2. Novogene Overview
2
• Founded in 2011
• Rapid growth each year, now 1000 employees
• Providing high-quality, next-gen sequencing and bioinforma5cs services in research
and clinical markets
• Currently the largest Illumina customer and the only Illumina IGN partner in China
• The largest sequencing center in China in capacity
• Preparing for IPO
• Revenue has surpassed BGI in research market in China recently
Headquarters in Beijing
4. Founder and Chief Executive
4
Dr. Ruiqiang Li
• One of the world’s leading experts in genomics and
bioinformatics
• Best known for developing the software SOAP for
ultra-fast sequence mapping, variation detection,
and de novo genome assembly.
• Prior experience
• Vice President of BGI
• Principle Investigator at Peking University &
Peking/Tsinghua Center for Life Sciences
• 70 publications (30 in Nature and Science series)
that are cited over 12,000 times
• PhD in Biology from University of Copenhagen
5. The Development of Novogene
5
28
86
198
506
0
100
200
300
400
500
600
2011 2012 2013 2014
Employees
2011 2012 2013 2014
Revenue
Both revenue and number of employees more than doubled each year since our
founding.
7. The Largest Sequencing Center in China
7
Platform Read Length
Q30 (Data Quality
Guarantee)
HiSeq X 2×150 bp
≥80%
HiSeq 2500/2000
2×250 bp
2×125 bp ≥85%
1×50 bp ≥90%
HiSeq 4000 2×150 bp
≥80%
MiSeq
2×300 bp
2×250 bp
NextSeq 500
2×75 bp
1×75 bp
Total Output/Month 234 Tb
8. Illumina’s Official Quality Guarantee
8
Our data quality
guarantee exceeds
Illumina’s official
guarantee.
We are the only
company providing
this guarantee.
9. The Largest Sequencing Center in China
9
Platform Read Length
Novogene Q30
Guarantee
Average Q30
Delivered
Illumina Q30
Guarantee
10 HiSeq X 2×150 bp ≥80% 88.11% ≥75%
10 HiSeq
2500/2000
2×250 bp ≥80% 91.22% ≥80%
2×125 bp ≥85% 88.29% ≥80%
1×50 bp ≥90% 96.57% ≥80%
1 HiSeq 4000 2×150 bp ≥80% 90.10% ≥75%
4 MiSeq
2×300 bp - 75.20% ≥70%
- ≥75%2×250 bp ≥80%
1 NextSeq 500
2×75 bp ≥80% 90.37% ≥80%
1×75 bp ≥80% 85.58% ≥80%
Total Output/Month 234 Tb
10. Our Human Whole Genome Sequencing
Service
10
Platform HiSeq X Ten
Read length 2×150 bp
Turnaround time 15 working days
Standard analysis Additional 8 working days
Advanced analysis upon request
Different Batch Flow Cell 1 Flow Cell 2
Output 972.0 G 943.2 G
Q30 90.30% 88.90%
Different Sample Sample 1 Sample 2
Raw Data 105.5 G 105.8 G
Mapping Ratio 99.90% 99.90%
Effective Coverage 31.7 31.8
Service Parameter
Data Output and Quality
“I am extremely
satisfied with the
quality of the WGS
results Novogene
delivered.”
From customer
Justin Loe,
CEO of
Full Genomes
Corporation,
Maryland, USA
11. Human Whole Genome Sequencing
(WGS)
11
Standard Bioinformatics Pipeline of WGS
Raw data
Clean data
Alignment
Annotation
CNV
Case Control
Yes
SNP, InDel SV Somatic SNV Somatic InDel
13. Data Analysis on HPC (High Performance
Computing) Platform
13
DELL Computing Nodes
Memory Size: 17 TB
Computing Power: 73 T flops
Storage: 3.2 PB
Data Type Data Analysis Capacity / Month
Human Genome 360 Tb / 4000 samples
Exome
40 Tb / 8000 samples
Transcriptome
14. Products and Services Overview
14
Life Science Research
Services
p Human whole genome &
exome sequencing
p Transcriptome sequencing
p Plant and animal sequencing
p Microbial sequencing
p Bioinformatics analysis
Clinical Genetic Testing (China)
p Cancer generic testing & risk
assessment
p Cancer drug panel
p ctDNA detection
p NIPT
15. Service Portfolio for Global Researchers
15
Whole Genome Sequencing
• Whole genome re-sequencing
• Whole exome sequencing
• Single-cell DNA sequencing
• Target region sequencing
Transcriptome Sequencing
• mRNA sequencing
• Single-cell RNA sequencing
• lncRNA sequencing
• Whole genome bisulfite sequencing
Microbial Genome Sequencing
• Microbial genome re-sequencing
• Microbial de novo sequencing
• Metagenomic sequencing
• 16S/18S/ITS amplicon sequencing
Animal & Plant Genome Sequencing
• Animal & Plant re-sequencing
• Animal & Plant de novo sequencing
• Pan-genome re-sequencing
• Genotyping by Sequencing
17. Whole Exome Sequencing (WES)
17
Platform HiSeq 4000
Exome Capture Agilent SureSelect V6 (58M) / V5
Read length
2×150 bp (longer reads with 20%
more data than PE125)
Turnaround time 15 working days
Standard anaylsis Additional 5 working days
Service Parameter (State-of-the-Art Platform)
Raw data
Clean data
Alignment
Annotation
InDel
Case Control
Yes
SNP Somatic SNP Somatic InDel
Standard Bioinformatics Pipeline
ExAC database--including
17 international exome
databases for free
18. 18
Inherit Susceptibility Gene
Screening NovoCRTM
Individual
cancer
panels
Multi-cancer
testing
Personalized Cancer
Therapy NovoPMTM
Tissue
samples
Standard
47 genes
Professional
483 genes
ctDNA
Standard
40 genes
Professional
483 genes
• NovoPM detects SNVs, indels, CNVs, fusions, and their relationships with cancer
drugs to guide personalized cancer therapy.
• NovoCR assesses an individual’s risk in developing cancer.
• We also offer custom panel service based on Agilent capture and HiSeq 4000.
Cancer Panel Solutions
19. RNA Sequencing
19
Platform HiSeq 4000
Read length 2×150 bp
Turnaround time 15 working days
Standard anaylsis additional 15 working days
Service Parameters
Novogene Advantages
• HiSeq paired-end 150 bp (longer reads)
Sequencing strategy
• Over 3,000 customer projects successfully completed
Rich experience
• Self-developed software (NovoFinder)--aim to find the genes you
need
Bioinformatics analysis
20. RNA Sequencing Data Analysis
20
Sequencing Data QC Total RNA mRNA Library
Genome Available Genome Unavailable
Genome Mapping
Gene Structure Gene Expression
Alternative Splicing
Antisense Transcripts
SNP & InDel
Differential exon usage
Gene Expression Level
Sample Correlation
Differential Expressed Genes
GO/KEGG Enrichment
Transcriptome Assembly
Transcripts Sequence
Length Distribution
Function Annotation
SNP & InDel
21. Long Non-coding RNA Sequencing
21
Platform HiSeq 4000
Read length Paired-end 150 bp
Turnaround
time
40 working days
Service Parameter Standard Bioinformatics Analysis (by an all PhD team)
Long non-coding RNA plays important regulatory functions. Our service enables
researchers to simultaneously obtain information on mRNAs and lncRNAs.
22. Microbial Genome Sequencing
22
Microbial
Genome-
sequencing
Bacterial
genomesequen
cing
Draft map
Fine map
Complete map
Re-sequencing
Draft map
Fine map
Re-sequencing
16S
18S
ITS
Fosmid,
plasmid
Mitochondria
Chloroplast
Virus
Meta-genomic
sequencing
Meta-survey
Fungal
genome
sequencing
Small
genome
sequencing
Amplicon
sequencing
Meta
sequencing
Platform: HiSeq PE 150
1000+ samples sequenced
per month
24. Single Cell Sequencing
24
• MALBAC for DNA, SMARTER for RNA
Amplification
technology
• 2 papers published (Nature & Science)Rich experience
• HiSeq X for human, HiSeq2500/4000 for
other species
Sequencing
strategy
Novogene Advantages
25. Customer Projects Completed in 2014
25
Transcriptome
Sequencing
Human
Resequencing
Microbial
Sequencing
Plant &
Animal
resequencing
Plant &
Animal de
novo
sequencing
3122 3181
813
243
32
26. We Understand Science
26
• >100 published ar=cles with a total impact fact of 649 in just 4 years
• 34 patents in NGS and bioinforma=cs
• Numerous ar=cles in submission
27. 27
Human preimplantation embryos and embryonic stem
cells
Methods: Single Cell lncRNA+mRNA Seq
22,687
maternally
expressed
genes
detected,
including
8,701
lncRNAs,
9,735
increased
than
microarray
2,733
novel
lncRNAs
discovered
and many
are
expressed
in specific
developme
ntal stages
EPI cells and
primary hESC
outgrowth have
dramatically
different
transcriptome,
1,498 genes
showing
differential
expression.
Grope samples:
Ø Metaphase II oocyte, zygote, 2-cell, 4-cell,
8-cell, morula and late blastocyst at hatching
stage;
Ø 3-30 biology repeats per group;
Ø 124 cells totally
Method:
lncRNA+mRNA
HiSeq2000, PE100
20M-60M clean reads; 438 Gb data totally
n Novogene Case 1
28. 28
Human Single Sperm Cells
Methods: Single Cell Whole Genome Sequencing
One Healthy Asian Male in late 40s
93 sperm:~1x
70x 99 sperm
MALBAC
23% coverage
2.8 million
SNP
2368 autosomal crossover events in the sperm
cells;
26 .6 crossovers per cell on average
Constructed a genetic map of recombination
of the individual
5% sperm were deteced having autosomal
aneuploidy
6 sperm:~5x
43% coverage
1.4 million
hetSNPs
n Novogene Case 2
29. 29
allotetraploid cotton Genome DNA
Methods: de novo and Transcriptome Sequencing
n Novogene Case 3 allotetraploid cotton
~96% of the estimated
allotetraploid genome
(total scaffold length
2.4/2.5 Gb)
265,279 contigs
(N50=34.0 kb) and
40,407 scaffolds
(N50=1.6 Mb)
RNA-seq
245x
97 samples(from
different organ,
developmental Stages
and adverse conditions)
contig N50 34Kb
scaffold N50 1.6M
total length 2.43G
allotetraploid cotton
evolutionary mechanism
and function of A-
subgenome and D-
subgenome.
A branch of MYB genes
family takes an
important role in the
fiber development.
Many CESA genes got
significant positive
selection function in
domestication
De novo
31. l Brief Introduction
Applying next-generation sequencing and state-of-the-art assembly algorithms make
the construction of pan-genome map feasible. Constructing genome maps for several
individuals provides unprecedented opportunities to investigate the detailed genetic
diversity at population level.
Pan-genome Sequencing
ATGCTACGGTAACCCTGATTGCAATG
? ? ? ? ? ? ? ? ? ? ? ?
。。。 。。。
ATGCTACGGTAACCCTGATTGCAATG
ATGCTACGGTAACCCTGATTGCAATG
。。。 。。。
32. Ø Key Points
The pan-genome is a superset of all the genes in all the strains of a species. :
ü Core Genome: Containing genes present in all strains;
ü Dispensable Genome: Containing genes present in two or more strains;
ü Specific Gene: Specific to single strain.
Core
genome
dispensable
genome
specific genome
33. l Pan-Genome Sequencing
Material selection and
QC
Library Construction
Genome preliminarily
assembly
Pan-genome
Construction
Customized
bioinformatics analysis
Gene Annotation
comparative genomics
analysis
SNP/InDel/SV/CNV / novel
sequence
Gene family analysis
Phylogenetic analysis
Co-linear analysis
Sample genome : 60X
Complex genome: 100X
230bp/500bp/2K /5K