SlideShare a Scribd company logo
1 of 29
Mathu Malar C., Jennifer
Yuzon, Takao Kasuga and
Sucheta Tripathy
UC Davis, CA, USA.
CSIR- Indian Institute of
Chemical Biology, Kolkata,
India.
Background
Phytophthora ramorum, a highly destructive pathogen with a
wide host-range that causes Sudden Oak Death in
western North America and Sudden Larch Death in the
UK.
P.ramorum was first reported in 1995 and the origins of
the pathogens are still unclear
 P. ramorum can be spread over several miles in mists, air
currents, watercourses and rain splash. It is also known
that Phytophthora pathogens can be spread on footwear,
dogs’ paws, bicycle wheels, tools and equipment etc.
Parke, J. L., and S. Lucas. 2008. Sudden oak death and ramorum blight. The Plant Health Instructor. DOI: 10.1094/PHI-I-2008-
0227-01 https://sites.google.com/site/phytophthoragenomicslab/home
Platform No of Reads
generated
Total reads used
for assembly
Organism Read coverage
Pacbio 435399 33% and 47% Pr102 25X
illumina 20942377 20942377 (100%) Pr102 10X
Platform No of Reads
generated
Total reads used
for assembly
Organism Read coverage
pacbio 402170 285487 (70%) ND886 50X
Illumina 43676830 43676830(100%) ND886 50X
For strain Pr102
For ND886
 V1 assembly (Tyler BM et al, 2006 ) by Sanger sequencing
method, 65 MB, Genome Coverage 7.7X and Total Gaps 12 MB.
 V2 Assembly (September 2015)
 V3 Assembly (December 2015)
 V4 Assembly (March 2016)
 V5 Assembly April 2016
Pacbio Pr102 435399
(raw reads)
ECTools with Sanger
Unitigs from 2006
phyra V1 assembly
Corrected
(33%)
147429 reads
Uncorrected
(67%)
287970 reads
ECTools with mock
intermediate
assembly (Illumina
reads + unitigs (V1)
derived 6K, 20K
simulated libraries
using allpaths)
Corrected
1418 reads
0.49%
Uncorrected
reads 286552
66.50%
PBCR Auto Error
correction assembly
used as input to
Ectools for EC
Corrected
57640 reads
13.2%
Uncorrected
228912
52%
Improved 3-way error
correction protocol
An Overview of Assemblers and tools used in this study
Tools Input type Function
ECTools PacBio reads with a reference dataset
(unitgs) for read error correction.
Correcting errors in PacBioreads
PBCR (PacBioToCA) PacBio reads Error corrections and Assembly
Canu PacBio reads Successor of PBCR assembler
SSPACE (stand-alone
scaffolder of pre-
assembled contigs using
paired-read data)
Pre-assembled contigs, short reads
(paired end and mate pair)
Is not a de novo assembler. Used
for scaffolding and extending
contigs
SSPACE Long Reads Pre-assembled contigs, uses (the
pacbio reads) especially long reads
Is a successor of SSPACE and
performs better on a case to
base basis.
Dedupe Sequence reads Removes PCR duplicates and
identical sequences prior to
mapping
Redundans Hybrid datasets Recently developed (2016)
specifically effective for
heterozygous genomes
Improved Error
corrected reads (49%)
Illumina reads
Dedupe
Redudans
2325 scaffolds 76
Mb largest 781884
N50=65030
Canu
Largest scaffold
=655506,
smallest=3055
Total scaffolds = 920,
N50 = 116386, size =
61mb
V3 Assembly
Celera
minimus
SSPACE
SSPACE
Long Reads
1114 scaffolds
Largest = 886281
Smallest = 15009
Total length =
79285078
Previous error corrected
protocol (33%)
V2 Assembly
Other Assembly Protocols
minimus
SSPACE
SSPACE
Long reads
SSPACE
SSPACE
Long reads
Improved Error
corrected reads (49%)
V4 Assembly
Total error corrected
reads 206487
Celera assembly with length
cut off 10k (2735 contigs,
77Mb )
Library No reads Read length
Illumina R1=10157419
R2=10784958
varies from 50 nt
to 100nt
V1 unitigs MP
20k
R1=28379
R2=28379
101
Pacbio corrected
MP 10k
6k
R1=5234
R2=5234
R1=59180
R2=59180
150
101
V1 unitigs (2006
assembly)
7589 (unitigs) variable
Input data for
Redundans
Comparison with Phyra unitigs
using mummer CAP3 on unmapped
sequences from V1 unitigs
appended to assembly
back
No of scaffolds = 2005, largest scaffold= 781884,
smallest scaffold = 2000 , N50 = 76032, total
length = 67996746 Gaps = 220 bases
Protocol for V5 assembly
Redundans Assembly 65M
1825 scaffolds, N50=76861,
Largest=781884,Smallest=2000
0
10
20
30
40
50
60
70
80
90
V1 V2 V3 V4 V5
Assembly sizes in MB
0
500
1000
1500
2000
2500
3000
V1 V2 V3 V4 V5
Number of scaffolds
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
V1 V2 V3 V4 V5
Gaps in nucleotides
Gap filled in the pacbio new
version genome assembly
Gaps filled
scaffolds broken
mis-assemblies
Assembly validation using Quast
0
5000
10000
15000
20000
25000
V1 V2 V3 V4 V5
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
V1 V2 V3 V4 V5
Number of genes
Average gene length
Largest gene
Gene Prediction statistics using Augustus and mapping statistics
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
V1 V2 V3 V4 V5
bases masked
Total interspersed repeats
LTR / Gypsy elements
Repeat Regions captured in the genome
V5 V1 V2 V3 V4
V5
V1
V2
V3V4
Assembly
Version
No of core
Prots(248
completely
highly conserved
CEG)
Unique gene %
completen
ess
Out of 458 core
genes present in
genome
V1 236 KOG0948
Nuclear exosomal
RNA helicase
MTR4
95.16 412
V2 237 KOG0434
Isoleucyl-tRNA
synthetase
95.56 412
V3 236 KOG0734
AAA+-type ATPase
containing the
peptide
95.16 413
V4 237 KOG2311
NAD/FAD-
utilizing protein
95.56 416
V5 238 KOG1158
NADP/FAD
dependent
oxidoreductase
95.97 414
Effector Prediction Pipeline
V5 Assembly
Signal p predicted protein
sequences (7159)
Removed proteins with
transmembrane domains.
RXLR motifs on the N
terminus (373 sequences)
Motif prediction with
MEME (W Y domain)
343 sequences were
detected in MEME
ND886 genome assembly
ND886 error correction and Read statistics
Pacbio raw reads
(402170)
ECTools with Sanger
Unitigs (V1
Assembly)
Corrected(70.
9%)
285487
Uncorrected
(29.1%)
ND886 assemblyTotal error corrected reads
285487
Celera Assembly
Minimus
Dedupe
Library No reads Read
length
Illumina R1=28389
986
R2=
28334221
varies
from 50 nt
to 100nt
Pacbio
corrected
MP 10k
6k
R1=
91555
R2=
91555
R1=13170
3
R2=13170
3
101
101
Read statistics
SSPACE [with illumina reads],Total contigs = 6443
Largest contig =648889,Smallest contig =2098,
assembly size = 150 Mb
Redudans
No of scaffolds = 2225, largest = 648906 ,
smallest = 2745 , N50 = 48161 , total length =
92877686 , Gaps = 4133
Assembly No of core
proteins
from 248
%
completen
ess
No of core
genes out
of 458
Nd886 234 94.35 410
Comparison of ND886 against Pr102 2006 assembly
P.ramorum ND886
P.ramorum Pr102 (2006)
 De Novo assemblers alone are not enough for a good genome
assembly.
 PacBio Reads are marred with errors and one error correction
protocol alone does not always produce the best result.
 Hybrid assembly in combination with scaffolder, duplicate removers
are effective for assembly.
 No protocol works best for 2 genomes, has to be mixed and
matched.
 Assembly improvement does not necessarily change the gene space
rather works better for repetitive regions and correcting assembly.
Assemb
ly name
base
s
mask
ed
Sma
ll
RN
A
Simple
repeats
Low
complex
ity
GC
content
Total
interspe
rsed
repeats
LINE
[R2/R4/
NeSL]
Ty1/copio
Gypsy/DIRS1
LTR elements
DNA
transpos
on
Piggy
BAC
Tourist/
harbinge
r
V1 11.77
%
0.01
%
0.36% 0.03% 53.86% 11.37% 0.13% 10.01 % 1.23 % 0.16% 0.01%
V2 20.68
%
0.00
%
0.44 % 0.04 % 54.32% 20.20 % 0.32 % 5.64 % 1.48 % 0.16 % 0.0.1 %
V3 27.00
%
0.45
%
0.75% 0.12% 52.40 % 25.70 % 0.39 % 23.85 % 1.46 % 0.17 % 0.01 %
V4 21.06
%
0.11
%
0.53% 0.05% 54.09 % 20.37 % 0.29 % 18.70 % 1.38 % 0.17 % 0.0.1 %
V5 24.34
%
0.07
%
0.49% 0.06 % 53.98 % 23.73% 0.37% 21.89 % 1.47 % 0.15 % 0.0.1%
Repeat Regions captured in the genome
 Long reads ranges from 14,000 to 48,000 base pairs greater than that of sanger and
NGS reads
 Shortest run time (30 mins).
 Least GC bias.
 No amplification bias.
 Handles the highly repetitive genome, can fill the gaps efficiently.
Reference: http://www.pacificbiosciences.com/products/smrt-technology/smrt-sequencing-advantage/
Assembly
name
bases
masked
Small
RNA
Simple
repeats
Low
complexity
GC
content
Total
interspersed
repeats
LINE
[R2/R4/
NeSL]
Ty1/copio
Gypsy/DIRS1
LTR elements
DNA transposon Piggy BAC Tourist/harbinger
P.ramorum
2006
7847064 bp
(11.77%)
11 (6033
bp)
0.01%
5336 (242077
bp)
0.36%
422(20747 bp)
0.03%
53.86% 7580618 bp
(11.37%)
53 (88470
bp 0.13%)
5972 (6669143
bp) 10.01 %
1174 (823005bp)
1.23 %
200 (104977 bp)
0.16%
12 (5609 bp)
0.01%
Protocol
1b
16553511
bp
(24.34%)
75 (49453
bp)
0.07%
7122(331100
bp)
0.49%
816 (40373
bp)
0.06 %
53.98 % 16138229 bp
(23.73%)
87
(250632bp
)
0.37%
8822(14885437
bp ) 21.89 %
1419 ( 1002160 bp )
1.47 %
198 ( 104684 bp )
0.15 %
13 (5809 bp )
0.0.1%
Protocol 2 21185972
bp
(27.00%)
605(3493
28 bp)
0.45%
11702
(586604 bp)
0.75%
1787 (91389
bp)
0.12%
52.40 % 20163370 bp
25.70 %
112
308607 bp
0.39 %
11127 (
18710327 bp )
23.85 %
1756
(1144436) bp
1.46 %
231
129697 bp
0.17 %
12
5417 bp
0.01 %
Protocol 3 12854764
bp
(21.06 %)
64( 69255
bp)
0.11 %
6801 (323105
bp)
0.53%
679 (33221
bp)
0.05%
54.09 % 12434182 bp
20.37 %
64 (176881
bp)
0.29 %
6752
(11415393 bp )
18.70 %
1133 (841908 bp) 1.38
%
191 (105824 bp)
0.17 %
12 (6211 bp)
0.0.1 %
Bangalore
meeting
16192690
bp
(20.68 %)
8 (3317
bp)
0.00%
7549 (340933
bp)
0.44 %
699(33372 bp)
0.04 %
54.32% 15819353 bp
(20.20 %)
92
250092bp
(0.32 %)
2498 (4413567
bp) 5.64 %
1560 (1155118 bp )
1.48 %
228 (126831bp)
0.16 %
15 (6376 bp)
0.0.1 %
Repeat Regions captured in the genome
Error
correction
Assembly No of
genes
predict
ed
Averag
e gene
length
Larges
t gene
Mappi
ng
with
V1
assem
bly
Mappi
ng
with
V2
assem
bly
Mappi
ng
with
V3
assem
bly
Mappi
ng
with
V4
assem
bly
Mappi
ng
with
V5
assem
bly
V1 16134 1673 21479 NA 15978 15645 15855 16072
V2 20741 2162.78 31832 20739 NA 20377 20519 20675
V3 15110 2005.05 46572 15055 15019 NA 14990 15073
V4 17311 1821.26 47518 17307 17245 16906 NA 17277
V5 19278 1829.68 31832 19273 19167 18861 19051 NA
Nucmer Promer
Pr102 2016
pacbio
assembly

More Related Content

What's hot

2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingQIAGEN
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGBilal Nizami
 
Polymerase Chain Reaction Ependorf Instrument
Polymerase Chain Reaction Ependorf InstrumentPolymerase Chain Reaction Ependorf Instrument
Polymerase Chain Reaction Ependorf InstrumentPalash Mehar
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAGEN
 
NGS overview
NGS overviewNGS overview
NGS overviewAllSeq
 
Molecular Biology Techniques - PCR
Molecular Biology Techniques - PCRMolecular Biology Techniques - PCR
Molecular Biology Techniques - PCRBharati Singh
 
Planning and Executing siRNA Experiments—Good Practices for Optimal Results
Planning and Executing siRNA Experiments—Good Practices for Optimal ResultsPlanning and Executing siRNA Experiments—Good Practices for Optimal Results
Planning and Executing siRNA Experiments—Good Practices for Optimal ResultsIntegrated DNA Technologies
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1QIAGEN
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reactionHassan Tariq
 

What's hot (18)

2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Pcr
PcrPcr
Pcr
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencing
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
xGen® Lockdown® Probes
xGen® Lockdown® ProbesxGen® Lockdown® Probes
xGen® Lockdown® Probes
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Polymerase Chain Reaction Ependorf Instrument
Polymerase Chain Reaction Ependorf InstrumentPolymerase Chain Reaction Ependorf Instrument
Polymerase Chain Reaction Ependorf Instrument
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene Panels
 
Pcr factors SRDIXIT
Pcr factors SRDIXITPcr factors SRDIXIT
Pcr factors SRDIXIT
 
NGS overview
NGS overviewNGS overview
NGS overview
 
Pcr
PcrPcr
Pcr
 
Molecular Biology Techniques - PCR
Molecular Biology Techniques - PCRMolecular Biology Techniques - PCR
Molecular Biology Techniques - PCR
 
Planning and Executing siRNA Experiments—Good Practices for Optimal Results
Planning and Executing siRNA Experiments—Good Practices for Optimal ResultsPlanning and Executing siRNA Experiments—Good Practices for Optimal Results
Planning and Executing siRNA Experiments—Good Practices for Optimal Results
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
 

Viewers also liked

контролінг інвестиційних проектів
контролінг інвестиційних проектівконтролінг інвестиційних проектів
контролінг інвестиційних проектівav47840
 
Windows環境でのMySQL
Windows環境でのMySQLWindows環境でのMySQL
Windows環境でのMySQLyoyamasaki
 
Go 1.8 'new' networking features
Go 1.8 'new' networking featuresGo 1.8 'new' networking features
Go 1.8 'new' networking featuresstrikr .
 
Millennials on coffee
Millennials on coffeeMillennials on coffee
Millennials on coffeeMinio Studio
 
La innovación en la educación superior
La innovación en la educación superiorLa innovación en la educación superior
La innovación en la educación superiorCarlos Marcelo
 
Making the most of social media | promoting your Pride event
Making the most of social media | promoting your Pride eventMaking the most of social media | promoting your Pride event
Making the most of social media | promoting your Pride eventAlex Webb
 
What's Next on the Digital Horizon #bisummit - Noteworthy Moments
What's Next on the Digital Horizon #bisummit - Noteworthy Moments What's Next on the Digital Horizon #bisummit - Noteworthy Moments
What's Next on the Digital Horizon #bisummit - Noteworthy Moments Heidi Forbes Öste, PhD
 
Acessibilidade Web Cognitiva
Acessibilidade Web CognitivaAcessibilidade Web Cognitiva
Acessibilidade Web CognitivaTalita Pagani
 
【2016.06】cvpaper.challenge2016
【2016.06】cvpaper.challenge2016【2016.06】cvpaper.challenge2016
【2016.06】cvpaper.challenge2016cvpaper. challenge
 
Hechos de los apostoles
Hechos de los apostolesHechos de los apostoles
Hechos de los apostolesAda Torres
 
Introducción a jupyter (antes i python notebook)
Introducción a jupyter (antes i python notebook)Introducción a jupyter (antes i python notebook)
Introducción a jupyter (antes i python notebook)Juan Rodríguez
 
A basic guide to cosmetic formulation
A basic guide to cosmetic formulationA basic guide to cosmetic formulation
A basic guide to cosmetic formulationNgan Nguyen
 
Biological control of aflatoxins in maize and groundnut through use of aflasa...
Biological control of aflatoxins in maize and groundnut through use of aflasa...Biological control of aflatoxins in maize and groundnut through use of aflasa...
Biological control of aflatoxins in maize and groundnut through use of aflasa...africa-rising
 
(In) Security graph database in real world
(In) Security graph database in real world (In) Security graph database in real world
(In) Security graph database in real world Miguel Hernández Boza
 
Founders4Schools diversity slides 6 march 2017
Founders4Schools diversity slides 6 march 2017Founders4Schools diversity slides 6 march 2017
Founders4Schools diversity slides 6 march 2017Sherry Coutu CBE
 

Viewers also liked (20)

контролінг інвестиційних проектів
контролінг інвестиційних проектівконтролінг інвестиційних проектів
контролінг інвестиційних проектів
 
El arte de hacer un pitch
El arte de hacer un pitchEl arte de hacer un pitch
El arte de hacer un pitch
 
Windows環境でのMySQL
Windows環境でのMySQLWindows環境でのMySQL
Windows環境でのMySQL
 
Go 1.8 'new' networking features
Go 1.8 'new' networking featuresGo 1.8 'new' networking features
Go 1.8 'new' networking features
 
Millennials on coffee
Millennials on coffeeMillennials on coffee
Millennials on coffee
 
La innovación en la educación superior
La innovación en la educación superiorLa innovación en la educación superior
La innovación en la educación superior
 
Making the most of social media | promoting your Pride event
Making the most of social media | promoting your Pride eventMaking the most of social media | promoting your Pride event
Making the most of social media | promoting your Pride event
 
What's Next on the Digital Horizon #bisummit - Noteworthy Moments
What's Next on the Digital Horizon #bisummit - Noteworthy Moments What's Next on the Digital Horizon #bisummit - Noteworthy Moments
What's Next on the Digital Horizon #bisummit - Noteworthy Moments
 
Acessibilidade Web Cognitiva
Acessibilidade Web CognitivaAcessibilidade Web Cognitiva
Acessibilidade Web Cognitiva
 
Parques Biblioteca medellin
Parques Biblioteca   medellinParques Biblioteca   medellin
Parques Biblioteca medellin
 
【2016.06】cvpaper.challenge2016
【2016.06】cvpaper.challenge2016【2016.06】cvpaper.challenge2016
【2016.06】cvpaper.challenge2016
 
Presentation1 (kesehatan dalam islam)
Presentation1 (kesehatan dalam islam)Presentation1 (kesehatan dalam islam)
Presentation1 (kesehatan dalam islam)
 
Drawing development
Drawing developmentDrawing development
Drawing development
 
Hechos de los apostoles
Hechos de los apostolesHechos de los apostoles
Hechos de los apostoles
 
Introducción a jupyter (antes i python notebook)
Introducción a jupyter (antes i python notebook)Introducción a jupyter (antes i python notebook)
Introducción a jupyter (antes i python notebook)
 
A basic guide to cosmetic formulation
A basic guide to cosmetic formulationA basic guide to cosmetic formulation
A basic guide to cosmetic formulation
 
Biological control of aflatoxins in maize and groundnut through use of aflasa...
Biological control of aflatoxins in maize and groundnut through use of aflasa...Biological control of aflatoxins in maize and groundnut through use of aflasa...
Biological control of aflatoxins in maize and groundnut through use of aflasa...
 
UD4. MENDEL I LES LLEIS DE L'HERÈNCIA
UD4. MENDEL I LES LLEIS DE L'HERÈNCIAUD4. MENDEL I LES LLEIS DE L'HERÈNCIA
UD4. MENDEL I LES LLEIS DE L'HERÈNCIA
 
(In) Security graph database in real world
(In) Security graph database in real world (In) Security graph database in real world
(In) Security graph database in real world
 
Founders4Schools diversity slides 6 march 2017
Founders4Schools diversity slides 6 march 2017Founders4Schools diversity slides 6 march 2017
Founders4Schools diversity slides 6 march 2017
 

Similar to Ramorum2016 final

20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)GigaScience, BGI Hong Kong
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfCRISTIANALONSORODRIG1
 
Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsJoe Parker
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing pptAshwini R
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...Arghya Kusum Das
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingARUNDHATI MEHTA
 
Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Forest Research
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfAkhileshPathak33
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析Monascus2008
 
DNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionDNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionSubhadipGhosh96
 
Gene expression in eukaryotes
Gene expression in eukaryotesGene expression in eukaryotes
Gene expression in eukaryotesDr.M.Prasad Naidu
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Integrated DNA Technologies
 

Similar to Ramorum2016 final (20)

20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomics
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
 
RMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdfRMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdf
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
DNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionDNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its function
 
Gene expression in eukaryotes
Gene expression in eukaryotesGene expression in eukaryotes
Gene expression in eukaryotes
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
 

More from Sucheta Tripathy (20)

Gal
GalGal
Gal
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Stat2013
Stat2013Stat2013
Stat2013
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 

Recently uploaded

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 

Recently uploaded (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Ramorum2016 final

  • 1. Mathu Malar C., Jennifer Yuzon, Takao Kasuga and Sucheta Tripathy UC Davis, CA, USA. CSIR- Indian Institute of Chemical Biology, Kolkata, India.
  • 2. Background Phytophthora ramorum, a highly destructive pathogen with a wide host-range that causes Sudden Oak Death in western North America and Sudden Larch Death in the UK. P.ramorum was first reported in 1995 and the origins of the pathogens are still unclear  P. ramorum can be spread over several miles in mists, air currents, watercourses and rain splash. It is also known that Phytophthora pathogens can be spread on footwear, dogs’ paws, bicycle wheels, tools and equipment etc. Parke, J. L., and S. Lucas. 2008. Sudden oak death and ramorum blight. The Plant Health Instructor. DOI: 10.1094/PHI-I-2008- 0227-01 https://sites.google.com/site/phytophthoragenomicslab/home
  • 3. Platform No of Reads generated Total reads used for assembly Organism Read coverage Pacbio 435399 33% and 47% Pr102 25X illumina 20942377 20942377 (100%) Pr102 10X Platform No of Reads generated Total reads used for assembly Organism Read coverage pacbio 402170 285487 (70%) ND886 50X Illumina 43676830 43676830(100%) ND886 50X For strain Pr102 For ND886
  • 4.  V1 assembly (Tyler BM et al, 2006 ) by Sanger sequencing method, 65 MB, Genome Coverage 7.7X and Total Gaps 12 MB.  V2 Assembly (September 2015)  V3 Assembly (December 2015)  V4 Assembly (March 2016)  V5 Assembly April 2016
  • 5. Pacbio Pr102 435399 (raw reads) ECTools with Sanger Unitigs from 2006 phyra V1 assembly Corrected (33%) 147429 reads Uncorrected (67%) 287970 reads ECTools with mock intermediate assembly (Illumina reads + unitigs (V1) derived 6K, 20K simulated libraries using allpaths) Corrected 1418 reads 0.49% Uncorrected reads 286552 66.50% PBCR Auto Error correction assembly used as input to Ectools for EC Corrected 57640 reads 13.2% Uncorrected 228912 52% Improved 3-way error correction protocol
  • 6. An Overview of Assemblers and tools used in this study Tools Input type Function ECTools PacBio reads with a reference dataset (unitgs) for read error correction. Correcting errors in PacBioreads PBCR (PacBioToCA) PacBio reads Error corrections and Assembly Canu PacBio reads Successor of PBCR assembler SSPACE (stand-alone scaffolder of pre- assembled contigs using paired-read data) Pre-assembled contigs, short reads (paired end and mate pair) Is not a de novo assembler. Used for scaffolding and extending contigs SSPACE Long Reads Pre-assembled contigs, uses (the pacbio reads) especially long reads Is a successor of SSPACE and performs better on a case to base basis. Dedupe Sequence reads Removes PCR duplicates and identical sequences prior to mapping Redundans Hybrid datasets Recently developed (2016) specifically effective for heterozygous genomes
  • 7. Improved Error corrected reads (49%) Illumina reads Dedupe Redudans 2325 scaffolds 76 Mb largest 781884 N50=65030 Canu Largest scaffold =655506, smallest=3055 Total scaffolds = 920, N50 = 116386, size = 61mb V3 Assembly Celera minimus SSPACE SSPACE Long Reads 1114 scaffolds Largest = 886281 Smallest = 15009 Total length = 79285078 Previous error corrected protocol (33%) V2 Assembly Other Assembly Protocols minimus SSPACE SSPACE Long reads SSPACE SSPACE Long reads Improved Error corrected reads (49%) V4 Assembly
  • 8. Total error corrected reads 206487 Celera assembly with length cut off 10k (2735 contigs, 77Mb ) Library No reads Read length Illumina R1=10157419 R2=10784958 varies from 50 nt to 100nt V1 unitigs MP 20k R1=28379 R2=28379 101 Pacbio corrected MP 10k 6k R1=5234 R2=5234 R1=59180 R2=59180 150 101 V1 unitigs (2006 assembly) 7589 (unitigs) variable Input data for Redundans Comparison with Phyra unitigs using mummer CAP3 on unmapped sequences from V1 unitigs appended to assembly back No of scaffolds = 2005, largest scaffold= 781884, smallest scaffold = 2000 , N50 = 76032, total length = 67996746 Gaps = 220 bases Protocol for V5 assembly Redundans Assembly 65M 1825 scaffolds, N50=76861, Largest=781884,Smallest=2000
  • 9. 0 10 20 30 40 50 60 70 80 90 V1 V2 V3 V4 V5 Assembly sizes in MB 0 500 1000 1500 2000 2500 3000 V1 V2 V3 V4 V5 Number of scaffolds 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 V1 V2 V3 V4 V5 Gaps in nucleotides
  • 10. Gap filled in the pacbio new version genome assembly Gaps filled scaffolds broken mis-assemblies
  • 12. 0 5000 10000 15000 20000 25000 V1 V2 V3 V4 V5 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 V1 V2 V3 V4 V5 Number of genes Average gene length Largest gene Gene Prediction statistics using Augustus and mapping statistics
  • 13. 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% V1 V2 V3 V4 V5 bases masked Total interspersed repeats LTR / Gypsy elements Repeat Regions captured in the genome
  • 14. V5 V1 V2 V3 V4 V5 V1 V2 V3V4
  • 15. Assembly Version No of core Prots(248 completely highly conserved CEG) Unique gene % completen ess Out of 458 core genes present in genome V1 236 KOG0948 Nuclear exosomal RNA helicase MTR4 95.16 412 V2 237 KOG0434 Isoleucyl-tRNA synthetase 95.56 412 V3 236 KOG0734 AAA+-type ATPase containing the peptide 95.16 413 V4 237 KOG2311 NAD/FAD- utilizing protein 95.56 416 V5 238 KOG1158 NADP/FAD dependent oxidoreductase 95.97 414
  • 16. Effector Prediction Pipeline V5 Assembly Signal p predicted protein sequences (7159) Removed proteins with transmembrane domains. RXLR motifs on the N terminus (373 sequences) Motif prediction with MEME (W Y domain) 343 sequences were detected in MEME
  • 18. ND886 error correction and Read statistics Pacbio raw reads (402170) ECTools with Sanger Unitigs (V1 Assembly) Corrected(70. 9%) 285487 Uncorrected (29.1%)
  • 19. ND886 assemblyTotal error corrected reads 285487 Celera Assembly Minimus Dedupe Library No reads Read length Illumina R1=28389 986 R2= 28334221 varies from 50 nt to 100nt Pacbio corrected MP 10k 6k R1= 91555 R2= 91555 R1=13170 3 R2=13170 3 101 101 Read statistics SSPACE [with illumina reads],Total contigs = 6443 Largest contig =648889,Smallest contig =2098, assembly size = 150 Mb Redudans No of scaffolds = 2225, largest = 648906 , smallest = 2745 , N50 = 48161 , total length = 92877686 , Gaps = 4133 Assembly No of core proteins from 248 % completen ess No of core genes out of 458 Nd886 234 94.35 410
  • 20. Comparison of ND886 against Pr102 2006 assembly P.ramorum ND886 P.ramorum Pr102 (2006)
  • 21.  De Novo assemblers alone are not enough for a good genome assembly.  PacBio Reads are marred with errors and one error correction protocol alone does not always produce the best result.  Hybrid assembly in combination with scaffolder, duplicate removers are effective for assembly.  No protocol works best for 2 genomes, has to be mixed and matched.  Assembly improvement does not necessarily change the gene space rather works better for repetitive regions and correcting assembly.
  • 22.
  • 23. Assemb ly name base s mask ed Sma ll RN A Simple repeats Low complex ity GC content Total interspe rsed repeats LINE [R2/R4/ NeSL] Ty1/copio Gypsy/DIRS1 LTR elements DNA transpos on Piggy BAC Tourist/ harbinge r V1 11.77 % 0.01 % 0.36% 0.03% 53.86% 11.37% 0.13% 10.01 % 1.23 % 0.16% 0.01% V2 20.68 % 0.00 % 0.44 % 0.04 % 54.32% 20.20 % 0.32 % 5.64 % 1.48 % 0.16 % 0.0.1 % V3 27.00 % 0.45 % 0.75% 0.12% 52.40 % 25.70 % 0.39 % 23.85 % 1.46 % 0.17 % 0.01 % V4 21.06 % 0.11 % 0.53% 0.05% 54.09 % 20.37 % 0.29 % 18.70 % 1.38 % 0.17 % 0.0.1 % V5 24.34 % 0.07 % 0.49% 0.06 % 53.98 % 23.73% 0.37% 21.89 % 1.47 % 0.15 % 0.0.1% Repeat Regions captured in the genome
  • 24.  Long reads ranges from 14,000 to 48,000 base pairs greater than that of sanger and NGS reads  Shortest run time (30 mins).  Least GC bias.  No amplification bias.  Handles the highly repetitive genome, can fill the gaps efficiently. Reference: http://www.pacificbiosciences.com/products/smrt-technology/smrt-sequencing-advantage/
  • 25. Assembly name bases masked Small RNA Simple repeats Low complexity GC content Total interspersed repeats LINE [R2/R4/ NeSL] Ty1/copio Gypsy/DIRS1 LTR elements DNA transposon Piggy BAC Tourist/harbinger P.ramorum 2006 7847064 bp (11.77%) 11 (6033 bp) 0.01% 5336 (242077 bp) 0.36% 422(20747 bp) 0.03% 53.86% 7580618 bp (11.37%) 53 (88470 bp 0.13%) 5972 (6669143 bp) 10.01 % 1174 (823005bp) 1.23 % 200 (104977 bp) 0.16% 12 (5609 bp) 0.01% Protocol 1b 16553511 bp (24.34%) 75 (49453 bp) 0.07% 7122(331100 bp) 0.49% 816 (40373 bp) 0.06 % 53.98 % 16138229 bp (23.73%) 87 (250632bp ) 0.37% 8822(14885437 bp ) 21.89 % 1419 ( 1002160 bp ) 1.47 % 198 ( 104684 bp ) 0.15 % 13 (5809 bp ) 0.0.1% Protocol 2 21185972 bp (27.00%) 605(3493 28 bp) 0.45% 11702 (586604 bp) 0.75% 1787 (91389 bp) 0.12% 52.40 % 20163370 bp 25.70 % 112 308607 bp 0.39 % 11127 ( 18710327 bp ) 23.85 % 1756 (1144436) bp 1.46 % 231 129697 bp 0.17 % 12 5417 bp 0.01 % Protocol 3 12854764 bp (21.06 %) 64( 69255 bp) 0.11 % 6801 (323105 bp) 0.53% 679 (33221 bp) 0.05% 54.09 % 12434182 bp 20.37 % 64 (176881 bp) 0.29 % 6752 (11415393 bp ) 18.70 % 1133 (841908 bp) 1.38 % 191 (105824 bp) 0.17 % 12 (6211 bp) 0.0.1 % Bangalore meeting 16192690 bp (20.68 %) 8 (3317 bp) 0.00% 7549 (340933 bp) 0.44 % 699(33372 bp) 0.04 % 54.32% 15819353 bp (20.20 %) 92 250092bp (0.32 %) 2498 (4413567 bp) 5.64 % 1560 (1155118 bp ) 1.48 % 228 (126831bp) 0.16 % 15 (6376 bp) 0.0.1 % Repeat Regions captured in the genome
  • 27.
  • 28. Assembly No of genes predict ed Averag e gene length Larges t gene Mappi ng with V1 assem bly Mappi ng with V2 assem bly Mappi ng with V3 assem bly Mappi ng with V4 assem bly Mappi ng with V5 assem bly V1 16134 1673 21479 NA 15978 15645 15855 16072 V2 20741 2162.78 31832 20739 NA 20377 20519 20675 V3 15110 2005.05 46572 15055 15019 NA 14990 15073 V4 17311 1821.26 47518 17307 17245 16906 NA 17277 V5 19278 1829.68 31832 19273 19167 18861 19051 NA