SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
www.citrusgreening.org
High quality arthropod genome assembly with
single molecule reads and long-range
scaffolding
Prashant S Hosmani1, Mirella Flores-Gonzalez1, Wayne Hunter2, Lukas A.
Mueller1, Susan Brown3, and Surya Saha1
1Boyce Thompson Institute; 2USDA-ARS U.S. Horticultural Research Laboratory; 3Kansas
State University
ss2489@cornell.edu @SahaSurya
Entomology 2017
Advances in Arthropod Genomics Workshop
www.citrusgreening.org
Acknowledgements
Mueller Lab
Mirella Flores
Prashant Hosmani
Kansas State University
Sue Brown
Cornell University/BTI
Michelle (Cilia) Heck
USDA/ARS
Wayne Hunter
Robert Shatters
University of California, Davis
Carolyn Slupsky
Indian River State College
Tom D’elia
www.citrusgreening.org
Citrus Greening: Huanglongbing
• Most significant disease of citrus worldwide
• More than $4.5 billion in lost citrus production and more than 8,200 lost jobs
(2006/07 to 2010/11)
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Annie Kruse
www.citrusgreening.org
Omics resources and databases are required for
identification of targets for interdiction
4
Genome Annotation
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
Genome Diaci1.1
Contigs 161,988
Total
Length
485 Mb
Longest 1 Mb
Shortest 201bp
Ns 19.3 Mb
Scaffold N50: 109,898 bp
Contig N50: 34,407bp
Highly fragmented
Many examples of
misassemblies!!
Current Illumina assembly
http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html
www.citrusgreening.org
Pacbio assembly
Error rate 0.013 Error rate 0.015
Number of
contigs
7,832 8,030
Total bases 462.8 Mb 493.1 Mb
Longest 1.6 Mb 1.7 Mb
Shortest 4.4 Kbp 5 Kbp
Average
length
59.9 Kb 61.4 Kb
Contig N50 85.8 Kb 92.6 Kb
Koren 2017
Contiguous assembly with longer contigs
Multiple individuals in DNA sample
http://canu.readthedocs.io/en/stable/
www.citrusgreening.org
PBJelly scaffolding
Canu assembly Scaffolded Assembly
v1.9
Number of contigs 7,832 8,352
Total bases 462.8 Mb 591.7 Mb
Longest 1.6 Mb 2 Mb
Shortest 4.4 Kb 1.5 Kb
Average length 59 Kb 70.8 Kb
Contig N50 85.8 Kb 115.8 Kb
5,290 gap extensions
535 gaps filled
Number of Ns: 0 bp
English 2012
www.citrusgreening.org
v1.91 v1.92
REFERENCE
v1.92
ALTERNATE
Number of
contigs
3,681 1,918 1,763
Total bases 596 Mb 513 Mb 83.4 Mb
Longest 4.2 Mb 4.2 Mb 760.6 Kb
Shortest 1.5 Kb 6 Kb 1.5 Kb
Average
length
162 Kb 267 Kb 47.3 Kb
Contig N50 620 Kb 755.7 Kb 75.1 Kb
Ns 5.1 Mb 4.6 Mb 467 Kb
500ng input DNA from single male psyllid
Duplicated contigs added to alternate assembly
https://github.com/Gabaldonlab/redundans
https://github.com/broadinstitute/pilon/wiki
Error correction
• DNA sequencing data
• RNA sequencing data
• Duplication removal
• Scaffolding
scaffolding
www.citrusgreening.org
Gene isoform sequencing (Iso-Seq)
Accurate gene models are
necessary for targeting assays
• Majority of genes are alternatively
spliced to produce multiple
transcript isoforms.
• Iso-Seq generates full-length cDNA
sequences (full-length transcripts
and gene isoforms).
Current MCOT (de novo and genome-based)
transcriptome is useful but fragmented
Korf 2013
www.citrusgreening.org
Sequencing full-length gene isoforms
www.citrusgreening.org
Mapping to D. citri genome
Isoforms mapped to D. citri
v1.92
Total isoforms: 314,275
Isoseq provides a comprehensive (de novo and genome-based)
transcriptome with full-length transcripts and a range of isoforms
Counts
Number of
genes
18,799
(30,562 in MCOT)
Number of
isoforms
61,086
Average
number of
isoforms/gene
3.24
N50 2.7 Kb
Longest 9 Kb
Shortest 100 bp
www.citrusgreening.org
Evaluating the assembly
Complete Fragmented Missing
Diaci 1.1 74.8% 0.3% 24.9%
Diaci 1.92 85.2% 0.1% 14.7%
Overall alignment
rate
Concordant
alignment rate
Diaci 1.1 82% 0.62%
Diaci 1.92 88% 60%
Benchmarking sets of Universal Single-Copy Orthologs based on a set of 3350 single-copy
orthologs from hemipteran species
Paired-end RNAseq
alignment
MCOT Isoseq
(full-length transcripts)
Diaci 1.1 1054 bp 470 bp
Diaci 1.92 1321 bp 699 bp
Average length of
aligned coding
sequence
NNN
www.citrusgreening.org
Improved genome and annotation will expedite
identification of targets for interdiction
13
Genome
Pacbio
v1.92
Annotation
Isoseq
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
Thank you!!
Utilizing system biology resources to decipher a tritrophic disease complex
Prashant Hosmani
Wednesday, 10:30 AM - 10:45 AM
Member Symposium: Applying Emerging Genomic Techniques to Control Invasive Species

Mais conteúdo relacionado

Mais de Surya Saha

Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Surya Saha
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Surya Saha
 

Mais de Surya Saha (20)

Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Sequencing
SequencingSequencing
Sequencing
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 
Sequencing: The Next Generation
Sequencing: The Next GenerationSequencing: The Next Generation
Sequencing: The Next Generation
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
 

Último

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Último (20)

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 

High quality arthropod genome assembly with single molecule reads and long-range scaffolding

  • 1. www.citrusgreening.org High quality arthropod genome assembly with single molecule reads and long-range scaffolding Prashant S Hosmani1, Mirella Flores-Gonzalez1, Wayne Hunter2, Lukas A. Mueller1, Susan Brown3, and Surya Saha1 1Boyce Thompson Institute; 2USDA-ARS U.S. Horticultural Research Laboratory; 3Kansas State University ss2489@cornell.edu @SahaSurya Entomology 2017 Advances in Arthropod Genomics Workshop
  • 2. www.citrusgreening.org Acknowledgements Mueller Lab Mirella Flores Prashant Hosmani Kansas State University Sue Brown Cornell University/BTI Michelle (Cilia) Heck USDA/ARS Wayne Hunter Robert Shatters University of California, Davis Carolyn Slupsky Indian River State College Tom D’elia
  • 3. www.citrusgreening.org Citrus Greening: Huanglongbing • Most significant disease of citrus worldwide • More than $4.5 billion in lost citrus production and more than 8,200 lost jobs (2006/07 to 2010/11) • Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas) • Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP) Annie Kruse
  • 4. www.citrusgreening.org Omics resources and databases are required for identification of targets for interdiction 4 Genome Annotation Target for interdiction molecules Pathway Databases Expression Networks ……. Host Vector Pathogen
  • 5. www.citrusgreening.org Genome Diaci1.1 Contigs 161,988 Total Length 485 Mb Longest 1 Mb Shortest 201bp Ns 19.3 Mb Scaffold N50: 109,898 bp Contig N50: 34,407bp Highly fragmented Many examples of misassemblies!! Current Illumina assembly http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html
  • 6. www.citrusgreening.org Pacbio assembly Error rate 0.013 Error rate 0.015 Number of contigs 7,832 8,030 Total bases 462.8 Mb 493.1 Mb Longest 1.6 Mb 1.7 Mb Shortest 4.4 Kbp 5 Kbp Average length 59.9 Kb 61.4 Kb Contig N50 85.8 Kb 92.6 Kb Koren 2017 Contiguous assembly with longer contigs Multiple individuals in DNA sample http://canu.readthedocs.io/en/stable/
  • 7. www.citrusgreening.org PBJelly scaffolding Canu assembly Scaffolded Assembly v1.9 Number of contigs 7,832 8,352 Total bases 462.8 Mb 591.7 Mb Longest 1.6 Mb 2 Mb Shortest 4.4 Kb 1.5 Kb Average length 59 Kb 70.8 Kb Contig N50 85.8 Kb 115.8 Kb 5,290 gap extensions 535 gaps filled Number of Ns: 0 bp English 2012
  • 8. www.citrusgreening.org v1.91 v1.92 REFERENCE v1.92 ALTERNATE Number of contigs 3,681 1,918 1,763 Total bases 596 Mb 513 Mb 83.4 Mb Longest 4.2 Mb 4.2 Mb 760.6 Kb Shortest 1.5 Kb 6 Kb 1.5 Kb Average length 162 Kb 267 Kb 47.3 Kb Contig N50 620 Kb 755.7 Kb 75.1 Kb Ns 5.1 Mb 4.6 Mb 467 Kb 500ng input DNA from single male psyllid Duplicated contigs added to alternate assembly https://github.com/Gabaldonlab/redundans https://github.com/broadinstitute/pilon/wiki Error correction • DNA sequencing data • RNA sequencing data • Duplication removal • Scaffolding scaffolding
  • 9. www.citrusgreening.org Gene isoform sequencing (Iso-Seq) Accurate gene models are necessary for targeting assays • Majority of genes are alternatively spliced to produce multiple transcript isoforms. • Iso-Seq generates full-length cDNA sequences (full-length transcripts and gene isoforms). Current MCOT (de novo and genome-based) transcriptome is useful but fragmented Korf 2013
  • 11. www.citrusgreening.org Mapping to D. citri genome Isoforms mapped to D. citri v1.92 Total isoforms: 314,275 Isoseq provides a comprehensive (de novo and genome-based) transcriptome with full-length transcripts and a range of isoforms Counts Number of genes 18,799 (30,562 in MCOT) Number of isoforms 61,086 Average number of isoforms/gene 3.24 N50 2.7 Kb Longest 9 Kb Shortest 100 bp
  • 12. www.citrusgreening.org Evaluating the assembly Complete Fragmented Missing Diaci 1.1 74.8% 0.3% 24.9% Diaci 1.92 85.2% 0.1% 14.7% Overall alignment rate Concordant alignment rate Diaci 1.1 82% 0.62% Diaci 1.92 88% 60% Benchmarking sets of Universal Single-Copy Orthologs based on a set of 3350 single-copy orthologs from hemipteran species Paired-end RNAseq alignment MCOT Isoseq (full-length transcripts) Diaci 1.1 1054 bp 470 bp Diaci 1.92 1321 bp 699 bp Average length of aligned coding sequence NNN
  • 13. www.citrusgreening.org Improved genome and annotation will expedite identification of targets for interdiction 13 Genome Pacbio v1.92 Annotation Isoseq Target for interdiction molecules Pathway Databases Expression Networks ……. Host Vector Pathogen
  • 14. www.citrusgreening.org Thank you!! Utilizing system biology resources to decipher a tritrophic disease complex Prashant Hosmani Wednesday, 10:30 AM - 10:45 AM Member Symposium: Applying Emerging Genomic Techniques to Control Invasive Species