SlideShare a Scribd company logo
1 of 43
How to sequence a large eukaryotic genomeand how we sequenced the cod genome Lex Nederbragt Norwegian High-Throughput Sequencing Centre (NSC) and Centre for Ecological and Evolutionary Synthesis (CEES)
What is a genome assembly? A hierarchical data structure that maps the sequence data to a putative reconstruction of the target  Miller et al 2010, Genomics 95 (6): 315-327
Hierarchical structure
Sequence data Reads http://www.cbcb.umd.edu/research/assembly_primer.shtml
Reads! http://www.sciencephoto.com/media/210915/enlarge
Contigs Building contigs
Contigs Building contigs Repeat copy 1 Repeat copy 2 Contig orienation? Contig order? Collapsed repeat consensus  http://www.cbcb.umd.edu/research/assembly_primer.shtml
Mate pairs Other read type Repeat copy 1 Repeat copy 2 (much) longer fragments mate pair reads
Scaffolds Ordered, oriented contigs mate pairs contigs gap size estimate
Hierarchical structure
Algorithms All are graph-based Read 2 Read 1 Overlap Graph-theory!
Algorithms Hamiltonian path a path that contains all the nodes http://www.cbcb.umd.edu/research/assembly_primer.shtml
Algorithms Overlap calculation (alignment) computationally intensive Read 2 Read 1 Overlap
Algorithms Path through the graph contig Read 2 Read 3 Read 4 Read 1 Overlap Overlap Overlap
Greedy extension Oldest http://www.cbcb.umd.edu/research/assembly_primer.shtml
Overlap-Layout-Consensus Typical for Sanger-type reads also used by newbler from 454 Life Sciences Steps Overlap computation Layout: graph simplification Consensus: sequence
Overlap-Layout-Consensus Overlap phase: K-mer seeds initiate overlap ACGCGATTCAGGTTACCACG
de Bruijn graphs Developed outside of DNA-related work Best solution for very short reads   ≤100 nt GACCTACA GAC  ACC   CCT    CTA     TAC      ACA Read de Bruijn graph K-mers (K=3) K-1 bases overlap
Graphs Schatz M C et al. Genome Res. 2010;20:1165-1173
Graphs Simplify the graph Add scaffolding information
Sequence data Sequencing errors add complexity to graph create new k-mers Correction of errors k-mer frequency Kelley et al.Genome Biology 2010 11:R116
How to sequence a genome human	1990's cod 1		2009 - 2011 cod 2		 2011 - 2012
Human genome Public effort BAC-by-BAC sequencing hierarchical shotgun sequencing Genome BACs Select BACs 100-150 kb  shotgun sequencing http://www.cbcb.umd.edu/research/assembly_primer.shtml
Human genome Celera: shotgun sequencing entire genome shotgun use of mate pairs
How to sequence a genome    Preparations BAC-by-BAC Add shotgun and mate pairs
The cod genome project Preparations * From a different individual
Cod: strategy ‘454 only’ NO subcloning Pure ‘shotgun’ approach 454 specific paired end libraries Supplementary BAC ends using Sanger sequencing
Cod: sequencing
Cod: assembly Input for assembly 84 million reads 28 billion bases (Gb) 34x coverage Assembly program Newbler from 454 Celera from Venter Inst. Computing nodes 24 cpus 128 GB of memory
Cod: assembly 611 Mb in 6 467 scaffolds but 35% gap bases short contigs incomplete genes
Cod: gaps Polymorphiccontig 2 Heterozygosity Contig 4 Contig 1 Polymorphiccontig 3 Short Tandem Repeats ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA
Cod: annotation Ensembl 'repair' genes based on stickleback sequence ~22 000 genes http://pre.ensembl.org/Gadus_morhua/
Cod 2: 2011-2012 Close the gaps increase contig size Pseudochromosomes genetic linkage map scaffolds to 'chromosomes' anchoring ordering and orienting
Cod 2: strategy New data Illumina reads longer 454 reads ~700 bases PacBio reads? Improved programs newbler New programs assembly gap closing
Many programs to choose from
Assembly competitions Assemblathon 1 simulated datasets ALLPATHS_LG – Broad Institute MIT (US) Soapdenovo – BGI (China) SGA – Sanger Institute (UK)
Assembly competitions Assemblathon 2 real datasets snake – Illumina only cichlid fish – Illumina only parrot Illumina 454 FLX+ PacBio http://assemblathon.org/
How to sequence a genome In 2011 Cheap alternative: RAD-tag sequencing
How to sequence a genome Foundation of Illumina data 100x coverage Paired End reads (2x100bp) several Mate Pair libraries 2kb, 3kb, 8k, 10kb, bigger? this is now very cheap! Fill gaps with long reads 454 or PacBio
How to sequence a genome Add lots of bioinformatics... http://cores.montana.edu/index.php?page=bioinformatics-core-facility
Thank you! lex.nederbragt@bio.uio.no www.sequencing.uio.no www.sequencing.uio.no

More Related Content

What's hot

Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basicsUSD Bioinformatics
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Hamza Khan
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) Newborn Screening KW
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 

What's hot (20)

Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018)
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 

Similar to How to sequence a large eukaryotic genome

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfATPowr
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copyPradeep Kumar
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Integrated DNA Technologies
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets? ehsan sepahi
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 

Similar to How to sequence a large eukaryotic genome (20)

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
De Novo
De NovoDe Novo
De Novo
 
Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdf
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
poster
posterposter
poster
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 

More from Lex Nederbragt

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraLex Nederbragt
 
Why of version control
Why of version controlWhy of version control
Why of version controlLex Nederbragt
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and afterLex Nederbragt
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioLex Nederbragt
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Lex Nederbragt
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...Lex Nederbragt
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Lex Nederbragt
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use bloggingLex Nederbragt
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomesLex Nederbragt
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data Lex Nederbragt
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 

More from Lex Nederbragt (12)

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Why of version control
Why of version controlWhy of version control
Why of version control
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

How to sequence a large eukaryotic genome

  • 1. How to sequence a large eukaryotic genomeand how we sequenced the cod genome Lex Nederbragt Norwegian High-Throughput Sequencing Centre (NSC) and Centre for Ecological and Evolutionary Synthesis (CEES)
  • 2.
  • 3. What is a genome assembly? A hierarchical data structure that maps the sequence data to a putative reconstruction of the target Miller et al 2010, Genomics 95 (6): 315-327
  • 5. Sequence data Reads http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 8. Contigs Building contigs Repeat copy 1 Repeat copy 2 Contig orienation? Contig order? Collapsed repeat consensus http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 9. Mate pairs Other read type Repeat copy 1 Repeat copy 2 (much) longer fragments mate pair reads
  • 10. Scaffolds Ordered, oriented contigs mate pairs contigs gap size estimate
  • 12. Algorithms All are graph-based Read 2 Read 1 Overlap Graph-theory!
  • 13. Algorithms Hamiltonian path a path that contains all the nodes http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 14. Algorithms Overlap calculation (alignment) computationally intensive Read 2 Read 1 Overlap
  • 15. Algorithms Path through the graph contig Read 2 Read 3 Read 4 Read 1 Overlap Overlap Overlap
  • 16. Greedy extension Oldest http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 17. Overlap-Layout-Consensus Typical for Sanger-type reads also used by newbler from 454 Life Sciences Steps Overlap computation Layout: graph simplification Consensus: sequence
  • 18. Overlap-Layout-Consensus Overlap phase: K-mer seeds initiate overlap ACGCGATTCAGGTTACCACG
  • 19. de Bruijn graphs Developed outside of DNA-related work Best solution for very short reads ≤100 nt GACCTACA GAC ACC CCT CTA TAC ACA Read de Bruijn graph K-mers (K=3) K-1 bases overlap
  • 20. Graphs Schatz M C et al. Genome Res. 2010;20:1165-1173
  • 21. Graphs Simplify the graph Add scaffolding information
  • 22. Sequence data Sequencing errors add complexity to graph create new k-mers Correction of errors k-mer frequency Kelley et al.Genome Biology 2010 11:R116
  • 23. How to sequence a genome human 1990's cod 1 2009 - 2011 cod 2  2011 - 2012
  • 24. Human genome Public effort BAC-by-BAC sequencing hierarchical shotgun sequencing Genome BACs Select BACs 100-150 kb shotgun sequencing http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 25. Human genome Celera: shotgun sequencing entire genome shotgun use of mate pairs
  • 26. How to sequence a genome Preparations BAC-by-BAC Add shotgun and mate pairs
  • 27. The cod genome project Preparations * From a different individual
  • 28. Cod: strategy ‘454 only’ NO subcloning Pure ‘shotgun’ approach 454 specific paired end libraries Supplementary BAC ends using Sanger sequencing
  • 30. Cod: assembly Input for assembly 84 million reads 28 billion bases (Gb) 34x coverage Assembly program Newbler from 454 Celera from Venter Inst. Computing nodes 24 cpus 128 GB of memory
  • 31. Cod: assembly 611 Mb in 6 467 scaffolds but 35% gap bases short contigs incomplete genes
  • 32. Cod: gaps Polymorphiccontig 2 Heterozygosity Contig 4 Contig 1 Polymorphiccontig 3 Short Tandem Repeats ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA
  • 33. Cod: annotation Ensembl 'repair' genes based on stickleback sequence ~22 000 genes http://pre.ensembl.org/Gadus_morhua/
  • 34.
  • 35. Cod 2: 2011-2012 Close the gaps increase contig size Pseudochromosomes genetic linkage map scaffolds to 'chromosomes' anchoring ordering and orienting
  • 36. Cod 2: strategy New data Illumina reads longer 454 reads ~700 bases PacBio reads? Improved programs newbler New programs assembly gap closing
  • 37. Many programs to choose from
  • 38. Assembly competitions Assemblathon 1 simulated datasets ALLPATHS_LG – Broad Institute MIT (US) Soapdenovo – BGI (China) SGA – Sanger Institute (UK)
  • 39. Assembly competitions Assemblathon 2 real datasets snake – Illumina only cichlid fish – Illumina only parrot Illumina 454 FLX+ PacBio http://assemblathon.org/
  • 40. How to sequence a genome In 2011 Cheap alternative: RAD-tag sequencing
  • 41. How to sequence a genome Foundation of Illumina data 100x coverage Paired End reads (2x100bp) several Mate Pair libraries 2kb, 3kb, 8k, 10kb, bigger? this is now very cheap! Fill gaps with long reads 454 or PacBio
  • 42. How to sequence a genome Add lots of bioinformatics... http://cores.montana.edu/index.php?page=bioinformatics-core-facility
  • 43. Thank you! lex.nederbragt@bio.uio.no www.sequencing.uio.no www.sequencing.uio.no

Editor's Notes

  1. Greedy assemblers - The first assembly programs followed a simple but effective strategy in which the assembler greedily joins together the reads that are most similar to each other.  An example is shown in Figure 8, where the assembler joins, in order,  reads 1 and 2 (overlap = 200 bp), then reads 3 and 4 (overlap  = 150 bp), then reads 2 and 3 (overlap = 50 bp) thereby creating a single contig from the four reads provided in the input.  One disadvantage of the simple greedy approach is that because local information is considered at each step, the assembler can be easily confused by complex repeats, leading to mis-assemblies.
  2. BAC-by-BAC approach.  The long lines represent individual BACs.  The minimal tiling path is represented by thick lines.  Each BAC in the tiling path is then sequenced through the shotgun method.