SlideShare uma empresa Scribd logo
1 de 16
De novo assembly, a
multi-technology approach:
Illumina, PacBio, and OpGen
PhD. Francesco Vezzi
Senior Bioinformatician, NGI-Stockholm
Both Stockholm and Uppsala nodes
Illumina HiSeq 2000/2500 16
Illumina MiSeq 3
Life Technologies SOLiD 5500xl 4
Life Technologies SOLiD 5500wildfire 2
Life Technologies Ion Torrent 2
Life Technologies Ion Proton 6
Life Technologies Sanger ABI3730 2
Pacific Biosciences RSII 1
Argus Whole Genome Mapping System 1
One of 3 best-equipped sequencing sites in Europe
In this talk
Illumina (Stockholm):
• 100/150 bp paired reads (low error rate)
• 900/200 Gbp in 6/2 day(s)
PacBio (Uppsala):
• 8.5 Kbp reads, (max 30Kbp, high error rate)
• 375 Mbp (1 SMRT Cell) in 10 hours
OpGen Argus System (Stockholm):
• ~300 Kbp maps
• 10 Gbp in ~1 day
Optical Maps
• Restriction Map
◦ Representation of the cut sites on a
given DNA molecule to provide spatial
information of genetic loci
• An enzyme is selected and used
to cut the molecules. This
provides a 2D representation of
the molecule structure
Optical Maps: workflow
DNA extraction directly
from culture
Quality control of
extracted material
Prepare a chip
Run Argus System
Data assembly
StepsTime
3-8h
1h
1.5h
1h
2-8h
Notes
Closing genomes with Optical Maps
De novo reconstructs parts
missing in the reference strain
Correctly assembles long tandem
repeats
De Novo assembly
(Illumina, PacBio)
Set of un-ordered and
not oriented contigs
Optical Map
Contigs
Case Study: Combing all the technologies
~15 Mbp genome sequenced at High Coverage with:
• Illumina HiSeq:
• 500X PE libraries (180bp and 650bp insert)
• 150X MP library (3Kbp)
• 150X MP library (7Kbp)
• PacBio
• 50/60X with reads longer than 2Kbp
• OpGen
• 3 chips (only one worked really well)
• 300X coverage
• Average map length 320Kbp
Assembly Strategy
https://github.com/vezzi/de_novo_scilife
Semi-automated pipeline for de novo assembly:
• Global configuration file  tools and system configuration
• Sample configuration file  samples description
3 modules:
1. QC-module (Illumina only):
• Adaptor removal, kmer-analysis, fastqc, (insert size estimation)
2. Assemble-module (Illumina only):
• Runs specified assemblers and outputs executed commands
3. Validation-module:
• FRCbam, coverage analysis, GC-analysis, (N50)
I NEED USERS/FEEDBACK/CONTIRBUTIONS
QC-Module
Kmer analysis:
• Samples complexity
• Error rate
• Heterozygosity
0 1000 2000 3000 4000 5000 6000
05000100001500020000
Insert Size Histogram for All_Reads
in file lib_3000.bam
Insert Size
Count FR
RF
TANDEM
FASTQC
Adaptor removal
Alignment (partial assembly)
Assemble-Module
Illumina only:
• SOAPdenovo
• MaSuRCA
• Allpaths-LG
PacBio only:
• HGAP
• CABOG
Hybrid:
• PB-jelly (HAH)
>5000
#scaffolds totalLength maxContigLength N50 N80 percentageNs
Allpaths-LG 227 14513103 596012 139364 57619 15%
MASURCA 163 18549484 1188669 526519 282507 2%
HGAP 290 14399273 763592 142483 37117 0%
PB-Jelly 179 14718213 747750 195225 85127 13%
• Try-and-fail process
• Automated pipeline developed in order to
streamline these analysis
• MASURCA surprisingly the “best” assembler
MaSuRCA HGAP PB-Jelly (HAH)
Validation-Module
FRCbam
Validation-Module
PacBio-only assembly is
clearly outperforming
the others
Optical Maps
PacBio produces the best assembly however 290 contigs contigs are produced.
Optical Maps allowed to obtain
the 2D representation of the 7
chromosomes.
N.B. chromosome number was
one of the biological questions of
this project!!!
But much more can be done!!!
Incredible tool to finish (or almost finish) genomes
% contigs placed
Total size of placed
contigs
% size placed
contigs
% genome
covered
pacBio+OpGene 94.12 11578995 97% 77.05
Allpaths+OpGene 71.88 10692027 84% 52.88
Allpaths+Masurca+Opgene 80.65 27506424 92% 69.64
Allpaths+PacBio+Opgene 82.32 22271022 91% 83.05
Masurca+PacBio+pgene 94.44 28393392 98% 83.79
Allpaths+Masurca+PacBio+Opgene 85.42 39085419 94% 87.39
Combing all the technologies
Conclusions – Take home message
Attempt to automate de novo assembly process:
• https://github.com/vezzi/de_novo_scilife
• Not 100% automated
Illumina, PacBio, Hybrid assemblies:
• PacBio alone seems to produce the best assemblers
• Hybrid assembly seems to not be able to correct merged-assembly
problems
Mixing technologies is always a good idea:
• Possibility to compensate technological biases
• Allows to produce better assemblies
Thanks
https://github.com/vezzi/de_novo_scilife

Mais conteúdo relacionado

Mais procurados

IGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.pptIGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.ppt
grssieee
 
postertemplate_plc_v36_final2
postertemplate_plc_v36_final2postertemplate_plc_v36_final2
postertemplate_plc_v36_final2
Patrick Cavins
 
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
grssieee
 
LAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FINLAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FIN
Tommy Lam
 

Mais procurados (18)

Behalf Of Pamela Collaboration
Behalf Of Pamela CollaborationBehalf Of Pamela Collaboration
Behalf Of Pamela Collaboration
 
SkySweeper: A High Wire Robot
SkySweeper: A High Wire RobotSkySweeper: A High Wire Robot
SkySweeper: A High Wire Robot
 
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
 
FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境
 
Track Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 TriggerTrack Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 Trigger
 
IGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.pptIGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.ppt
 
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
cReComp : Automated Design Tool  for ROS-Compliant FPGA Component cReComp : Automated Design Tool  for ROS-Compliant FPGA Component
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
 
Uav flight control system with ins gps
Uav flight control system with ins gpsUav flight control system with ins gps
Uav flight control system with ins gps
 
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
 
Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...
 
Review regional Source Specific Station Corrections (SSSCs) developed for no...
Review regional Source Specific Station Corrections (SSSCs) developed for  no...Review regional Source Specific Station Corrections (SSSCs) developed for  no...
Review regional Source Specific Station Corrections (SSSCs) developed for no...
 
Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.
 
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
 
postertemplate_plc_v36_final2
postertemplate_plc_v36_final2postertemplate_plc_v36_final2
postertemplate_plc_v36_final2
 
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
 
High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 
LAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FINLAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FIN
 

Semelhante a SeRC: de novo assembly workshop. Francesco Vezzi

Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)
kike2005
 
Advanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystalsAdvanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystals
IAEME Publication
 
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical CommunicationsAdvanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
CPqD
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUME
amar kanteti
 
Optical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and FutureOptical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and Future
CPqD
 

Semelhante a SeRC: de novo assembly workshop. Francesco Vezzi (20)

Integrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product updateIntegrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product update
 
ThesisPresentation_Upd
ThesisPresentation_UpdThesisPresentation_Upd
ThesisPresentation_Upd
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
Advanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystalsAdvanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystals
 
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical CommunicationsAdvanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUME
 
IEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper PresentationIEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper Presentation
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Optical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and FutureOptical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and Future
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Huawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdfHuawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdf
 
Resume201411
Resume201411Resume201411
Resume201411
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Towards Terabit per Second Optical Networking
Towards Terabit per Second Optical NetworkingTowards Terabit per Second Optical Networking
Towards Terabit per Second Optical Networking
 
LTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic PrincipleLTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic Principle
 
6600ingles
6600ingles6600ingles
6600ingles
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Parameters for drive test
Parameters for drive testParameters for drive test
Parameters for drive test
 

Último

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Último (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 

SeRC: de novo assembly workshop. Francesco Vezzi

  • 1. De novo assembly, a multi-technology approach: Illumina, PacBio, and OpGen PhD. Francesco Vezzi Senior Bioinformatician, NGI-Stockholm
  • 2. Both Stockholm and Uppsala nodes Illumina HiSeq 2000/2500 16 Illumina MiSeq 3 Life Technologies SOLiD 5500xl 4 Life Technologies SOLiD 5500wildfire 2 Life Technologies Ion Torrent 2 Life Technologies Ion Proton 6 Life Technologies Sanger ABI3730 2 Pacific Biosciences RSII 1 Argus Whole Genome Mapping System 1 One of 3 best-equipped sequencing sites in Europe
  • 3. In this talk Illumina (Stockholm): • 100/150 bp paired reads (low error rate) • 900/200 Gbp in 6/2 day(s) PacBio (Uppsala): • 8.5 Kbp reads, (max 30Kbp, high error rate) • 375 Mbp (1 SMRT Cell) in 10 hours OpGen Argus System (Stockholm): • ~300 Kbp maps • 10 Gbp in ~1 day
  • 4. Optical Maps • Restriction Map ◦ Representation of the cut sites on a given DNA molecule to provide spatial information of genetic loci • An enzyme is selected and used to cut the molecules. This provides a 2D representation of the molecule structure
  • 5. Optical Maps: workflow DNA extraction directly from culture Quality control of extracted material Prepare a chip Run Argus System Data assembly StepsTime 3-8h 1h 1.5h 1h 2-8h Notes
  • 6. Closing genomes with Optical Maps De novo reconstructs parts missing in the reference strain Correctly assembles long tandem repeats De Novo assembly (Illumina, PacBio) Set of un-ordered and not oriented contigs Optical Map Contigs
  • 7. Case Study: Combing all the technologies ~15 Mbp genome sequenced at High Coverage with: • Illumina HiSeq: • 500X PE libraries (180bp and 650bp insert) • 150X MP library (3Kbp) • 150X MP library (7Kbp) • PacBio • 50/60X with reads longer than 2Kbp • OpGen • 3 chips (only one worked really well) • 300X coverage • Average map length 320Kbp
  • 8. Assembly Strategy https://github.com/vezzi/de_novo_scilife Semi-automated pipeline for de novo assembly: • Global configuration file  tools and system configuration • Sample configuration file  samples description 3 modules: 1. QC-module (Illumina only): • Adaptor removal, kmer-analysis, fastqc, (insert size estimation) 2. Assemble-module (Illumina only): • Runs specified assemblers and outputs executed commands 3. Validation-module: • FRCbam, coverage analysis, GC-analysis, (N50) I NEED USERS/FEEDBACK/CONTIRBUTIONS
  • 9. QC-Module Kmer analysis: • Samples complexity • Error rate • Heterozygosity 0 1000 2000 3000 4000 5000 6000 05000100001500020000 Insert Size Histogram for All_Reads in file lib_3000.bam Insert Size Count FR RF TANDEM FASTQC Adaptor removal Alignment (partial assembly)
  • 10. Assemble-Module Illumina only: • SOAPdenovo • MaSuRCA • Allpaths-LG PacBio only: • HGAP • CABOG Hybrid: • PB-jelly (HAH) >5000 #scaffolds totalLength maxContigLength N50 N80 percentageNs Allpaths-LG 227 14513103 596012 139364 57619 15% MASURCA 163 18549484 1188669 526519 282507 2% HGAP 290 14399273 763592 142483 37117 0% PB-Jelly 179 14718213 747750 195225 85127 13% • Try-and-fail process • Automated pipeline developed in order to streamline these analysis • MASURCA surprisingly the “best” assembler
  • 11. MaSuRCA HGAP PB-Jelly (HAH) Validation-Module
  • 13. Optical Maps PacBio produces the best assembly however 290 contigs contigs are produced. Optical Maps allowed to obtain the 2D representation of the 7 chromosomes. N.B. chromosome number was one of the biological questions of this project!!! But much more can be done!!!
  • 14. Incredible tool to finish (or almost finish) genomes % contigs placed Total size of placed contigs % size placed contigs % genome covered pacBio+OpGene 94.12 11578995 97% 77.05 Allpaths+OpGene 71.88 10692027 84% 52.88 Allpaths+Masurca+Opgene 80.65 27506424 92% 69.64 Allpaths+PacBio+Opgene 82.32 22271022 91% 83.05 Masurca+PacBio+pgene 94.44 28393392 98% 83.79 Allpaths+Masurca+PacBio+Opgene 85.42 39085419 94% 87.39 Combing all the technologies
  • 15. Conclusions – Take home message Attempt to automate de novo assembly process: • https://github.com/vezzi/de_novo_scilife • Not 100% automated Illumina, PacBio, Hybrid assemblies: • PacBio alone seems to produce the best assemblers • Hybrid assembly seems to not be able to correct merged-assembly problems Mixing technologies is always a good idea: • Possibility to compensate technological biases • Allows to produce better assemblies