1. Laboratory Aspects of Generating
High Quality Assemblies
MGI Reference Genomes Workshop
Bob Fulton
February 13th 2017
2. Primary Objectives
• Develop Tools and Techniques to Provide High Quality, Haplo-resolved
Genome Assemblies Sampling and Capturing as Much Human Diversity
as Possible
3. Sequencing Strategy for Reference Genomes
• PacBio Large Insert Library Construction
• Linked Reads with 10X Genomics
• Validation Using BioNano Physical Map
5. PacBio WGS Library Construction
• High Molecular Weight Genomic DNA
• DNA must be of sufficient quality to allow for 50 kb shearing to
produce PacBio Continuous Long Reads (CLR)
• Consistent Shearing 50 kb
• Preferred method: Diagenode Megaruptor
• Fragment size setting – 50kb
• Working on 3 Methods for Library Construction
• PacBio SMRTbell – Current Standard PacBio SMRTbell Template Prep
Kit 1.0 and SMRTbell Damage Repair Kit
• Hybrid Library– Swift Accel-NGS XL Library Prep Kit but exchanging
PacBio Damage Repair Kit
• Swift Library - Swift Accel-NGS XL Library Prep Kit Including Swift
DNA Repair Enzymes
• New Data Recently Available with New Repair Process
6. HG02818 Library Preparation and Sequencing
• Three library reactions(15ug) each of HG02818 were processed using the
PacBio SMRTbell, Hybrid, and Swift library preps.
• Library recoveries leading into BluePippin size selection for the Hybrid and
Swift methods were double the PacBio library prep.
• All libraries were size selected on the BP at 20Kb-50Kb..
• The PacBio SMRTbell library generated over a Gb of data for the first two
SMRT cells. Additional SMRT cells produced less data as the library appeared to
degrade.
Library Method Library Recovery
Pre-BP
ROI Read Length
PacBio SMRTbell 35.8% (5.3ug) 12178
Hybrid 68.8% (10.3ug) 13511
Swift 70.9% (10.6ug) 10232
7. HG02818 Library Preparation and Sequencing
0
200
400
600
800
1000
1200
1400
1600
1800
11/6/16 11/11/16 11/16/16 11/21/16 11/26/16 12/1/16 12/6/16
PacBio SMRTbell
Hybrid
Swift
Date of PacBio RSII Sequencing Run
ReadofInsertMbasesperSMRTcell
8. Subread Length Comparisons - HG02818
SMRTbell Library
• Mean Subread Length: 11,391 bp
• N50 Subread Length: 17,007 bp
Hybrid Libraries
• Mean Subread Length: 13,406 bp
• N50 Subread Length: 18,649 bp
9. Subread Length Comparisons - HG02818
Swift Library
• Mean Subread Length: 10,163 bp
• N50 Subread Length: 15,220 bp
E. Coli New Swift Only Kit
• Mean Subread Length:
16,387 bp
• N50 Subread Length:
22,625 bp
10. Agilent Tape Station Assessment of Library Size
PacBio SMRTbell No BluePippin Size Selection
11. Agilent Tape Station Assessment of Library Size
PacBio SMRTbell 6Kb-50Kb BluePippin Size Selection
12. Agilent Tape Station Assessment of Library Size
Hybrid Prep Pre-BluePippin Size Selection
13. Agilent Tape Station Assessment of Library Size
PacBio SMRTbell 8Kb-50Kb BluePippin Size Selection
14. Agilent Tape Station Assessment of Library Size
Hybrid Prep 18Kb-50Kb BluePippin Size Selection
16. 10X Genomics
• Chromium Instrument
• Long Range Linking Information on a Genome Wide
Scale
• Phasing Information Across a Genome
• Enhanced Variant Calling and Structural Variation
Detection
• DeNovo Assembly of Diploid Genomes
• Both WGS and Targeted Approaches
32. BioNano Map
A CG TG T
Nick Sites
Indicates Flipped Loop of Inverted Repeat
33. Future Plans
• Refine Existing Platforms
• Longer Linking
• Longer Sequences
• Cost Reductions
• Investigate New Platforms
• PacBio Sequel
• Oxford Nanopore
• Investigate New Techniques
• Hybridization of Long Linked Reads in Lieu of Large Insert Clones to
Capture Allelic Diversity Across as Many Humans as Possible
34. Summary
• Goal: Generate Robust Data Sets for Additional High-
quality Reference Genome Enhancing the Full Range of
Genetic Diversity in Humans
• These Long Read (Long Range) Sequencing/Mapping
Applications Provide Orthogonal Synergistic Data Sets to
Help Accomplish Our Goal.
• Each System Possesses Unique Challenges and Requires
Optimization of Protocols and Running Conditions Specific
to Our Needs.
• Experience and Communication is Key.
(Magrini)
35. Acknowledgements
The McDonnell Genome Institute at
Washington University in St. Louis
Tina Graves
Amy Ly
Lisa Cook
Catrina Fronick
Karyn Meltz Steinberg
Wes Warren
Chad Tomlinson
Eddie Belter
Susan Dutcher
10X Genomics
Deanna Church
Michael Chase
BioNano Genomics
Alex Hastie
Pacific Biosciences Nick Sisneros
Laura Nolden
Nationwide Children’s Hospital
Rick Wilson
Vince Magrini
Sean McGrath
NCBI
Valerie Schneider