Pests of mustard_Identification_Management_Dr.UPR.pdf
Telomere-to-telomere assembly of a complete human X chromosome
1. Adam M. Phillippy
Head, Genome Informatics Section
Telomere-to-telomere assembly of a
complete human X chromosome
AGBT – March 2, 2019
2. • The human reference genome is incomplete
• 368 unresolved issues, 102 gaps
• Segmental duplications, rDNAs
• Centromeres, telomeres, heterochromatin
• These gaps contain important information
• Missing reference sequence leads to analysis artifacts
• Variation in these gaps is unexplored (e.g. rDNAs)
• We don’t know what we don’t know…
I have some troubling news…
4. • Repeats are long, reads are short
• “If the overlap is of sufficient length to distinguish
it from being a repeat in the sequence the two
sequences must be contiguous.”
• Rodger Staden, 1979, MRC Laboratory of Molecular Biology
What’s the problem?
5. • The return of closed (bacterial) genomes
• Bibersteinia trehalosi 192
Flashback to AGBT 2012
6. • How long are the repeats?
• 7 kbp LINEs
• 1 Mbp+ rDNA arrays
• 1 Mbp+ centromere arrays
• 10 Mbp+ heterochromatin blocks
• Coverage and accuracy matter too
• 1,000X of 100 bp reads at 100% accuracy? NO
• 10X of 10,000,000 bp reads at 100% accuracy, YES
• 100X of 100,000 bp reads at 90% accuracy, MAYBE?
How long do reads need to be, for human?
7. • ONT R9 pore: E. coli CsgG membrane protein
• Read lengths >1 Mbp possible
Ultra-long nanopore sequencing
*Assuming 3.4 Å per bp, 1 Mbp = 3,400,000 Å (0.34 mm) = 40,000x height of the pore
120 Å
85 Å
3.2 km in 37 m
8 cm
8. LONG READ CLUB
Really very long reads indeed
@pathogenomenick
Nick Loman
@mattloose
Matt Loose
9. It’s time to finish the human genome
CHM13 cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton, Stowers (N=46; XX)
The Telomere-to-Telomere (T2T) consortium is an
open, community-based effort to generate the
first complete assembly of a human genome.
10. • 30x Nanopore ultra-long
• Contig building
• 60x PacBio
• Polishing
• 50x 10x Genomics
• Polishing
• BioNano
• Structural validation
We need long reads. Lots of long reads
100 kb
11. • Nanopore UL read length distribution is long tailed
It pays to go deep
repeat
12. • From May 1 – October 29, 2018
• 62 MinION/GridION flow cells
• 8.9M reads, 98 Gb, 1.6 Gb / cell
• N50 read length 76 kb
• 44 Gb in reads >100 kb
• Max read length 1.03 Mb
• Assembled with Canu
• 10x cov of 100 kb at 90% acc
CHM13 sequencing
Now upwards of 90 flow cells and counting…
23. • Almost!
• Have proven it’s possible for the X chromosome
• T2T assembly of all chrs within the next 2 years
• Remaining challenges
• Satellite arrays, rDNA arrays, segmental duplications
• Nanopore consensus quality
• Targeted long-read sequencing
• Better methods for phasing repeats and haplotypes
Are we there yet?
24. • github.com/nanopore-wgs-consortium/chm13
• Draft whole-genome assemblies
• Nanopore ultra-long reads
• 10x Genomics reads
• BioNano DLS (WashU)
• PacBio (SRA)
• Coming soon:
• Hi-C (Arima Genomics)
All our CHM13 data is openly released
25. NHGRI
• Sergey Koren
• Arang Rhie
• Jim Mullikin
• Alice Young
• Shelise Brooks
• Valerie Maduro
• Gerard Bouffard
• Sofia Barreira
• Andy Baxevanis
• Nancy Hansen
• Karen Miga, UCSC
• Jennifer Gerton, Stowers
• Tamara Potapova, Stowers
• Tina Graves Lindsay, WashU
• Ira Hall, WashU
• Valerie Schneider, NCBI
• Kerstin Howe, Sanger
• Jo Wood, Sanger
• Matt Loose, Nottingham
• Nick Loman, Birmingham
• Urvashi Surti, Pitt (ret.)
Acknowledgements