Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Ashg2017 workshop schneider
1. GRC/GIAB Workshop:
Getting the Most from the Reference
Assembly and Reference Materials
Oct 17: 1-4 pm
Valerie Schneider (NCBI): GRCh38 assembly basics and updates
Tina Lindsay (MGI): Reference-grade human assemblies
Karen Miga (UCSC): Centromere assemblies
BREAK (15 min)
Benedict Paten (UCSC): Building human variation graphs
Fritz Sedlazeck (BCM): Structural Variation Characterization Across the Human Genome and Populations
Justin Zook (NIST): GIAB benchmarks for difficult variants
2. GRCh38 assembly basics and updates
Valerie Schneider, Ph.D.
NCBI
17 October 2017
https://genomereference.org
9. HuRef
SOAPdenovo
NA12878
ALLPATHS
NA12878
Lander and Waterman
(1988) Genomics
SequencedNot sequenced
1X Coverage
5X Coverage
10X Coverage
37% 63%
0.6% 99.4%
0.005% 99.995%
The likelihood a base is seq’d.Coverage
Contig N50
MHAP
CHM1
Chaisson and Eichler (2015), with modification
Measure of contiguity. Half of the assembly
is in contigs this length or greater.
Reference Assembly Basics
AK1
HX1
NA12878_prelim
10. Why all this matters:
Longer haplotype blocks
Fewer collapsed repeats & segmental duplications
Better annotation
More robust mapping target
Reference Assembly Basics
11. Today’s reference assembly does not represent:
1.The most common allele/haplotype
2.The longest allele/haplotype
3.The ancestral allele/haplotype
It represents the sequence available from the HGP
Reference Assembly Basics
13. Sequences from haplotype 1
Sequences from haplotype 2
Reference Assembly Basics
Original assembly model:
compress into a consensus
false
gap
chromosome
Current assembly model:
represent both haplotypes
alt loci scaffold
chromosomemany
Gene1 Gene2
Sample
Gene2
Gene1
chromosome
alt scaffold
Reference
14. GRCh38 (Dec. 2013)
• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
• Average alt length = 400 kb, max = ~5 Mb
• >150 genes only represented on alt loci
Reference Assembly Basics
15. Reference Assembly Basics
• Closed gaps
• Targeted base fixes
• Corrected path errors
• Addition of missing paralogs
• Better representation of variation
• Better annotation
• Modeled centromeres
• Genome Research 27(5):849-864
(2017)
• PubMed: 28396521
GRCh38
• Changed coordinates
• Remapping challenges
• Alt Loci Usability
• Allelic duplication/Aligners
• Reporting multiple locations
• Variant analysis
• Clinical validation
2016
Growth in SRA submission
over prior year
GRCh38
GRCh37
21. GRCh38 Updates
• Ideals:
• Chromosome context for any
common human sequence >500 bp
• Unambiguous data interpretation at
all clinically relevant loci
• No systematic error/bias in
genome-wide analyses
• Real-World:
• Community interest
• Resources for curation
• GRCh39
• Substantial added value
• User must-haves
31. Credits
GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• Karyn Meltz Steinberg
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
GRC
Tina Graves-Lindsay
Tayebeh Rezaie
Kerstin Howe
Richard Durbin
Paul Flicek
Laura Clarke
Deanna Church
Curators!
Developers!