SlideShare a Scribd company logo
1 of 47
Characterizing extreme diversity in
the human genome using a single
haplotype genomic resource
Karyn Meltz Steinberg, Ph.D.
AGBT 2015 GRC Workshop
@KMS_Meltzy
1 bp 1 chr
Frequency
SNP
Trisomy
monosomy
Copy number
variants
Size of variant
1 kb 1 Mb
Types of genetic variants
Slide courtesy of S. Girirajan
Human Genetic Variation
1 bp 1 chr
Frequency
SNP
Trisomy
monosomy
Copy number
variants
Size of variant
1 kb 1 Mb
Types of genetic variants
1 bp 1 chr
Throughput
1 kb 1 Mb
Size of variant
How do we assay them?
Slide courtesy of S. Girirajan
Human Genetic Variation
1 bp 1 chr
Frequency
SNP
Trisomy
monosomy
Copy number
variants
Size of variant
1 kb 1 Mb
Types of genetic variants
SNP genotyping
1 bp 1 chr
Throughput
1 kb 1 Mb
Size of variant
How do we assay them?
Slide courtesy of S. Girirajan
Human Genetic Variation
1 bp 1 chr
Frequency
SNP
Trisomy
monosomy
Copy number
variants
Size of variant
1 kb 1 Mb
Types of genetic variants
Array-CGH
Karyotyping
SNP genotyping
1 bp 1 chr
Throughput
1 kb 1 Mb
Size of variant
How do we assay them?
Slide courtesy of S. Girirajan
Human Genetic Variation
1 bp 1 chr
Frequency
SNP
Trisomy
monosomy
Copy number
variants
Size of variant
1 kb 1 Mb
Types of genetic variants
Array-CGH
Karyotyping
Sequencing
SNP genotyping
1 bp 1 chr
Throughput
1 kb 1 Mb
Size of variant
How do we assay them?
Slide courtesy of S. Girirajan
Human Genetic Variation
Extreme diversity in the human genome
• <99.5% identity to the reference
• Refractory to traditional sequencing efforts
• Loci often contain gene families associated with
immune response and xenobiotic metabolism
HLA is a classic example of an extremely diverse locus
• Critical to immune response
• Characterized by overdominant
selection
• Alleles are linked and segregate as
distinct haplotypes
• Shaped by gene duplication and
diversification
Segmental duplications can predispose loci to further
rearrangement via NAHR
Segmental duplications can predispose loci to further
rearrangement via NAHR
A
A
C
T
C
G
C
C
Repeat Copies (noted by color difference)
Allelic
Copies
Diploid Genome
With a diploid genome, there is significant ambiguity
sorting allelic copies from repeat copies
A C C C
Haploid Genome
Repeat Copies
(ONLY but noted by color differences)
With a haploid genome, allelic differences are eliminated, and
base differences are likely indicative of repeat copies
Hydatidiform mole
SRGAP2 Homology between genes
Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs
Shows homology between SRGAP2B and SRGAP2C
Dennis, et.al. 2012
SRGAP2A
SRGAP2B
SRGAP2C
1q21
1q21 patch alignment to chromosome 1
1q32 1q21 1p21
Hydatidiform mole
Let’s sequence and assemble the whole genome!
CHM1_1.1 Assembly
• Reference-guided assembly
• SRPRISM v2.3, R. Agarwala
• Alignment of Illumina reads to GRCh37 primary assembly
• CHORI-17 BAC clone tilepaths were then incorporated
• 428 total clones
• 324 clones in 45 tilepaths
• 104 clones as singletons
http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.
2
Total Sequence Length 3,037,866,619 bp
Total Assembly Gap Length 210,229,812 bp
Number of Scaffolds 163
Scaffold N50 50,362,920 bp
CHM1 Assembly Paper - Genome Research Steinberg et al. 2014
CHM1_1.1 assembly is highly contiguous compared to
other WGS based assemblies
Integrating BAC tiling paths improved assembly
Integrating BAC tiling paths improved assembly
Alignment of CHM1 Illumina data to assembly revealed
regions of extreme heterogeneity
Heterozygous Homozygous Total
Variants 64033 22513 86546
In RepeatMasked (RM) sequence 37060 14833 51893
In Segmental duplication (SD) 30670 4843 35513
In RM and SD 51466 17174 68640
Ts:Tv 1.5 0.7 1.2
Mean SNV density/kb 0.02 0.008 0.03
There are significantly more heterozygous variants in repetitive
sequence than expected (p<1x10-16). BAC ends mapping discordantly
and in multiple loci are significantly enriched for segmental
duplications (p<1x10-5).
Identified 549 novel protein coding genes not annotated
in GRCh37
CHM1 BioNano Genome Map Aligned to GRCh38
GRCh38
CHM1 BioNano Map
~15kb additional data
BioNano SV Calls Identified a Assembly Problems
Collapse
Expansion
inAssembly
Gap in SequenceCHM1_1.1 Assembly
CHM1 BioNano Map
Conclusion
• Extremely diverse regions of the genome are difficult to
characterize due to issues distinguishing allelic from
paralogous duplications
• CHM1_1.1 highly contiguous single haplotype
representation of the genome
• Identified regions of misassembly or reference-ized
regions
• Utilize long read technology and nanopore technology to
attempt to fix these regions
Need to add more diversity to reference
• Finish another hydatidiform mole to platinum
status
• Finish 5 genomes to gold status
• NA19240 (Yoruban)
• NA12878 (European)
• HG00513 (Han Chinese)
• 2 “wildcards”
• Looking for underrepresented minority population
• Add high quality alternative sequences to
reference to create a population reference graph
or “pan genome”
Use colored de Bruijn graph structure to represent
population reference graph
Bioinformatic tool development in the future
• Alignment of short reads to population reference
graph
• Variant calling
• Variant reporting/Haplotype resolution
Adapted from Weinstein et al, 2009
The GRCh37 reference sequence was assembled
from three lymphoblastoid cell lines
Not a true haplotype
Incomplete
The CH17 haplotype is quite different from the reference
Novel insertion
The CH17 haplotype is quite different from the reference
Complex Indel
The CH17 haplotype is quite different from the reference
Hotspot/Recurrent Mutation
The CH17 haplotype is quite different from the reference
60 kbp Insertion
(Hotspot)
African Asian European
Duplication (influenza)
The CH17 haplotype is quite different from the reference
44 kbp Duplication
(influenza)
African Asian European
Summary of hydatidiform mole sequence
• 47 functional V genes
• 24 total variants (SNV and CNV) involving 29 IGHV
genes
• 5 structural variants
• 19 single nucleotide variants
• 15 non-synonymous mutations
• 20 out of 24 variants represent differences in amino acid
sequence or gene copy number
Summary of hydatidiform mole sequence
• 47 functional V genes
• 24 total variants (SNV and CNV) involving 29 IGHV
genes
• 5 structural variants
• 19 single nucleotide variants
• 15 non-synonymous mutations
• 20 out of 24 variants represent differences in amino acid
sequence or gene copy number
100 kbp of novel sequence
Current status of CHM1 resources
• CHORI-17 BAC Library (created from CHM1 cell line)
• CHORI-17 BAC end sequences (n=325,659)
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs)
• CHORI-17 BACs (>750 have been sequenced, with 592 of them in
Genbank as phase 3)
• Active cell line
• >100X coverage Illumina 100bp reads
• 300, 500bp, 3kb inserts
• Reference assisted assembly CHM1_1.1
• BioNano genome map
• >50X coverage of PacBio long read data
CHM1_1.1 Assembly
• Reference-guided assembly – SRPRISM v2.3, R. Agarwala
• Alignment of Illumina reads to GRCh37 primary assembly
• CHORI-17 BAC clone tilepaths were then incorporated
• 428 total clones
• 324 clones in 45 tilepaths
• 104 clones as singletons
• Comparison back to GRCh37 reference to provide appropriate gaps
sizes
• Assembly submitted to Genbank
• http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2
• Steinberg et al, 2014
• Genome Research (Dec;24(12):2066-76)
LILR (leukocyte
immunoglobulin-like
receptor)/KIR (killer
immunoglobulin receptor)
Immunoglobulin Kappa chain
Immunoglobulin Lambda chain
TCRA/B
17q21.31 inversion
polymorphism
Immunoglobulin
heavy chain locus
CYP2D6
SRGAP2
15q13.3
inversion
polymorphism

More Related Content

What's hot

Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assemblyGenome Reference Consortium
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cellsChris Thorne
 

What's hot (20)

Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cells
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 

Viewers also liked

Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 
Variation reference graphs and the variation graph toolkit vg
Variation reference graphs and the variation graph toolkit vgVariation reference graphs and the variation graph toolkit vg
Variation reference graphs and the variation graph toolkit vgGenome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 

Viewers also liked (13)

Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Variation reference graphs and the variation graph toolkit vg
Variation reference graphs and the variation graph toolkit vgVariation reference graphs and the variation graph toolkit vg
Variation reference graphs and the variation graph toolkit vg
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Transitioning to gr_ch38
Transitioning to gr_ch38Transitioning to gr_ch38
Transitioning to gr_ch38
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 

Similar to 150224 grc kms

Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen MigaKaren Hayden Miga
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Labjsrep91
 
Molecular Markers and Applications-Lecture.pdf
Molecular Markers and Applications-Lecture.pdfMolecular Markers and Applications-Lecture.pdf
Molecular Markers and Applications-Lecture.pdfaisha159367
 
Genomic Analyses: QTLs, etc.
Genomic Analyses:  QTLs, etc.Genomic Analyses:  QTLs, etc.
Genomic Analyses: QTLs, etc.gfb1
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studiesFOODCROPS
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomesAnthony Papenfuss
 
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTING
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTINGDNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTING
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTINGshooterzgame09
 
Dna based tools in fish identification
Dna based tools in fish identificationDna based tools in fish identification
Dna based tools in fish identificationDEVIKA ANTHARJANAM
 
Isothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesIsothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesAref Farokhi Fard
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets? ehsan sepahi
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Pat (JS) Heslop-Harrison
 
Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...Joseph Pickrell
 
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...maysoethu
 

Similar to 150224 grc kms (20)

Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Molecular Markers and Applications-Lecture.pdf
Molecular Markers and Applications-Lecture.pdfMolecular Markers and Applications-Lecture.pdf
Molecular Markers and Applications-Lecture.pdf
 
Genomic Analyses: QTLs, etc.
Genomic Analyses:  QTLs, etc.Genomic Analyses:  QTLs, etc.
Genomic Analyses: QTLs, etc.
 
Mapping genetic diversity through genetic markers
Mapping genetic diversity through genetic markersMapping genetic diversity through genetic markers
Mapping genetic diversity through genetic markers
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 
ArraySNP(Tsuchiya2012).ppt
ArraySNP(Tsuchiya2012).pptArraySNP(Tsuchiya2012).ppt
ArraySNP(Tsuchiya2012).ppt
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTING
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTINGDNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTING
DNA MARKERS 2023 DNA FINGERPRINTING TYPE OF METHODS OF DNA FINGERPRINTING
 
Dna based tools in fish identification
Dna based tools in fish identificationDna based tools in fish identification
Dna based tools in fish identification
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Isothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesIsothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification Techniques
 
CSHL
CSHLCSHL
CSHL
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
 
Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...
 
MGG2003-cDNA-AFLP
MGG2003-cDNA-AFLPMGG2003-cDNA-AFLP
MGG2003-cDNA-AFLP
 
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...
Genetic variation in individual & population, polymorphism, Hardy-Weinberg Eq...
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 

More from Genome Reference Consortium (15)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 

Recently uploaded

Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...narwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 

Recently uploaded (20)

Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 

150224 grc kms

  • 1. Characterizing extreme diversity in the human genome using a single haplotype genomic resource Karyn Meltz Steinberg, Ph.D. AGBT 2015 GRC Workshop @KMS_Meltzy
  • 2. 1 bp 1 chr Frequency SNP Trisomy monosomy Copy number variants Size of variant 1 kb 1 Mb Types of genetic variants Slide courtesy of S. Girirajan Human Genetic Variation
  • 3. 1 bp 1 chr Frequency SNP Trisomy monosomy Copy number variants Size of variant 1 kb 1 Mb Types of genetic variants 1 bp 1 chr Throughput 1 kb 1 Mb Size of variant How do we assay them? Slide courtesy of S. Girirajan Human Genetic Variation
  • 4. 1 bp 1 chr Frequency SNP Trisomy monosomy Copy number variants Size of variant 1 kb 1 Mb Types of genetic variants SNP genotyping 1 bp 1 chr Throughput 1 kb 1 Mb Size of variant How do we assay them? Slide courtesy of S. Girirajan Human Genetic Variation
  • 5. 1 bp 1 chr Frequency SNP Trisomy monosomy Copy number variants Size of variant 1 kb 1 Mb Types of genetic variants Array-CGH Karyotyping SNP genotyping 1 bp 1 chr Throughput 1 kb 1 Mb Size of variant How do we assay them? Slide courtesy of S. Girirajan Human Genetic Variation
  • 6. 1 bp 1 chr Frequency SNP Trisomy monosomy Copy number variants Size of variant 1 kb 1 Mb Types of genetic variants Array-CGH Karyotyping Sequencing SNP genotyping 1 bp 1 chr Throughput 1 kb 1 Mb Size of variant How do we assay them? Slide courtesy of S. Girirajan Human Genetic Variation
  • 7. Extreme diversity in the human genome • <99.5% identity to the reference • Refractory to traditional sequencing efforts • Loci often contain gene families associated with immune response and xenobiotic metabolism
  • 8. HLA is a classic example of an extremely diverse locus • Critical to immune response • Characterized by overdominant selection • Alleles are linked and segregate as distinct haplotypes • Shaped by gene duplication and diversification
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Segmental duplications can predispose loci to further rearrangement via NAHR
  • 14. Segmental duplications can predispose loci to further rearrangement via NAHR
  • 15. A A C T C G C C Repeat Copies (noted by color difference) Allelic Copies Diploid Genome With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies A C C C Haploid Genome Repeat Copies (ONLY but noted by color differences) With a haploid genome, allelic differences are eliminated, and base differences are likely indicative of repeat copies
  • 17. SRGAP2 Homology between genes Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs Shows homology between SRGAP2B and SRGAP2C Dennis, et.al. 2012 SRGAP2A SRGAP2B SRGAP2C
  • 18. 1q21 1q21 patch alignment to chromosome 1 1q32 1q21 1p21
  • 19. Hydatidiform mole Let’s sequence and assemble the whole genome!
  • 20. CHM1_1.1 Assembly • Reference-guided assembly • SRPRISM v2.3, R. Agarwala • Alignment of Illumina reads to GRCh37 primary assembly • CHORI-17 BAC clone tilepaths were then incorporated • 428 total clones • 324 clones in 45 tilepaths • 104 clones as singletons http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695. 2 Total Sequence Length 3,037,866,619 bp Total Assembly Gap Length 210,229,812 bp Number of Scaffolds 163 Scaffold N50 50,362,920 bp CHM1 Assembly Paper - Genome Research Steinberg et al. 2014
  • 21. CHM1_1.1 assembly is highly contiguous compared to other WGS based assemblies
  • 22. Integrating BAC tiling paths improved assembly
  • 23. Integrating BAC tiling paths improved assembly
  • 24. Alignment of CHM1 Illumina data to assembly revealed regions of extreme heterogeneity Heterozygous Homozygous Total Variants 64033 22513 86546 In RepeatMasked (RM) sequence 37060 14833 51893 In Segmental duplication (SD) 30670 4843 35513 In RM and SD 51466 17174 68640 Ts:Tv 1.5 0.7 1.2 Mean SNV density/kb 0.02 0.008 0.03 There are significantly more heterozygous variants in repetitive sequence than expected (p<1x10-16). BAC ends mapping discordantly and in multiple loci are significantly enriched for segmental duplications (p<1x10-5).
  • 25. Identified 549 novel protein coding genes not annotated in GRCh37
  • 26. CHM1 BioNano Genome Map Aligned to GRCh38 GRCh38 CHM1 BioNano Map ~15kb additional data
  • 27. BioNano SV Calls Identified a Assembly Problems Collapse Expansion inAssembly Gap in SequenceCHM1_1.1 Assembly CHM1 BioNano Map
  • 28. Conclusion • Extremely diverse regions of the genome are difficult to characterize due to issues distinguishing allelic from paralogous duplications • CHM1_1.1 highly contiguous single haplotype representation of the genome • Identified regions of misassembly or reference-ized regions • Utilize long read technology and nanopore technology to attempt to fix these regions
  • 29.
  • 30.
  • 31. Need to add more diversity to reference • Finish another hydatidiform mole to platinum status • Finish 5 genomes to gold status • NA19240 (Yoruban) • NA12878 (European) • HG00513 (Han Chinese) • 2 “wildcards” • Looking for underrepresented minority population • Add high quality alternative sequences to reference to create a population reference graph or “pan genome”
  • 32. Use colored de Bruijn graph structure to represent population reference graph
  • 33. Bioinformatic tool development in the future • Alignment of short reads to population reference graph • Variant calling • Variant reporting/Haplotype resolution
  • 34. Adapted from Weinstein et al, 2009
  • 35. The GRCh37 reference sequence was assembled from three lymphoblastoid cell lines Not a true haplotype Incomplete
  • 36. The CH17 haplotype is quite different from the reference
  • 37. Novel insertion The CH17 haplotype is quite different from the reference
  • 38. Complex Indel The CH17 haplotype is quite different from the reference
  • 39. Hotspot/Recurrent Mutation The CH17 haplotype is quite different from the reference
  • 41. Duplication (influenza) The CH17 haplotype is quite different from the reference
  • 43. Summary of hydatidiform mole sequence • 47 functional V genes • 24 total variants (SNV and CNV) involving 29 IGHV genes • 5 structural variants • 19 single nucleotide variants • 15 non-synonymous mutations • 20 out of 24 variants represent differences in amino acid sequence or gene copy number
  • 44. Summary of hydatidiform mole sequence • 47 functional V genes • 24 total variants (SNV and CNV) involving 29 IGHV genes • 5 structural variants • 19 single nucleotide variants • 15 non-synonymous mutations • 20 out of 24 variants represent differences in amino acid sequence or gene copy number 100 kbp of novel sequence
  • 45. Current status of CHM1 resources • CHORI-17 BAC Library (created from CHM1 cell line) • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs (>750 have been sequenced, with 592 of them in Genbank as phase 3) • Active cell line • >100X coverage Illumina 100bp reads • 300, 500bp, 3kb inserts • Reference assisted assembly CHM1_1.1 • BioNano genome map • >50X coverage of PacBio long read data
  • 46. CHM1_1.1 Assembly • Reference-guided assembly – SRPRISM v2.3, R. Agarwala • Alignment of Illumina reads to GRCh37 primary assembly • CHORI-17 BAC clone tilepaths were then incorporated • 428 total clones • 324 clones in 45 tilepaths • 104 clones as singletons • Comparison back to GRCh37 reference to provide appropriate gaps sizes • Assembly submitted to Genbank • http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 • Steinberg et al, 2014 • Genome Research (Dec;24(12):2066-76)
  • 47. LILR (leukocyte immunoglobulin-like receptor)/KIR (killer immunoglobulin receptor) Immunoglobulin Kappa chain Immunoglobulin Lambda chain TCRA/B 17q21.31 inversion polymorphism Immunoglobulin heavy chain locus CYP2D6 SRGAP2 15q13.3 inversion polymorphism

Editor's Notes

  1. There are many types of genetic variation in the human genome ranging from single nucleotide variants up to chromosomal abnormalities. Copy number variants fall within these two types of variation.
  2. The techniques we use to assay these variants depends on the size of the variant and this in turn affects the throughput
  3. SNP genotyping has very high throughput but the amount of genome that can be assayed is small
  4. Array CGH is ideal for copy number variants but the throughput is much lower than SNP genotyping and is less effective for smaller variants Karyotyping can detect large abnormalities and the throughput is similar to Array CGH
  5. Finally sequencing should be able to assay all forms of genetic variation and the goal of next generation sequencing is to keep increasing the throughput
  6. So I became interested in how to effectively identify and assay extreme genetic diversity. We expect any two human haplotypes to be 99.8-100% identical to one another. We define extreme genetic diversity as less than 99.5% identity to the reference sequence which is approximately 1 variant per 500 base pairs. These loci are often refractory to traditional sequencing efforts and are enriched for genes and gene families related to immune response and environmental detoxification.
  7. The human leukocyte antigen locus is a classic example of high diversity in the human genome. HLA is critical to human disease as it binds to the T cell receptor and plays a critical role in antigen processing and presentation. The locus is characterized by overdominant selection where individuals with a heterozygous genotype have higher fitness than those with the homozygous genotype. Alleles at this locus are linked and segregate as distinct haplotypes and the locus has been shaped by extensive gene duplication and diversification.
  8. Now what do I mean by gene duplication and diversification? Let’s say there is a gene with 4 functions that duplicates. It has a few different fates.
  9. First the two copies could each acquire different mutations that inactivate certain functions but overall the two genes retain the same function as before
  10. Secondly the duplicate copy could acquire mutations that endow it with novel functions
  11. Third the duplicate gene could acquire mutations to the point that it no longer functions leading to loss of the gene or pseudogenization
  12. A duplicated region of the genome that is larger than 1 kilobase and has greater than 90% identity is called a segmental duplication. These duplications can predispose loci to further rearrangement via non-allelic homologous recombination. This can lead to deletions or duplications of intervening sequence
  13. Segmental duplications may also lead to inversions. They are an important part of the genome’s architecture as they serve as hotspots that can create more complex architecture and diversity.
  14. Here is an example of one of those segmentally duplicated regions. The SRGAP2 gene family maps to 3 specific regions on chr 1. All three of these loci were poorly assembled in GRCh37, This region was resequenced using the CH17 single haplotype BAC library. This diagram shows the homologous regions using Miropeats, where the green lines indicate nearly identical segments between SRGAP2A and the duplicate paralogs. The blue lines delineate the larger extent the homology between SRGAP2B and C. Notice the scale of these region, the red boxed regions are 244 kb of sequence that is nearly identical among among all three loci. In GRCh37, these regions contained multiple haplotypes and were very fragmented By sequencing these regions with the CH17 BACs, we were able to full resolve all three of these regions
  15. Here is another representation of this region. The graphic at the bottom shows the alignment of the fix patch to the reference assembly. The blue boxes highlight sequences on the patches that were completely missing from GRCh37. By sequencing these regions from a single haplotype source, approximately 500kb of sequence, that was previously misassembled, incorrectly oriented or completely missing, has now been resolved.
  16. As I mentioned, the CHM1_1.1, which is the Illumina-based assembly, was a reference guided assembly created by Richa Agarwala at NCBI, using her SRPRISM assembler. The process involves alignment of the Illumina data to the GRCh37 primary assembly. One thing that is unique about this assembly compared to other whole genome assemblies is the fact that we used many BAC tilepaths in segmentally duplicated regions. There were 45 total paths used in the assembly and then another 104 singletons that were included. This assembly is available for download from Genbank using the following link. We have recently published a paper on this assembly with further analysis.
  17. At the time this assembly was generated, it had the longest N50 contig length of any human whole genome assembly in Genbank.
  18. Another CHM1 resource I mentioned is the BioNano Genome Map. BioNano is a nanochannel technology where the DNA is nicked and labeled at specific recognition sites, so you end up with nick sites along the DNA molecules in context, similar to a restriction digest, only you have the added benefit of the nicks being in context. This is showing the same region as the previous example – The top green line here is the GRCh38 reference in silico representation and the bottom blue lines are the map. The alignment of these two data sets shows the same size discrepancy, so the BioNano map data confirms the extra data found in CHM1 at this position. If this data were accessioned, we could add this sequence to the reference, it would become a fix patch if we think there is an error in GRCh38, or a novel patch if we believe this to be variation
  19. Here is another way to use the BioNano map to identify potential assembly issues. This time, I have the BioNano map aligned to the CHM1 Illumina assembly. The green bar represents our CHM1 Illumina based sequence assembly and the blue bars are the map contigs. The first tag, labeled here as a collapse is indicating that there is more data present here in the map contig. To look into the possibility that we have a collapse in our assembly, we examined the Illumina reads aligned back to the CHM1_1.1 assembly. We found that the reads were piling up in the region indicating a collapse, so it looks as if the BioNano map is correct through this region. The expansion label is indicating that there is too much data in the assembly. Within this region we have a gap in our assembly, so our gap is likely sized too big. I think in this example, a standard 50Kb gap size was used in the assembly, which likely indicates that we were not sure what the size was.
  20. Immunoglobulin molecules are formed when somatic recombination occurs between one V, one D and one J gene.
  21. The second major issue is that the current reference sequence was assembled from three lymphoblastoid cell lines. These will be subject to somatic recombination and don’t reflect a true haplotype
  22. Across the V gene region we find that the CH17 haplotype is quite different from the reference haplotype. I have indicated the reference genome sequence on the bottom—each of the green boxes is a functional V gene. On the top line is the mole haplotype and as you can see there are many differences both allelic as well as structural. We identified and characterized five large structural variants including the novel insertion of the 7-4-1 gene, an insertion deletion event where two genes were deleted and replaced with two additional genes and a duplication and insertion event at V1-69 and V2-70. Interestingly, V1-69 is preferentially used against hemagglutinin epitopes of particular influenza strains and copy number correlates with expression levels.
  23. Across the V gene region we find that the CH17 haplotype is quite different from the reference haplotype. I have indicated the reference genome sequence on the bottom—each of the green boxes is a functional V gene. On the top line is the mole haplotype and as you can see there are many differences both allelic as well as structural. We identified and characterized five large structural variants including the novel insertion of the 7-4-1 gene, an insertion deletion event where two genes were deleted and replaced with two additional genes and a duplication and insertion event at V1-69 and V2-70. Interestingly, V1-69 is preferentially used against hemagglutinin epitopes of particular influenza strains and copy number correlates with expression levels.
  24. Across the V gene region we find that the CH17 haplotype is quite different from the reference haplotype. I have indicated the reference genome sequence on the bottom—each of the green boxes is a functional V gene. On the top line is the mole haplotype and as you can see there are many differences both allelic as well as structural. We identified and characterized five large structural variants including the novel insertion of the 7-4-1 gene, an insertion deletion event where two genes were deleted and replaced with two additional genes and a duplication and insertion event at V1-69 and V2-70. Interestingly, V1-69 is preferentially used against hemagglutinin epitopes of particular influenza strains and copy number correlates with expression levels.
  25. Across the V gene region we find that the CH17 haplotype is quite different from the reference haplotype. I have indicated the reference genome sequence on the bottom—each of the green boxes is a functional V gene. On the top line is the mole haplotype and as you can see there are many differences both allelic as well as structural. We identified and characterized five large structural variants including the novel insertion of the 7-4-1 gene, an insertion deletion event where two genes were deleted and replaced with two additional genes and a duplication and insertion event at V1-69 and V2-70. Interestingly, V1-69 is preferentially used against hemagglutinin epitopes of particular influenza strains and copy number correlates with expression levels.
  26. This is reflected in the enormous and highly significant Fst values shown here.
  27. Across the V gene region we find that the CH17 haplotype is quite different from the reference haplotype. I have indicated the reference genome sequence on the bottom—each of the green boxes is a functional V gene. On the top line is the mole haplotype and as you can see there are many differences both allelic as well as structural. We identified and characterized five large structural variants including the novel insertion of the 7-4-1 gene, an insertion deletion event where two genes were deleted and replaced with two additional genes and a duplication and insertion event at V1-69 and V2-70. Interestingly, V1-69 is preferentially used against hemagglutinin epitopes of particular influenza strains and copy number correlates with expression levels.
  28. Using Fst we see that this locus is significantly differentiated between the Asians and Africans.
  29. We identify and annotate 47 functional V genes, 27 functional D genes and six functional J genes in the mole haplotype. There are 7 functional V genes present in mole not in reference; 3 V genes deleted from mole
  30. In conclusion, using the CH17 hydatidiform mole BAC library, we identify approximately 100 kbp of novel sequence not present in the human reference
  31. Knowing the utility of this single haploid source, it was decided to sequence the whole genome of CHM1. At the time we started this project, Illumina data was the only cost effective method of generating a whole genome assembly. We generated over 100X coverage of Illumina paired end reads, a reference guided assembly was produced using this data. More recently Pac Bio generated >50X coverage of CHM1 in long read data and we have also had a BioNano Genome map generated.
  32. As I mentioned, the CHM1_1.1 assembly was a reference guided assembly created by Richa Agarwala at NCBI, using her SRPRISM assembler. The process involves alignment of the Illumina data to the GRCh37 primary assembly. One thing that is unique about this assembly compared to other whole genome assemblies is the fact that we used many BAC tilepaths in segmentally duplicated regions. There were 45 total paths used in the assembly and then another 104 singletons that were included. Then a final step was done to compare the assembly back to GRCh37 to provide appropriate gap sizes. This assembly is available for download from Genbank using the following link. We also have a paper that should be published soon and you can go to the Bioarchive to find it there.