SlideShare uma empresa Scribd logo
1 de 27
Graph and assembly strategies for the
MHC and ribosomal DNA regions
Alexander Dilthey
The MHC is the zebrafish of the genome!
(model region)
PRGs – Population Reference Graphs
• Simple: acyclic, directed (sub-class of general variation graphs)
• Usually built from MSA, preserve gap positions
(i.e. global homology between input sequences).
• Generative model: Recombination
• Ploidy well-defined (0, 1, 2)
TA CT A G
C
C
_
_
A
TA
A
Outline
• Quick recap:
What we know about the utility of graph genome approaches
• New results:
Haplotyping in hypervariable regions (HLA)
Pseudo graph alignment
• De novo assembly of ribosomal DNA
In most of the MHC, single-reference
approaches work just fine…
Numberofkmers(millions)
4.55.0
PGFreference Platypus PRG-Viterbi PRG-Mapped
kmersrecovered
kmersnot recovered
+ long-read validation with consistent results (not shown)
Dilthey et al., Nature Genetics 2015
… graph genomes outperform in the most
complex sub-region of the MHC …
Dilthey et al., Nature Genetics 2015
… remaining problems driven by incomplete
input haplotypes + algorithmics.
Aligned kmers
Chromotype position (kb)
Readposition(kb)
0 10 20
0
2
4
6
Incomplete input haplotypes:
Large uncharacterized inversion
Algorithmics:
Incorrect HLA haplotyping.
Dilthey et al., Nature Genetics 2015
HLA haplotyping
• Hypothesis: Whole-genome sequencing data contains the information
necessary for accurate HLA typing
• “HLA typing”  HLA gene exon sequences
• HLA class I: exons 2 and 3
• HLA class II: exon 2
• Challenge: align reads to the right gene – homology hell.
• Proper read-to-graph alignment instead of k-Mers.
Class I exon homology
Exon 2 Exon 3
HLA-A 3284 alleles
HLA-B 4077 alleles
HLA-C 2799 alleles
Approach: deep PRG + mapping
Exonic MSA
T*01:01 _ _ A C G T A C T _ _
T*01:02 C A A C A T A C T _ _
T*01:03 _ _ A C G C G C T _ _
T*01:04 _ _ A T C C G C T A C
T*01:05 _ _ A T C C C C T _ _
T*01:06 _ _ _ C C T A C T _ _
Genomic MSA
T*01:01 A G C A _ _ A C G T A C T _ _ C C T A
T*01:02 A C C A C A A C A T A C T _ _ C C T A
T*01:04 _ T T A _ _ A T C C G C T A C C C T A
8 xMHC reference haplotypes
PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A
MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A
1) Gene-only PRG – 46 (pseudo) genes, mostly HLA
|--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3
Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding
Numberofreferencesequences
Region covered by 'genomic' sequences
2) Varying numbers of input sequences across PRG
3) Use hierarchical MSA approach to combine in
Approach: deep PRG + mapping
Level 1
CA
_ _
C T
C
CC
G
AAligned read
2 3 4 5 6 7
A _ TATA _ C
198 9 10 11 12 13 14 15 16 17 18 25 26
C AGTATC
20 21 22 23 24
TC
TC
T T
A
_
A _
A G
C
T
C
T
T
C T
ATA
C
C {G, C}T
C
G
CA
A
_ _
A
4) Seed-and-extend paired-end mapping to PRG
5) Likelihood-based inference: maximize L( aligned reads | HLA types )
(independently per locus)
High-quality WGS data enables gold-standard
accuracy
(of note: 2/3 original discrepancies with validation data were errors in the validation data!)
… but not from exome, MiSeq data
Sequencing error?
Effective fragment length? [2 x read length + IS]
Conclusion (intermediate)
• If the input sequencing data is „good enough“, we manage near-
perfect haplotyping in the genome‘s most polymorphic region
• Effective fragment length likely the most important factor
• Not-so-good sequencing data: joint haplotyping + alignment
(i.e. alignment location is not independent of inferred haplotype)
• Read mapping implementation SLOW
Pseudo graph mapping
Input sequences
Pseudo graph mapping
Input sequences
Graph
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
... transpose onto graph
Scrubbing, cutting, cleaning
Input MSA Lin. alignment MSA coor. Scrubbed
123456789 123456X789 123456789
Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT
Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT
-
Graph TTCAC TTT
G
Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system
Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch
Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps
Graph alignment
123456789
Graph AACACGTTT
Seq1 AACACGTTT
Accuracy slightly worse; fast!
Conclusion: perhaps there is a middle ground between graph and linear sequence
alignment. Work in progress. Further tuning?
Inferred Accuracy Call Rate Inferred Accuracy Call Rate
A 6 6 1.00 1.00 6 1.00 1.00
B 6 6 1.00 1.00 6 1.00 1.00
C 6 6 1.00 1.00 6 1.00 1.00
DQA1 6 6 1.00 1.00 6 1.00 1.00
DQB1 6 6 1.00 1.00 6 1.00 1.00
DRB1 6 6 1.00 1.00 6 1.00 1.00
A 22 22 0.86 1.00 22 1.00 1.00
B 22 22 1.00 1.00 22 1.00 1.00
C 22 22 1.00 1.00 22 1.00 1.00
DQA1 12 12 1.00 1.00 12 1.00 1.00
DQB1 22 22 1.00 1.00 22 1.00 1.00
DRB1 22 22 0.91 1.00 22 0.95 1.00
Platinum
Trio
1000
Genomes
Highest
Resolution
MHC-PRG-2 HLA*PRG
NLocusCohort
Towards additional high-quality reference
haplotypes…
Remaining challenges: extreme repeats, haplotypes.
Sergey Koren
Ribosomal DNA
• Encodes ribosomal RNA
• Hundreds of copies
(tandem repeat arrays)
• Variation poorly characterized
• Step 1: Targeted approach
• Step 2: WGS-based
• Step 3: Variation graph
Read error vs variation
… from whole-genome data?
Long reads  de Bruijn graph Technology!
6% > 50k
Summary
• Variation graphs are worth the effort – at least in highly complex regions.
• Evidence: MHC „model system“
+ overall improvement of Genome inference accuracy
+ complex-locus haplotyping
• Incorporate LD?
• Middle ground between full graph alignment and linear sequence
alignment?
• Ribosomal DNA – let me know if you‘re also interested!
Acknowledgements
NIH
Adam Phillippy
Sergey Koren
Brian Walenz
Jung-Hyun Kim
Vladimir Larionov
Oxford
Gil McVean
Zam Iqbal
Alexander Mentzer
Histogenetics
Nezih Cereb
UCSF/Nantes
Pierre-Antoine Gourraud
GSK
Matt Nelson
Charles Cox

Mais conteúdo relacionado

Destaque

Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaSnowfairy007
 

Destaque (14)

Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - Schizophrenia
 

Semelhante a Graph and assembly strategies for the MHC and ribosomal DNA regions

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)Marwa Al-Rikaby
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesIntegrated DNA Technologies
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...Christian Have
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to resultsAGRF_Ltd
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Thermo Fisher Scientific
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010russodl
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_PosterLong Pei
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphsChirag Jain
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeetingManuel Rivas
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeElsa von Licy
 

Semelhante a Graph and assembly strategies for the MHC and ribosomal DNA regions (20)

Biochip
BiochipBiochip
Biochip
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple plates
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCR
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_Poster
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeeting
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcarde
 
Notes on Mutation
Notes on MutationNotes on Mutation
Notes on Mutation
 

Mais de Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 

Mais de Genome Reference Consortium (18)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 

Último

Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Dipal Arora
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...narwatsonia7
 
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Genuine Call Girls
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 

Último (20)

Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 

Graph and assembly strategies for the MHC and ribosomal DNA regions

  • 1. Graph and assembly strategies for the MHC and ribosomal DNA regions Alexander Dilthey
  • 2. The MHC is the zebrafish of the genome! (model region)
  • 3. PRGs – Population Reference Graphs • Simple: acyclic, directed (sub-class of general variation graphs) • Usually built from MSA, preserve gap positions (i.e. global homology between input sequences). • Generative model: Recombination • Ploidy well-defined (0, 1, 2) TA CT A G C C _ _ A TA A
  • 4. Outline • Quick recap: What we know about the utility of graph genome approaches • New results: Haplotyping in hypervariable regions (HLA) Pseudo graph alignment • De novo assembly of ribosomal DNA
  • 5. In most of the MHC, single-reference approaches work just fine… Numberofkmers(millions) 4.55.0 PGFreference Platypus PRG-Viterbi PRG-Mapped kmersrecovered kmersnot recovered + long-read validation with consistent results (not shown) Dilthey et al., Nature Genetics 2015
  • 6. … graph genomes outperform in the most complex sub-region of the MHC … Dilthey et al., Nature Genetics 2015
  • 7. … remaining problems driven by incomplete input haplotypes + algorithmics. Aligned kmers Chromotype position (kb) Readposition(kb) 0 10 20 0 2 4 6 Incomplete input haplotypes: Large uncharacterized inversion Algorithmics: Incorrect HLA haplotyping. Dilthey et al., Nature Genetics 2015
  • 8. HLA haplotyping • Hypothesis: Whole-genome sequencing data contains the information necessary for accurate HLA typing • “HLA typing”  HLA gene exon sequences • HLA class I: exons 2 and 3 • HLA class II: exon 2 • Challenge: align reads to the right gene – homology hell. • Proper read-to-graph alignment instead of k-Mers.
  • 9. Class I exon homology Exon 2 Exon 3 HLA-A 3284 alleles HLA-B 4077 alleles HLA-C 2799 alleles
  • 10. Approach: deep PRG + mapping Exonic MSA T*01:01 _ _ A C G T A C T _ _ T*01:02 C A A C A T A C T _ _ T*01:03 _ _ A C G C G C T _ _ T*01:04 _ _ A T C C G C T A C T*01:05 _ _ A T C C C C T _ _ T*01:06 _ _ _ C C T A C T _ _ Genomic MSA T*01:01 A G C A _ _ A C G T A C T _ _ C C T A T*01:02 A C C A C A A C A T A C T _ _ C C T A T*01:04 _ T T A _ _ A T C C G C T A C C C T A 8 xMHC reference haplotypes PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A 1) Gene-only PRG – 46 (pseudo) genes, mostly HLA |--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3 Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding Numberofreferencesequences Region covered by 'genomic' sequences 2) Varying numbers of input sequences across PRG 3) Use hierarchical MSA approach to combine in
  • 11. Approach: deep PRG + mapping Level 1 CA _ _ C T C CC G AAligned read 2 3 4 5 6 7 A _ TATA _ C 198 9 10 11 12 13 14 15 16 17 18 25 26 C AGTATC 20 21 22 23 24 TC TC T T A _ A _ A G C T C T T C T ATA C C {G, C}T C G CA A _ _ A 4) Seed-and-extend paired-end mapping to PRG 5) Likelihood-based inference: maximize L( aligned reads | HLA types ) (independently per locus)
  • 12. High-quality WGS data enables gold-standard accuracy (of note: 2/3 original discrepancies with validation data were errors in the validation data!)
  • 13. … but not from exome, MiSeq data
  • 15. Effective fragment length? [2 x read length + IS]
  • 16. Conclusion (intermediate) • If the input sequencing data is „good enough“, we manage near- perfect haplotyping in the genome‘s most polymorphic region • Effective fragment length likely the most important factor • Not-so-good sequencing data: joint haplotyping + alignment (i.e. alignment location is not independent of inferred haplotype) • Read mapping implementation SLOW
  • 18. Pseudo graph mapping Input sequences Graph
  • 19. Pseudo graph mapping Input sequences Graph Align short reads to input sequences...
  • 20. Pseudo graph mapping Input sequences Graph Align short reads to input sequences... ... transpose onto graph
  • 21. Scrubbing, cutting, cleaning Input MSA Lin. alignment MSA coor. Scrubbed 123456789 123456X789 123456789 Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT - Graph TTCAC TTT G Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps Graph alignment 123456789 Graph AACACGTTT Seq1 AACACGTTT
  • 22. Accuracy slightly worse; fast! Conclusion: perhaps there is a middle ground between graph and linear sequence alignment. Work in progress. Further tuning? Inferred Accuracy Call Rate Inferred Accuracy Call Rate A 6 6 1.00 1.00 6 1.00 1.00 B 6 6 1.00 1.00 6 1.00 1.00 C 6 6 1.00 1.00 6 1.00 1.00 DQA1 6 6 1.00 1.00 6 1.00 1.00 DQB1 6 6 1.00 1.00 6 1.00 1.00 DRB1 6 6 1.00 1.00 6 1.00 1.00 A 22 22 0.86 1.00 22 1.00 1.00 B 22 22 1.00 1.00 22 1.00 1.00 C 22 22 1.00 1.00 22 1.00 1.00 DQA1 12 12 1.00 1.00 12 1.00 1.00 DQB1 22 22 1.00 1.00 22 1.00 1.00 DRB1 22 22 0.91 1.00 22 0.95 1.00 Platinum Trio 1000 Genomes Highest Resolution MHC-PRG-2 HLA*PRG NLocusCohort
  • 23. Towards additional high-quality reference haplotypes… Remaining challenges: extreme repeats, haplotypes. Sergey Koren
  • 24. Ribosomal DNA • Encodes ribosomal RNA • Hundreds of copies (tandem repeat arrays) • Variation poorly characterized • Step 1: Targeted approach • Step 2: WGS-based • Step 3: Variation graph
  • 25. Read error vs variation … from whole-genome data? Long reads  de Bruijn graph Technology! 6% > 50k
  • 26. Summary • Variation graphs are worth the effort – at least in highly complex regions. • Evidence: MHC „model system“ + overall improvement of Genome inference accuracy + complex-locus haplotyping • Incorporate LD? • Middle ground between full graph alignment and linear sequence alignment? • Ribosomal DNA – let me know if you‘re also interested!
  • 27. Acknowledgements NIH Adam Phillippy Sergey Koren Brian Walenz Jung-Hyun Kim Vladimir Larionov Oxford Gil McVean Zam Iqbal Alexander Mentzer Histogenetics Nezih Cereb UCSF/Nantes Pierre-Antoine Gourraud GSK Matt Nelson Charles Cox