SlideShare uma empresa Scribd logo
1 de 26
Using long-read data to reveal
variation and advance the
human reference genome
Peter Audano
Motivation
• GRCh38 – 3.10 Gbp
• 3.26 Gbp with patches/ALTs
• Needs diversity: 70% RP11
• Structural variants (SVs) affect many bases
• Indels ≥ 50 bp and inversions
• 11 Mbp / genome (7x more than SNPs/indel)
• More likely to be an eQTL
• Illumina cannot capture all SVs
• 53% DEL, 22% INS (Chaisson 2018)
Alkan (2011)
Project goals
1. Sequence-resolve common structural variation
2. Correct errors and minor alleles in the reference
3. Build an alternate reference to support SV analysis
with Illumina data
Constructing a diversity panel GRC Sequenced
• CHM1 (Mole)
• CHM13 (Mole)
• HG00514 (CHB)
• HG00733 (PUR)
• NA19240 (YRI)
• HG02818 (GWD)
• NA19434 (LWK)
• HG01352 (CLM)
• HG02059 (KHV)
• NA12878 (CEU)
• HG04217 (ITU)
• HG02106 (PEL)
• HG00268 (FIN)
Public
• AK1 (Korean)
• Seo, 2016
• HX1 (Chinese)
• Shi 2016
• New PacBio data on 11 genomes.
• 7 are new long-read biological samples.
• Selected females to balance X
Building a non-redundant discovery set
• 99,604 SVs
• 21.3 Mbp INS
• 18.5 Mbp DEL
• 2,238 shared
• 1.2 Mbp INS
• 0.1 Mbp DEL
• 5 coding
• 160 regulatory
Discovery set growth slows
Merged set variant length distribution
• 223 full-length L1 elements• 3,383 full-length AluY elements
Half the call set is novel, most is sequence-
resolved for the first time
90% of the genome is resolved
• CEN / Peri-CEN
• 254.5 Mbp
• Low depth
• 0.4 Mbp
• Unresolved DUP
• 19.9 Mbp
• High-identity
segmental
duplications
Recovered 200
missing bases in
FOXO6 and
corrected its
reading frame
Credit: Max Dougherty
VNTRs distribution is non-random
• VNTR enrichment in subtelomeres
• 4.8-fold enrichment (Wilcoxon p = 2.9 × 10-9)
• Correlates with male meiotic recombination and
double-strand breaks
Credit: Arvis Sulovari
Patching GRCh38
• Add SVs to reference on alternate contigs
• Map reads with an ALT-aware aligner
• Recovers 2.62% unmapped reads
• Improves mapping quality for 25.68% of SV-insertion mapped reads
• 2,228 SNPs and indels per sample within SV insertions (GQ 20+)
Genotyping SVs in Illumina samples
GRCh38
primary contig
SV contig
• Extract features around SVs
• Train a machine learning
model to predict genotypes
• 91-95% accuracy
• 15% no-call
Genotyper is accurate over SV types and coverage
Genotyping enables eQTL and sQTL analysis
• 376 samples (avg. 6-fold)
• 379 SV eQTLs (411 genes)
• 34 significant after accounting for
SNP eQTLs
• 244 SV sQTLs (197 genes)
Credit: Yang Li and Ankeeta Shah
Resources available soon
• Variant calls
• VCF of SVs linked to contig breakpoints
• BAM of contigs
• SMRT-SV v2 genotyper
• Contigs (PRJN481779)
• Variants in dbVar (ntsd163)
• Patched reference
• ALTs for BWA-MEM
• Graph for vg (Garrison 2018)
Future work
• Sequence 50 additional genomes
• Phase genomes
• 10X and Strand-Seq
• Phased-SV (Chaisson 2017, bioRxiv)
• Genotype additional genomes
• 2,500 high-coverage 1000 Genomes samples
• 10,000 autism genomes
• Improve the human reference
• Patch GRCh38
• Build a human pan genome reference
Acknowledgments
• Evan Eichler
• Tonia Brown
• Arvis Sulovari
• David Gordon
• Benson Hsieh
• Zev Kronenberg
• Tina Graves-Lindsay
• Susan Dutcher
• Wesley Warren
• Vince Magrini
• Sean McGrath
• Richard Wilson
• Yang Li
• Ankeeta Shah
Supplementary Slides
15 long-read samples
Sample Population
Super-
population Source New Data Accession Platform
Mean
Depth
Longest
Coverage
Subread
Coverage
Longest
N50
Subread
N50
CHM1 Mole NA Reference No PRJNA246220 RS II 65x 63x 66x 19,728 19,226
CHM13 Mole NA Reference No PRJNA269593 RS II 67x 63x 72x 11,954 11,320
HG00514 CHB EA Reference Yes* PRJNA300843 RS II 76x 93x 104x 17,472 16,653
HG00733 PUR AMR Reference Yes* PRJNA300840 RS II 55x 63x 69x 16,195 15,461
NA19240 YRI AFR Reference Yes* PRJNA288807 RS II 59x 65x 71x 17,343 16,584
HG02818 GWD AFR Reference Yes PRJNA339722 RS II 79x 90x 98x 16,807 16,221
NA19434 LWK AFR Reference Yes PRJNA385272 RS II 59x 62x 71x 17,635 16,853
HG01352 CLM AMR Reference Yes PRJNA339719 RS II 56x 69x 75x 20,738 20,049
HG02059 KHV SA Reference Yes PRJNA339726 RS II 64x 71x 77x 18,533 17,890
NA12878 CEU EUR Reference Yes* PRJNA323611 RS II 50x 66x 75x 17,121 16,376
HG04217 ITU SA Reference Yes PRJNA481794 RS II 40x 46x 51x 18,149 16,871
HG02106 PEL AMR Reference Yes PRJNA480858 Sequel 73x 66x 69x 21,540 20,646
HG00268 FIN EUR Reference Yes PRJNA480712 Sequel 85x 76x 79x 25,245 24,487
AK1 Korean EA Public No PRJNA298944 RS II 77x 89x 102x 15,609 14,721
HX1 Chinese EA Public No PRJNA301527 RS II 98x 79x 103x 13,412 12,002
Methods
•SMRT-SV
•Align reads to GRCh38
•Assemble in windows (40 - 60 kbp)
• Polish assemblies (quiver/arrow)
•Align assemblies to GRCh38
•Call SVs
• SV calls (VCF) with contig breakpoints
• Contig BAM
SVs are
resolved in 90%
of the genome
Closing muted gaps
Sensitivity increases over Illumina
Published:
• Illumina
• 1000 Genomes 1 & 3
• Mills 2011
• Sudmant 2015
• GoNL
An augmented reference reveals hidden variation
Type Count AC SVs
DEL 3,582 8,835 206
INS 2,407 11,008 35
SNV 15,980 48,813 0
All 21,969 68,656 241
TEMPLATE
• TEMPLATE

Mais conteúdo relacionado

Mais procurados

Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014vaschn
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 

Mais procurados (20)

AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 

Semelhante a 20181016 grc presentation-pa

Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Integrated DNA Technologies
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_pptaserrapages
 
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR System
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR SystemIDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR System
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR SystemThermo Fisher Scientific
 
Molecular Assisted Breeding in Maize.pdf
Molecular Assisted Breeding in Maize.pdfMolecular Assisted Breeding in Maize.pdf
Molecular Assisted Breeding in Maize.pdfssuser5893431
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Kate Barlow
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingGenomeInABottle
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Thermo Fisher Scientific
 
Single elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxSingle elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxssuser1580e5
 
Jo Dens: Re-entry devices – essential or not?
Jo Dens: Re-entry devices – essential or not?Jo Dens: Re-entry devices – essential or not?
Jo Dens: Re-entry devices – essential or not?Euro CTO Club
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsGenomeInABottle
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 

Semelhante a 20181016 grc presentation-pa (20)

Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_ppt
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
Aug2014 acrometrix
Aug2014 acrometrixAug2014 acrometrix
Aug2014 acrometrix
 
Jan2016 nabsys giab
Jan2016 nabsys giabJan2016 nabsys giab
Jan2016 nabsys giab
 
08 Bernat aimradial20170921 Radial patency
08 Bernat aimradial20170921 Radial patency08 Bernat aimradial20170921 Radial patency
08 Bernat aimradial20170921 Radial patency
 
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR System
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR SystemIDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR System
IDNADEX: Improving DNA Data Exchange Validation Studies of a Global STR System
 
Molecular Assisted Breeding in Maize.pdf
Molecular Assisted Breeding in Maize.pdfMolecular Assisted Breeding in Maize.pdf
Molecular Assisted Breeding in Maize.pdf
 
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
 
ThesisPresentation_Upd
ThesisPresentation_UpdThesisPresentation_Upd
ThesisPresentation_Upd
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Single elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxSingle elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptx
 
Jo Dens: Re-entry devices – essential or not?
Jo Dens: Re-entry devices – essential or not?Jo Dens: Re-entry devices – essential or not?
Jo Dens: Re-entry devices – essential or not?
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 

Mais de Genome Reference Consortium

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 

Mais de Genome Reference Consortium (15)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 

Último

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 

Último (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 

20181016 grc presentation-pa

  • 1. Using long-read data to reveal variation and advance the human reference genome Peter Audano
  • 2. Motivation • GRCh38 – 3.10 Gbp • 3.26 Gbp with patches/ALTs • Needs diversity: 70% RP11 • Structural variants (SVs) affect many bases • Indels ≥ 50 bp and inversions • 11 Mbp / genome (7x more than SNPs/indel) • More likely to be an eQTL • Illumina cannot capture all SVs • 53% DEL, 22% INS (Chaisson 2018) Alkan (2011)
  • 3. Project goals 1. Sequence-resolve common structural variation 2. Correct errors and minor alleles in the reference 3. Build an alternate reference to support SV analysis with Illumina data
  • 4. Constructing a diversity panel GRC Sequenced • CHM1 (Mole) • CHM13 (Mole) • HG00514 (CHB) • HG00733 (PUR) • NA19240 (YRI) • HG02818 (GWD) • NA19434 (LWK) • HG01352 (CLM) • HG02059 (KHV) • NA12878 (CEU) • HG04217 (ITU) • HG02106 (PEL) • HG00268 (FIN) Public • AK1 (Korean) • Seo, 2016 • HX1 (Chinese) • Shi 2016 • New PacBio data on 11 genomes. • 7 are new long-read biological samples. • Selected females to balance X
  • 5. Building a non-redundant discovery set • 99,604 SVs • 21.3 Mbp INS • 18.5 Mbp DEL • 2,238 shared • 1.2 Mbp INS • 0.1 Mbp DEL • 5 coding • 160 regulatory
  • 7. Merged set variant length distribution • 223 full-length L1 elements• 3,383 full-length AluY elements
  • 8. Half the call set is novel, most is sequence- resolved for the first time
  • 9. 90% of the genome is resolved • CEN / Peri-CEN • 254.5 Mbp • Low depth • 0.4 Mbp • Unresolved DUP • 19.9 Mbp • High-identity segmental duplications
  • 10. Recovered 200 missing bases in FOXO6 and corrected its reading frame Credit: Max Dougherty
  • 11. VNTRs distribution is non-random • VNTR enrichment in subtelomeres • 4.8-fold enrichment (Wilcoxon p = 2.9 × 10-9) • Correlates with male meiotic recombination and double-strand breaks Credit: Arvis Sulovari
  • 12. Patching GRCh38 • Add SVs to reference on alternate contigs • Map reads with an ALT-aware aligner • Recovers 2.62% unmapped reads • Improves mapping quality for 25.68% of SV-insertion mapped reads • 2,228 SNPs and indels per sample within SV insertions (GQ 20+)
  • 13. Genotyping SVs in Illumina samples GRCh38 primary contig SV contig • Extract features around SVs • Train a machine learning model to predict genotypes • 91-95% accuracy • 15% no-call
  • 14. Genotyper is accurate over SV types and coverage
  • 15. Genotyping enables eQTL and sQTL analysis • 376 samples (avg. 6-fold) • 379 SV eQTLs (411 genes) • 34 significant after accounting for SNP eQTLs • 244 SV sQTLs (197 genes) Credit: Yang Li and Ankeeta Shah
  • 16. Resources available soon • Variant calls • VCF of SVs linked to contig breakpoints • BAM of contigs • SMRT-SV v2 genotyper • Contigs (PRJN481779) • Variants in dbVar (ntsd163) • Patched reference • ALTs for BWA-MEM • Graph for vg (Garrison 2018)
  • 17. Future work • Sequence 50 additional genomes • Phase genomes • 10X and Strand-Seq • Phased-SV (Chaisson 2017, bioRxiv) • Genotype additional genomes • 2,500 high-coverage 1000 Genomes samples • 10,000 autism genomes • Improve the human reference • Patch GRCh38 • Build a human pan genome reference
  • 18. Acknowledgments • Evan Eichler • Tonia Brown • Arvis Sulovari • David Gordon • Benson Hsieh • Zev Kronenberg • Tina Graves-Lindsay • Susan Dutcher • Wesley Warren • Vince Magrini • Sean McGrath • Richard Wilson • Yang Li • Ankeeta Shah
  • 20. 15 long-read samples Sample Population Super- population Source New Data Accession Platform Mean Depth Longest Coverage Subread Coverage Longest N50 Subread N50 CHM1 Mole NA Reference No PRJNA246220 RS II 65x 63x 66x 19,728 19,226 CHM13 Mole NA Reference No PRJNA269593 RS II 67x 63x 72x 11,954 11,320 HG00514 CHB EA Reference Yes* PRJNA300843 RS II 76x 93x 104x 17,472 16,653 HG00733 PUR AMR Reference Yes* PRJNA300840 RS II 55x 63x 69x 16,195 15,461 NA19240 YRI AFR Reference Yes* PRJNA288807 RS II 59x 65x 71x 17,343 16,584 HG02818 GWD AFR Reference Yes PRJNA339722 RS II 79x 90x 98x 16,807 16,221 NA19434 LWK AFR Reference Yes PRJNA385272 RS II 59x 62x 71x 17,635 16,853 HG01352 CLM AMR Reference Yes PRJNA339719 RS II 56x 69x 75x 20,738 20,049 HG02059 KHV SA Reference Yes PRJNA339726 RS II 64x 71x 77x 18,533 17,890 NA12878 CEU EUR Reference Yes* PRJNA323611 RS II 50x 66x 75x 17,121 16,376 HG04217 ITU SA Reference Yes PRJNA481794 RS II 40x 46x 51x 18,149 16,871 HG02106 PEL AMR Reference Yes PRJNA480858 Sequel 73x 66x 69x 21,540 20,646 HG00268 FIN EUR Reference Yes PRJNA480712 Sequel 85x 76x 79x 25,245 24,487 AK1 Korean EA Public No PRJNA298944 RS II 77x 89x 102x 15,609 14,721 HX1 Chinese EA Public No PRJNA301527 RS II 98x 79x 103x 13,412 12,002
  • 21. Methods •SMRT-SV •Align reads to GRCh38 •Assemble in windows (40 - 60 kbp) • Polish assemblies (quiver/arrow) •Align assemblies to GRCh38 •Call SVs • SV calls (VCF) with contig breakpoints • Contig BAM
  • 22. SVs are resolved in 90% of the genome
  • 24. Sensitivity increases over Illumina Published: • Illumina • 1000 Genomes 1 & 3 • Mills 2011 • Sudmant 2015 • GoNL
  • 25. An augmented reference reveals hidden variation Type Count AC SVs DEL 3,582 8,835 206 INS 2,407 11,008 35 SNV 15,980 48,813 0 All 21,969 68,656 241

Notas do Editor

  1. 341,331 calls Mean 11,043,749 / sample 13,480 INS, 9,212 DEL, 64 INS Shared: 1.19 Mbp INS (n = 1,670) 0.33 Mbp DEL (n = 557)
  2. Add 1: Grow 2.1% Add 35 (50 total): Grow 39% Double: 327 total samples
  3. Add funding
  4. Chr15 chr15-34346634-INS-2159 2.1 kbp shared INS
  5. In 15 sample, we discovered 22% of the SVs in more than 3,000 samples.