SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Converting from Analog to Digital
Integrating the historical archive of human variation in an NGS world
Deanna M. Church
Staff Scientist, NCBI
@deannachurch Genome Informatics Alliance 2013
Acknowledgements
GeT-RM
Lisa Kalman (CDC)
Birgit Funke (Harvard)
Mahduri Hegde (Emory)
Maryam Halavi
Chao Chen
Jon Trow
Douglas Slotta
Peter Meric
Daniel Frishberg
Victor Ananiev
ClinVar
Alex Astashyn
Shanmuga Chitipiralla
Douglas Hoffman
Wonhee Jang
Brandi Kattman
Melissa Landrum
Jennifer Lee
Adriana Malheiro
Wendy Rubinstein
George Riley
Amanjeev Sethi
Ricardo Villamarin
ISCA
Christa Lese Martin (Geisinger)
Erin Riggs (Geisinger)
Jose Mena
Mike Feolo
Tim Hefferon
John Garner
John Lopez
GRC
Valerie Schneider (NCBI)
The Genome Institute at Washington University
The Wellcome Trust Sanger Institute
The European Bioinformatics Institute
Church gia13
Variation
Phenotypes
Phenotypes
Variant Call (dbVar
submission)
Array data files
Clinical Labs
QC Analysis
Curation
Data regularization
dbGaP
Controlled Access
Web access
FTP Access
Assembly
Remapping
dbVar
ISCA
UCSC
DGV
DGVa
NCBI
Approved Users
BioProject ID
ClinVardbGaP projects need
a sponsoring NIH
institute to run the
DAC (NICHD)
ASD
Atrial Septum Defect Autism Spectrum Disorder
??
No HPO
1,814
HPO
6,770
Riggs et al, 2012
~2 HPO terms/case
(max of 16)
The Human Phenotype Ontology
http://www.ncbi.nlm.nih.gov/medgen
Variation
sequences alignments genotype likelihoods individual variants
1
10
100
1,000
10,000
100,000
size(gigabytes)
component
1092 genomes (low coverage + exome)
38.2M SNPs
3.9M Short Indels and
14K Deletions
FASTQ
BAM
VCF
VCF
FASTQ
BAM
VCF
VCF
Steve Sherry, NCBI
http://www.bioplanet.com/gcat
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
http://genomereference.org
GRCh37
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
Hydin: chr16 (16q22.2)
Hydin2: chr1 (1q21.1)
Missing in NCBI35 Unlocalized in NCBI36/GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
Church gia13
Kidd et al, 2007APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: Insertion
Black: Deletion
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Human Resolved for GRCh38
http://genomereference.org
GRCh38 is coming
(September, 2013)
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Calls
Tests
cSRA
Concordant
DiscordantNA
Target audience: Clinical testing labs
Submissions from: Clinical and Research labs
Reporting Standards: Not standard
Twelve submitting labs to date
Twelve custom scripts to regularize data
Despite defined formats here:
http://www.ncbi.nlm.nih.gov/projects/variation/get-rm
What are the issues?
Reporting Standards: Not standard
What are the issues?
Better Example: QUAL*
*Required sixth column in VCF file
10.01-18357.11
2.6-21.2
0-21.2
20-3070
Allele string
34.79-44624.03
None
20-46006
c.1956+15C>CT
Reporting Standards: Not standard
What are the issues?
Lab reporting a single nucleotide change (C->T) het change as:
c.1956+15C>T[=]
HGVS standards says this should be reported as:
Lab reporting a single nucleotide change (A->G) hom change as:
c.670+9A>G
HGVS standards says this should be reported as:
c.[670+9A>G];[670+9A>G]
Defining a reference sequence: Data validation
NM_007171.3:c.942T>CReported as:
Base in transcript is a ‘C’ not a ‘T’
http://www.ncbi.nlm.nih.gov/clinvar
Standardize data: what is the variation?
607008.0001
985A>G
985A>G (K304E)
A985G
ACADM, LYS304GLU
K304E
K304E (985 A->G)
K304E (K329E)
K304E only
K329E
K329E(985A>G)
LYS304GLU
Mutation c.985A>G (p.K304E)
c.985A>G
c.985A>G (p.K304E)
c.985A>G (p.Lys304Glu
includes: K304E (985A>G)
p.K304E
p.Lys329Glu
previously known as p.Lys329Glu
Analysis of ACADM 985A>G mutation
NC_000001.10:g.76226846A>GNG_007045.1:g.41804A>G
NM_000016.4:c.985A>G
NP_000007.1:p.Lys329Glu
rs77931234
Church gia13
Miki et al, 1994
Church gia13

Mais conteúdo relacionado

Mais procurados

DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsMelanie Swan
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
 
Security's in your DNA: Genomics for InfoSec
Security's in your DNA: Genomics for InfoSecSecurity's in your DNA: Genomics for InfoSec
Security's in your DNA: Genomics for InfoSecRob Bird
 
Literature mining and large-scale data integration
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integrationLars Juhl Jensen
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumHuman Variome Project
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Melanie Swan
 
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...Human Variome Project
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous dataLars Juhl Jensen
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architectureGenomeInABottle
 
Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Human Variome Project
 

Mais procurados (20)

DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal Genomics
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 
darpa documents
darpa documentsdarpa documents
darpa documents
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
Security's in your DNA: Genomics for InfoSec
Security's in your DNA: Genomics for InfoSecSecurity's in your DNA: Genomics for InfoSec
Security's in your DNA: Genomics for InfoSec
 
Literature mining and large-scale data integration
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integration
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Compar
ComparCompar
Compar
 
Com par 25jun14
Com par 25jun14Com par 25jun14
Com par 25jun14
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?Personal Genomes: what can I do with my data?
Personal Genomes: what can I do with my data?
 
Com par 25jun14
Com par 25jun14Com par 25jun14
Com par 25jun14
 
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
I evobio zilversmit_25jun14
I evobio zilversmit_25jun14I evobio zilversmit_25jun14
I evobio zilversmit_25jun14
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...
 

Destaque

Destaque (6)

Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church iowa2013
Church iowa2013Church iowa2013
Church iowa2013
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 

Semelhante a Church gia13

NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserGenomeInABottle
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)niranabey
 
Crowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging ArchiveCrowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging ArchiveCancerImagingInforma
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
The BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar RätschThe BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar RätschHuman Variome Project
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech Huser
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...Gabe Rudy
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Saul Kravitz
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 

Semelhante a Church gia13 (20)

Church GeT-RM
Church GeT-RMChurch GeT-RM
Church GeT-RM
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)
 
Crowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging ArchiveCrowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging Archive
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
The BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar RätschThe BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
Vojtech huser-2009-amia-clinical-research-informatics-panel-eligibility-v011
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 

Mais de Deanna Church

Mais de Deanna Church (9)

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Church emory2013
Church emory2013Church emory2013
Church emory2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church nhgri 2012
Church nhgri 2012Church nhgri 2012
Church nhgri 2012
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Church gia13

Notas do Editor

  1. The reference is not just the is the chromosome sequences of the primary assembly unit, but also includes the alternate loci and patches, which are used to provide additional sequence representations at selected genomic regions. The GRC has been releasing patches to the human assembly on a quarterly cycle, and we’re now at GRCh37.p12. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociThis ideogram shows the current distribution of patches and alternate loci, and you can see that many regions have changed since GRCh37. Note that approximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.