2. Variation Resources Team at NCBI
Ming Ward
Lon Phan
Brad Holmes
Anna Glodek
Michael Kholodov
Rama Maiti
Juliana Sampson
David Shao
Eugene Shekhtman
Qiang Wang
Hua Zhang
Key Collaborators
Heidi Rehm, Harvard Partners
Christa Lese Martin, Geisinger
Sherri Bale, GeneDx
Lisa Kalman, CDC
Birgit Funke, Harvard Partners
Madhuri Hegde, Emory
Donna Maglott
Melissa Landrum
Jennifer Lee
George Riley
Ray Tully
Craig Wallin
Shanmuga Chitipiralla
Douglas Hoffman
Wonhee Jang
Ken Katz
Michael Ovetsky
Ricardo Villamarin
Tim Hefferon
John Lopez
John Garner
Chao Chen
21. Submitter Information
Contact and author information
Study Information
Study meta-data (description, PMID,
ProjectID, etc)
Sample/Sampleset data
Experiment data
Variants
Sample IDs (if samples are consented)
Sampleset ID for pooled samples (case
v control sets)
Assay method (sequencing, array)
Platform and analysis information
Variant definitions
22.
23. Variant Call Ambiguity
start
stop
Probes with decreased signal intensity
Probes with expected signal intensity
breakpoint
breakpoint
Inner start
Inner stop
Outer start
Outer stop
Inner start
Inner stop
24. Fosmid clone (40 Kb +/- 1 Kb)
Variant Call Ambiguity
Outer start
Outer stop
20Kb
Clone has a deletion
relative to the genome
60 Kb
Clone has an insertion
relative to the genome
28. ClinVar data model and display
Allele
Variant
Variant
Phenotype
Variant
Phenotype
Submitter
RCV
RCV
SCV
SCV
SCV
SCV
SCV
SCV
29. ClinVar RCV report - Overview
Interpretation
• Significance
• Review status *
• Accession.version *
Allele summary
• Gene
• Variant type
• Genomic location
• HGVS expressions*
• Molecular
consequence*
• Links*
• Frequency*
Phenotype summary
• Names
• Links*
• Age of onset *
• Prevalence *
* May be provided by NCBI
30. ClinVar RCV report –
Summary of assertions
• Each submission is accessioned and versioned
• Terms provided by the submitter are mapped to controlled values
• Method of review is clearly reported so primary data can be distinguished
from that reported in the literature
42. Target audience: Clinical testing labs
Submissions from: Clinical and Research labs
NA
Concordant
Discordant
Calls
Tests
cSRA
http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
43. Twelve submitting labs to date
Twelve custom scripts to regularize data
Defined formats here:
http://www.ncbi.nlm.nih.gov/projects/variation/get-rm
45. Lab Provided Validation
Variants validated in this sample using another platform
Variants validated in another sample using another platform
Variants seen in other samples from submitting lab using this platform
Variants seen in public data set
Variants that are novel
Variants that were not assessed
46. Suppor ng Read Counts
250
Number of Variant
200
150
100
50
0
0
10
Based on May 2013 Data release
50
100
500
Read Count Bins
1000
5000
50. Gene level concordance
Σ (max(xi)/Σ T)
i = genotype call
X = count per call for each variant
T = total genotype calls per variant
Sums are taken over all variants in
a gene.
Tested regions taken into account
Phasing ignored
Notas do Editor
Put plug for Tim here…
RefSeqGene/LRG screen shot: stable coordinate system for gene level reporting. Gene centric genomic sequences.