Thousands of new variants are being identified thanks to advances in sequencing technologies. However, much of the data are stored in separate and sometimes private databases and so may be difficult to use to evaluate the clinical significance of variants, especially rare variants. To improve access to this type of data, ClinVar maintains a freely available, public archive of human variation and its relationship to disease. The data can be used interactively on the web; a monthly full release in XML format and weekly summary files of genes and variants are also available for incorporation into analysis pipelines. Submissions include variants identified by direct testing in clinical or research labs, as well as reviewed variant-phenotype relationships from expert groups, such as InSiGHT and CFTR2, and professional societies, such as ACMG. In addition to the variant and phenotype, individual submissions may also provide a clinical assertion and evidence for that interpretation. The data model is flexible for many data elements, such that a variant may be defined by sequence or cytogenetic nomenclature; the phenotype may be a diagnostic term or features of a disease; and evidence for the interpretation may be structured as counts or provided as free text. For submitters who maintain their own website for variants, such as LSDBs, ClinVar links to the submitter’s site for each submitted variant, allowing users who start at ClinVar an awareness of the LSDB’s curated variants and access to more information on the variant that may be available at the LSDB. Each individual submission is accessioned and versioned, in the format SCV000000000.1, to allow the submitter to update their record as the interpretation of the variant is re-evaluated over time. ClinVar uses standard terminologies, such as those for variant nomenclature, phenotypes, and pathogenicity, to avoid data ambiguity and to promote comparison of information from multiple sources. ClinVar also adds related variant data, such as allele frequencies and HGVS expressions mapped across molecule types. While ClinVar staff members provide some curation of variants and phenotypes represented in ClinVar, clinical significance values are provided by submitters. As part of the submission process, ClinVar provides feedback to submitters. This feedback includes invalid HGVS expressions and submissions that conflict in clinical significance with an existing record for the same variant and phenotype which may warrant further curation. Submissions for the same variant-phenotype pair from different submitters are aggregated into a record that is accessioned and versioned in the format RCV000000000.1. Aggregation allows ClinVar to indicate when multiple submitters agree or conflict in the clinical interpretation of the variant, which can help clinical labs and curation groups to identify high-confidence interpretations as well as those that should be prioritized for curation efforts.
10. classified by single submitter
classified by multiple submitters
conflicting data from submitters
reviewed by expert panel
reviewed by professional society
ClinVar Review Status
Expert panels – both medical and research experts
with published criteria and process for evaluating
variant pathogenicity
• CFTR2, InSiGHT
Professional society – groups that provide practice
guidelines
• American College of Medical Genetics (ACMG)
11. ClinVar aggregates by variant
Variant
Phenotype
Submitter
PTPN11:c.205G>C
Noonan syndrome
Lab A
SCV000000010
PTPN11:c.205G>C
Noonan syndrome
Lab B
SCV000000020
Variant
Phenotype
PTPN11:c.205G>C
Noonan syndrome
RCV000000050
PTPN11:c.205G>C
Rasopathy
RCV000000050
PTPN11:c.205G>CVariant
PTPN11:c.205G>C
Rasopathy
Lab C
SCV000000030
13. Accessing ClinVar data
• Interactively on the web, updated weekly
• Monthly full releases
– Comprehensive XML extraction
– VCF files
– Tab-delimited summary files for genes, variants
• E-utilities as web service or via command line
• Annotation on graphic sequence displays
• Variation Viewer
www.ncbi.nlm.nih.gov/variation/view/
• Variation Reporter
www.ncbi.nlm.nih.gov/variation/tools/reporter
14. Submitting data to ClinVar
• Minimal or data-rich submissions are accepted
• Multiple submission formats
– Excel spreadsheet templates
– tsv, csv files
– XML
• Online documentation
http://www.ncbi.nlm.nih.gov/clinvar/docs/submit/
And contact us with questions -
clinvar@ncbi.nlm.nih.gov
15. Acknowledgements
ClinVar/GTR/RefSeqGene
/Gene/MedGen staff
dbSNP/dbVar/dbGaP
Alex Astashyn
Chao Chen
Shanmuga Chitipiralla
Baoshan Gu
Douglas Hoffman
Wonhee Jang
Brandi Kattman
Ken Katz
Jennifer Lee
Donna Maglott
Adriana Malheiro
Michael Ovetsky
George Riley
Wendy Rubinstein
Amanjeev Sethi
Ray Tully
Ricardo Villamarin
Michael Feolo
John Garner
Tim Hefferon
Brad Holmes
John Lopez
Rama Maiti
Jose Mena
Lon Phan
David Shao
Ming Ward
All of NCBI
Jim Ostell
Steve Sherry
clinvar@ncbi.nlm.nih.gov
Notas do Editor
Describe standardized names as HGVS in several coordinate systems