The community needs to be provided with terminology standards in order to achieve interoperability between databases intended for clinical research and including description of phenotypes. This is crucial to interpret genomic rearrangements as well as future high-throughput sequence data. The aim of our work was to promote a core terminology of phenotypes interoperable with all the terminologies in use. Relevant terminologies in use by different communities to describe phenomes were cross–referenced: PhenoDB (2846 terms), London Dysmorphology Database (LDDB; 1318 terms), Orphanet (1243 terms), Human Phenotype Ontology (9895 terms, 22/08/2012), Elements of Morphology (AJMG; 423 terms), ICD10 (1230 terms), as well as medical terminologies in use: UMLS (7,957,179 distinct concept terms), SNOMED CT (>311,000 concepts), MeSH (26,853 concepts) and MedDRA (69,389 concepts). We established a strategy to compare them to find commonalities and differences, using ONAGUI as a tool to pick-up exact matches. The non-exact matches were verified manually by an expert. A core-terminology of 2,300 terms was derived and analysed by a panel of experts (International Consortium for Human Phenotype Terminologies – ICHPT). The resulting consensual terminology will be freely available in a dedicated website (www.ichpt.org) and mappings with other terminologies will be given in order to ease the interoperability between databases without disturbing the habits of the different groups of users.
Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath
1. www.orpha.net
Phenotype terminologies in use
for genotype-phenotype
databases:
A common core for standardisation and
interoperability
HVP5 – UNESCO, Paris – 22 May 2014
Aymé S., Rath A., Chanas L., Hamosh A., Robinson P.N. &
The International Consortium for Human Phenotype Terminologies
3. www.orpha.net
Different resources, different terminologies
(e)HR:
SNOMED CT
Others?
Free text
Mutation/patient registries,
databases:
HPO
LDDB
PhenoDB
Elements of morphology
Others? Free text?
Tools for diagnosis:
HPO
LDDB
Orphanet
4. www.orpha.net
Levels of granularity
Disorders
• Purpose: coding diagnoses (i.e. medical records)
Clinical manifestations
• Purpose: describing patients, genotype-phenotype correlations,
… (i.e. assitance-to-diganosis tools, research databases)
Specialised terms
• Fit the particular needs of a disease-focused database/registry
(i.e. Phe values in PKU and related disorders)
For phenotype annotations, interoperability between
terminologies is needed at the clinical manifestations
level.
5. www.orpha.net
Interoperability based on mappings
Syntaxic:
• The terms are identical
Can be done by machines
Semantic:
• The concepts are identical
Should be done by humans
Structural:
• The comprehension of the concepts is identical
Impossible to maintain
6. www.orpha.net
Phenotype terminology project
Aims:
• Map commonly used clinical terminologies (Orphanet, LDDB, HPO,
Elements of morphology, PhenoDB, UMLS, SNOMED-CT, MESH,
MedDRA):
automatic map, expert validation, detection and correction of
inconsistencies
• Find common terms in the terminologies
• Produce a core terminology
Common denominator allowing to share/exchange phenotypic data
between databases
Mapped to every single terminology
7. www.orpha.net
Overview of project progress
Sept 2012: start of mappings (Orphanet)
EUGT2 – EUCERD workshop (Paris, September 2012)
• Constitution of the International Consortium of Human
Phenotype Terminologies
ICHPT workshop (ASHG, Boston, October 2013)
• Selection of 2,300 core terms
PhenoDB
HPO
Orphanet
LDDB
Elements of Morphology
POSSUM
SNOMED CT (IHTSDO)
DECIPHER
IRDiRC
8. www.orpha.net
Step 1: mapping terminologies
Orphanet: 1357 terms (Orphanet database, version 2008)
LDDB: 1348 dysmorphological terms (Installation CD)
Elements of Morphology: 423 terms (retrieved manually from
publication AJMG, January 2009)
HPO: 9895 terms (download bioportal, obo format, 30/08/12)
PhenoDB: 2846 terms (given in obo format, 02/05/2012)
UMLS: (version 2012AA) (integrating MeSH, MedDra, SNOMED CT)
9. www.orpha.net
Tools
OnaGUI (INSERM U729):
ontology alignment tool
Work with file in owl format
I-Sub algorithm: detect syntaxic
similarity
Graphical interface to check
automatic mappings and manually
add ones
Metamap (National Library of
Medicine): a tool to map biomedical
text to the UMLS Metathesaurus
Perl scripts: format conversion,
launching Metamap, comparison of
results…
10. www.orpha.net
Comparison of mappings and deduction
Perl script to compare all the mappings and infer mappings
of non-Orphanet terminologies
Eg: Orphanet ID XX mapped to YY in HPO and ZZ in LDDB ->
deduction: YY and ZZ should probably map
Retrieve HPO mappings versus UMLS, MeSH
First figures:
LDDB El. Morpho PhenoDB HPO UMLS…
Orphanet E: 1062 E: 416 E: 978 E: 2228 E: 6948
LDDB D: 275 D: 533 D: 1123 D:2678
El. Morpho D: 177 D: 716 D: 409
PhenoDB D: 1045 D:3268
HPO D: 6307+4800
UMLS…
12. www.orpha.net
First list of common terms
Present in at least 3 terminologies
Definition of rules for nomenclature
Addition of terms present in each terminology as synonyms
13. www.orpha.net
Next steps
Cleaning-up: around 2,300 terms as a result
Re-do mappings
• In order to provide exact matchs
Revision process by the group
Addition of definitions
• Elements of Morphology
• HPO
• New definitions
Release in a dedicated website, hosted by
• Visualisation
• Download