Our conference presentation at the 6th International Conference on Biomedical Ontology (ICBO), held at Lisbon, Portugal, during 27th-30th July 2015. Conference Proceedings: http://icbo2015.fc.ul.pt/ICBO2015Proceedings.pdf
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Investigating Term Reuse and Overlap in Biomedical Ontologies
1. Investigating Term Reuse and Overlap
in Biomedical Ontologies
International Conference on Biomedical Ontology
Lisbon, 27th -30th July 2015
MAU LI K R. K AM D AR , TANI A TUDORACHE A N D MARK A . MUS E N
Are we there yet?
3. C0011849Diabetes
Mellitus
Diabetes
Mellitus
Unified Medical Language System (UMLS)
Open Biomedical Ontologies (OBO) Foundry
SNOMEDCT ICD9CM
Binding to RNA
(GRO#BindingToRNA)
GO:0003723
IRI xref
RNA Binding
(GO:0003723)
Gene Expression
Ontology (GEXO)
Gene Regulation
Ontology (GEXO)
Gene Ontology (GO)
4. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
5. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
Same IRI
6. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
Same IRI
Intent for
Reuse
7. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
Xref
mapping
Same IRI
Intent for
Reuse
8. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
September 2009
9. Ghazvinian, Amir, et al. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
OBO Reuse vs Overlap in 2010
September 2010
11. Key Findings
~3% Term Reuse
Only popular or upper-
level ontologies reused
14.4% Term Overlap
12. Key Findings
~3% Term Reuse
Only popular or upper-
level ontologies reused
14.4% Term Overlap
Semantically-similar
terms reused together
Similarity metric for a
Recommender system
20. Key Findings
~3% Term Reuse
Only popular or upper-
level ontologies reused
14.4% Term Overlap
Semantically-similar
terms reused together
Similarity metric for a
Recommender system
22. 14.4% Naïve Term Overlap!
• Normalized String Matching on
Term Labels
14.4%
(823621)
23. 156/377 ontologies reuse no terms from other ontologies!
<5% of Terms reused from other Ontologies!
>
IRI Reuse
24. 156/377 ontologies reuse no terms from other ontologies!
<5% of Terms reused from other Ontologies!
>
IRI Reuse
25. 156/377 ontologies reuse no terms from other ontologies!
<5% of Terms reused from other Ontologies!
>
IRI Reuse
26. 315/377 ontologies xref link to no terms from other ontologies!
<5% of Terms reused from other Ontologies!
>
Xref Reuse
27. 263/377 ontologies have no terms reused by other ontologies!
Reuse from a small set of ontologies only!
>
IRI Reuse
28. 286/377 ontologies have no terms xref linked by other ontologies!
Reuse from a small set of ontologies only!
>
Xref Reuse
29. 0-5% of total terms reused explicitly or using
xref, with >150 ontologies showing 0% reuse.
Average Term Reuse ~ 3%
Reuse from a small set of ontologies only with
terms from >250 ontologies never reused
>100% term reuse from some ontologies! Why?
32. … Reuse from a small set of popular or upper-
level ontologies only with terms from >250
ontologies never reused
>100% terms reused w.r.t current version of the
BFO, PATO, CARO, UO, SO ontologies!
Needs rigorous analysis through term overlap …
40. Minimum sharing of CUIs, especially across
UMLS Procedural Terminologies
- ICD10PCS, HCPCS and CPT
Several unique terms introduced as we migrate
from ICD9CM -> ICD10CM, leading to decrease
in Term reuse.
Should there actually be Term Reuse?
41. Overlap decreases using correct representations!
14.4%
(823621)
• Normalized String Matching on Term Labels
13.2%
(752,176)
• Removing Explicitly Reused Terms
10.8%
(617509)
• Removing Terms Mapped to the same UMLS CUI
1.6%
(93,650)
• Removing almost-similar terms (same identifier
and source ontology but different representation)
42. Average 3% Term reuse across ontologies using
any method, yet a 14.4% naïve Term overlap!
Term overlap decreases substantially on
removing almost similar terms …
Examples for almost similar terms?
43. Version 1.0/Version1.1
Subcellular Anatomy Ontology (SAO)
Suggested Ontology for Pharmacogenomics (SOPHARM)
Intent
Different
Versions
BFO
NCIT
Different
Notations
FMA
Different
Namespaces
MESH
SNOMEDCT
Ontology Engineers show an intent for reuse!
48. Different versions, notations, namespaces
• >100% Reuse of few source ontologies
• Increase in Term Overlap
Incorrect representations without mappings do
not provide advantages of Term Reuse!
49. Key Findings
~3% Term Reuse
Only popular or upper-
level ontologies reused
14.4% Term Overlap
Semantically-similar
terms reused together
Similarity metric for a
Recommender system
56. Semantic-similar terms (Parent-child or siblings)
are reused together …
Similarity Metric and BioPortal can be used to
provide recommendations to ontology
developers through a Web Protégé plugin!
57. Challenges to Term Reuse
• Substantial term overlap but less than 5% reuse.
• Lexically-similar terms may represent different concepts (e.g.,
anatomical concepts between ZFA and XAO).
• Lexically-different terms may represent same concepts (e.g.
myocardium and cardiac muscle)
• Same terms use different IRI representations, and without explicit
CUI or xref mappings.
• Lack of guidelines and semi-automated tools.
58. Future Work: WebProtégé Plugin
Term reuse recommendations using
Item-based Collaborative Filtering method.
Two-fold (A Posteriori and User-Centered) Evaluation
GO:0033036
GO:0008104
GO:1902432 GO:1903260
GO:0061472
GO:0090174
GO:0071850
GO:0044770
GO:0044839
GO:0045786
GO:0007050
GO:0044843 GO:1902969 GO:0036226
59. - Still far from achieving ideal term reuse, beyond upper
level and popular ontologies
- Newer ontologies added in BioPortal
- Without strict guidelines and semi-automated tools,
we will deviate more away …
The Road Ahead …
To support the interoperability, the Unified Medical Language System (UMLS) uses the notion of a Concept Unique Identifier (CUI) to map terms with similar meaning in different terminologies
To support the interoperability, the Unified Medical Language System (UMLS) uses the notion of a Concept Unique Identifier (CUI) to map terms with similar meaning in different terminologies
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Ghazvinian, Amir, Natalya Fridman Noy, and Mark A. Musen. "How orthogonal are the OBO Foundry ontologies?." J. Biomedical Semantics 2.S-2 (2011): S2.
Contributions:
A set of descriptive statistics describing the level of reuse in biomedical ontologies stored in BioPortal,
An interactive visualization technique for displaying the reuse dependencies among biomedical ontologies
A clustering method to help identify patterns of reuse using semantic similarity between the terms
A discussion on the state and challenges of reuse in biomedical ontologies and development of a semi-automated tool enabling reuse
Contributions:
A set of descriptive statistics describing the level of reuse in biomedical ontologies stored in BioPortal,
An interactive visualization technique for displaying the reuse dependencies among biomedical ontologies
A clustering method to help identify patterns of reuse using semantic similarity between the terms
A discussion on the state and challenges of reuse in biomedical ontologies and development of a semi-automated tool enabling reuse
Contributions:
A set of descriptive statistics describing the level of reuse in biomedical ontologies stored in BioPortal,
An interactive visualization technique for displaying the reuse dependencies among biomedical ontologies
A clustering method to help identify patterns of reuse using semantic similarity between the terms
A discussion on the state and challenges of reuse in biomedical ontologies and development of a semi-automated tool enabling reuse
Dresden Ontology Generator for Directed Acyclic Graphs
MIREOT Principles
Dresden Ontology Generator for Directed Acyclic Graphs
MIREOT Principles
Dresden Ontology Generator for Directed Acyclic Graphs
MIREOT Principles
Contributions:
A set of descriptive statistics describing the level of reuse in biomedical ontologies stored in BioPortal,
An interactive visualization technique for displaying the reuse dependencies among biomedical ontologies
A clustering method to help identify patterns of reuse using semantic similarity between the terms
A discussion on the state and challenges of reuse in biomedical ontologies and development of a semi-automated tool enabling reuse
175,347 terms (3.1%) were explicitly shared using the same IRIs. Source ontology for all but 37 terms, whose ontologies were not present in BioPortal (e.g., owl:Thing and time#datetimedescription). After removing the imported ontology terms (term reuse > 35% threshold), only 59,618 terms (1.1%) were actually reused
We found a total of 4,370,350 xref axioms across all the BioPortal ontologies. After extracting xrefs, which assert equivalence between BioPortal ontology terms, we found 171,069 ‘outlinking’ terms (3.9%) xref-linked to 386,442 `inlinking' terms (8.84%)
175,347 terms (3.1%) were explicitly shared using the same IRIs. Source ontology for all but 37 terms, whose ontologies were not present in BioPortal (e.g., owl:Thing and time#datetimedescription). After removing the imported ontology terms (term reuse > 35% threshold), only 59,618 terms (1.1%) were actually reused
We found a total of 4,370,350 xref axioms across all the BioPortal ontologies. After extracting xrefs, which assert equivalence between BioPortal ontology terms, we found 171,069 ‘outlinking’ terms (3.9%) xref-linked to 386,442 `inlinking' terms (8.84%)
175,347 terms (3.1%) were explicitly shared using the same IRIs. Source ontology for all but 37 terms, whose ontologies were not present in BioPortal (e.g., owl:Thing and time#datetimedescription). After removing the imported ontology terms (term reuse > 35% threshold), only 59,618 terms (1.1%) were actually reused
We found a total of 4,370,350 xref axioms across all the BioPortal ontologies. After extracting xrefs, which assert equivalence between BioPortal ontology terms, we found 171,069 ‘outlinking’ terms (3.9%) xref-linked to 386,442 `inlinking' terms (8.84%)
175,347 terms (3.1%) were explicitly shared using the same IRIs. Source ontology for all but 37 terms, whose ontologies were not present in BioPortal (e.g., owl:Thing and time#datetimedescription). After removing the imported ontology terms (term reuse > 35% threshold), only 59,618 terms (1.1%) were actually reused
We found a total of 4,370,350 xref axioms across all the BioPortal ontologies. After extracting xrefs, which assert equivalence between BioPortal ontology terms, we found 171,069 ‘outlinking’ terms (3.9%) xref-linked to 386,442 `inlinking' terms (8.84%)
BFO, PATO(Phenotypic Quality Ontology) CARO (Core anatomy reference ontology), UO (Units of Measurement) and SO (Sequence and Cell Feature Types ontology)
BFO, PATO(Phenotypic Quality Ontology) CARO (Core anatomy reference ontology), UO (Units of Measurement) and SO (Sequence and Cell Feature Types ontology)
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Healthcare Common Procedure Coding System (HCPCS)
Current Procedural Terminology
Executing normalised string matching on the term labels, we found a term overlap of 823,621 shared term labels (14.4%).
Removing explicitly-reused terms, list reduced to 752,176 labels (13.2%).
Removing terms mapped to the same UMLS CUI, list reduced to 617,509 labels (10.8%).
On extracting the resource identifier from each term IRI, we removed terms with almost similar term IRIs (same identifier and source ontology, but a different or incorrect representation)
List reduced to 93,650 term labels (1.6%).
The last step does not represent actual reuse between ontologies, but rather that ontology developers showed an intention to reuse terms, but used different and sometimes incorrect term representations (discussed below)
SO (
SO (
SO (
SO (
SO (
Contributions:
A set of descriptive statistics describing the level of reuse in biomedical ontologies stored in BioPortal,
An interactive visualization technique for displaying the reuse dependencies among biomedical ontologies
A clustering method to help identify patterns of reuse using semantic similarity between the terms
A discussion on the state and challenges of reuse in biomedical ontologies and development of a semi-automated tool enabling reuse
Term-ontology matrix. The rows contain the explicitly-reused terms and the columns contain the ontology in which the term appears.
Sparse K-means algorithm with the Gap-Estimate method (K=6)
For each pair of terms in each cluster, we compute similarity scores.
Use spectral clustering method with the term-term affinity matrix.
Term-ontology matrix. The rows contain the explicitly-reused terms and the columns contain the ontology in which the term appears.
Sparse K-means algorithm with the Gap-Estimate method (K=6)
For each pair of terms in each cluster, we compute similarity scores.
Use spectral clustering method with the term-term affinity matrix.
Term-ontology matrix. The rows contain the explicitly-reused terms and the columns contain the ontology in which the term appears.
Sparse K-means algorithm with the Gap-Estimate method (K=6)
For each pair of terms in each cluster, we compute similarity scores.
Use spectral clustering method with the term-term affinity matrix.
Reuse dependencies could guide term reuse based on the structure of ontologies in related domains.
Identifying reuse patterns and providing personalized recommendations could help increase term reuse.
Item-based Collaborative Filtering Method (used by Amazon) to provide term reuse recommendations to users through a Web Protégé Plugin, and also allow automated updating.
Two-fold Evaluation
a posteriori: check if the term-reuse recommendations match those actually reused by users, as analyzed from the logs
user-centered: monitoring term reuse when developers build an ontology combining existing ontologies, and surveys