SlideShare uma empresa Scribd logo
1 de 27
ONTOLOGY MAPPING
FOR LIFE SCIENCE LINKED DATA
ISWC2016:::BMDID::Dumontier1
Amrapali Zaveri and Michel Dumontier
Stanford Center for Biomedical Informatics Research
Stanford University
Large and growing network of Linked Data
2 ISWC2016:::BMDID::DumontierLinking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
ISWC2016:::BMDID::Dumontier
Linked Data for the Life Sciences
3
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• 11B+ interlinked statements from 35 biomedical
datasets and 400+ ontologies
• dataset description, provenance & statistics
• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and
commercial tool providers
Biomedical Linked Data
ISWC2016:::BMDID::Dumontier4
the lack of coordination to a global schema
makes Linked Data chaotic and unwieldy
ISWC2016:::BMDID::Dumontier5
Federated queries require intimate
knowledge of each dataset schema
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x)
WHERE {
service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .
?go rdfs:subClassOf+ ?tgo
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
}
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
}
ISWC2016:::BMDID::Dumontier6
uniprot:P05067
uniprot:Protein
is a
sio:gene
is a is a
Previous work involved manual mappings between
Bio2RDF types and relations and the Semanticscience
Integrated Ontology (SIO)
dataset
ontology
Knowledge Base
ISWC2016:::BMDID::Dumontier
pharmgkb:PA30917
refseq:Protein
is a
is a
omim:189931
omim:Gene pharmgkb:Gene
Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and
Michel Dumontier. Bio-ontologies 2012.
7
ISWC2016:::BMDID::Dumontier8
Semanticscience Ontology (SIO)
An effective upper level ontology.
1500+ classes
207 object properties (inc. inverses)
1 datatype property
Bio2RDF and SIO powered SPARQL federated query:
Find chemicals (from CTD) and proteins (from SGD) that
participate in the same process (from GOA)
SELECT ?chem, ?prot, ?proc
FROM <http://bio2rdf.org/ctd>
WHERE {
SERVICE <http://ctd.bio2rdf.org/sparql> {
?chemical a sio:chemical-entity.
?chemical rdfs:label ?chem.
?chemical sio:is-participant-in ?process.
?process rdfs:label ?proc.
FILTER regex (?process, "http://bio2rdf.org/go:")
}
SERVICE <http://sgd.bio2rdf.org/sparql> {
?protein a sio:protein .
?protein sio:is-participant-in ?process.
?protein rdfs:label ?prot .
}
} ISWC2016:::BMDID::Dumontier9
Many vocabularies, ontologies
and community-based standards
are now available
ISWC2016:::BMDID::Dumontier
PubChem uses multiple terminologies
ISWC2016:::BMDID::Dumontier11
Existing limitations
with Bio2RDF mappings
• New datasets have been added
• Existing datasets have changed
• The target ontology (SIO) has changed
• The target ontology (SIO) is incomplete and there
may be better ontologies to use
• These ontologies are evolving, today’s mappings
may be invalid or imprecise tomorrow
• Manual process -> not easy and not reproducible
-> must automate
ISWC2016:::BMDID::Dumontier12
Goal
Develop a semi-automated procedure to
generate high quality mappings between
Bio2RDF and SIO.
ISWC2016:::BMDID::Dumontier13
approach
14
distance
metrics
graph
-based
instance
-based
BioPortal
crowdsourcing
previous work*
Our work
Automated Manual
ISWC2016:::BMDID::Dumontier
Idea: Create mappings between SIO and
Bio2RDF using ontologies in BioPortal
15
Bio2RDF
NCBO Annotator/
Recommender
SIO
ISWC2016:::BMDID::Dumontier
Bio2RDF-SIO mappings via transitive
closure through BioPortal ontologies
16
Bio2RDF
SIO
Super Class
Mapped Class
match
ISWC2016:::BMDID::Dumontier
Results
17
319 (of 6093) classes
pruned
1 NCBO Annotator
174 Bio2RDF classes
matched directly
and exactly to SIO
2 NCBO Recommender
94 Bio2RDF classes
matched to
BioPortal ontologies
Bio2RDF
remove blank nodes, general resources, OWL
vocabulary & non-Bio2RDF types/relations.
ISWC2016:::BMDID::Dumontier
Results
18
SIO
1500 classes
475 BioPortal
Ontologies
3
393 BioPortal
ontologies
matched to SIO
ISWC2016:::BMDID::Dumontier
Results
19
Bio2RDF
319 classes
4 Traverse hierarchy
SIO
1500 classes
393 BioPortal
ontologies
matched to SIO
94 Bio2RDF classes
matched to
BioPortal ontologies
ISWC2016:::BMDID::Dumontier
Results
20
Bio2RDF
319 classes
4 Traverse hierarchy
SIO
1500 classes
393 BioPortal
ontologies
matched to SIO
94 Bio2RDF classes
matched to
BioPortal ontologies
71 matches
Mapped class
Super class
ISWC2016:::BMDID::Dumontier
Results — Example
21
Bio2RDF
class
clinicaltrials:Clincial-Study
Super class
Edda:Study_Design
Mapped class
edda:clinical_trial
SIO
class
sio:001041| (study design)
skos:broader
ISWC2016:::BMDID::Dumontier
Mappings often occurred
to more than one class
22
sider:Drug-Indication-Association
sio:010038 (drug)
sio:010299 (disease)
sio:000897 (association)
ISWC2016:::BMDID::Dumontier
Manual validation of mappings
23
Bio2RDF Class SIO Class Annotation
drugbank:Biotech no match
clinicaltrials:Organization sio:00012 (organization) exact
drugbank:toxicity sio:001008 (toxicity) exact
sgd:GlycineCount sio:000794 (count) partial – is-a
wormbase:Genetic-
Interaction
sio:010035 (gene) partial – part-of
clinicaltrials:Serious-Event sio:000614 (attribute) incorrect
drugbank:Source sio:000510 (model) incorrect
All results available at https://goo.gl/eiijmQ ISWC2016:::BMDID::Dumontier
Conclusion
• Developed a semi-automated
methodology to map Bio2RDF classes to
SIO via BioPortal ontologies
• 245 of 319 Bio2RDF classes matched to
SIO
24 ISWC2016:::BMDID::Dumontier
Limitations
• Unmatched classes: neither SIO nor other
ontologies have complete coverage
• Overly general concepts: Semantically
incompatible classes
• Incorrect mappings: Matches to part of the
class
• Mappings are insufficient to precisely to
retrieve data across different datasets
25 ISWC2016:::BMDID::Dumontier
Future Work
• Extend SIO to include classes that are
ultimately not found
• Explore mid-level portion of SIO to eliminate
root level mappings
• Scalable validation by via crowdsourcing
• Pursue query rewriting
26 ISWC2016:::BMDID::Dumontier
dumontierlab.com
michel.dumontier@stanford.edu
Website: http://dumontierlab.com
27 ISWC2016:::BMDID::Dumontier

Mais conteúdo relacionado

Mais procurados

Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
Yasel Cruz
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
Valery Tkachenko
 

Mais procurados (20)

Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Structure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider databaseStructure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider database
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 

Destaque

Ontologies
OntologiesOntologies
Ontologies
Michel Dumontier
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Jie Bao
 
Linked Data in Healthcare and Life Sciences
Linked Data in Healthcare and Life SciencesLinked Data in Healthcare and Life Sciences
Linked Data in Healthcare and Life Sciences
James G. Boram Kim
 
CHPC Afternoon Session
CHPC Afternoon SessionCHPC Afternoon Session
CHPC Afternoon Session
Ntino Krampis
 

Destaque (20)

Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Ontologies
OntologiesOntologies
Ontologies
 
Bio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic WebBio2RDF : A biological knowledge base for the Semantic Web
Bio2RDF : A biological knowledge base for the Semantic Web
 
Knowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic WebKnowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic Web
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
Deliverable_5.1.2
Deliverable_5.1.2Deliverable_5.1.2
Deliverable_5.1.2
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 
The General Ontology Evaluation Framework (GOEF) & the I-Choose Use Case A ...
The General Ontology Evaluation Framework (GOEF) & the I-Choose Use CaseA ...The General Ontology Evaluation Framework (GOEF) & the I-Choose Use CaseA ...
The General Ontology Evaluation Framework (GOEF) & the I-Choose Use Case A ...
 
IAS 16 Ontology Dojo
IAS 16 Ontology DojoIAS 16 Ontology Dojo
IAS 16 Ontology Dojo
 
OWL Web Ontology Language Overview
OWL Web Ontology Language OverviewOWL Web Ontology Language Overview
OWL Web Ontology Language Overview
 
Schema.org: Where did that come from!
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Linked Data in Healthcare and Life Sciences
Linked Data in Healthcare and Life SciencesLinked Data in Healthcare and Life Sciences
Linked Data in Healthcare and Life Sciences
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Semantic Web - Ontology 101
Semantic Web - Ontology 101Semantic Web - Ontology 101
Semantic Web - Ontology 101
 
CHPC Afternoon Session
CHPC Afternoon SessionCHPC Afternoon Session
CHPC Afternoon Session
 
Generation Myth
Generation MythGeneration Myth
Generation Myth
 
Geekmeet Iasi Intro
Geekmeet Iasi IntroGeekmeet Iasi Intro
Geekmeet Iasi Intro
 

Semelhante a 2016 bmdid-mappings

Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Michel Dumontier
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
Michel Dumontier
 
Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 

Semelhante a 2016 bmdid-mappings (20)

2013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r22013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r2
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Bryant orcid pkp_08212013_v2
Bryant orcid pkp_08212013_v2Bryant orcid pkp_08212013_v2
Bryant orcid pkp_08212013_v2
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
GBIF and Open Science
GBIF and Open ScienceGBIF and Open Science
GBIF and Open Science
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
2013 CrossRef Annual Meeting Flash Update ORCID, Ed Pentz.
2013 CrossRef Annual Meeting Flash Update ORCID, Ed Pentz.2013 CrossRef Annual Meeting Flash Update ORCID, Ed Pentz.
2013 CrossRef Annual Meeting Flash Update ORCID, Ed Pentz.
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
From Biological Data to Clinical Applications: Positioning a digital infrastr...
From Biological Data to Clinical Applications: Positioning a digital infrastr...From Biological Data to Clinical Applications: Positioning a digital infrastr...
From Biological Data to Clinical Applications: Positioning a digital infrastr...
 
Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19Guideline based CDSS for COVID-19
Guideline based CDSS for COVID-19
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 

Mais de Michel Dumontier

CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
Michel Dumontier
 

Mais de Michel Dumontier (19)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
 
1st Network-of-BioThings Hackathon
1st Network-of-BioThings Hackathon1st Network-of-BioThings Hackathon
1st Network-of-BioThings Hackathon
 

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Último (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 

2016 bmdid-mappings

  • 1. ONTOLOGY MAPPING FOR LIFE SCIENCE LINKED DATA ISWC2016:::BMDID::Dumontier1 Amrapali Zaveri and Michel Dumontier Stanford Center for Biomedical Informatics Research Stanford University
  • 2. Large and growing network of Linked Data 2 ISWC2016:::BMDID::DumontierLinking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  • 3. ISWC2016:::BMDID::Dumontier Linked Data for the Life Sciences 3 Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF. chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications • 11B+ interlinked statements from 35 biomedical datasets and 400+ ontologies • dataset description, provenance & statistics • A growing interoperable ecosystem with the EBI, NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers
  • 5. the lack of coordination to a global schema makes Linked Data chaotic and unwieldy ISWC2016:::BMDID::Dumontier5
  • 6. Federated queries require intimate knowledge of each dataset schema Get all protein catabolic processes (and more specific GO terms) in biomodels SELECT ?go ?label count(distinct ?x) WHERE { service <http://bioportal.bio2rdf.org/sparql> { ?go rdfs:label ?label . ?go rdfs:subClassOf+ ?tgo ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") } service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . } } ISWC2016:::BMDID::Dumontier6
  • 7. uniprot:P05067 uniprot:Protein is a sio:gene is a is a Previous work involved manual mappings between Bio2RDF types and relations and the Semanticscience Integrated Ontology (SIO) dataset ontology Knowledge Base ISWC2016:::BMDID::Dumontier pharmgkb:PA30917 refseq:Protein is a is a omim:189931 omim:Gene pharmgkb:Gene Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. Bio-ontologies 2012. 7
  • 8. ISWC2016:::BMDID::Dumontier8 Semanticscience Ontology (SIO) An effective upper level ontology. 1500+ classes 207 object properties (inc. inverses) 1 datatype property
  • 9. Bio2RDF and SIO powered SPARQL federated query: Find chemicals (from CTD) and proteins (from SGD) that participate in the same process (from GOA) SELECT ?chem, ?prot, ?proc FROM <http://bio2rdf.org/ctd> WHERE { SERVICE <http://ctd.bio2rdf.org/sparql> { ?chemical a sio:chemical-entity. ?chemical rdfs:label ?chem. ?chemical sio:is-participant-in ?process. ?process rdfs:label ?proc. FILTER regex (?process, "http://bio2rdf.org/go:") } SERVICE <http://sgd.bio2rdf.org/sparql> { ?protein a sio:protein . ?protein sio:is-participant-in ?process. ?protein rdfs:label ?prot . } } ISWC2016:::BMDID::Dumontier9
  • 10. Many vocabularies, ontologies and community-based standards are now available ISWC2016:::BMDID::Dumontier
  • 11. PubChem uses multiple terminologies ISWC2016:::BMDID::Dumontier11
  • 12. Existing limitations with Bio2RDF mappings • New datasets have been added • Existing datasets have changed • The target ontology (SIO) has changed • The target ontology (SIO) is incomplete and there may be better ontologies to use • These ontologies are evolving, today’s mappings may be invalid or imprecise tomorrow • Manual process -> not easy and not reproducible -> must automate ISWC2016:::BMDID::Dumontier12
  • 13. Goal Develop a semi-automated procedure to generate high quality mappings between Bio2RDF and SIO. ISWC2016:::BMDID::Dumontier13
  • 15. Idea: Create mappings between SIO and Bio2RDF using ontologies in BioPortal 15 Bio2RDF NCBO Annotator/ Recommender SIO ISWC2016:::BMDID::Dumontier
  • 16. Bio2RDF-SIO mappings via transitive closure through BioPortal ontologies 16 Bio2RDF SIO Super Class Mapped Class match ISWC2016:::BMDID::Dumontier
  • 17. Results 17 319 (of 6093) classes pruned 1 NCBO Annotator 174 Bio2RDF classes matched directly and exactly to SIO 2 NCBO Recommender 94 Bio2RDF classes matched to BioPortal ontologies Bio2RDF remove blank nodes, general resources, OWL vocabulary & non-Bio2RDF types/relations. ISWC2016:::BMDID::Dumontier
  • 18. Results 18 SIO 1500 classes 475 BioPortal Ontologies 3 393 BioPortal ontologies matched to SIO ISWC2016:::BMDID::Dumontier
  • 19. Results 19 Bio2RDF 319 classes 4 Traverse hierarchy SIO 1500 classes 393 BioPortal ontologies matched to SIO 94 Bio2RDF classes matched to BioPortal ontologies ISWC2016:::BMDID::Dumontier
  • 20. Results 20 Bio2RDF 319 classes 4 Traverse hierarchy SIO 1500 classes 393 BioPortal ontologies matched to SIO 94 Bio2RDF classes matched to BioPortal ontologies 71 matches Mapped class Super class ISWC2016:::BMDID::Dumontier
  • 21. Results — Example 21 Bio2RDF class clinicaltrials:Clincial-Study Super class Edda:Study_Design Mapped class edda:clinical_trial SIO class sio:001041| (study design) skos:broader ISWC2016:::BMDID::Dumontier
  • 22. Mappings often occurred to more than one class 22 sider:Drug-Indication-Association sio:010038 (drug) sio:010299 (disease) sio:000897 (association) ISWC2016:::BMDID::Dumontier
  • 23. Manual validation of mappings 23 Bio2RDF Class SIO Class Annotation drugbank:Biotech no match clinicaltrials:Organization sio:00012 (organization) exact drugbank:toxicity sio:001008 (toxicity) exact sgd:GlycineCount sio:000794 (count) partial – is-a wormbase:Genetic- Interaction sio:010035 (gene) partial – part-of clinicaltrials:Serious-Event sio:000614 (attribute) incorrect drugbank:Source sio:000510 (model) incorrect All results available at https://goo.gl/eiijmQ ISWC2016:::BMDID::Dumontier
  • 24. Conclusion • Developed a semi-automated methodology to map Bio2RDF classes to SIO via BioPortal ontologies • 245 of 319 Bio2RDF classes matched to SIO 24 ISWC2016:::BMDID::Dumontier
  • 25. Limitations • Unmatched classes: neither SIO nor other ontologies have complete coverage • Overly general concepts: Semantically incompatible classes • Incorrect mappings: Matches to part of the class • Mappings are insufficient to precisely to retrieve data across different datasets 25 ISWC2016:::BMDID::Dumontier
  • 26. Future Work • Extend SIO to include classes that are ultimately not found • Explore mid-level portion of SIO to eliminate root level mappings • Scalable validation by via crowdsourcing • Pursue query rewriting 26 ISWC2016:::BMDID::Dumontier

Notas do Editor

  1. Bio2RDF is an open-source project that offers a large and connected knowledge graph of Life Science Linked Data. Each dataset is expressed using its own vocabulary, thereby hindering integration, search, query, and browse data across similar or identical types of data. With growth and content changes in source data, a manual approach to maintain mappings has proven untenable. The aim of this work is to develop a (semi)automated procedure to generate high quality mappings between Bio2RDF and SIO using BioPortal ontologies. Our preliminary results demonstrate that our approach is promising in that it can find new mappings using a transitive closure between ontology mappings. Further development of the methodology coupled with improvements in the ontology will offer a better-integrated view of the Life Science Linked Data
  2. The Bio2RDF project transforms silos of life science data into a globally distributed network of linked data for biological knowledge discovery.
  3. Bio2RDF - 11 billion triples, 35 datasets with 6093 classes across all the datasets Pruning - removing blank nodes, general resources, OWL vocabulary & other ontologies.
  4. SIO - 1500 classes, 208 properties LogMap - large-scale ontology mapping Ontologies such as CPO, FAO had no mappings, while others (e.g. GAZ, COGPO) were inconsistent and could not be used by LogMap.
  5. SIO-BioPortal & Bio2RDF-BioPortal
  6. we traversed the ancestors of the mapped BioPortal class to the first super class that is mapped to a SIO class. In this way, the Bio2RDF type becomes a candidate subclass of the SIO class.
  7. http://ontologies.dbmi.pitt.edu/edda/StudyDesigns.owl
  8. \verb|sider:Drug-Indication-Association| mapped to three of the SIO classes \verb|sio:010038| (drug) and \verb|sio:010299| (disease) and \verb|sio:000897| (association).
  9. evaluated the mappings manually drugbank:Source -> SNOMED CT Model Component -> model