Forensic Biology & Its biological significance.pdf
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
1. Semantic Approaches
for Biochemical Knowledge Discovery
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
@micheldumontier::ACS:15-03-2016
6. Reusing raw and curated data in thousands of databases is
challenging: identifiers, formats, access methods, links
6 @micheldumontier::ACS:15-03-2016
7. Various software are needed to analyze data
(problems: OS, versioning, input/output formats)
7 @micheldumontier::ACS:15-03-2016
8. Ultimately, scientists develop fairly sophisticated
programs/workflows to test hypotheses
8 @micheldumontier::ACS:15-03-2016
9. The absence of intelligent systems
requires vast amounts of
experience and technical expertise
@micheldumontier::ACS:15-03-20169
10. How can we
automatically find
the evidence that
support or dispute a
scientific hypothesis
using the latest data,
tools and scientific
knowledge?
@micheldumontier::ACS:15-03-201610
11. So what do we need to achieve this?
1. Data Science Tools and Methods
– To identify, represent, interlink, integrate, and query
data and services
– To identify and uncover support for known or novel
associations
2. Community Standards to share and interrogate a
massive, decentralized network of interconnected data
and software
@micheldumontier::ACS:15-03-201611
12. First, we need FAIR data
Findable
– Globally unique identifiers for datasets and the data they contain
– Rich set of descriptors to search and filter with
– Indexed and searchable
Accessible
– Metadata is eternally available.
– Identifiers are used to retrieve representations using standard protocols (e.g.
HTTP)
Interoperable
– Data represented with formal knowledge representations
– Include links to other datasets/vocabularies
Reusable
– Licensing, Provenance, Community standards
@micheldumontier::ACS:15-03-201612
“Numbers have no way of speaking for themselves. We need to
imbue them with meaning.” - Nate Silver, The signal and the noise
14. The Semantic Web
is the new global web of knowledge
14 @micheldumontier::ACS:15-03-2016
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently formulated
and distributed knowledge
15. Linked Data is FAIR data
15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
16. @micheldumontier::ACS:15-03-2016
Linked Data for the Life Sciences
16
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• 11B+ interlinked statements from 35 biomedical
datasets
• dataset description, provenance & statistics
• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and
commercial tool providers
18. Bio2RDF shows how datasets are
connected together
@micheldumontier::ACS:15-03-201618
19. graph methods for data quality
to find mismatches and discover new links
@micheldumontier::ACS:15-03-201619
W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data.
International Semantic Web Conference (2) 2015: 446-462.
20. Federated Queries
over public SPARQL EndPoints
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x)
WHERE {
service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .
?go rdfs:subClassOf+ ?tgo
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
}
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
}
@micheldumontier::ACS:15-03-201620
21. EbolaKB
Using Linked Data and Software
@micheldumontier::ACS:15-03-201621
Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.
28. smartAPI
The goal is to reduce the barrier for the discovery and
reuse of web APIs through richer semantic metadata.
i) a coordinated facility for the intelligent annotation of
smart APIs
ii) a web application to discover smart APIs and how
they connect to each other.
iii) The augmentation of existing APIs to provide FAIR
data
28 @micheldumontier::ACS:15-03-2016
30. Evan’s Questions
• What should we be doing now?
– Encouraging researchers to publish FAIR data and
services
• How should we be doing it?
– As Linked Data
– Institutional repositories and available in wikidata and
other aggregators
• Where are things going in the future?
– Reproducible analyses over indexed, archived, and
massively connected knowledge graphs
@micheldumontier::ACS:15-03-201630