valuating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
Evaluating scientific hypotheses using the SPARQL Inferencing Notation
1. Evaluating scientific hypotheses using
the SPARQL Inferencing Notation
Alison Callahan and Michel Dumontier
Department of Biology, Carleton University
1 ESWC2012::HyQue-SPIN
4. Uncovering all the evidence to support/refute a
hypothesis is becoming increasingly difficult
and requires a lot of digging around
5. Continuous growth in research outputs
Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html
http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=151
5 ESWC2012::HyQue-SPIN
6. Semantic Web technologies for biological
knowledge management and discovery
• Capability to publish, link, retrieve and query de-
centralized data
• A powerful integrative platform across data, ontology and
services
• Formal knowledge representation allows for automated
reasoning
• Massive growth in dataset availability, and soon, in
application development
7. A rapidly growing web of linked data
7 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
10. SADI provides access to Semantic
Web Services
The Semantic Automated Discovery
and Integration (SADI) framework
makes it easy to create Semantic
Web Services using OWL classes as
service inputs and outputs
http://sadiframework.org
~700 bioinformatic services as of May 29, 2012
Mark Wilkinson, UBC
Michel Dumontier, Carleton University
Christopher Baker, UNB
11. HyQue
HyQue is the Hypothesis query and evaluation system
• A platform for knowledge discovery
• Facilitates hypothesis formulation and evaluation
• Leverages Semantic Web technologies to provide access to
facts, expert knowledge and web services
• Conforms to a simplified event-based model
• Supports evaluation against positive and negative findings
• Transparent and reproducible evidence prioritization
• Provenance of across all elements of hypothesis testing
– trace a hypothesis to its evaluation, including the data and rules used
Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web
technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
11 ESWC2012::HyQue-SPIN
12. HyQue
• Background knowledge as OWL ontologies
hypotheses (HO), processes/events (GO), measurement
values (SIO), units (UO), evidence (ECO), molecules (ChEBI),
biopolymers (SO), etc
• Facts as RDF data
model organism data - genes and their chromosomal location,
proteins and their functions, localization and participation in
interactions, complexes, pathways, biological processes, etc
• Evaluation rules defined using SPIN
Domain-specific rules - scores based on external knowledge
System rules - scores based on hypothesis structure
Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation.
Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
14. A HyQue hypothesis is a collection of propositions
• proposition: “a statement expressing something true or false”
• HyQue propositions specify events
• complex propositions can be formulated using logical operators
(AND, OR, XOR…) or decomposed using component relations
HyQue hypothesis ≡ ‘proposition’
that ‘specifies’ only `event’)
HyQue hypothesis ≡ ‘proposition’
that `has component part’ only
(`proposition’ that ‘specifies’ only `event’)
15. Event-based data model
HyQue events denote a phenomenon involving two
objects: „agent‟ and „target‟ . In addition, we can specify the
context of this event (e.g. located in nucleus, or under
some genetic background)
Currently supported events
Event 1. protein-protein binding
‘has agent’ agent 2. protein-nucleic acid binding
‘has target’ target 3. molecular activation
‘is located in’ location
4. molecular inhibition
5. gene induction
‘is negated’ boolean
6. gene repression
7. transport
15 ESWC2012::HyQue-SPIN
16. Example Hypothesis
• HyQue‟s demonstrative knowledge
base is focused on galactose
metabolism and regulation.
The paper describes a union of
hypotheses:
(Gal4p induces the expression of GAL1 AND
Gal4p induces the expression of GAL7 AND
Gal3p induces the expression of GAL2)
OR
(Gal4p induces the expression of GAL7 AND
Gal80p induces the expression of GAL7 AND
Gal80p does not inhibit the activity of Gal4p
WHEN GAL3 is over-expressed)
16
17. Users don‟t need to know RDF to formulate hypotheses
User Interface with auto-completion
http://hyque.semanticscience.org
17 ESWC2012::HyQue-SPIN
20. event:
gal4p positively regulates the
expression of GAL1
HyQue‟s SPIN rules retrieve event data,
and then score it and the overall hypothesis
HyQue current contains 63 SPIN rules to evaluate hypotheses:
18 system rules, 45 domain specific rules
20 ESWC2012::HyQue-SPIN
21. Combination of system and domain rules to
retrieve and score data, and add new triples
Event - induction SPIN induction rule
:e1 a go:0010628;
hyque:agent sgd:Gal4p;
hyque:target sgd:GAL1 .
hyque:is_negated "0" ;
21 ESWC2012::HyQue-SPIN
22. SPIN System Rule :
Link Hypothesis to Evaluation
CONSTRUCT {
?this ‘has attribute’ ?hypothesisEval .
?hypothesisEval a ‘evaluation’.
?hypothesisEval ‘obtained from’ ?propositionEval .
?hypothesisEval ‘has value ?hypothesisEvalScore .
} WHERE {
?this ‘has component part’ ?proposition .
?proposition ‘has attribute’ ?propositionEval .
BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) .
BIND(IRI(fn:concat(afn:namespace(?this),
afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) .
}
22 ESWC2012::HyQue-SPIN
23.
24. SPIN Domain Rule: Score experimental evidence
of Gene Expression Induction Event
SELECT ?induceEventScore
WHERE {
BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) .
BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS
?agentFunctionTypeScore) .
BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) .
BIND (:calculateInduceLogicalOperatorScore(?arg1) AS
?logicalOperatorScore) .
BIND (:calculateInduceEventLocationScore(?arg1) AS
?eventLocationScore) .
BIND (:penalizeNegation(?arg1) AS ?negationScore) .
BIND (5 AS ?maxScore) .
BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) +
?targetTypeScore) + ?logicalOperatorScore) +
?eventLocationScore) + ?negationScore) / ?maxScore) AS
?induceEventScore) .
}
24 ESWC2012::HyQue-SPIN
25. HyQue domain rules CALCULATE a quantitative
measure of evidence for an event
„induce‟ rule (maximum score: 5):
– Is event negated? GO:0010628
• If yes, subtract 2
– Is event of type „induce‟? CHEBI:36080
• If yes, add 1; if no, subtract 1
– Is agent of type „protein‟ or „RNA‟?
• If yes, add 1; if of type „gene‟, subtract 1
– Is target of type „gene‟? SO:0000236
• If yes, add 1; if no, subtract 1
– Does agent have known „transcription factor activity‟?
• If yes, add 1 GO:0003700
– Is event located in the „nucleus‟?
• If yes, add 1; if no, subtract 1
GO:0005634
26. SPIN rule, outcome and score
for a GAL gene induction event
4/5 = 0.80
26 ESWC2012::HyQue-SPIN
27. Can customize rules to get more
evidence, but at a cost if not found
• calculateInhibitEventScore
does not take into account the (Gal4p induces the expression of GAL1 e1 AND
physical location of the event Gal3p induces the expression of GAL2 e2 AND
participants Gal4p induces the expression of GAL7)
OR
• Experimental evidence (Gal4p induces the expression of GAL7 AND
suggests that physical location Gal80p induces the expression of GAL7 AND
in the context of an inhibition Gal80p does not inhibit the activity of Gal4p
event is important WHEN GAL3 is over-expressed)
• Inhibition of Gal4p activity by
Gal80p is known to take place
in the nucleus, yet this
inhibition is interrupted when Adding a new rule to consider location
Gal80p is bound by Gal3p, weakens the event due to lack of data
which is typically found in the (0.87 -> score 0.78)
cytoplasm
27 ESWC2012::HyQue-SPIN
30. Summary
• HyQue is a system that facilitates the
formulation and evaluation of scientific
hypotheses against formalized
knowledge on the Semantic Web.
• This work focused on the development
and incorporation of recursive SPIN rules
to obtain and score events and multi-event
hypotheses using OWL ontologies and
RDF-based LOD.
30 ESWC2012::HyQue-SPIN
31. Future Directions
• Collaborative, end user-centered
environment to engineer, share, compare
and evaluate hypotheses
• Investigate alternative scoring systems
• Structure knowledge beyond the GAL
network
– EU/US Collaborations on disease-centered
research hypotheses
– Applications for clinical decision support
31 ESWC2012::HyQue-SPIN
Evaluating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
Can’t answer questions that require background knowledge
We represent a hypothesis as a collection of propositions
This is part of a hypothesis represented in N3 and used as input to HyQueNote: Binding between galactose and Gal3p does not return any results; there IS binding between Gal3p and Gal80p
The RDF representing the evaluation of the input hypothesis is linked to both the hypothesis AND the data used to evaluate the hypothesis
This is a screenshot of some HyQue data in Virtuoso, a triple store system that we use to store and access RDF