SlideShare uma empresa Scribd logo
1 de 32
Evaluating scientific hypotheses using
      the SPARQL Inferencing Notation




             Alison Callahan and Michel Dumontier

             Department of Biology, Carleton University




1                                                         ESWC2012::HyQue-SPIN
2   ESWC2012::HyQue-SPIN
3   ESWC2012::HyQue-SPIN
Uncovering all the evidence to support/refute a
 hypothesis is becoming increasingly difficult
                  and requires a lot of digging around
Continuous growth in research outputs




    Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html




                                                             http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=151

5                                                                                            ESWC2012::HyQue-SPIN
Semantic Web technologies for biological
   knowledge management and discovery

• Capability to publish, link, retrieve and query de-
  centralized data

• A powerful integrative platform across data, ontology and
  services

• Formal knowledge representation allows for automated
  reasoning

• Massive growth in dataset availability, and soon, in
  application development
A rapidly growing web of linked data




7   “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Bio2RDF covers the major biological
           databases
BioPortal gives up-to-date access to
           bio-ontologies
SADI provides access to Semantic
           Web Services


                                                 The Semantic Automated        Discovery
                                                 and Integration (SADI)       framework
                                                 makes it easy to create       Semantic
                                                 Web Services using OWL       classes as
                                                 service inputs and outputs

                                                   http://sadiframework.org

~700 bioinformatic services as of May 29, 2012
                                                           Mark Wilkinson, UBC
                                                   Michel Dumontier, Carleton University
                                                         Christopher Baker, UNB
HyQue

     HyQue is the Hypothesis query and evaluation system
     • A platform for knowledge discovery
     • Facilitates hypothesis formulation and evaluation
     • Leverages Semantic Web technologies to provide access to
       facts, expert knowledge and web services
     • Conforms to a simplified event-based model
     • Supports evaluation against positive and negative findings
     • Transparent and reproducible evidence prioritization
     • Provenance of across all elements of hypothesis testing
        – trace a hypothesis to its evaluation, including the data and rules used



       Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web
       technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
11                                                                                         ESWC2012::HyQue-SPIN
HyQue
  • Background knowledge as OWL ontologies
      hypotheses (HO), processes/events (GO), measurement
      values (SIO), units (UO), evidence (ECO), molecules (ChEBI),
      biopolymers (SO), etc

  • Facts as RDF data
      model organism data - genes and their chromosomal location,
      proteins and their functions, localization and participation in
      interactions, complexes, pathways, biological processes, etc

  • Evaluation rules defined using SPIN
           Domain-specific rules - scores based on external knowledge
           System rules - scores based on hypothesis structure


Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation.
Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
HyQue Architecture
A HyQue hypothesis is a collection of propositions

• proposition: “a statement expressing something true or false”
• HyQue propositions specify events
• complex propositions can be formulated using logical operators
  (AND, OR, XOR…) or decomposed using component relations

HyQue hypothesis ≡ ‘proposition’
    that ‘specifies’ only `event’)

HyQue hypothesis ≡ ‘proposition’
    that `has component part’ only
    (`proposition’ that ‘specifies’ only `event’)
Event-based data model

     HyQue events denote a phenomenon involving two
     objects: „agent‟ and „target‟ . In addition, we can specify the
     context of this event (e.g. located in nucleus, or under
     some genetic background)
                                        Currently supported events
     Event                          1. protein-protein binding
      ‘has agent’ agent             2. protein-nucleic acid binding
      ‘has target’ target           3. molecular activation
      ‘is located in’ location
                                    4. molecular inhibition
                                    5. gene induction
      ‘is negated’ boolean
                                    6. gene repression
                                    7. transport




15                                                        ESWC2012::HyQue-SPIN
Example Hypothesis
     • HyQue‟s demonstrative knowledge
       base is focused on galactose
       metabolism and regulation.

     The paper describes a union of
     hypotheses:
     (Gal4p induces the expression of GAL1 AND
      Gal4p induces the expression of GAL7 AND
      Gal3p induces the expression of GAL2)
     OR
     (Gal4p induces the expression of GAL7 AND
      Gal80p induces the expression of GAL7 AND
      Gal80p does not inhibit the activity of Gal4p
       WHEN GAL3 is over-expressed)


16
Users don‟t need to know RDF to formulate hypotheses




     User Interface with auto-completion
     http://hyque.semanticscience.org
17                                               ESWC2012::HyQue-SPIN
Hypothesis RDF Representation

  hypothesis                  :h rdf:type hyque:Hypothesis ;
                                 hyque:has-component-part :p1 .
         has component part


                              :p1 rdf:type hyque:Proposition ;
  proposition
                                  hyque:specifies :e1


         specifies
                              :e1 rdf:type hyque:Event .

    event
Event RDF representation


                                  :e1 rdf:type hyque:event ;
                                   <!– positive regulation of gene expression -->
              event:
     gal4p positively regulates       rdf:type <http://bio2rdf.org/go:0010628>;
      the expression of GAL1
                                  hyque:agent <http://bio2rdf.org/sgd:Gal4p> ;
                                     hyque:target <http://bio2rdf.org/sgd:GAL1> ;
                                     hyque:is_negated "0";
                                     ….




19                                                            ESWC2012::HyQue-SPIN
event:
     gal4p positively regulates the
         expression of GAL1




                           HyQue‟s SPIN rules retrieve event data,
                          and then score it and the overall hypothesis

              HyQue current contains 63 SPIN rules to evaluate hypotheses:
                       18 system rules, 45 domain specific rules



20                                                                ESWC2012::HyQue-SPIN
Combination of system and domain rules to
 retrieve and score data, and add new triples
 Event - induction         SPIN induction rule

 :e1 a go:0010628;
 hyque:agent sgd:Gal4p;
 hyque:target sgd:GAL1 .
 hyque:is_negated "0" ;




21                                               ESWC2012::HyQue-SPIN
SPIN System Rule :
          Link Hypothesis to Evaluation
     CONSTRUCT {
       ?this ‘has attribute’ ?hypothesisEval .
       ?hypothesisEval a ‘evaluation’.
       ?hypothesisEval ‘obtained from’ ?propositionEval .
       ?hypothesisEval ‘has value ?hypothesisEvalScore .
     } WHERE {
       ?this ‘has component part’ ?proposition .
       ?proposition ‘has attribute’ ?propositionEval .
       BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) .
       BIND(IRI(fn:concat(afn:namespace(?this),
             afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) .
     }




22                                                        ESWC2012::HyQue-SPIN
SPIN Domain Rule: Score experimental evidence
          of Gene Expression Induction Event

 SELECT ?induceEventScore
 WHERE {
     BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) .
     BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS
         ?agentFunctionTypeScore) .
     BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) .
     BIND (:calculateInduceLogicalOperatorScore(?arg1) AS
         ?logicalOperatorScore) .
     BIND (:calculateInduceEventLocationScore(?arg1) AS
         ?eventLocationScore) .
     BIND (:penalizeNegation(?arg1) AS ?negationScore) .
     BIND (5 AS ?maxScore) .
     BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) +
         ?targetTypeScore) + ?logicalOperatorScore) +
         ?eventLocationScore) + ?negationScore) / ?maxScore) AS
         ?induceEventScore) .
 }


24                                                      ESWC2012::HyQue-SPIN
HyQue domain rules CALCULATE a quantitative
      measure of evidence for an event
„induce‟ rule (maximum score: 5):
   – Is event negated?                             GO:0010628
       • If yes, subtract 2
   – Is event of type „induce‟?                             CHEBI:36080
       • If yes, add 1; if no, subtract 1
   – Is agent of type „protein‟ or „RNA‟?
       • If yes, add 1; if of type „gene‟, subtract 1
   – Is target of type „gene‟?                                  SO:0000236
       • If yes, add 1; if no, subtract 1
   – Does agent have known „transcription factor activity‟?
       • If yes, add 1                                          GO:0003700
   – Is event located in the „nucleus‟?
       • If yes, add 1; if no, subtract 1
                                                        GO:0005634
SPIN rule, outcome and score
     for a GAL gene induction event




                                   4/5 = 0.80




26                               ESWC2012::HyQue-SPIN
Can customize rules to get more
      evidence, but at a cost if not found
     • calculateInhibitEventScore
       does not take into account the    (Gal4p induces the expression of GAL1 e1 AND
       physical location of the event    Gal3p induces the expression of GAL2 e2 AND
       participants                          Gal4p induces the expression of GAL7)
                                                              OR
     • Experimental evidence              (Gal4p induces the expression of GAL7 AND
       suggests that physical location    Gal80p induces the expression of GAL7 AND
       in the context of an inhibition     Gal80p does not inhibit the activity of Gal4p
       event is important                         WHEN GAL3 is over-expressed)

     • Inhibition of Gal4p activity by
       Gal80p is known to take place
       in the nucleus, yet this
       inhibition is interrupted when     Adding a new rule to consider location
       Gal80p is bound by Gal3p,          weakens the event due to lack of data
       which is typically found in the    (0.87 -> score 0.78)
       cytoplasm

27                                                                ESWC2012::HyQue-SPIN
Customization of rules and rulesets can generate
     different evidence-based evaluations
Reproducible eScience
LOD for Hypothesis, Rules, Data and Evaluation
Summary

     • HyQue is a system that facilitates the
       formulation and evaluation of scientific
       hypotheses against formalized
       knowledge on the Semantic Web.
     • This work focused on the development
       and incorporation of recursive SPIN rules
       to obtain and score events and multi-event
       hypotheses using OWL ontologies and
       RDF-based LOD.

30                                      ESWC2012::HyQue-SPIN
Future Directions

     • Collaborative, end user-centered
       environment to engineer, share, compare
       and evaluate hypotheses
     • Investigate alternative scoring systems
     • Structure knowledge beyond the GAL
       network
       – EU/US Collaborations on disease-centered
         research hypotheses
       – Applications for clinical decision support

31                                          ESWC2012::HyQue-SPIN
dumontierlab.com
michel_dumontier@carleton.ca
                         Website: http://dumontierlab.com
    Presentations: http://slideshare.com/micheldumontier




                                       ESWC2012::HyQue-SPIN

Mais conteúdo relacionado

Destaque

Design Thinking in EFL Context
Design Thinking in EFL ContextDesign Thinking in EFL Context
Design Thinking in EFL Context
Debopriyo Roy
 
iPhone and Appstore
iPhone and AppstoreiPhone and Appstore
iPhone and Appstore
Home
 
Bangkok presentation3.17.13reduced
Bangkok presentation3.17.13reducedBangkok presentation3.17.13reduced
Bangkok presentation3.17.13reduced
Mark Bethel
 
Financiranje malih in srednjih podjetij
Financiranje malih in srednjih podjetijFinanciranje malih in srednjih podjetij
Financiranje malih in srednjih podjetij
Damjana Kocjanc
 
Going the extra mile in designing for the web
Going the extra mile in designing for the webGoing the extra mile in designing for the web
Going the extra mile in designing for the web
Home
 
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
SteveScheibe
 

Destaque (20)

Cim2013 oboni oboni_zabolotoniuk
Cim2013 oboni oboni_zabolotoniukCim2013 oboni oboni_zabolotoniuk
Cim2013 oboni oboni_zabolotoniuk
 
Referaat 31 05 2011
Referaat 31 05 2011Referaat 31 05 2011
Referaat 31 05 2011
 
Balaur.ro - Cristian George Strat
Balaur.ro - Cristian George StratBalaur.ro - Cristian George Strat
Balaur.ro - Cristian George Strat
 
Design Thinking in EFL Context
Design Thinking in EFL ContextDesign Thinking in EFL Context
Design Thinking in EFL Context
 
iPhone and Appstore
iPhone and AppstoreiPhone and Appstore
iPhone and Appstore
 
Email etiquette
Email etiquetteEmail etiquette
Email etiquette
 
Bangkok presentation3.17.13reduced
Bangkok presentation3.17.13reducedBangkok presentation3.17.13reduced
Bangkok presentation3.17.13reduced
 
Cda esm waste oil disposal application part 2
Cda esm waste oil disposal application part 2Cda esm waste oil disposal application part 2
Cda esm waste oil disposal application part 2
 
Design for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement ProcessDesign for Innovation (D4I) Improvement Process
Design for Innovation (D4I) Improvement Process
 
The Beatles Parte II
The Beatles Parte IIThe Beatles Parte II
The Beatles Parte II
 
Line Upgrade Deferral Scenarios for Distributed Renewable Energy Resources
Line Upgrade Deferral Scenarios for Distributed Renewable Energy ResourcesLine Upgrade Deferral Scenarios for Distributed Renewable Energy Resources
Line Upgrade Deferral Scenarios for Distributed Renewable Energy Resources
 
GeekMeet Intro - Filip C.T.E.
GeekMeet Intro - Filip C.T.E.GeekMeet Intro - Filip C.T.E.
GeekMeet Intro - Filip C.T.E.
 
Financiranje malih in srednjih podjetij
Financiranje malih in srednjih podjetijFinanciranje malih in srednjih podjetij
Financiranje malih in srednjih podjetij
 
Going the extra mile in designing for the web
Going the extra mile in designing for the webGoing the extra mile in designing for the web
Going the extra mile in designing for the web
 
Tamk Conference Finished 2008
Tamk Conference Finished 2008Tamk Conference Finished 2008
Tamk Conference Finished 2008
 
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
RISI Latin American Conference, Eden Roc Hotel, Miami, Nov. 20, 2013
 
Zas
ZasZas
Zas
 
Free Software
Free SoftwareFree Software
Free Software
 
Uz big design talk may10
Uz big design talk may10Uz big design talk may10
Uz big design talk may10
 
Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...
 

Semelhante a Evaluating scientific hypotheses using the SPARQL Inferencing Notation

HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologies
Michel Dumontier
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 

Semelhante a Evaluating scientific hypotheses using the SPARQL Inferencing Notation (20)

HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologies
 
Knowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic WebKnowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic Web
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologiesTAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
TAMALE Seminar: Evaluating scientific hypotheses using Semantic Web technologies
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2
 
Neo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and BiomarkersNeo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and Biomarkers
 
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxyTin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
 
Bio4j
Bio4jBio4j
Bio4j
 

Mais de Michel Dumontier

CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
Michel Dumontier
 

Mais de Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Evaluating scientific hypotheses using the SPARQL Inferencing Notation

  • 1. Evaluating scientific hypotheses using the SPARQL Inferencing Notation Alison Callahan and Michel Dumontier Department of Biology, Carleton University 1 ESWC2012::HyQue-SPIN
  • 2. 2 ESWC2012::HyQue-SPIN
  • 3. 3 ESWC2012::HyQue-SPIN
  • 4. Uncovering all the evidence to support/refute a hypothesis is becoming increasingly difficult and requires a lot of digging around
  • 5. Continuous growth in research outputs Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=151 5 ESWC2012::HyQue-SPIN
  • 6. Semantic Web technologies for biological knowledge management and discovery • Capability to publish, link, retrieve and query de- centralized data • A powerful integrative platform across data, ontology and services • Formal knowledge representation allows for automated reasoning • Massive growth in dataset availability, and soon, in application development
  • 7. A rapidly growing web of linked data 7 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 8. Bio2RDF covers the major biological databases
  • 9. BioPortal gives up-to-date access to bio-ontologies
  • 10. SADI provides access to Semantic Web Services The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs http://sadiframework.org ~700 bioinformatic services as of May 29, 2012 Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB
  • 11. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3. 11 ESWC2012::HyQue-SPIN
  • 12. HyQue • Background knowledge as OWL ontologies hypotheses (HO), processes/events (GO), measurement values (SIO), units (UO), evidence (ECO), molecules (ChEBI), biopolymers (SO), etc • Facts as RDF data model organism data - genes and their chromosomal location, proteins and their functions, localization and participation in interactions, complexes, pathways, biological processes, etc • Evaluation rules defined using SPIN Domain-specific rules - scores based on external knowledge System rules - scores based on hypothesis structure Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
  • 14. A HyQue hypothesis is a collection of propositions • proposition: “a statement expressing something true or false” • HyQue propositions specify events • complex propositions can be formulated using logical operators (AND, OR, XOR…) or decomposed using component relations HyQue hypothesis ≡ ‘proposition’ that ‘specifies’ only `event’) HyQue hypothesis ≡ ‘proposition’ that `has component part’ only (`proposition’ that ‘specifies’ only `event’)
  • 15. Event-based data model HyQue events denote a phenomenon involving two objects: „agent‟ and „target‟ . In addition, we can specify the context of this event (e.g. located in nucleus, or under some genetic background) Currently supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport 15 ESWC2012::HyQue-SPIN
  • 16. Example Hypothesis • HyQue‟s demonstrative knowledge base is focused on galactose metabolism and regulation. The paper describes a union of hypotheses: (Gal4p induces the expression of GAL1 AND Gal4p induces the expression of GAL7 AND Gal3p induces the expression of GAL2) OR (Gal4p induces the expression of GAL7 AND Gal80p induces the expression of GAL7 AND Gal80p does not inhibit the activity of Gal4p WHEN GAL3 is over-expressed) 16
  • 17. Users don‟t need to know RDF to formulate hypotheses User Interface with auto-completion http://hyque.semanticscience.org 17 ESWC2012::HyQue-SPIN
  • 18. Hypothesis RDF Representation hypothesis :h rdf:type hyque:Hypothesis ; hyque:has-component-part :p1 . has component part :p1 rdf:type hyque:Proposition ; proposition hyque:specifies :e1 specifies :e1 rdf:type hyque:Event . event
  • 19. Event RDF representation :e1 rdf:type hyque:event ; <!– positive regulation of gene expression --> event: gal4p positively regulates rdf:type <http://bio2rdf.org/go:0010628>; the expression of GAL1 hyque:agent <http://bio2rdf.org/sgd:Gal4p> ; hyque:target <http://bio2rdf.org/sgd:GAL1> ; hyque:is_negated "0"; …. 19 ESWC2012::HyQue-SPIN
  • 20. event: gal4p positively regulates the expression of GAL1 HyQue‟s SPIN rules retrieve event data, and then score it and the overall hypothesis HyQue current contains 63 SPIN rules to evaluate hypotheses: 18 system rules, 45 domain specific rules 20 ESWC2012::HyQue-SPIN
  • 21. Combination of system and domain rules to retrieve and score data, and add new triples Event - induction SPIN induction rule :e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ; 21 ESWC2012::HyQue-SPIN
  • 22. SPIN System Rule : Link Hypothesis to Evaluation CONSTRUCT { ?this ‘has attribute’ ?hypothesisEval . ?hypothesisEval a ‘evaluation’. ?hypothesisEval ‘obtained from’ ?propositionEval . ?hypothesisEval ‘has value ?hypothesisEvalScore . } WHERE { ?this ‘has component part’ ?proposition . ?proposition ‘has attribute’ ?propositionEval . BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) . BIND(IRI(fn:concat(afn:namespace(?this), afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) . } 22 ESWC2012::HyQue-SPIN
  • 23.
  • 24. SPIN Domain Rule: Score experimental evidence of Gene Expression Induction Event SELECT ?induceEventScore WHERE { BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) . BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS ?agentFunctionTypeScore) . BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) . BIND (:calculateInduceLogicalOperatorScore(?arg1) AS ?logicalOperatorScore) . BIND (:calculateInduceEventLocationScore(?arg1) AS ?eventLocationScore) . BIND (:penalizeNegation(?arg1) AS ?negationScore) . BIND (5 AS ?maxScore) . BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) + ?targetTypeScore) + ?logicalOperatorScore) + ?eventLocationScore) + ?negationScore) / ?maxScore) AS ?induceEventScore) . } 24 ESWC2012::HyQue-SPIN
  • 25. HyQue domain rules CALCULATE a quantitative measure of evidence for an event „induce‟ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type „induce‟? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type „protein‟ or „RNA‟? • If yes, add 1; if of type „gene‟, subtract 1 – Is target of type „gene‟? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known „transcription factor activity‟? • If yes, add 1 GO:0003700 – Is event located in the „nucleus‟? • If yes, add 1; if no, subtract 1 GO:0005634
  • 26. SPIN rule, outcome and score for a GAL gene induction event 4/5 = 0.80 26 ESWC2012::HyQue-SPIN
  • 27. Can customize rules to get more evidence, but at a cost if not found • calculateInhibitEventScore does not take into account the (Gal4p induces the expression of GAL1 e1 AND physical location of the event Gal3p induces the expression of GAL2 e2 AND participants Gal4p induces the expression of GAL7) OR • Experimental evidence (Gal4p induces the expression of GAL7 AND suggests that physical location Gal80p induces the expression of GAL7 AND in the context of an inhibition Gal80p does not inhibit the activity of Gal4p event is important WHEN GAL3 is over-expressed) • Inhibition of Gal4p activity by Gal80p is known to take place in the nucleus, yet this inhibition is interrupted when Adding a new rule to consider location Gal80p is bound by Gal3p, weakens the event due to lack of data which is typically found in the (0.87 -> score 0.78) cytoplasm 27 ESWC2012::HyQue-SPIN
  • 28. Customization of rules and rulesets can generate different evidence-based evaluations
  • 29. Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation
  • 30. Summary • HyQue is a system that facilitates the formulation and evaluation of scientific hypotheses against formalized knowledge on the Semantic Web. • This work focused on the development and incorporation of recursive SPIN rules to obtain and score events and multi-event hypotheses using OWL ontologies and RDF-based LOD. 30 ESWC2012::HyQue-SPIN
  • 31. Future Directions • Collaborative, end user-centered environment to engineer, share, compare and evaluate hypotheses • Investigate alternative scoring systems • Structure knowledge beyond the GAL network – EU/US Collaborations on disease-centered research hypotheses – Applications for clinical decision support 31 ESWC2012::HyQue-SPIN
  • 32. dumontierlab.com michel_dumontier@carleton.ca Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier ESWC2012::HyQue-SPIN

Notas do Editor

  1. Evaluating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
  2. Can’t answer questions that require background knowledge
  3. We represent a hypothesis as a collection of propositions
  4. This is part of a hypothesis represented in N3 and used as input to HyQueNote: Binding between galactose and Gal3p does not return any results; there IS binding between Gal3p and Gal80p
  5. The RDF representing the evaluation of the input hypothesis is linked to both the hypothesis AND the data used to evaluate the hypothesis
  6. This is a screenshot of some HyQue data in Virtuoso, a triple store system that we use to store and access RDF