SlideShare uma empresa Scribd logo
1 de 23
Augmenting PROV with Plans in P-PLAN:
           Scientific Processes as Linked Data



             Daniel Garijo                            Yolanda Gil
                OEG-DIA                    Information Sciences Institute and
         Facultad de Informática            Department of Computer Science
    Universidad Politécnica de Madrid       University of Southern California

      dgarijo@delicias.dia.fi.upm.es            http://www.isi.edu/~gil




USC Information Sciences                Yolanda Gil                 gil@isi.edu   1
W3C PROV
                           http://www.w3.org/2011/prov/




USC Information Sciences   Yolanda Gil        gil@isi.edu   2
A Workflow Execution
in PROV
      Benefits:
       •   Makes the work
           inspectable
      Shortcomings:
       •   Hard to reproduce
       •   Not efficient to reuse




USC Information Sciences            Yolanda Gil   gil@isi.edu   3
Reproducibility




USC Information Sciences   Yolanda Gil   gil@isi.edu   4
Replication of Crohn’s Disease Association
     Study from [Duerr et al, Science 06]




USC Information Sciences   Yolanda Gil    gil@isi.edu   5
Replication of Early-Onset Parkinson’s Disease
Study from [Bayrakli et al, Human Mutation 07]




USC Information Sciences   Yolanda Gil   gil@isi.edu   6
Reusability
    Lower cost
      •   “Scientists and engineers spend more than
          60% of their time just preparing the data
          for model input or data-model
          comparison” (NASA A40)
    Better quality
      •   “We write QC without thinking about the
          best way to do the WC. Such approaches
          perpetuate mediocrity. If someone did it
          right once, it would benefit many people.”
          (EC WF CQ)
    More efficient
      •   “I often see that I’m repeating the work
          that 100 other people have been doing to
          obtain and process the data.” (EC WF CQ)
USC Information Sciences                 Yolanda Gil   gil@isi.edu   7
Access to Data Analytics Expertise [Science 2011]




USC Information Sciences   Yolanda Gil    gil@isi.edu   8
The TB-Drugome [Kinnings et al., PLoS CompBio 2010]
                                 “We report a computational
                                 approach to construct a
                                 drug-target network…
                                 applied to the genome of
                                 tuberculosis…”
                                 “The TB-drugome reveals
                                 that approximately one-
                                 third of the drugs examined
                                 have the potential to… treat
                                 tuberculosis…”
                                 “The methodology can be
                                 applied to other pathogens
                                 of interest …”
USC Information Sciences    Yolanda Gil           gil@isi.edu   9
Executable and Abstract Workflow
    What I actually run    The method that I followed




USC Information Sciences   Yolanda Gil         gil@isi.edu   10
The Ontology for Biomedical Investigations
     http://obi-ontology.org/




USC Information Sciences   Yolanda Gil    gil@isi.edu   11
Semantic Web Applications in Neuromedicine
     (SWAN) Ontology http://www.w3.org/TR/hcls-swan/




USC Information Sciences   Yolanda Gil      gil@isi.edu   12
Research Objects
http://www.wf4ever-project.org/research-object-model




 USC Information Sciences   Yolanda Gil       gil@isi.edu   13
Executable and Abstract Workflow
    What I actually run    The method that I followed




USC Information Sciences   Yolanda Gil         gil@isi.edu   14
Semantic Workflows in Wings
[Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06]
 Workflows are augmented with
 semantic constraints
   •   Each workflow constituent has a
       variable associated with it
        – Workflow components, arguments,
          datasets
   •   Constraints are used to restrict
       workflow variables
   •   Can define abstract classes of
       components
        – Concrete components model exec. codes
 Workflow reasoners propagate and
 use semantic constraints
 Uses semantic web standards:
 OWL/RDF, SPARQL, rules


USC Information Sciences                     Yolanda Gil   gil@isi.edu   9 15
Ontologies for Data and Workflow Components
   Documents                                                                  Correlation
                               Language                                       Scoring
Plain        Markup
text         InDoc                 En                          ChiSq InfoGain       MutInfo
                                        Fr
htmlDoc                                                              Modeler
                               Model
          latexDoc
                                                                    DecTree     Linear
                                             Dec                    Modeler
                            Size                                                Regression
        Feature                              Tree
        Vector                     SVM                       C4.5    J48

  WSJ-2010                                                          MatLab_LR           R_LR
                                                    Weka-C4.5
 USC Information Sciences                            Yolanda Gil               gil@isi.edu     16
Semantic Workflows: Abstractions Based on
  Ontologies [Gil et al 2011]




                                        TF-IDF                     CODE
     Term Weighting


                                                     Chi Squared                    CODE
                  Correlation Scoring




USC Information Sciences                         Yolanda Gil              gil@isi.edu      17
Publishing Workflows on the Web with OPMW
   http://www.opmw.org
  Red: OPM model                                                Extension of the Open Provenance Model
  Black: OPMW profile (extension)

                                                          hasArtifactTemplate
        Artifact                                                                                                                  account
                                                   Artifact                                       Artifact                                     Artifact
                      Input                   Input              hasArtifactTemplate              Execution                 Execution
                     artifact1               artifact2                                             Input1                    Input2

                                                                                                          used                                account
                                      used                                             user                               used
            hasArtifact                                                                           wasControlledBy                   account
                            used               Process
 Workflow                   Abstract template                                    Agent                                                  account          Execution
                                                                                                                 Execution Node
 template                         Node                          hasProcessTemplate                                                                        account
              hasProcess                 hasAbstractComponent
                                                                                              hasSpecificComponent         Process                      Account
OPM           hasArtifact
                            wasGeneratedBy
                                                      Abstract          subClassOf         Specific                                           account
Graph                                                component                           component               wasGeneratedBy


                                  Output                           hasArtifactTemplate                               Execution
                                 artifact1                                                                             result
                                             Artifact                                              Artifact
                                                                 hasWorkflowTemplate

                      Workflow Template                                                                    Execution Results

 USC Information Sciences                                                        Yolanda Gil                                                  gil@isi.edu            18
Published as Linked Data: Executed Workflow
 + Abstract Workflow + Data + Steps + Codes…




USC Information Sciences   Yolanda Gil   gil@isi.edu   19
P-PLAN: Extending PROV to represent
     plans
         Plan representations can be very complex
          •   Iteration, conditionals, decomposition, etc.
         P-PLAN is a core representation with only:
          •   Sequences of steps
          •   Parallel steps
         P-PLAN, like PROV, is a DAG
          •   Simplest representation of plans




USC Information Sciences                   Yolanda Gil       gil@isi.edu   20
P-Plan




USC Information Sciences   Yolanda Gil   gil@isi.edu   21
Queries about Workflows Published as
     Linked Data
    Find all abstract workflows (?plan) in which a
    given entity (?entity) has been used when
    executing them

    SELECT DISTINCT ?plan WHERE {
      ?entity a p-plan:Entity,prov:Entity;
              p-plan:correspondsTo ?templVariable.
        ?templVariable a p-plan:Variable;
              p-plan:isVariableOfPlan ?plan.}

USC Information Sciences     Yolanda Gil       gil@isi.edu   22
Conclusions
         Linked data as a vehicle to publish science processes
          •   Workflows, experiments, …
         Important to publish method, not just provenance
          •   Reproducibility, efficiency, access to expertise
         W3C PROV useful to publish execution
         P-PLAN is an extension of PROV for publishing methods
          •   Plan, step, variable
         P-PLAN is applicable beyond science




USC Information Sciences                    Yolanda Gil          gil@isi.edu   23

Mais conteúdo relacionado

Mais procurados

Aplicación de la plataforma moodle para mejorar el rendimiento academmico
Aplicación de la plataforma moodle para mejorar el rendimiento academmicoAplicación de la plataforma moodle para mejorar el rendimiento academmico
Aplicación de la plataforma moodle para mejorar el rendimiento academmicoSandro Santiago A
 
Introduction au web sémantique
Introduction au web sémantiqueIntroduction au web sémantique
Introduction au web sémantiqueStéphane Traumat
 
Planificación modelo TPACK
Planificación modelo TPACKPlanificación modelo TPACK
Planificación modelo TPACKlucecita1
 
#NSD14 - La sécurité et l'Internet des objets
#NSD14 - La sécurité et l'Internet des objets#NSD14 - La sécurité et l'Internet des objets
#NSD14 - La sécurité et l'Internet des objetsNetSecure Day
 
Eva en la escuela tipos, modelo didáctico y rol del docente
Eva en la escuela tipos, modelo didáctico y rol del docenteEva en la escuela tipos, modelo didáctico y rol del docente
Eva en la escuela tipos, modelo didáctico y rol del docentekarenhdezaguirre
 
Inclusión digital
Inclusión digitalInclusión digital
Inclusión digitalaliciagr96
 

Mais procurados (8)

Aplicación de la plataforma moodle para mejorar el rendimiento academmico
Aplicación de la plataforma moodle para mejorar el rendimiento academmicoAplicación de la plataforma moodle para mejorar el rendimiento academmico
Aplicación de la plataforma moodle para mejorar el rendimiento academmico
 
Introduction au web sémantique
Introduction au web sémantiqueIntroduction au web sémantique
Introduction au web sémantique
 
Taller de tesis i
Taller de tesis iTaller de tesis i
Taller de tesis i
 
Planificación modelo TPACK
Planificación modelo TPACKPlanificación modelo TPACK
Planificación modelo TPACK
 
#NSD14 - La sécurité et l'Internet des objets
#NSD14 - La sécurité et l'Internet des objets#NSD14 - La sécurité et l'Internet des objets
#NSD14 - La sécurité et l'Internet des objets
 
Canvas ole
Canvas oleCanvas ole
Canvas ole
 
Eva en la escuela tipos, modelo didáctico y rol del docente
Eva en la escuela tipos, modelo didáctico y rol del docenteEva en la escuela tipos, modelo didáctico y rol del docente
Eva en la escuela tipos, modelo didáctico y rol del docente
 
Inclusión digital
Inclusión digitalInclusión digital
Inclusión digital
 

Semelhante a P-Plan

Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
 
VIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionVIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionWilliam Gunn
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsAndre Freitas
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalTope Omitola
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
AI Based Student S Assignments Plagiarism Detector
AI Based Student S Assignments Plagiarism DetectorAI Based Student S Assignments Plagiarism Detector
AI Based Student S Assignments Plagiarism DetectorAsia Smith
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Christos Kannas
 
How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?Xiaonan Wang
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020Ralf Laemmel
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Performance Analysis of Leading Application Lifecycle Management Systems for...
Performance Analysis of Leading Application Lifecycle  Management Systems for...Performance Analysis of Leading Application Lifecycle  Management Systems for...
Performance Analysis of Leading Application Lifecycle Management Systems for...Daniel van den Hoven
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceOpen Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceMarcus Hanwell
 

Semelhante a P-Plan (20)

Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
VIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionVIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity Extraction
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts final
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
AI Based Student S Assignments Plagiarism Detector
AI Based Student S Assignments Plagiarism DetectorAI Based Student S Assignments Plagiarism Detector
AI Based Student S Assignments Plagiarism Detector
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Performance Analysis of Leading Application Lifecycle Management Systems for...
Performance Analysis of Leading Application Lifecycle  Management Systems for...Performance Analysis of Leading Application Lifecycle  Management Systems for...
Performance Analysis of Leading Application Lifecycle Management Systems for...
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceOpen Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
 

Mais de dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 

Mais de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 

P-Plan

  • 1. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data Daniel Garijo Yolanda Gil OEG-DIA Information Sciences Institute and Facultad de Informática Department of Computer Science Universidad Politécnica de Madrid University of Southern California dgarijo@delicias.dia.fi.upm.es http://www.isi.edu/~gil USC Information Sciences Yolanda Gil gil@isi.edu 1
  • 2. W3C PROV http://www.w3.org/2011/prov/ USC Information Sciences Yolanda Gil gil@isi.edu 2
  • 3. A Workflow Execution in PROV Benefits: • Makes the work inspectable Shortcomings: • Hard to reproduce • Not efficient to reuse USC Information Sciences Yolanda Gil gil@isi.edu 3
  • 4. Reproducibility USC Information Sciences Yolanda Gil gil@isi.edu 4
  • 5. Replication of Crohn’s Disease Association Study from [Duerr et al, Science 06] USC Information Sciences Yolanda Gil gil@isi.edu 5
  • 6. Replication of Early-Onset Parkinson’s Disease Study from [Bayrakli et al, Human Mutation 07] USC Information Sciences Yolanda Gil gil@isi.edu 6
  • 7. Reusability Lower cost • “Scientists and engineers spend more than 60% of their time just preparing the data for model input or data-model comparison” (NASA A40) Better quality • “We write QC without thinking about the best way to do the WC. Such approaches perpetuate mediocrity. If someone did it right once, it would benefit many people.” (EC WF CQ) More efficient • “I often see that I’m repeating the work that 100 other people have been doing to obtain and process the data.” (EC WF CQ) USC Information Sciences Yolanda Gil gil@isi.edu 7
  • 8. Access to Data Analytics Expertise [Science 2011] USC Information Sciences Yolanda Gil gil@isi.edu 8
  • 9. The TB-Drugome [Kinnings et al., PLoS CompBio 2010] “We report a computational approach to construct a drug-target network… applied to the genome of tuberculosis…” “The TB-drugome reveals that approximately one- third of the drugs examined have the potential to… treat tuberculosis…” “The methodology can be applied to other pathogens of interest …” USC Information Sciences Yolanda Gil gil@isi.edu 9
  • 10. Executable and Abstract Workflow What I actually run The method that I followed USC Information Sciences Yolanda Gil gil@isi.edu 10
  • 11. The Ontology for Biomedical Investigations http://obi-ontology.org/ USC Information Sciences Yolanda Gil gil@isi.edu 11
  • 12. Semantic Web Applications in Neuromedicine (SWAN) Ontology http://www.w3.org/TR/hcls-swan/ USC Information Sciences Yolanda Gil gil@isi.edu 12
  • 13. Research Objects http://www.wf4ever-project.org/research-object-model USC Information Sciences Yolanda Gil gil@isi.edu 13
  • 14. Executable and Abstract Workflow What I actually run The method that I followed USC Information Sciences Yolanda Gil gil@isi.edu 14
  • 15. Semantic Workflows in Wings [Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06] Workflows are augmented with semantic constraints • Each workflow constituent has a variable associated with it – Workflow components, arguments, datasets • Constraints are used to restrict workflow variables • Can define abstract classes of components – Concrete components model exec. codes Workflow reasoners propagate and use semantic constraints Uses semantic web standards: OWL/RDF, SPARQL, rules USC Information Sciences Yolanda Gil gil@isi.edu 9 15
  • 16. Ontologies for Data and Workflow Components Documents Correlation Language Scoring Plain Markup text InDoc En ChiSq InfoGain MutInfo Fr htmlDoc Modeler Model latexDoc DecTree Linear Dec Modeler Size Regression Feature Tree Vector SVM C4.5 J48 WSJ-2010 MatLab_LR R_LR Weka-C4.5 USC Information Sciences Yolanda Gil gil@isi.edu 16
  • 17. Semantic Workflows: Abstractions Based on Ontologies [Gil et al 2011] TF-IDF CODE Term Weighting Chi Squared CODE Correlation Scoring USC Information Sciences Yolanda Gil gil@isi.edu 17
  • 18. Publishing Workflows on the Web with OPMW http://www.opmw.org Red: OPM model Extension of the Open Provenance Model Black: OPMW profile (extension) hasArtifactTemplate Artifact account Artifact Artifact Artifact Input Input hasArtifactTemplate Execution Execution artifact1 artifact2 Input1 Input2 used account used user used hasArtifact wasControlledBy account used Process Workflow Abstract template Agent account Execution Execution Node template Node hasProcessTemplate account hasProcess hasAbstractComponent hasSpecificComponent Process Account OPM hasArtifact wasGeneratedBy Abstract subClassOf Specific account Graph component component wasGeneratedBy Output hasArtifactTemplate Execution artifact1 result Artifact Artifact hasWorkflowTemplate Workflow Template Execution Results USC Information Sciences Yolanda Gil gil@isi.edu 18
  • 19. Published as Linked Data: Executed Workflow + Abstract Workflow + Data + Steps + Codes… USC Information Sciences Yolanda Gil gil@isi.edu 19
  • 20. P-PLAN: Extending PROV to represent plans Plan representations can be very complex • Iteration, conditionals, decomposition, etc. P-PLAN is a core representation with only: • Sequences of steps • Parallel steps P-PLAN, like PROV, is a DAG • Simplest representation of plans USC Information Sciences Yolanda Gil gil@isi.edu 20
  • 21. P-Plan USC Information Sciences Yolanda Gil gil@isi.edu 21
  • 22. Queries about Workflows Published as Linked Data Find all abstract workflows (?plan) in which a given entity (?entity) has been used when executing them SELECT DISTINCT ?plan WHERE { ?entity a p-plan:Entity,prov:Entity; p-plan:correspondsTo ?templVariable. ?templVariable a p-plan:Variable; p-plan:isVariableOfPlan ?plan.} USC Information Sciences Yolanda Gil gil@isi.edu 22
  • 23. Conclusions Linked data as a vehicle to publish science processes • Workflows, experiments, … Important to publish method, not just provenance • Reproducibility, efficiency, access to expertise W3C PROV useful to publish execution P-PLAN is an extension of PROV for publishing methods • Plan, step, variable P-PLAN is applicable beyond science USC Information Sciences Yolanda Gil gil@isi.edu 23