SlideShare uma empresa Scribd logo
1 de 23
“looking back
            to gaze forwards”
            Janus
            Provenance

                         LOP: Linked Open Provenance
            (was: from Workfl
                          ows to Semantic Provenance
                                and Linked Open Data)
    Paolo Missier                       Jun Zhao                  Satya S. Sahoo
 Newcastle University                                               Amit Sheth
(University of Manchester)        University of Oxford, UK   Wright State University, USA




               Nesc workshop:
               Understanding Provenance and Linked Open Data


                             Edinburgh, March 30th, 2011
Baseline provenance of a workflow run
                        QTL →                                mmu:12575
                     Ensembl Genes
                                                                         v1     ...    vn                     w

    Ensembl Gene →                     Ensembl Gene →
      Uniprot Gene                       Entrez Gene
                                                                     path:mmu04012
                                                         exec
    Uniprot Gene →                     Entrez Gene →                     a1     ...    an           b1        ...      bm
      Kegg Gene                         Kegg Gene
                                                                                                    mmu:26416

                      merge gene IDs




                                                                path:mmu04010   y11                           ymn
                     Gene → Pathway
                                                                                              ...

                                                        path:mmu04010→derives_from→mmu:26416
                                                        path:mmu04012→derives_from→mmu:12575

        • The graph encodes all direct data dependency relations
        • Baseline query model: compute paths amongst sets of nodes
          • Transitive closure over data dependency relations

2
                                                                                 P.Missier - Provenance and Linked Data workshop
                                                                                 Edinburgh March 2011
Key ideas
    • Janus:
      – a semantic provenance model with domain-specific
        extensions
      – designed around the Taverna workflow model

    • From domain-agnostic provenance graphs
    • To domain-aware graphs through explicit
      annotations

    • From local provenance graphs and queries
      scoped to the graph
    • To
      – Graphs published as Linked Data
      – Queries extended into the Web of Data


3
                                                P.Missier - Provenance and Linked Data workshop
                                                Edinburgh March 2011
Reference user questions

                          QTL →
                       Ensembl Genes                      Q0: Find all intermediate and initial input
                                                          values that contribute to the computation of a
                                                          certain output value.
     Ensembl Gene →                      Ensembl Gene →
       Uniprot Gene                        Entrez Gene

                                                          Q1. Find all those genes within the input QTL
      Uniprot Gene →                     Entrez Gene →
                                                          region that are involved in a given KEGG
        Kegg Gene                         Kegg Gene       pathway.
                        merge gene IDs
                                                          Q2: Find all Uniprot-sourced genes

                       Gene → Pathway




    Q3: Find all Entrez genes that encode proteins involved in ATP binding (go:
    0005524).

    Q4: List relevant PubMed publications for the pathways listed in the result set.



4
                                                                                P.Missier - Provenance and Linked Data workshop
                                                                                Edinburgh March 2011
Query types and model capabilities

              Query formulation effort         Annotation requirements                 Query Scope

    Q0      - Requires knowledge of
              process structure and
              data values
                                               No annotations required            Single run graph or
                                                                                   Multi-run graphs
            - Graphical query
            constructor may be
            available
    Q1 Q2
                                            Requires domain annotations
            Use of domain terms                                                   Single run graph or
                                            on workflow tasks and on data
            facilitates query formulation                                          Multi-run graphs
                                            values

    Q3 Q4
            - Use of domain terms           - Requires domain annotations
              facilitates query               on workflow tasks and on
              formulation.                    data values
                                                                                    The Web of Data
            - Can be integrated with        - Relies on completeness of
            browsers for LoD sources        Linked Data Sources


5
                                                                   P.Missier - Provenance and Linked Data workshop
                                                                   Edinburgh March 2011
Query types and model capabilities

              Query formulation effort         Annotation requirements                 Query Scope

    Q0      - Requires knowledge of
              process structure and
              data values
                                               No annotations required            Single run graph or
                                                                                   Multi-run graphs
            - Graphical query
            constructor may be
            available
    Q1 Q2
                                            Requires domain annotations
            Use of domain terms                                                   Single run graph or
                                            on workflow tasks and on data
            facilitates query formulation                                          Multi-run graphs
                                            values

    Q3 Q4
            - Use of domain terms           - Requires domain annotations
              facilitates query               on workflow tasks and on
              formulation.                    data values
                                                                                    The Web of Data
            - Can be integrated with        - Relies on completeness of
            browsers for LoD sources        Linked Data Sources


5
                                                                   P.Missier - Provenance and Linked Data workshop
                                                                   Edinburgh March 2011
Janus structure




6
    P.Missier - Provenance and Linked Data workshop
    Edinburgh March 2011
Janus structure
                                                                 port
                                v1     v2     v3                value

    X1   X2    X3               X1    X2      X3
                                                                    port
         P             exec          P_inst
    Y1        Y2                Y1          Y2
                                                             processor
                    processor                                  exec
                      spec      w1          w2




6
                                                   P.Missier - Provenance and Linked Data workshop
                                                   Edinburgh March 2011
Annotated provenance

      Annotated workflow                                                Annotated provenance graph


                        QTL →                                             Gene
                     Ensembl Genes




    Ensembl Gene →                     Ensembl Gene →
      Uniprot Gene                       Entrez Gene



                                                          exec                                           ...
    Uniprot Gene →                     Entrez Gene →
      Kegg Gene                         Kegg Gene



                      merge gene IDs



                                                                 Kegg
                     Gene → Pathway




                                                        Q1 Q2


7
                                                                            P.Missier - Provenance and Linked Data workshop
                                                                            Edinburgh March 2011
Desiderata: from structure to values annotations




8
                                 P.Missier - Provenance and Linked Data workshop
                                 Edinburgh March 2011
Desiderata: from structure to values annotations




8
                                 P.Missier - Provenance and Linked Data workshop
                                 Edinburgh March 2011
Desiderata: from structure to values annotations

                                                                 protein
                                                                sequence

                                X1    X2      X3
                                                                    interpro
                                      P                              match
                                                                     report
                                Y1         Y2

                                                         processor
                              interpro                     spec
                                scan




8
                                  P.Missier - Provenance and Linked Data workshop
                                  Edinburgh March 2011
Desiderata: from structure to values annotations

                                                                                        protein
                                                                                       sequence

                                                       X1     X2     X3
                                                                                           interpro
                                                              P                             match
                                                                                            report
                                                       Y1          Y2

                                                                                processor
                                                     interpro                     spec
                                                       scan


                   protein
                  sequence
                                            port
                      v1     v2     v3     value

                      X1
                                                                                 exec
                             X2     X3
                                            port
                           P_inst

    interpro          Y1          Y2
                                         processor          interpro
      scan                                 exec
                      w1          w2                         match
8                                                            report
                                                         P.Missier - Provenance and Linked Data workshop
                                                         Edinburgh March 2011
Annotations propagation rules
                 X rdf:type Port    C = {c}    X has value type c
                      X has value v    v rdf:type PortValue
                                  v rdf:type C               denotes
                                                                                   data type
                                                                                   in the PL
                          protein                                                    sense
       has_value_type    sequence

      X1   X2    X3
                            interpro
           P                 match
                             report
      Y1        Y2

                      processor
    interpro            spec
      scan




9
                                                   P.Missier - Provenance and Linked Data workshop
                                                   Edinburgh March 2011
Annotations propagation rules
                 X rdf:type Port    C = {c}    X has value type c
                      X has value v    v rdf:type PortValue
                                  v rdf:type C               denotes
                                                                                              data type
                                                                                              in the PL
                          protein                                                               sense
       has_value_type    sequence

      X1   X2    X3
                            interpro
           P                 match
                             report
      Y1        Y2

                      processor
                                               ?
    interpro            spec                                           port
      scan                                v1       v2    v3           value

                                          X1       X2    X3
                                                                         port
                                               P_inst

                         interpro         Y1            Y2
                                                                  processor                     interpro
                           scan                                     exec
                                          w1            w2                                       match
9                                                                                                report
                                                              P.Missier - Provenance and Linked Data workshop
                                                              Edinburgh March 2011
Annotations propagation rules
                 X rdf:type Port    C = {c}    X has value type c
                      X has value v    v rdf:type PortValue
                                  v rdf:type C               denotes
                                                                                             data type
                                                                                             in the PL
                          protein                                                              sense
       has_value_type    sequence

      X1   X2    X3
                            interpro
           P                 match
                             report
      Y1        Y2
                                          protein
                      processor          sequence
    interpro            spec                                          port
      scan                                v1     v2     v3           value

                                          X1    X2      X3
                                                                        port
                                               P_inst

                         interpro         Y1          Y2
                                                                 processor                     interpro
                           scan                                    exec
                                          w1          w2                                        match
9                                                                                               report
                                                             P.Missier - Provenance and Linked Data workshop
                                                             Edinburgh March 2011
Annotations as semantic overlay

                                v1                                          w1




                                                X3



                                                            Y2
     Provenance graph          has_port_value                     has_port_value




                                                X2

                                                      P
     fragment




                                                            Y1
                                                X1
                                vn                                         wm




                      has-input-type                 Pathway                         has-output-type
                                                      search
                                                      service

                                          instance-of
               instance-of                                                            instance-of

       Gene                     v1                                           w1                           Pathway
                                                                                     has-source
                                                 X3


                  has-source
                               has_port_value                Y2    has_port_value
                                                 X2

                                                        P


                 instance-of
                                                                                   instance-of
                                                             Y1


       Kegg                                                                  wm
                                                 X1




                                vn                                                                               Kegg
              has-source                                                                has-source


10
                                                                              P.Missier - Provenance and Linked Data workshop
                                                                              Edinburgh March 2011
Extensions to Linked Data

       Annotated workflow                                           Annotated provenance graph


                         QTL →
                      Ensembl Genes




     Ensembl Gene →                     Ensembl Gene →
       Uniprot Gene                       Entrez Gene
                                                         exec                                       ...
     Uniprot Gene →                     Entrez Gene →
       Kegg Gene                         Kegg Gene



                       merge gene IDs




                      Gene → Pathway




11
                                                                        P.Missier - Provenance and Linked Data workshop
                                                                        Edinburgh March 2011
Extensions to Linked Data

       Annotated workflow                                           Annotated provenance graph


                         QTL →
                      Ensembl Genes




     Ensembl Gene →                     Ensembl Gene →
       Uniprot Gene                       Entrez Gene
                                                         exec                                       ...
     Uniprot Gene →                     Entrez Gene →
       Kegg Gene                         Kegg Gene



                       merge gene IDs




                      Gene → Pathway

                                                                                   - Publish
                                                                                   - I - Map IDs
                                                                                   - II - query



11
                                                                        P.Missier - Provenance and Linked Data workshop
                                                                        Edinburgh March 2011
I - Mapping data values to LoD URIs
     In our prototype we map data values to Bio2RDF as follows:
                                                                                      Entrez Genes


                                                                                      Uniprot Genes


                                                                                      KEGG Genes


                                                                                      KEGG Pathways


     <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId">
       <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/>

     <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18">
       <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/>
       <rdfs:comment>11835</rdfs:comment>
       <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/>
       <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/>

     <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18">
       <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/>
12
                                                                       P.Missier - Provenance and Linked Data workshop
                                                                       Edinburgh March 2011
II - extended queries in the LoD setting
     Strategy:
     - use the SQUIN LoD query engine to query multiple “Web of Data” sources
         - only Bio2RDF in our case
     - combine graph patterns on local provenance with conditions on remote LoD
       graphs


      Q5: Find all Entrez genes that encode proteins involved in ATP binding
                                  (GO:0005524).

     PREFIX uniprot: <http://purl.uniprot.org/core/>
     PREFIX : <http://www.taverna.org.uk/janus#>
     SELECT distinct ?entrezgene
     WHERE {
     ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> .                        Bio2RDF
     ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein .
     ?gene rdfs:seeAlso ?entrezgene
     ?gene rdf:type :port_gene
     ?gene :has_source :entrez_gene . }
                                                      local provenance graph

13
                                                               P.Missier - Provenance and Linked Data workshop
                                                               Edinburgh March 2011
Current status
         Current Taverna provenance architecture:
             Lab prototype                                  Production
         –   “Export as...” Janus RDF                     – “native” (relational) graphs
         –   currently only queried using                 – simple, efficient query language on
             SPARQL                                         native provenance
e
         –   manually published
         –   manually annotated
                                                                                                                 Taverna
                                       Taverna             <<Events stream>>                                  Provenance
                                       runtime                                 events capture



                               query   <scope ... />            Lineage
                                       <select .../>                                            relational
                                                                 query
                                       <focus .../>
                                                               processor                           DB
                            query response
                               OPM graph

                            complete Janus                        RDF
                                     graph                      exporter

                                                       Provenance
                                                       API (Java)
    14
                                                                               P.Missier - Provenance and Linked Data workshop
                                                                               Edinburgh March 2011
Summary and References
     • Janus: a semantic model for workflow provenance
        – OWL ontology, extension of Provenir
        – should include attribution + system level provenance
        – alignment with OPM?

     • Domain-aware graphs through annotations:
        – automatically propagated from workflow annotations when possible
        – but in practice no real workflows are annotated

     • LoD integration:
        – powerful provenance publishing and query broadening
        – mapping rules currently limited
        – no completeness guarantee -- all joins are outer joins!


      P. Missier, S.S. Sahoo, J. Zhao, A. Sheth, and C. Goble, "Janus: from Workflows to Semantic
         Provenance and Linked Open Data," Procs. IPAW 2010, Troy, NY: 2010

      S.S. Sahoo, J. Zhao, and P. Missier, "Extending Semantic provenance into the Web of Data,"
      Internet Computing, special issue on Provenance in Web Applications, vol. to appear, 2011.
15
                                                                          P.Missier - Provenance and Linked Data workshop
                                                                          Edinburgh March 2011

Mais conteúdo relacionado

Destaque

Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...Beniamino Murgante
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
What may I do with your data? What do I have to do with your data? Policie...
What may I do with your data? What do I have to do with your data? Policie...What may I do with your data? What do I have to do with your data? Policie...
What may I do with your data? What do I have to do with your data? Policie...Steffen Staab
 
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1Bertram Ludäscher
 
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...Nils Gehlenborg
 
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...satyasanket
 
The Provenance of Madame Bonnier: Museum Linked Data
The Provenance of Madame Bonnier: Museum Linked DataThe Provenance of Madame Bonnier: Museum Linked Data
The Provenance of Madame Bonnier: Museum Linked DataAmerican Art Collaborative
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4jM. David Allen
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
 

Destaque (10)

Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
What may I do with your data? What do I have to do with your data? Policie...
What may I do with your data? What do I have to do with your data? Policie...What may I do with your data? What do I have to do with your data? Policie...
What may I do with your data? What do I have to do with your data? Policie...
 
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1
Data and Workflow Provenance: (From Theory to Hands-­‐on Practice)^­‐1
 
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
 
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
 
The Provenance of Madame Bonnier: Museum Linked Data
The Provenance of Madame Bonnier: Museum Linked DataThe Provenance of Madame Bonnier: Museum Linked Data
The Provenance of Madame Bonnier: Museum Linked Data
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 

Semelhante a Nesc invited presentation: Semantic Provenance and Linked Open Data

Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...
Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...
Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...Paolo Missier
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptxPandi Gingee
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
Workflows, experimental findings, and their provenance: towards semantically ...
Workflows, experimental findings, and their provenance: towards semantically ...Workflows, experimental findings, and their provenance: towards semantically ...
Workflows, experimental findings, and their provenance: towards semantically ...Paolo Missier
 
Network Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptxNetwork Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptxSubhrajyotiPayra
 
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01Sage Base
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
Wi iat-bootstrapping the analysis of large-scale web service networks-v3
Wi iat-bootstrapping the analysis of large-scale web service networks-v3Wi iat-bootstrapping the analysis of large-scale web service networks-v3
Wi iat-bootstrapping the analysis of large-scale web service networks-v3Shahab Mokarizadeh
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biologyLaura Berry
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
IRJET- Artificial Neural Network: Overview
IRJET-  	  Artificial Neural Network: OverviewIRJET-  	  Artificial Neural Network: Overview
IRJET- Artificial Neural Network: OverviewIRJET Journal
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow systemStian Soiland-Reyes
 

Semelhante a Nesc invited presentation: Semantic Provenance and Linked Open Data (20)

Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...
Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...
Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Link...
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Java
JavaJava
Java
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Workflows, experimental findings, and their provenance: towards semantically ...
Workflows, experimental findings, and their provenance: towards semantically ...Workflows, experimental findings, and their provenance: towards semantically ...
Workflows, experimental findings, and their provenance: towards semantically ...
 
Network Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptxNetwork Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptx
 
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Wi iat-bootstrapping the analysis of large-scale web service networks-v3
Wi iat-bootstrapping the analysis of large-scale web service networks-v3Wi iat-bootstrapping the analysis of large-scale web service networks-v3
Wi iat-bootstrapping the analysis of large-scale web service networks-v3
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biology
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
IRJET- Artificial Neural Network: Overview
IRJET-  	  Artificial Neural Network: OverviewIRJET-  	  Artificial Neural Network: Overview
IRJET- Artificial Neural Network: Overview
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system
 

Mais de Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

Mais de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Nesc invited presentation: Semantic Provenance and Linked Open Data

  • 1. “looking back to gaze forwards” Janus Provenance LOP: Linked Open Provenance (was: from Workfl ows to Semantic Provenance and Linked Open Data) Paolo Missier Jun Zhao Satya S. Sahoo Newcastle University Amit Sheth (University of Manchester) University of Oxford, UK Wright State University, USA Nesc workshop: Understanding Provenance and Linked Open Data Edinburgh, March 30th, 2011
  • 2. Baseline provenance of a workflow run QTL → mmu:12575 Ensembl Genes v1 ... vn w Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene path:mmu04012 exec Uniprot Gene → Entrez Gene → a1 ... an b1 ... bm Kegg Gene Kegg Gene mmu:26416 merge gene IDs path:mmu04010 y11 ymn Gene → Pathway ... path:mmu04010→derives_from→mmu:26416 path:mmu04012→derives_from→mmu:12575 • The graph encodes all direct data dependency relations • Baseline query model: compute paths amongst sets of nodes • Transitive closure over data dependency relations 2 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 3. Key ideas • Janus: – a semantic provenance model with domain-specific extensions – designed around the Taverna workflow model • From domain-agnostic provenance graphs • To domain-aware graphs through explicit annotations • From local provenance graphs and queries scoped to the graph • To – Graphs published as Linked Data – Queries extended into the Web of Data 3 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 4. Reference user questions QTL → Ensembl Genes Q0: Find all intermediate and initial input values that contribute to the computation of a certain output value. Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Q1. Find all those genes within the input QTL Uniprot Gene → Entrez Gene → region that are involved in a given KEGG Kegg Gene Kegg Gene pathway. merge gene IDs Q2: Find all Uniprot-sourced genes Gene → Pathway Q3: Find all Entrez genes that encode proteins involved in ATP binding (go: 0005524). Q4: List relevant PubMed publications for the pathways listed in the result set. 4 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 5. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 5 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 6. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 5 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 7. Janus structure 6 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 8. Janus structure port v1 v2 v3 value X1 X2 X3 X1 X2 X3 port P exec P_inst Y1 Y2 Y1 Y2 processor processor exec spec w1 w2 6 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 9. Annotated provenance Annotated workflow Annotated provenance graph QTL → Gene Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Kegg Gene → Pathway Q1 Q2 7 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 10. Desiderata: from structure to values annotations 8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 11. Desiderata: from structure to values annotations 8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 12. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 13. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan protein sequence port v1 v2 v3 value X1 exec X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match 8 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 14. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 9 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 15. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor ? interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match 9 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 16. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 protein processor sequence interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match 9 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 17. Annotations as semantic overlay v1 w1 X3 Y2 Provenance graph has_port_value has_port_value X2 P fragment Y1 X1 vn wm has-input-type Pathway has-output-type search service instance-of instance-of instance-of Gene v1 w1 Pathway has-source X3 has-source has_port_value Y2 has_port_value X2 P instance-of instance-of Y1 Kegg wm X1 vn Kegg has-source has-source 10 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 18. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway 11 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 19. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway - Publish - I - Map IDs - II - query 11 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 20. I - Mapping data values to LoD URIs In our prototype we map data values to Bio2RDF as follows: Entrez Genes Uniprot Genes KEGG Genes KEGG Pathways <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdfs:comment>11835</rdfs:comment> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/> 12 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 21. II - extended queries in the LoD setting Strategy: - use the SQUIN LoD query engine to query multiple “Web of Data” sources - only Bio2RDF in our case - combine graph patterns on local provenance with conditions on remote LoD graphs Q5: Find all Entrez genes that encode proteins involved in ATP binding (GO:0005524). PREFIX uniprot: <http://purl.uniprot.org/core/> PREFIX : <http://www.taverna.org.uk/janus#> SELECT distinct ?entrezgene WHERE { ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> . Bio2RDF ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein . ?gene rdfs:seeAlso ?entrezgene ?gene rdf:type :port_gene ?gene :has_source :entrez_gene . } local provenance graph 13 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 22. Current status Current Taverna provenance architecture: Lab prototype Production – “Export as...” Janus RDF – “native” (relational) graphs – currently only queried using – simple, efficient query language on SPARQL native provenance e – manually published – manually annotated Taverna Taverna <<Events stream>> Provenance runtime events capture query <scope ... /> Lineage <select .../> relational query <focus .../> processor DB query response OPM graph complete Janus RDF graph exporter Provenance API (Java) 14 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
  • 23. Summary and References • Janus: a semantic model for workflow provenance – OWL ontology, extension of Provenir – should include attribution + system level provenance – alignment with OPM? • Domain-aware graphs through annotations: – automatically propagated from workflow annotations when possible – but in practice no real workflows are annotated • LoD integration: – powerful provenance publishing and query broadening – mapping rules currently limited – no completeness guarantee -- all joins are outer joins! P. Missier, S.S. Sahoo, J. Zhao, A. Sheth, and C. Goble, "Janus: from Workflows to Semantic Provenance and Linked Open Data," Procs. IPAW 2010, Troy, NY: 2010 S.S. Sahoo, J. Zhao, and P. Missier, "Extending Semantic provenance into the Web of Data," Internet Computing, special issue on Provenance in Web Applications, vol. to appear, 2011. 15 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. On a typical QTL region with ~150k base pairs, one execution of this workflow finds about 50 Ensembl genes. \nThese could correspond to about 30 genes in the UniProt database, and 60 in the NCBI Etrez genes database. \nEach gene may be involved in a number of pathways, for example the mouse genes Mapk13 (mmu:26415) and \nCdkn1a (mmu:12575) participate to 13 and 9 pathways, respectively. \n\n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  15. comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  16. comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  17. comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  18. \n
  19. annotations done manually for this implementation but could have been inherited from service annotations -- we show this in the next slide\n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n