SlideShare uma empresa Scribd logo
1 de 26
Work at ISI,
                 Current Status,
                   Next Steps

                       Daniel Garijo Verdejo,
                           Oscar Corcho,
                            Yolanda Gil


Ontology Engineering Group. Laboratorio de Inteligencia Artificial
            Departamento de Inteligencia Artificial
                   Facultad de Informática
              Universidad Politécnica de Madrid
                                                           Date: 29/11/2012
What am I working on?


•Creation of abstractions in scientific workflows
   •Workflow Traces and template representation
       •Provenance representation
       •Plan representation

   •Abstraction catalog

   •Find ways to link the definitions to the provenance traces automatically


•Understandability and reuse of scientific workflows




                                                                               2
Outline


Index

1.   Motivation
2.   Overview
3.   Workflow systems used
4.   Summary of work done in my previous visit to ISI
     •   OPMW and provenance publishing
5. Summary of work done before second visit to ISI
     •   Workflow motif catalog
6. Summary of work done in my second visit to ISI
     •   OPMW-PROV and P-PLAN
     •   Automatic macro abstraction detection
7. Next Steps
8. Future work

                                                            3
Motivation


•As a designer: Discovery

   •Workflows with similar functionality fragments/methods

   •Design based in previous templates.

•As user/reuser: Understandability

   •Search workflows by functionality

   •Commonalities between execution runs

   •Component categorization




                                                                    4
Overview

                                             Descriptions/
Abstraction definitions and categorization   PSMS/Ontologies


                                             Data mining tools,
                                             graph analysis, etc.
   Algorithms for finding the different
       abstractions automatically



                                             RDF Stores
         Experiment publication


                                             Vocabularies
   Provenance                 Plan
 representation          representation



                                                                    5
Taverna and Wings




                                                        http://www.taverna.org.uk/




http://www.wings-workflows.org/




                                                                                       6
                              IEEE eScience 2012. Chicago, USA
Summary: Previous Work at ISI


Abstractions definitions and categorization




   Algorithms for finding the different
       abstractions automatically



                                              Virtuoso,
         Experiment Publication               Pubby,
                                              Wings (+Plugin)

                                              OPMW
    Provenance                 Plan
  representation          representation



                                                                7
High level architecture


                                                                                    Other
                                                                                    workflow
                  WINGS on local laptop                                             environments
                         Workflow
          Core           Template        OPM
         Portal          Workflow       export
                         Instance
                                                               Programatic access
                                                                 (external apps)
                  WINGS on shared host
                         Workflow                  Linked
          Core           Template        OPM
         Portal                         export      Data
                         Workflow
                         Instance                Publication       Interactive
                  WINGS on web server
                                                                    Browsing
                         Workflow                               (Pubby frontend)
          Core           Template      OPM
         Portal                       export                                            Users
                        Workflow
                        Instance



Wings workflow                OPM
                                                 Publication      Share               Reuse
  generation                conversion




                                                                                                8
OPMW: Process view




               9
OPMW: Attribution view




                   10
Work previous to second visit to ISI


                                              Motif Detection
Abstractions definitions and categorization




   Algorithms for automatic matching




                                              Virtuoso,
         Experiment Publication               Pubby,
                                              Wings (+Plugin)

    Provenance                 Plan            OPMW
  representation          representation



                                                                 11
Overview



      • Empirical analysis on 177 workflow templates from Taverna and
      Wings


      • Catalog of recurring patterns: scientific
      workflow motifs.

               • Data Oriented Motifs

               • Workflow Oriented Motifs


      •Understandability and reuse

http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
                                                                                                  12
                                                          IEEE eScience 2012. Chicago, USA
Approach


•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence

•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use




                                                                  13
                      IEEE eScience 2012. Chicago, USA
Workflow Motifs


•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?

     •E.g.:
          •Data retrieval
          •Data preparation
          • etc.
                                                          WHAT?
2.   Workflow-oriented motifs: How does the workflow perform its operations?

     •E.g.:
          •Stateful steps
          •Stateless steps
          •Human interactions
          •etc.
                                                                 HOW?
                                                                                14
                              IEEE eScience 2012. Chicago, USA
Motif Catalog
Data-Oriented Motifs                   Workflow-Oriented Motifs

  Data Retrieval                            Intra-Workflow Motifs
  Data Preparation                                      Stateful (Asynchronous) Invocations
          Format Transformation                         Stateless (Synchronous) Invocations
          Input Augmentation                            Internal Macros
           and Output Splitting
                                                        Human Interactions
          Data Organisation

  Data Analysis                             Inter-Workflow Motifs

  Data Curation/Cleaning                                Atomic Workflows

                                                         Composite Workflows
  Data Moving

  Data Visualisation                                    Workflow Overloading

                                                                                         15
                              IEEE eScience 2012. Chicago, USA
Summary: Work done at ISI


                                                 Motif Detection
Abstractions definitions and categorization


                                                  SUBDUE
                                                  exploration and
                                 Macro
Algorithms for automatic                          integration in
                               abstraction
        matching                                  RDF
                                detection



                                                 Virtuoso,
         Experiment Publication                  Pubby,
                                                 Wings (+Plugin)

    Provenance                  Plan              OPMW + PROV
  representation           representation         + P-PLAN



                                                                    16
PROV Compatibility

•OPMW fits naturally into PROV
    •Same usage-generation
    structure
    •Extension for the scientific
    workflow with PROV

•Binary relationships (no n-ary
patterns used).
    •Simplicity

•Publication of PROV as well as
OPMW.
    •Queries can be answered in
    both languages.
    •Flexibility.

•http://www.opmw.org/node/8



                                    17
P-PLAN




•Plans are not provenance
•P-PLAN: Simple plan model for binding traces to template representations
•Aligned with OPMW and PROV
•Documentation in progress
                                                                               18
Summary: Work done at ISI


                                                 Motif Detection
Abstractions definitions and categorization


                                                  SUBDUE
                                                  exploration and
                                 Macro
Algorithms for automatic                          integration in
                               abstraction
        matching                                  RDF
                                detection



                                                 Virtuoso,
         Experiment Publication                  Pubby,
                                                 Wings (+Plugin)

    Provenance                  Plan              OPMW + PROV
  representation           representation         + P-PLAN



                                                                    19
Macro abstraction detection

Problem statement:

Given a repository of workflow templates (either abstract or specific) or workflow
execution traces, what are the workflow fragments I can deduce from it?


Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate them)
     •Finding relationships between workflows and sub-workflows.
          •Most used fragments, most executed, etc.

    •Systems like GenePattern and Galaxy: (Many runs, nearly no templates
    published)
         •Proposing new templates with the popular fragments.




                                                                                      20
Macro abstraction detection

•Work in Progress (implementation and evaluation)
   •WINGS traces

•Similar to Sub-graph Isomorphism
•Kind of “Graph Clustering”

•Early results
     •Tool for finding common sub-graphs
          •Sequential graphs
          •Efficient
          •Scalable.

    •Integration with RDF (by me)

•TO DO:
    •Finish implementation: inference.
    •Evaluation!!

                                                                            21
Next Steps




       22
Next Steps


•Thesis:
    •Finish up implementation.
    •How to evaluate results?

•Publications:
    •Workshop:
         •Provenance Corpus (with Taverna Team). To have something citable
    •Conference:
         •KCAP: Macro detection implementation and evaluation.
    •Journal
         •Decay analysis publication in journal (January)
         •OPMW - PROV -P-PLAN publication in journal (December)
         •Motif extension publication in journal (Invited by special issue)
         (Now)




                                                                              23
Future work


•Thesis:

    •Other methods for detecting workflow abstraction automatically
        •Metadata and file analysis (diff, etc.): Filter, merge, etc.
        •Provenance reconstruction.

•Project:

    •RO model specifications
    •Testcases
    •Workflow abstraction with Isoco




                                                                            24
Thanks !
                     :oscarCorcho                                   :yolandGil




                             :supervises
                                                  :supervises

                                                    :danielGarijo
:khalidBelhajjame
                                                                                 :varunRatnakar
                                                            :collaboratesWith
                     :collaboratesWith




                              :collaboratesWith
                                                  :collaboratesWith


                    :caroleGoble




                                                          :pinarAlper

                                                                                                       25
Work at ISI,
        relation with wf4Ever,
              future steps
                        Daniel Garijo Verdejo

Ontology Engineering Group. Laboratorio de Inteligencia Artificial
            Departamento de Inteligencia Artificial
                   Facultad de Informática
              Universidad Politécnica de Madrid




                                                           Date: 03/10/2011

Mais conteúdo relacionado

Semelhante a Status update OEG - Nov 2012

Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
Narayan Bharadwaj
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...
MOCA Platform
 
The SENSORIA Development Environment
The SENSORIA Development EnvironmentThe SENSORIA Development Environment
The SENSORIA Development Environment
Istvan Rath
 
Alfresco day madrid jeff potts - activiti
Alfresco day madrid   jeff potts - activitiAlfresco day madrid   jeff potts - activiti
Alfresco day madrid jeff potts - activiti
Alfresco Software
 
Alfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - ActivitiAlfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - Activiti
Toni de la Fuente
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
Stefane Fermigier
 

Semelhante a Status update OEG - Nov 2012 (20)

Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 
Introducing spring
Introducing springIntroducing spring
Introducing spring
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...
 
Chisimba - introduction to practical demo
Chisimba - introduction to practical demoChisimba - introduction to practical demo
Chisimba - introduction to practical demo
 
SharePoint 2010 as a Development Platform
SharePoint 2010 as a Development PlatformSharePoint 2010 as a Development Platform
SharePoint 2010 as a Development Platform
 
The SENSORIA Development Environment
The SENSORIA Development EnvironmentThe SENSORIA Development Environment
The SENSORIA Development Environment
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Webinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process MiningWebinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process Mining
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Alfresco day madrid jeff potts - activiti
Alfresco day madrid   jeff potts - activitiAlfresco day madrid   jeff potts - activiti
Alfresco day madrid jeff potts - activiti
 
Alfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - ActivitiAlfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - Activiti
 
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
 
Spring fundamentals
Spring fundamentalsSpring fundamentals
Spring fundamentals
 
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
 
Single API for library services (poster)
Single API for library services (poster)Single API for library services (poster)
Single API for library services (poster)
 
Open Source
Open SourceOpen Source
Open Source
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
 
EclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational DocumentationEclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational Documentation
 

Mais de dgarijo

Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 

Mais de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Status update OEG - Nov 2012

  • 1. Work at ISI, Current Status, Next Steps Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 29/11/2012
  • 2. What am I working on? •Creation of abstractions in scientific workflows •Workflow Traces and template representation •Provenance representation •Plan representation •Abstraction catalog •Find ways to link the definitions to the provenance traces automatically •Understandability and reuse of scientific workflows 2
  • 3. Outline Index 1. Motivation 2. Overview 3. Workflow systems used 4. Summary of work done in my previous visit to ISI • OPMW and provenance publishing 5. Summary of work done before second visit to ISI • Workflow motif catalog 6. Summary of work done in my second visit to ISI • OPMW-PROV and P-PLAN • Automatic macro abstraction detection 7. Next Steps 8. Future work 3
  • 4. Motivation •As a designer: Discovery •Workflows with similar functionality fragments/methods •Design based in previous templates. •As user/reuser: Understandability •Search workflows by functionality •Commonalities between execution runs •Component categorization 4
  • 5. Overview Descriptions/ Abstraction definitions and categorization PSMS/Ontologies Data mining tools, graph analysis, etc. Algorithms for finding the different abstractions automatically RDF Stores Experiment publication Vocabularies Provenance Plan representation representation 5
  • 6. Taverna and Wings http://www.taverna.org.uk/ http://www.wings-workflows.org/ 6 IEEE eScience 2012. Chicago, USA
  • 7. Summary: Previous Work at ISI Abstractions definitions and categorization Algorithms for finding the different abstractions automatically Virtuoso, Experiment Publication Pubby, Wings (+Plugin) OPMW Provenance Plan representation representation 7
  • 8. High level architecture Other workflow WINGS on local laptop environments Workflow Core Template OPM Portal Workflow export Instance Programatic access (external apps) WINGS on shared host Workflow Linked Core Template OPM Portal export Data Workflow Instance Publication Interactive WINGS on web server Browsing Workflow (Pubby frontend) Core Template OPM Portal export Users Workflow Instance Wings workflow OPM Publication Share Reuse generation conversion 8
  • 11. Work previous to second visit to ISI Motif Detection Abstractions definitions and categorization Algorithms for automatic matching Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW representation representation 11
  • 12. Overview • Empirical analysis on 177 workflow templates from Taverna and Wings • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg 12 IEEE eScience 2012. Chicago, USA
  • 13. Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use 13 IEEE eScience 2012. Chicago, USA
  • 14. Workflow Motifs •Workflow motif: Domain independent conceptual abstraction on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? •E.g.: •Data retrieval •Data preparation • etc. WHAT? 2. Workflow-oriented motifs: How does the workflow perform its operations? •E.g.: •Stateful steps •Stateless steps •Human interactions •etc. HOW? 14 IEEE eScience 2012. Chicago, USA
  • 15. Motif Catalog Data-Oriented Motifs Workflow-Oriented Motifs Data Retrieval Intra-Workflow Motifs Data Preparation Stateful (Asynchronous) Invocations Format Transformation Stateless (Synchronous) Invocations Input Augmentation Internal Macros and Output Splitting Human Interactions Data Organisation Data Analysis Inter-Workflow Motifs Data Curation/Cleaning Atomic Workflows Composite Workflows Data Moving Data Visualisation Workflow Overloading 15 IEEE eScience 2012. Chicago, USA
  • 16. Summary: Work done at ISI Motif Detection Abstractions definitions and categorization SUBDUE exploration and Macro Algorithms for automatic integration in abstraction matching RDF detection Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW + PROV representation representation + P-PLAN 16
  • 17. PROV Compatibility •OPMW fits naturally into PROV •Same usage-generation structure •Extension for the scientific workflow with PROV •Binary relationships (no n-ary patterns used). •Simplicity •Publication of PROV as well as OPMW. •Queries can be answered in both languages. •Flexibility. •http://www.opmw.org/node/8 17
  • 18. P-PLAN •Plans are not provenance •P-PLAN: Simple plan model for binding traces to template representations •Aligned with OPMW and PROV •Documentation in progress 18
  • 19. Summary: Work done at ISI Motif Detection Abstractions definitions and categorization SUBDUE exploration and Macro Algorithms for automatic integration in abstraction matching RDF detection Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW + PROV representation representation + P-PLAN 19
  • 20. Macro abstraction detection Problem statement: Given a repository of workflow templates (either abstract or specific) or workflow execution traces, what are the workflow fragments I can deduce from it? Useful for: •Systems like Taverna and Wings: (Many templates, little annotation to relate them) •Finding relationships between workflows and sub-workflows. •Most used fragments, most executed, etc. •Systems like GenePattern and Galaxy: (Many runs, nearly no templates published) •Proposing new templates with the popular fragments. 20
  • 21. Macro abstraction detection •Work in Progress (implementation and evaluation) •WINGS traces •Similar to Sub-graph Isomorphism •Kind of “Graph Clustering” •Early results •Tool for finding common sub-graphs •Sequential graphs •Efficient •Scalable. •Integration with RDF (by me) •TO DO: •Finish implementation: inference. •Evaluation!! 21
  • 23. Next Steps •Thesis: •Finish up implementation. •How to evaluate results? •Publications: •Workshop: •Provenance Corpus (with Taverna Team). To have something citable •Conference: •KCAP: Macro detection implementation and evaluation. •Journal •Decay analysis publication in journal (January) •OPMW - PROV -P-PLAN publication in journal (December) •Motif extension publication in journal (Invited by special issue) (Now) 23
  • 24. Future work •Thesis: •Other methods for detecting workflow abstraction automatically •Metadata and file analysis (diff, etc.): Filter, merge, etc. •Provenance reconstruction. •Project: •RO model specifications •Testcases •Workflow abstraction with Isoco 24
  • 25. Thanks ! :oscarCorcho :yolandGil :supervises :supervises :danielGarijo :khalidBelhajjame :varunRatnakar :collaboratesWith :collaboratesWith :collaboratesWith :collaboratesWith :caroleGoble :pinarAlper 25
  • 26. Work at ISI, relation with wf4Ever, future steps Daniel Garijo Verdejo Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 03/10/2011