Slides for the e-Science 2012 presentation for the paper: Common Motifs in Scientific Workflows: An Empirical Analysis. The paper provides an analysis on 177 workflows from Taverna and Wings workflow systems, across diverse domains. The analysis highlights the commonmotifs or patterns that were found in the templates based on the functionality of each workflow step.
Take control of your SAP testing with UiPath Test Suite
Common Motifs in Scientific Workflows: An Empirical Analysis
1. Date: 10/11/2012
Common Motifs in Scientific
Workflows: An Empirical
Analysis
Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho
*, Yolanda Gil Ŧ, Carole Goble ⱡ
* Universidad Politécnica de Madrid,
ⱡUniversity of Manchester,
Ŧ USC Information Sciences Institute
IEEE eScience 2012. Chicago, USA
2. 2
Overview
• Empirical analysis on 177 workflow templates from Taverna and
Wings
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs
•Understandability and reuse
IEEE eScience 2012. Chicago, USA
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
3. 3
Background
• Workflows as software artifacts that capture the scientific method
• Addition to paper publication
• Reuse
• Existing repositories of workflows (myExperiment)
• Sharing workflows
• Exploring existing workflows.
• PROBLEMS to address:
• Sometimes workflows are difficult to understand
• Workflow descriptions depend on tools/files
• Decay of workflows
• Identify good practices for workflow design
IEEE eScience 2012. Chicago, USA
http://www.myexperiment.org
4. 4
Approach
•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use
IEEE eScience 2012. Chicago, USA
5. 5
Taverna and Wings
IEEE eScience 2012. Chicago, USA
http://www.taverna.org.uk/
http://www.wings-workflows.org/
6. 6
Workflow Motifs
•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?
•E.g.:
•Data retrieval
•Data preparation
• etc.
2. Workflow-oriented motifs: How does the workflow perform its operations?
•E.g.:
•Stateful steps
•Stateless steps
•Human interactions
•etc.
IEEE eScience 2012. Chicago, USA
WHAT?
HOW?
7. 7
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
8. 8
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
9. 9
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
10. 10
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
11. 11
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
12. 12
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
18. 18
Experiment setup
IEEE eScience 2012. Chicago, USA
•177 Workflow templates
• 111 from Taverna, sample from myExperiment
• 66 from Wings, available in public server (now as Linked Data)
• Diverse domains
0
5
10
15
20
25
30
35
40
Taverna
Wings
19. 19
Result Summary: Data Oriented Motifs
IEEE eScience 2012. Chicago, USA
•Over 60% of the motifs are data preparation motifs
•Of the 4 subcategories, the most common across domains are output
splitting, input augmentation, and reformatting steps.
•Data retrieval common in domains where curated databases exist
•Data analysis is often the main functionality of the workflow
Data organisation
20. 20
Result Summary: Workflow Oriented Motifs
IEEE eScience 2012. Chicago, USA
• Around 40% composite workflows and internal macros
•Workflow reuse is present even in some atomic workflows
•Human interactions steps increasingly used in some domains
21. 21
Differences and commonalities of the workflow systems
IEEE eScience 2012. Chicago, USA
•Data moving/retrieval, stateful interactions and human interaction steps are
not present in Wings
•Web services (Taverna) versus software components (Wings)
•Wings has layered execution through Pegasus
•Data preparation steps are common in both systems
•Use of sub workflows is high
22. 22
Discussion
IEEE eScience 2012. Chicago, USA
http://www.sandensconsulting.com/images/DataObfuscation.jpg
Our observations:
• Obfuscation of scientific workflows
•The abundance of data preparation
steps make the functionality of the
workflow unclear.
• Decay of scientific workflows
• Create an abstract description.
• Good practices for workflow design
• Sub-workflows
• Workflow overloading
Method in paper
Workflow
23. •Empirical analysis of scientific workflows
177 workflows
• 2 different systems
• A variety of heterogeneous domains
•Workflow motif catalog
• Data oriented motifs
• Workflow oriented motifs
•Future work: automatic abstractions on workflows
Template analysis
Trace analysis (provenance)
Include other workflow systems
23
Conclusions and future work
IEEE eScience 2012. Chicago, USA
24. 24
Who are we?
•Pinar Alper
School of Computer Science, University of Manchester
•Khalid Belhajjame
School of Computer Science, University of Manchester
•Oscar Corcho
Ontology Engineering Group, UPM
•Yolanda Gil
Information Sciences Institute, USC
•Carole Goble
School of Computer Science, University of Manchester
EU Wf4Ever project (270129)
funded under EU FP7 (ICT- 2009.4.1).
(http://www.wf4ever-project.org)
IEEE eScience 2012. Chicago, USA
25. Date: 10/11/2012
Common Motifs in Scientific
Workflows: An Empirical
Analysis
Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho
*, Yolanda Gil Ŧ, Carole Goble ⱡ
* Universidad Politécnica de Madrid,
ⱡUniversity of Manchester,
Ŧ USC Information Sciences Institute
IEEE eScience 2012. Chicago, USA