Scaling API-first – The story of a global engineering organization
On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects
1. On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects
Mitglied der Helmholtz-Gemeinschaft
8th Workshop On Workflows in Support of Large-Scale Science
17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group, Facultad
de Informática Universidad Politécnica de Madrid, Spain
$School of Computer Science University of Manchester, UK
#Reference Center
on Environmental Information Campinas SP, Brazil
~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
2. Scientific Workflows
•
Mitglied der Helmholtz-Gemeinschaft
•
Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
2
3. Ecological Niche Modeling
1
4
5
3
Mitglied der Helmholtz-Gemeinschaft
2
Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
3
4. Ecological Niche Modeling Workflow
Parameter
Occurrence
Data
Environmental
Layer
Geographic
Mask
createModel
Mitglied der Helmholtz-Gemeinschaft
testModel
calcAUC
AUC
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
4
6. Ecological Niche Modeling Workflow
Gamma
Cost
NumberOfPseu
doAbsences
Occurrence
Data
createModel
Environmental
Layer
Geographic
Mask
SVM
Maxent
GARP
Mitglied der Helmholtz-Gemeinschaft
testModel
calcAUC
AUC
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
6
7. ‐3.2
1
11
2.3
1.5
a
4.55
‐3
Ecological Niche Modeling Workflow
84
BLAST
10
6.788
Gamma
0.5
Cost
NumberOfPseu
doAbsences
Occurrence
Data
Environmental
Layer
Select Algorithms
0
createModel
Geographic
Mask
12
SVM
Maxent
GARP
Select Parameters
100
testModel
Mitglied der Helmholtz-Gemeinschaft
‐2.9
‐bt
1.3
calcAUC
1
AUC
1
Sunday Nov. 17, 2013
/
gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7
7
13
8. Common strategies to handle this challenge
•
•
•
Default parameters & applications
Trial and error
Parameter sweeps
But:
Mitglied der Helmholtz-Gemeinschaft
•
•
•
Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
8
10. Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•
Mitglied der Helmholtz-Gemeinschaft
•
•
Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
10
11. How does it work?
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna
Sunday Nov. 17, 2013
Framework
Optimization
Layer
Plugins
A
P
I
Parameter Optimization
Component Optimization
8th Workshop On Workflows in Support of Large-Scale Science
11
12. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)
Best Fitness:
0.34
1
Best Fitness:
0.42
2
Best Fitness:
0.48
Mitglied der Helmholtz-Gemeinschaft
.
.
.
Display the
optimization
result
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
12
13. Status quo
•
•
Workflow optimization starts from scratch each time
Optimization meta-data are lost
Mitglied der Helmholtz-Gemeinschaft
Idea: Capture optimization meta-data next to traditional
provenance data
⇒
⇒
learn from/extend prior optimization runs
improve and accelerate optimization process
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
13
14. Research Objects
•
•
•
•
Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies
Mitglied der Helmholtz-Gemeinschaft
…
ore:aggregates
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
14
15. Taverna
Optimization Framework & Plugin
Mitglied der Helmholtz-Gemeinschaft
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Display the
optimization
result
Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48
1
2
.
.
.
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
15
16. Optimization Research Object Ontology
ro:Research
Object
opt:Optimization
Research
Object
ore:aggregates
Mitglied der Helmholtz-Gemeinschaft
opt:Algorithm
Describes the
optimization
algorithm and
its parameters
opt:Fitness
opt:Generation
opt:Optimization
Run
opt:Search
Space
opt:Termination
Condition
opt:Workflow
Describes the
fitness
functions
Defines the
population size
and generation
number for an
Optimization
Run
Represents one
result set: sub‐
workflow,
parameters and
obtained fitness
values
Describes the
dependencies
and parameter
constraints
Describes the
termination
condition
defined by the
user
The workflow
that was
optimized
rdfs:subClassOf
Sunday Nov. 17, 2013
rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science
16
17. Algorithm
Mitglied der Helmholtz-Gemeinschaft
• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
17
18. Search Space
Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft
• Cost/2 < Gamma
(fictional)
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
18
19. Optimization Run
Mitglied der Helmholtz-Gemeinschaft
• Origin of result
• Parameter setting
• Fitness value
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
19
20. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Generation 1 Iteration 1
Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05
1
Best Fitness:
0.42
2
Best Fitness:
0.48
Mitglied der Helmholtz-Gemeinschaft
.
.
.
Display the
optimization
result
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
20
21. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Generation 1 Iteration 1
Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
1
Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:
0.42
Fitness: 0.27
Generation 1 Iteration 4
2
Fitness: 0.19
Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31
.
Generation 1 Iteration 6
.
Fitness: 0.34
x
Mitglied der Helmholtz-Gemeinschaft
.
Display the
optimization
result
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
21
24. Benefits of sharing and exploiting Optimization
Research Objects
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
•
•
What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
24
25. Conclusion
•
Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures
•
Outlook: How to learn from similar workflows?
•
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
25