Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows

Integration of Workﬂow Partitioning and
Resource Provisioning

Weiwei Chen, Ewa Deelman

{wchen,deelman}@isi.edu

Information Sciences Institute

University of Southern California

CCGrid 2012, Ottawa, Canada

1

Outline

•  Introduction

•  System Overview

•  Solution

–  Heuristics

–  Genetic Algorithms

–  Ant Colony Optimization

•  Evaluation

–  Heuristics

•  Related Work

•  Q&A

2

Introduction

•  Scientific Workflows

–  A set of jobs and the dependencies between them.

–  DAG (Directed Acyclic Graph), where nodes represent !
!
computation and directed edges represent data! flow Job1
dependencies.

!
•  Pegasus Workflow Management System

! Job2 Job3 Job4
–  Workflow Planner: Pegasus

!
•  Abstract Workflow: portable, execution site !!
independent

! Job5
•  Concrete Workflow: bound to specific sites

!
!
–  Workflow Engine: DAGMan

! !
!
–  Resource Provisioner: Wrangler

!
!
–  Execution/Scheduling System: Condor/Condor-G

–  Environment: Grids, Clouds, Clusters, many-cores

3

Introduction

•  Background

–  Large scale workflows require multiple execution sites to run.

–  The entire CyberShake earthquake science workflow has 16,000 sub-
workflows and each sub-workflow has ~24,000 jobs and requires
~58GB.

–  A Montage workflow with a size of 8 degree square of sky has ~10,000
jobs and requires ~57GB data. the Galactic Plane that covers 360
degrees along the plane and +/-20 degrees on either side of it.

Figure
1.1
Output
of
the
Montage
workflow.
The
image
Figure
1.2
CyberShake
workflow

above
was
recently
created
to
verify
a
bar
in
the
spiral
and
example
output
for
the

galaxy
M31.
Southern
California
Area.

4

Single Site

VM

Provisioner

Data

Staging

DAX
DAG

Workﬂow

Workﬂow

Job

Planner
Engine
Scheduler

Single Site

•  Constraints/Concerns

–  Storage systems

–  File systems

–  Data transfer services

–  Data constraints

–  Services constraints

6

Multiple Sites, No Partitioning

VM

Provisioner

Data

Staging

DAX
DAG

Workﬂow

Workﬂow

Job

Planner
Engine
Scheduler

Data

Staging

VM

Provisioner

Multiple Sites, No Partitioning

•  Constraints/Concerns

–  Job migration

–  Load balancing

–  Overhead

–  Cost

–  Deadline

–  Resource utilizations

8

Multiple Sites, Partitioning

VM

Provisioner

Data

Staging

DAX
DAG

Workflow

Workflow

Job

Planner
Engine
Scheduler

DAX
DAX

Workflow
VM

ParPPoner

Scheduler
Provisioner

Data

Staging

DAG

DAX

Workflow

Workflow

Job

Planner
Engine
Scheduler

Solution

•  A hierarchical workflow

Ø  It contains workflows (sub-workflow) as its jobs.

Ø  Sub-workflows are planned at the execution sites and
matched to the resources in them.

•  Workflow Partitioning vs Job Grouping/Clustering

Ø  Heterogeneous Environments

§  MPIDAG, Condor DAG, etc.

Ø  Data Placement Services

§  Bulk Data Transfer

10

Solution

•  Resource Provisioning

Ø  Virtual Cluster Provisioning

Ø  The number of resources and the type of VM instances
(worker node, master node and I/O node) are the
parameters indicating the storage and computational
capability of a virtual cluster.

Ø  The topology and structure of a virtual cluster: balance the
load in different services (scheduling service, data transfer
service, etc.) and avoid a bottleneck.

Ø  On grids, usually the data transfer service is already
available and does not need further conﬁguration.

11

Data Transfer across Sites

•  A pre-script to transfer data before and after the job
execution

•  A single data transfer job on demand

•  A bulk data transfer job

Ø  merge data transfer

Computation

Data Transfer

12

Backward Search Algorithm

•  Targeting a workﬂow with a fan-in-fan-out
structure

•  Search operation involves three steps. It starts
from the sink job and proceeds backward.

–  First, check if it’s safe to add the whole fan
structure into the sub-workﬂow (aggressive
search).

–  If not, a cut is issued between this fan-in job and
its parents to avoid cycle dependency and
increase parallelism.

–  Second, a neutral search is performed on its
parent jobs, which include all of its predecessors
until the search reaches a fan-out job.

–  If this partition is still too large, a conservative
search is performed that includes all of its
predecessors until it reaches a fan-in job or a fan-
out job.

Figure
2.3
Search
OperaPon

13

Heuristics (Storage Constraints)

•  Heuristics I

–  Dependencies between sub-workflows should be reduced since they
represent data transfer between sites.

–  Usually jobs that have parent-child relationships share a lot of data. It’s
reasonable to schedule such jobs into the same sub-workflow.

–  Heuristic I only checks three types of nodes: the fan-out job, the fan-in
job, and the parents of the fan-in job and search for the potential
candidate jobs that have parent-child relationships between them.

–  Check operation means checking whether one job and its potential
candidate jobs can be added to a sub-workflow without violating
constraints.

–  Our algorithm reduces the time complexity of check operations by n
folds, while n equals to the average depth of the fan-in-fan-out structure.

14

J1
Heuristic I
Search Operation:

Less Aggressive Search
Aggressive Search J2 J3 J4 J5
Candidate List(CL):

{J1, J2, J3, J4, J5, J6, J7, J8, J9}
{J4, J5, J7}
{J2, J3, J6}

Job to be examined(J):

J10
J1
J9
J8 J6 J7

Partition (P):

PP3={} J3, J6, J8}
P4={J1}
P4={}
P3={J4,
P2={J2,
P2={} J5, J7, J9}
1={}
Check Operation: J8 J9
Sum (CL+J+P)=100 50
(CL+J+P)=10
(CL+J+P)=80
(CL+J+P)=40 50 J10
Final Results:
P1={J10}

Scheduled Being Examined
Partition Candidate Not Examined
15

Heuristics/Hints

•  Two other heuristics

–  Heuristic II adds a job to a sub-workflow if all of its unscheduled
children can be added to that sub-workflow.

–  For a job with multiple children, Heuristic III adds it to a sub-
workflow when all of its children has been scheduled.

Figure
2.4

HeurisPc
I,
II,
and
III
(from
leW

to
right)
parPPon
an
example
workflow

into
different
sub-‐workflows.

16

Similar step we put J2, J3,
The ﬁrstto J8, is similar to J6
into P2.
Heuristic I that puts J10 into P1

J1
Heuristic II: check unscheduled children
Search Operation:

J2 J3 J4 J5
Candidate List(CL):

{J6} J5, J7, J9}
{J4,


J8
J1 J6 J7

Partition (P):

P1={J10}
P3={J1,J4,J5,J7,J9}
P3={}
P2={J8, J2,J3, J6}
P2={} J6}
Sum (CL+J+P)=20 50
(CL+J+P)=50
(CL+J+P)=90 J10
Final Results:
P1={J10}
P2={J8, J2,J3, J6}
P3={J1,J4,J5,J7,J9}
17

Similar to J8,we put J9, J7, J6
The ﬁrstto J8, is similar to J3,J4, J5,
Similar step we put J2,
J1 into P3.
Heuristic I that puts J10 into P1
into P2.

J1
Heuristic III: all children should be examined
Search Operation:

J2 J3 J4 J5
Candidate List(CL):

{J6}
{J4}


J8
J1 J6 J7

Partition(P):

P1={J10}
P3={J1,
P3={} J4,J5, J7,
P2={J8, J2,J3, J6} J9}
P2={} J6}
J1 has a child Non-examined job J4
Sum (CL+J+P)=20 50 and J6 has no Non-examined job J10
Final Results:
P1={J10}
P2={J8, J2,J3, J6}
P3={J1,J4,J5,J7,J9}
18

Genetics Algorithm

Job1
Job2
Job3
Job4
Job5
VM1
VM2
VM3
VM4
VM5
VM6

1
2
2
1
2
2
2
2
1
1
1

19

Fitness Functions

Makespan Cost
Min(! + )
Deadline Budget
Min(Makespan), Cost Budget
Min(Cost), Makespan Deadline

With Constraints

20

Ant Colony Optimization

Job1
Job2
Job3
Job4
Job5
VM1
VM2
VM3
VM4
VM5
VM6

Global
1
2
2
1
2
2
2
2
1
1
1

Optimization:

Local Job1
Job2
Job3
VM1
VM2
VM3

Optimization:

1
1
1
1
1
1

Job4
Job5
VM4
VM5
VM6

2
2
2
2
2

21

Scheduling Sub-workflows

•  Estimating the overall runtime of sub-workflows

–  Critical Path

–  Average CPU Time is cumulative CPU time of all jobs
divided by the number of available resources.

–  Earliest Finish Time is the moment the last sink job
completes

•  Provisioning resources based on the estimation
results

•  Scheduling Sub-workflows on Sites

22

Evaluation: Heuristics

•  In this example, we aim to reduce data movement and
makespan with storage constraints.

•  Workﬂows used:

–  Montage: an astronomy application, I/O intensive, ~24,000
tasks and 58GB data.

–  CyberShake: a seismology application, memory intensive,
~10,000 tasks and 57GB data.

–  Epigenomics: a bioinformatics application, CPU intensive,
~1,500 tasks and 23GB data.

–  Each were run ﬁve times.

23

Performance: CyberShake

•  Heuristic II produces 5 sub-workflows
with 10 dependencies between them.
Heuristic I produces 4 sub-workflows
and 3 dependencies. Heuristic III
produces 4 sub-workflows and 5
dependencies

•  Heuristic II and III simply add a job if
it doesn’t violate the storage or cross
dependency constraints.

•  Heuristic I performs better in terms of
both runtime reduction and disk usage
because it tends to put the whole fan
structure into the same sub-workflow.

24

Performance: CyberShake

•  Storage Constraints

•  With more sites and partitions, data movement is increased
although computational capability is improved.

•  The CyberShake workﬂow across two sites with a storage
constraint of 35GB performs best.

25

Performance of Estimator and Scheduler

•  Three estimators and two schedulers are evaluated with
CyberShake workﬂow.

•  The combination of EFT estimator + HEFT scheduler (EFT
+HEFT) performs best (10%).

•  HEFT scheduler is slightly better than MinMin scheduler
with all three estimators.

26

Publications

Integration of Workflow Partitioning and Resource Provisioning, Weiwei Chen, Ewa Deelman,
accepted, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGrid 2012), Doctoral Symposium, Ottawa, Canada, May 13-15, 2012

Improving Scientific Workflow Performance using Policy Based Data Placement, Muhammad
Ali Amer, Ann Chervenak and Weiwei Chen, accepted, 2012 IEEE International Symposium on
Policies for Distributed Systems and Networks, Chapel Hill, NC, July 2012

Fault Tolerant Clustering in Scientific Workflows, Weiwei Chen, Ewa Deelman, IEEE
International Workshop on Scientific Workflows (SWF), accepted, in conjunction with 8th IEEE
World Congress on Servicess, Honolulu, Hawaii, Jun 2012

Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6th
Workshop on Workflows in Support of Large-Scale Science, in conjunction with
Supercomputing 2011, Seattle, Nov 2011

Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, Weiwei
Chen, Ewa Deelman, 9th International Conference on Parallel Processing and Applied
Mathematics (PPAM 2011), Poland, Sep 2011

27

Future Work

•  GA and ACO: Efﬁciency

•  Provisioning Algorithms

•  Other Algorithms

28

QA

Thank you!

For further info:
pegasus.isi.edu

www.isi.edu/~wchen

29

Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (8)

Semelhante a Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows

Semelhante a Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows (20)

Último

Último (20)

Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows