SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Integration of Workflow Partitioning and
          Resource Provisioning	


           Weiwei Chen, Ewa Deelman	

            {wchen,deelman}@isi.edu	

          Information Sciences Institute	

         University of Southern California	

                         	


             CCGrid 2012, Ottawa, Canada	

     1	
  
Outline	

•  Introduction	

•  System Overview	

•  Solution	

   –  Heuristics	

   –  Genetic Algorithms	

   –  Ant Colony Optimization	

•  Evaluation	

   –  Heuristics	

•  Related Work	

•  Q&A	

                                   2	
  
Introduction	

•  Scientific Workflows	

   –  A set of jobs and the dependencies between them.	

   –  DAG (Directed Acyclic Graph), where nodes represent !
                                                          !
      computation and directed edges represent data! flow                      Job1
      dependencies. 	

                                                                  !
•  Pegasus Workflow Management System	

                           !    Job2   Job3       Job4
   –  Workflow Planner: Pegasus	

                                !
         •  Abstract Workflow: portable, execution site           !!
                                                          independent	

                                                                 !            Job5
         •  Concrete Workflow: bound to specific sites	

          !
                                                                 !
   –    Workflow Engine: DAGMan	

                                !                   !
                                                                 !
   –    Resource Provisioner: Wrangler	

                        !
                                                                 !
   –    Execution/Scheduling System: Condor/Condor-G	

   –    Environment: Grids, Clouds, Clusters, many-cores	


                                                                                                3	
  
Introduction	

•  Background	

   –  Large scale workflows require multiple execution sites to run. 	

   –  The entire CyberShake earthquake science workflow has 16,000 sub-
      workflows and each sub-workflow has ~24,000 jobs and requires
      ~58GB.	

   –  A Montage workflow with a size of 8 degree square of sky has ~10,000
      jobs and requires ~57GB data. the Galactic Plane that covers 360
      degrees along the plane and +/-20 degrees on either side of it. 	





       Figure	
  1.1	
  Output	
  of	
  the	
  Montage	
  workflow.	
  The	
  image	
                Figure	
   1.2	
   CyberShake	
   workflow	
  
       above	
  was	
  recently	
  created	
  to	
  verify	
  a	
  bar	
  in	
  the	
  spiral	
     and	
   example	
   output	
   for	
   the	
  
       galaxy	
  M31.	
                                                                             Southern	
  California	
  Area.	
  	
            4	
  
Single Site	

                                                 VM	
  	
  
                                              Provisioner	
  

                                                 Data	
  
                                                Staging	
  


DAX	
                   DAG	
  
          Workflow	
  	
     Workflow	
  	
         Job	
  
          Planner	
          Engine	
          Scheduler	
  
Single Site	

•  Constraints/Concerns	

   –  Storage systems	

   –  File systems	

   –  Data transfer services	

   –  Data constraints	

   –  Services constraints	





                                  6	
  
Multiple Sites, No Partitioning	

                                                     VM	
  	
  
                                                  Provisioner	
  

                                                     Data	
  
                                                    Staging	
  


    DAX	
                 DAG	
  
              Workflow	
  	
     Workflow	
  	
         Job	
  
              Planner	
          Engine	
          Scheduler	
  




                                                       Data	
  
                                                      Staging	
  


                                                      VM	
  	
  
                                                   Provisioner	
  
Multiple Sites, No Partitioning	

     •  Constraints/Concerns	

           –  Job migration	

           –  Load balancing	

           –  Overhead	

           –  Cost	

           –  Deadline	

           –  Resource utilizations	

     	




                                         8	
  
Multiple Sites, Partitioning	

                                                                                        VM	
  	
  
                                                                                     Provisioner	
  

                                                                                        Data	
  
                                                                                       Staging	
  
                                      DAX	
                  DAG	
  
                                                 Workflow	
  	
   Workflow	
  	
           Job	
  
                                                 Planner	
           Engine	
         Scheduler	
  

DAX	
                 DAX	
  

                                Workflow	
                                               VM	
  	
  
      ParPPoner	
  
                                Scheduler	
                                          Provisioner	
  

                                                                                        Data	
  
                                                                                       Staging	
  
                                                             DAG	
  
                                       DAX	
  
                                                 Workflow	
  	
     Workflow	
  	
         Job	
  
                                                 Planner	
          Engine	
          Scheduler	
  
Solution	

•  A hierarchical workflow	

      Ø  It contains workflows (sub-workflow) as its jobs.	

      Ø  Sub-workflows are planned at the execution sites and
          matched to the resources in them. 	

•  Workflow Partitioning vs Job Grouping/Clustering	

      Ø  Heterogeneous Environments	

         §  MPIDAG, Condor DAG, etc. 	

      Ø  Data Placement Services 	

         §  Bulk Data Transfer	

	


                                                           10	
  
Solution	

•  Resource Provisioning	

  Ø  Virtual Cluster Provisioning	

  Ø  The number of resources and the type of VM instances
      (worker node, master node and I/O node) are the
      parameters indicating the storage and computational
      capability of a virtual cluster. 	

  Ø  The topology and structure of a virtual cluster: balance the
      load in different services (scheduling service, data transfer
      service, etc.) and avoid a bottleneck. 	

  Ø  On grids, usually the data transfer service is already
      available and does not need further configuration. 	


                                                                 11	
  
Data Transfer across Sites	

•  A pre-script to transfer data before and after the job
   execution	

•  A single data transfer job on demand	

•  A bulk data transfer job 	

      Ø  merge data transfer	



Computation
	

Data Transfer	





                                                       12	
  
Backward Search Algorithm 	

	

•  Targeting a workflow with a fan-in-fan-out
   structure	

•  Search operation involves three steps. It starts
   from the sink job and proceeds backward. 	

    –  First, check if it’s safe to add the whole fan
       structure into the sub-workflow (aggressive
       search). 	

    –  If not, a cut is issued between this fan-in job and
       its parents to avoid cycle dependency and
       increase parallelism.	

    –  Second, a neutral search is performed on its
       parent jobs, which include all of its predecessors
       until the search reaches a fan-out job. 	

    –  If this partition is still too large, a conservative
       search is performed that includes all of its
       predecessors until it reaches a fan-in job or a fan-
       out job. 	

                                                              Figure	
  2.3	
  Search	
  OperaPon	
  




                                                                                                        13	
  
Heuristics (Storage Constraints)	

	

•  Heuristics I	

    –  Dependencies between sub-workflows should be reduced since they
       represent data transfer between sites. 	

    –  Usually jobs that have parent-child relationships share a lot of data. It’s
       reasonable to schedule such jobs into the same sub-workflow. 	

    –  Heuristic I only checks three types of nodes: the fan-out job, the fan-in
       job, and the parents of the fan-in job and search for the potential
       candidate jobs that have parent-child relationships between them. 	

    –  Check operation means checking whether one job and its potential
       candidate jobs can be added to a sub-workflow without violating
       constraints. 	

    –  Our algorithm reduces the time complexity of check operations by n
       folds, while n equals to the average depth of the fan-in-fan-out structure. 	



                                                                                  14	
  
J1
     Heuristic I
Search Operation:	

       Less Aggressive Search
                           Aggressive Search                       J2        J3         J4          J5
Candidate List(CL):	

 {J1, J2, J3, J4, J5, J6, J7, J8, J9}
                       {J4, J5, J7}
                       {J2, J3, J6}

Job to be examined(J):	

 J10
                          J1
                          J9
                          J8                                            J6                   J7

Partition (P):	

PP3={} J3, J6, J8}
                P4={J1}
                P4={}
                P3={J4,
                P2={J2,
                P2={} J5, J7, J9}
                  1={}
 Check Operation:                                                       J8                   J9
    Sum (CL+J+P)=100  50
        (CL+J+P)=10
        (CL+J+P)=80
        (CL+J+P)=40 50                                                           J10
 Final Results:
          P1={J10}


                                                       Scheduled             Being Examined
                                      Partition        Candidate             Not Examined
                                                                                                  15	
  
Heuristics/Hints	

	

•  Two other heuristics	

    –  Heuristic II adds a job to a sub-workflow if all of its unscheduled
       children can be added to that sub-workflow. 	

    –  For a job with multiple children, Heuristic III adds it to a sub-
       workflow when all of its children has been scheduled. 	





                                                      Figure	
  2.4	
   	
  HeurisPc	
  I,	
  II,	
  and	
  III	
  (from	
  leW	
  
                                                      to	
   right)	
   parPPon	
   an	
   example	
   workflow	
  
                                                      into	
  different	
  sub-­‐workflows.	
  	
  




                                                                                                                                      16	
  
Similar step we put J2, J3,
                                      The firstto J8, is similar to J6
                                      into P2.
                                      Heuristic I that puts J10 into P1

                                                                                 J1
     Heuristic II: check unscheduled children
Search Operation:	

                                              J2        J3         J4          J5
Candidate List(CL):	

 {J6} J5, J7, J9}
                       {J4,

Job to be examined(J):	

 J8
                          J1                                           J6                   J7

Partition (P):	

P1={J10}
                P3={J1,J4,J5,J7,J9}
                P3={}
                P2={J8, J2,J3, J6}
                P2={} J6}
 Check Operation:                                                      J8                   J9
    Sum (CL+J+P)=20  50
        (CL+J+P)=50
        (CL+J+P)=90                                                             J10
 Final Results:
          P1={J10}
          P2={J8, J2,J3, J6}
          P3={J1,J4,J5,J7,J9}
                                                      Scheduled             Being Examined
                                          Partition   Candidate             Not Examined
                                                                                                 17	
  
Similar to J8,we put J9, J7, J6
                                    The firstto J8, is similar to J3,J4, J5,
                                     Similar step we put J2,
                                    J1 into P3.
                                    Heuristic I that puts J10 into P1
                                     into P2.

                                                                                J1
     Heuristic III: all children should be examined
 Search Operation:	

                                            J2        J3         J4          J5
Candidate List(CL):	

 {J6}
                       {J4}

Job to be examined(J):	

 J8
                          J1                                          J6                   J7

Partition(P):	

 P1={J10}
                 P3={J1,
                 P3={} J4,J5, J7,
                 P2={J8, J2,J3, J6} J9}
                 P2={} J6}
 Check Operation:                                     J8                                   J9
    J1 has a child Non-examined job J4
    Sum (CL+J+P)=20  50 and J6 has no Non-examined job                         J10
 Final Results:
           P1={J10}
           P2={J8, J2,J3, J6}
           P3={J1,J4,J5,J7,J9}
                                                     Scheduled             Being Examined
                                      Partition      Candidate             Not Examined
                                                                                                18	
  
Genetics Algorithm	

Job1	
  Job2	
  Job3	
   Job4	
  Job5	
     VM1	
  VM2	
  VM3	
  VM4	
  VM5	
  VM6	
  
   1	
     2	
     2	
     1	
     2	
        2	
     2	
     2	
     1	
     1	
     1	
  




                                                                                              19	
  
Fitness Functions	


      Makespan     Cost
Min(!          +        )
      Deadline    Budget
Min(Makespan), Cost  Budget
 Min(Cost), Makespan  Deadline

     With Constraints	





                                  20	
  
Ant Colony Optimization	

                     Job1	
  Job2	
  Job3	
   Job4	
  Job5	
     VM1	
  VM2	
  VM3	
  VM4	
  VM5	
  VM6	
  
Global
                   1	
   2	
   2	
   1	
   2	
            2	
   2	
              2	
     1	
     1	
     1	
  
Optimization:	

Local             Job1	
  Job2	
  Job3	
   VM1	
  VM2	
  VM3	
  
Optimization: 	

   1	
   1	
   1	
         1	
   1	
   1	
  
                         Job4	
  Job5	
    VM4	
  VM5	
  VM6	
  
                            2	
   2	
       2	
   2	
   2	
  




                                                                                                                 21	
  
Scheduling Sub-workflows	

•  Estimating the overall runtime of sub-workflows	

   –  Critical Path 	

   –  Average CPU Time is cumulative CPU time of all jobs
      divided by the number of available resources. 	

   –  Earliest Finish Time is the moment the last sink job
      completes	

•  Provisioning resources based on the estimation
   results	

•  Scheduling Sub-workflows on Sites	




                                                             22	
  
Evaluation: Heuristics	

•  In this example, we aim to reduce data movement and
   makespan with storage constraints.	

•  Workflows used:	

   –  Montage: an astronomy application, I/O intensive, ~24,000
      tasks and 58GB data.	

   –  CyberShake: a seismology application, memory intensive,
      ~10,000 tasks and 57GB data.	

   –  Epigenomics: a bioinformatics application, CPU intensive,
      ~1,500 tasks and 23GB data.	

   –  Each were run five times.	



                                                              23	
  
Performance: CyberShake	

•  Heuristic II produces 5 sub-workflows
   with 10 dependencies between them.
   Heuristic I produces 4 sub-workflows
   and 3 dependencies. Heuristic III
   produces 4 sub-workflows and 5
   dependencies 	

•  Heuristic II and III simply add a job if
   it doesn’t violate the storage or cross
   dependency constraints. 	

•  Heuristic I performs better in terms of
   both runtime reduction and disk usage
   because it tends to put the whole fan
   structure into the same sub-workflow. 	

                                              24	
  
Performance: CyberShake	

•  Storage Constraints	

•  With more sites and partitions, data movement is increased
   although computational capability is improved.	

•  The CyberShake workflow across two sites with a storage
   constraint of 35GB performs best. 	





                                                           25	
  
Performance of Estimator and Scheduler	

 •  Three estimators and two schedulers are evaluated with
    CyberShake workflow. 	

 •  The combination of EFT estimator + HEFT scheduler (EFT
    +HEFT) performs best (10%). 	

 •  HEFT scheduler is slightly better than MinMin scheduler
    with all three estimators. 	





                                                          26	
  
Publications	

    Integration of Workflow Partitioning and Resource Provisioning, Weiwei Chen, Ewa Deelman,
accepted, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGrid 2012), Doctoral Symposium, Ottawa, Canada, May 13-15, 2012	

	

    Improving Scientific Workflow Performance using Policy Based Data Placement, Muhammad
Ali Amer, Ann Chervenak and Weiwei Chen, accepted, 2012 IEEE International Symposium on
Policies for Distributed Systems and Networks, Chapel Hill, NC, July 2012	

	

     Fault Tolerant Clustering in Scientific Workflows, Weiwei Chen, Ewa Deelman, IEEE
International Workshop on Scientific Workflows (SWF), accepted, in conjunction with 8th IEEE
World Congress on Servicess, Honolulu, Hawaii, Jun 2012	

	

    Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6th
Workshop on Workflows in Support of Large-Scale Science, in conjunction with
Supercomputing 2011, Seattle, Nov 2011	

	

    Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, Weiwei
Chen, Ewa Deelman, 9th International Conference on Parallel Processing and Applied
Mathematics (PPAM 2011), Poland, Sep 2011	



                                                                                              27	
  
Future Work	


•  GA and ACO: Efficiency	

•  Provisioning Algorithms	

•  Other Algorithms	





                                28	
  
QA	

   Thank you!	

 For further info: 
  pegasus.isi.edu	

www.isi.edu/~wchen	





                         29	
  

Mais conteúdo relacionado

Mais procurados

The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810Gennaro (Rino) Persico
 
Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Lee Stott
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
MapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanMapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanJamie Pitts
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloudboorad
 
Database Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaDatabase Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaMichael Findling
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Biug 20112026 dimensional modeling and mdx best practices
Biug 20112026   dimensional modeling and mdx best practicesBiug 20112026   dimensional modeling and mdx best practices
Biug 20112026 dimensional modeling and mdx best practicesItay Braun
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadisdandre
 

Mais procurados (20)

The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810The fillmore-group-aese-presentation-111810
The fillmore-group-aese-presentation-111810
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012
 
Ta3
Ta3Ta3
Ta3
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
MapReduce Using Perl and Gearman
MapReduce Using Perl and GearmanMapReduce Using Perl and Gearman
MapReduce Using Perl and Gearman
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
Larocca
LaroccaLarocca
Larocca
 
Map Reduce An Introduction
Map Reduce An IntroductionMap Reduce An Introduction
Map Reduce An Introduction
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
Database Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaDatabase Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 Beta
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Optimization
OptimizationOptimization
Optimization
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Biug 20112026 dimensional modeling and mdx best practices
Biug 20112026   dimensional modeling and mdx best practicesBiug 20112026   dimensional modeling and mdx best practices
Biug 20112026 dimensional modeling and mdx best practices
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 

Destaque

C# Parallel programming
C# Parallel programmingC# Parallel programming
C# Parallel programmingUmeshwaran V
 
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORETASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-COREHaris Muhammed
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsMenlo Systems GmbH
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Real time system in Multicore/Multiprocessor system
Real time system in Multicore/Multiprocessor systemReal time system in Multicore/Multiprocessor system
Real time system in Multicore/Multiprocessor systemMayank Garg
 
Critical Chain Project Management
Critical Chain Project ManagementCritical Chain Project Management
Critical Chain Project ManagementFred Wiersma
 

Destaque (8)

C# Parallel programming
C# Parallel programmingC# Parallel programming
C# Parallel programming
 
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORETASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systems
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Real time system in Multicore/Multiprocessor system
Real time system in Multicore/Multiprocessor systemReal time system in Multicore/Multiprocessor system
Real time system in Multicore/Multiprocessor system
 
Critical Chain Project Management
Critical Chain Project ManagementCritical Chain Project Management
Critical Chain Project Management
 
Multicore scheduling in automotive ECUs
Multicore scheduling in automotive ECUsMulticore scheduling in automotive ECUs
Multicore scheduling in automotive ECUs
 

Semelhante a Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows

MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...Amazon Web Services Japan
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...Big Data Spain
 

Semelhante a Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows (20)

MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...
Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DO...
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Workflow Partitioning and Resource Provisioning for Distributed Scientific Workflows

  • 1. Integration of Workflow Partitioning and Resource Provisioning Weiwei Chen, Ewa Deelman {wchen,deelman}@isi.edu Information Sciences Institute University of Southern California CCGrid 2012, Ottawa, Canada 1  
  • 2. Outline •  Introduction •  System Overview •  Solution –  Heuristics –  Genetic Algorithms –  Ant Colony Optimization •  Evaluation –  Heuristics •  Related Work •  Q&A 2  
  • 3. Introduction •  Scientific Workflows –  A set of jobs and the dependencies between them. –  DAG (Directed Acyclic Graph), where nodes represent ! ! computation and directed edges represent data! flow Job1 dependencies. ! •  Pegasus Workflow Management System ! Job2 Job3 Job4 –  Workflow Planner: Pegasus ! •  Abstract Workflow: portable, execution site !! independent ! Job5 •  Concrete Workflow: bound to specific sites ! ! –  Workflow Engine: DAGMan ! ! ! –  Resource Provisioner: Wrangler ! ! –  Execution/Scheduling System: Condor/Condor-G –  Environment: Grids, Clouds, Clusters, many-cores 3  
  • 4. Introduction •  Background –  Large scale workflows require multiple execution sites to run. –  The entire CyberShake earthquake science workflow has 16,000 sub- workflows and each sub-workflow has ~24,000 jobs and requires ~58GB. –  A Montage workflow with a size of 8 degree square of sky has ~10,000 jobs and requires ~57GB data. the Galactic Plane that covers 360 degrees along the plane and +/-20 degrees on either side of it. Figure  1.1  Output  of  the  Montage  workflow.  The  image   Figure   1.2   CyberShake   workflow   above  was  recently  created  to  verify  a  bar  in  the  spiral   and   example   output   for   the   galaxy  M31.   Southern  California  Area.     4  
  • 5. Single Site VM     Provisioner   Data   Staging   DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler  
  • 6. Single Site •  Constraints/Concerns –  Storage systems –  File systems –  Data transfer services –  Data constraints –  Services constraints 6  
  • 7. Multiple Sites, No Partitioning VM     Provisioner   Data   Staging   DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler   Data   Staging   VM     Provisioner  
  • 8. Multiple Sites, No Partitioning •  Constraints/Concerns –  Job migration –  Load balancing –  Overhead –  Cost –  Deadline –  Resource utilizations 8  
  • 9. Multiple Sites, Partitioning VM     Provisioner   Data   Staging   DAX   DAG   Workflow     Workflow     Job   Planner   Engine   Scheduler   DAX   DAX   Workflow   VM     ParPPoner   Scheduler   Provisioner   Data   Staging   DAG   DAX   Workflow     Workflow     Job   Planner   Engine   Scheduler  
  • 10. Solution •  A hierarchical workflow Ø  It contains workflows (sub-workflow) as its jobs. Ø  Sub-workflows are planned at the execution sites and matched to the resources in them. •  Workflow Partitioning vs Job Grouping/Clustering Ø  Heterogeneous Environments §  MPIDAG, Condor DAG, etc. Ø  Data Placement Services §  Bulk Data Transfer 10  
  • 11. Solution •  Resource Provisioning Ø  Virtual Cluster Provisioning Ø  The number of resources and the type of VM instances (worker node, master node and I/O node) are the parameters indicating the storage and computational capability of a virtual cluster. Ø  The topology and structure of a virtual cluster: balance the load in different services (scheduling service, data transfer service, etc.) and avoid a bottleneck. Ø  On grids, usually the data transfer service is already available and does not need further configuration. 11  
  • 12. Data Transfer across Sites •  A pre-script to transfer data before and after the job execution •  A single data transfer job on demand •  A bulk data transfer job Ø  merge data transfer Computation Data Transfer 12  
  • 13. Backward Search Algorithm •  Targeting a workflow with a fan-in-fan-out structure •  Search operation involves three steps. It starts from the sink job and proceeds backward. –  First, check if it’s safe to add the whole fan structure into the sub-workflow (aggressive search). –  If not, a cut is issued between this fan-in job and its parents to avoid cycle dependency and increase parallelism. –  Second, a neutral search is performed on its parent jobs, which include all of its predecessors until the search reaches a fan-out job. –  If this partition is still too large, a conservative search is performed that includes all of its predecessors until it reaches a fan-in job or a fan- out job. Figure  2.3  Search  OperaPon   13  
  • 14. Heuristics (Storage Constraints) •  Heuristics I –  Dependencies between sub-workflows should be reduced since they represent data transfer between sites. –  Usually jobs that have parent-child relationships share a lot of data. It’s reasonable to schedule such jobs into the same sub-workflow. –  Heuristic I only checks three types of nodes: the fan-out job, the fan-in job, and the parents of the fan-in job and search for the potential candidate jobs that have parent-child relationships between them. –  Check operation means checking whether one job and its potential candidate jobs can be added to a sub-workflow without violating constraints. –  Our algorithm reduces the time complexity of check operations by n folds, while n equals to the average depth of the fan-in-fan-out structure. 14  
  • 15. J1 Heuristic I Search Operation: Less Aggressive Search Aggressive Search J2 J3 J4 J5 Candidate List(CL): {J1, J2, J3, J4, J5, J6, J7, J8, J9} {J4, J5, J7} {J2, J3, J6} Job to be examined(J): J10 J1 J9 J8 J6 J7 Partition (P): PP3={} J3, J6, J8} P4={J1} P4={} P3={J4, P2={J2, P2={} J5, J7, J9} 1={} Check Operation: J8 J9 Sum (CL+J+P)=100 50 (CL+J+P)=10 (CL+J+P)=80 (CL+J+P)=40 50 J10 Final Results: P1={J10} Scheduled Being Examined Partition Candidate Not Examined 15  
  • 16. Heuristics/Hints •  Two other heuristics –  Heuristic II adds a job to a sub-workflow if all of its unscheduled children can be added to that sub-workflow. –  For a job with multiple children, Heuristic III adds it to a sub- workflow when all of its children has been scheduled. Figure  2.4    HeurisPc  I,  II,  and  III  (from  leW   to   right)   parPPon   an   example   workflow   into  different  sub-­‐workflows.     16  
  • 17. Similar step we put J2, J3, The firstto J8, is similar to J6 into P2. Heuristic I that puts J10 into P1 J1 Heuristic II: check unscheduled children Search Operation: J2 J3 J4 J5 Candidate List(CL): {J6} J5, J7, J9} {J4, Job to be examined(J): J8 J1 J6 J7 Partition (P): P1={J10} P3={J1,J4,J5,J7,J9} P3={} P2={J8, J2,J3, J6} P2={} J6} Check Operation: J8 J9 Sum (CL+J+P)=20 50 (CL+J+P)=50 (CL+J+P)=90 J10 Final Results: P1={J10} P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9} Scheduled Being Examined Partition Candidate Not Examined 17  
  • 18. Similar to J8,we put J9, J7, J6 The firstto J8, is similar to J3,J4, J5, Similar step we put J2, J1 into P3. Heuristic I that puts J10 into P1 into P2. J1 Heuristic III: all children should be examined Search Operation: J2 J3 J4 J5 Candidate List(CL): {J6} {J4} Job to be examined(J): J8 J1 J6 J7 Partition(P): P1={J10} P3={J1, P3={} J4,J5, J7, P2={J8, J2,J3, J6} J9} P2={} J6} Check Operation: J8 J9 J1 has a child Non-examined job J4 Sum (CL+J+P)=20 50 and J6 has no Non-examined job J10 Final Results: P1={J10} P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9} Scheduled Being Examined Partition Candidate Not Examined 18  
  • 19. Genetics Algorithm Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6   1   2   2   1   2   2   2   2   1   1   1   19  
  • 20. Fitness Functions Makespan Cost Min(! + ) Deadline Budget Min(Makespan), Cost Budget Min(Cost), Makespan Deadline With Constraints 20  
  • 21. Ant Colony Optimization Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6   Global 1   2   2   1   2   2   2   2   1   1   1   Optimization: Local Job1  Job2  Job3   VM1  VM2  VM3   Optimization: 1   1   1   1   1   1   Job4  Job5   VM4  VM5  VM6   2   2   2   2   2   21  
  • 22. Scheduling Sub-workflows •  Estimating the overall runtime of sub-workflows –  Critical Path –  Average CPU Time is cumulative CPU time of all jobs divided by the number of available resources. –  Earliest Finish Time is the moment the last sink job completes •  Provisioning resources based on the estimation results •  Scheduling Sub-workflows on Sites 22  
  • 23. Evaluation: Heuristics •  In this example, we aim to reduce data movement and makespan with storage constraints. •  Workflows used: –  Montage: an astronomy application, I/O intensive, ~24,000 tasks and 58GB data. –  CyberShake: a seismology application, memory intensive, ~10,000 tasks and 57GB data. –  Epigenomics: a bioinformatics application, CPU intensive, ~1,500 tasks and 23GB data. –  Each were run five times. 23  
  • 24. Performance: CyberShake •  Heuristic II produces 5 sub-workflows with 10 dependencies between them. Heuristic I produces 4 sub-workflows and 3 dependencies. Heuristic III produces 4 sub-workflows and 5 dependencies •  Heuristic II and III simply add a job if it doesn’t violate the storage or cross dependency constraints. •  Heuristic I performs better in terms of both runtime reduction and disk usage because it tends to put the whole fan structure into the same sub-workflow. 24  
  • 25. Performance: CyberShake •  Storage Constraints •  With more sites and partitions, data movement is increased although computational capability is improved. •  The CyberShake workflow across two sites with a storage constraint of 35GB performs best. 25  
  • 26. Performance of Estimator and Scheduler •  Three estimators and two schedulers are evaluated with CyberShake workflow. •  The combination of EFT estimator + HEFT scheduler (EFT +HEFT) performs best (10%). •  HEFT scheduler is slightly better than MinMin scheduler with all three estimators. 26  
  • 27. Publications Integration of Workflow Partitioning and Resource Provisioning, Weiwei Chen, Ewa Deelman, accepted, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Doctoral Symposium, Ottawa, Canada, May 13-15, 2012 Improving Scientific Workflow Performance using Policy Based Data Placement, Muhammad Ali Amer, Ann Chervenak and Weiwei Chen, accepted, 2012 IEEE International Symposium on Policies for Distributed Systems and Networks, Chapel Hill, NC, July 2012 Fault Tolerant Clustering in Scientific Workflows, Weiwei Chen, Ewa Deelman, IEEE International Workshop on Scientific Workflows (SWF), accepted, in conjunction with 8th IEEE World Congress on Servicess, Honolulu, Hawaii, Jun 2012 Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6th Workshop on Workflows in Support of Large-Scale Science, in conjunction with Supercomputing 2011, Seattle, Nov 2011 Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, Weiwei Chen, Ewa Deelman, 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Poland, Sep 2011 27  
  • 28. Future Work •  GA and ACO: Efficiency •  Provisioning Algorithms •  Other Algorithms 28  
  • 29. QA Thank you! For further info: pegasus.isi.edu www.isi.edu/~wchen 29