1. EEDC
34330
Execution
Environments for Scientific Programming
Distributed Models
Computing
Master in Computer Architecture,
Networks and Systems - CANS
Group members:
Francesc Lordan francesc.lordan@bsc.es
Roger Rafanell roger.rafanell@bsc.es
2. Outline
Scientific Programming Models
– Part 1: Introduction
– Part 2: Reference parallel programming models
– Part 3: Novel parallel programming models
– Part 4: Conclusions
– Part 5: Questions
2
3. Introduction
Scientific applications:
– Solve complex problems
– Usually long run applications
– Implemented as a sequence of steps
– Each step (task) can be hard to compute
– So …
3
4. Introduction
In time terms…
Scientific applications can’t be no more
considered in sequential way!!!
OK?
4
5. Introduction
We need solutions based on distribute and
parallelize the work.
5
6. Introduction: MPI
1980s - early 1990s: Distributed memory & parallel computing started
as a bunch of incompatible software tools for writing programs.
MPI (Message Passing Interface)
becomes at 1994 a new reference
standard.
It provides:
– Portability
– Performance
– Functionality
– Availability (many implementations)
Good for: Parallelize the processing by distributing the work among
different machines/nodes.
6
7. Introduction: OpenMP
In the early 90's: Vendors of shared-memory machines supplied similar,
directive-based for Fortran programming extensions:
The user can extend a serial Fortran program with directives specifying
which loops were to be parallelized.
The compiler automatically parallelize such loops across the SMP
processors.
Implementations were all functionally similar, but were diverging (as usual).
Good for: Parallelize the computation among all the resources of a
single machine.
7
8. Reference PM: OpenMP
Programming model:
Computation is done by threads.
Fork-join model: Threads are dynamically created and destroyed.
Programmer can specify which variables are shared among threads
and which are private.
8
11. Reference PM: OpenMP
Strong Points:
– Keeps the sequential version.
– Communication is implicit.
– Easy to program, debug and modify.
– Good performance and scalability.
Weaknesses:
– Communication is implicit (less control).
– Simple and flat memory model (does not run on clusters).
– No support for accelerators.
11
12. Reference PM: MPI
Programming model:
Computation is done by several processes that execute the same program.
Communicates by passing data (send/receive).
Programmer decides:
– Which role the process plays by branches.
– Orders which communications are done.
12
14. Reference PM: MPI
Strong Points:
– Any parallel algorithm can be expressed in terms of the MPI paradigm.
– Data placement problems are rarely observed.
– Suitable for clusters/supercomputers (large number of processors).
– Excellent performance and scalable.
Weaknesses:
– Communication is explicit.
– Re-fitting serial code using MPI often requires refactoring.
– Dynamic load balancing is difficult to implement.
14
15. Reference PM: The best of both worlds
Hybrid (MPI + OpenMP):
– MPI is most effective for problems with “course-grained” parallelism.
– “Fine-grain” parallelization is successfully handled by OpenMP.
When use hybrid programming?
– The code exhibits limited scaling with MPI.
– The code could make use of dynamic load balancing.
– The code exhibits fine-grained or a combination of both fine-grained and
course-grained parallelism.
Some algorithms, such as computational fluid
dynamics, benefit greatly from a hybrid approach!!!
15
17. Reference PM: New reference approaches
Heterogeneous parallel-computing:
– CUDA (From NVIDIA)
– OpenCL (Open Compute Language)
– Cross-platform
• Implementations for
– ATI GPUs
– NVIDIA GPUs
– x86 CPUs
– API similar to OpenGL.
– Based on C.
17
18. Novel PMs
Workflows:
– Based on processes
– Requires planning and scheduling
– Needs flow control
– In-transit visibility
Novel PMs:
– Complex problems require simple solutions
(non reference PMs based)
18
19. Microsoft Dryad
The Dryad Project is investigating programming model
for writing parallel and distributed programs to scale from
a small cluster to a large data-center.
Theoretical approach (not used)
– Last and unique publication on 2007.
User defines:
– a set of methods
– a task dependency graph with a specific language.
19
21. MapReduce
Programmer only defines 2 functions
– Map(KInput,VInput) list(Ktemp,Vtemp)
– Reduce(Ktemp, list(Vtemp))list(Vtemp)
The library is in charge of all the rest
21
22. MapReduce
Weaknesses
– Specific programming.
– Not easy to find key value pairs.
Strong points
– Efficiency.
– Simplicity of the model.
– Community and tools.
22
24. COMPSs overview - Objective
Reduce the development complexity of
Grid/Cluster/Cloud applications to the minimum
– As easy as writing a sequential application.
Target applications: composed of tasks, most of them
repetitive
– Granularity of the tasks of the level of simulations or programs.
– Data: files, objects, arrays, primitive types.
24
25. COMPSs overview - Main idea
Parallel Resources
(a) Task selection +
Sequential Code parameters direction
Resource 1
...
for (i=0; i<N; i++){
(
(input, output, inout)
T1 (data1, data2);
T2 (data4, data5);
T3 (data2, data5, data6);
T4 (data7, data8);
T5 (data6, data8, data9); (d) Task completion,
}
... Resource 2
synchronization
T10 T20
T30
T40
. ..
(b) Task graph creation T50
T11 T21 Resource N
based on data (c) Scheduling,
T41
T31
dependencies data transfer,
T51 task execution
T12
…
25
26. Programming model - Sample application
Main program
public void main(){
Integer sum=0;
double pi
double step=1.0d /(double) num_steps;
for (int i=0;i<num_steps;i++){
computeInterval (i, step,sum);
}
pi = sum * step;
}
Subroutine
public static void computeInterval (int index, int step, Integer acum) {
int x = (index -0.5) * step;
acum = acum + 4.0/(1.0+x*x);
}
26
27. Programming Model - Task Selection
Task selection interface
public interface PiItf {
Implementation
@Method(declaringClass = “Pi")
void computeInterval(
@Parameter(direction = IN)
int index,
@Parameter(direction = IN)
int step,
@Parameter(direction = INOUT) Parameter
Integer index, metadata
);
}
13
27
28. Programming Model – Main code
public static void main(String[] args) {
Integer sum=0;
double pi
double step=1.0d /(double) num_steps;
NO CHANGES!
for (int i=0;i<num_steps;i++){
computeInterval (i, step, sum);
}
pi = sum * step;
}
1
0
Compute Step Compute … N-1
Step Compute
sum SYNCH
Step Interval Interval Interval
sum sum sum sum
28
29. Programming Model – Real Example
HMMER
Protein Database Aminoacid Sequence
IQKKSGKWHTLTDLRA
VNAVIQPMGPLQPGLP
SPAMIPKDWPLIIIDLK
DCFFTIPLAEQDCEKFA
FTIPAINNKEPATRF
Model Score E-value N
-------- ------ --------- ---
IL6_2 -78.5 0.13 1
COLFI_2 -164.5 0.35 1
pgtp_13 -36.3 0.48 1
clf2 -15.6 3.6 1
PKD_9 -24.0 5 1
29
35. COMPSs
Strong points
– Sequential programming approach
– Parallelization at task level
– Transparent data management and remote execution
– Can operate on different infrastructures:
• Cluster/Grid
• Cloud (Public/Private)
– PaaS
– IaaS
• Web services
Weaknesses:
– Under continuous development
– Does not offer binding to other languages (currently)
35
37. Manjrasoft Aneka
.NET based Platform-as-a-Service
Allows the usage of:
– Private Clouds.
– Public Clouds: Amazon EC2, Azure, GoGrid.
Offers mechanisms to control, reserve and monitoring
the resources.
– Also offers autoscale mechanisms.
3 programming models
– Task-based: tasks are put in a bag of executable tasks.
– Thread-based: exposes the .NET thread API but they are remotely
created.
– MapReduce
No data dependency analysis!!
37
38. Microsoft Azure
.NET based Platform-as-a-Service
Computing services
– Web Role: Web Service frontend.
– Worker Role: Backend.
Storage Services
Strong Point
– Scalable architecture.
Weakness
– Platform-tied applications.
38
39. Conclusions
Scientific problems are usually complex.
Current reference PMs are usually unsuitable.
New novel & flexible PMs came into the game.
Existing gap between scientifics and user-friendly
workflow-oriented programming models.
A sea of available solutions (DSLs)
39
The programming model can be defined as task-based and dependency-aware. In it, the programmer is only required to select a set of methods called from a sequential Java application, for them to be run as parallel tasks on the available distributed resources. Initially, the application starts running sequentially in one node and, whenever a call to a selected method is found, an asynchronous task is created instead, letting the main program continue its execution right away. The created tasks are processed by the runtime, which discovers the dependencies between them, building a task dependency graph. A renaming technique is used to avoid some kinds of dependencies. The parallelism exhibited by the graph is exploited as much as possible, scheduling the dependency-free tasks on the available resources. The scheduling is locality-aware: nodes can cache task data for later use, and a node that already has some or all the input data for a task gets more chances to run it. The runtime also manages these data - performing data copies or transfers if necessary - and controls the completion of tasks.
First, the user has to provide a Java interface which declares the methods that must be executed on the Grid, that’s to say, the different kinds of task. As I mentioned before, a task is a given call to one of these methods from the application code. In addition, the user can utilise Java annotations to provide: First, the class that implements the method. Second, the constraints for each kind of task, what are the capabilities that a resource must have to run the task. This is optional. Third, it is mandatory to state the type and direction of the parameters for each kind of task. Currenly we support the file type, the string type and all the primitive types.