The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
HPC with Clouds and Cloud Technologies
1. Jaliya Ekanayake and Geoffrey Fox
School of Informatics and Computing
Indiana University Bloomington
Cloud Computing and Software Services: Theory and Techniques
July, 2010
Presented by:
Inderjeet Singh
2. Introduction
Problem
Data Analysis Applications
Evaluations and Analysis
Performance of MPI on Clouds
Benchmarks and Results
Conclusions and Future Work
Critique
3.
4. Apache Hadoop (OpenSource version of Google MapReduce)
DryadLINQ (Microsoft API for Dryad)
CGL-MapReduce (Iterative version of MapReduce)
Cloud technologies/Parallel Runtimes/Cloud Runtimes
5. On demand provisioning of resources
Customizable Virtual Machines (VM)
Root privileges
Provisioning is very fast (within minutes)
You pay only for what you use
Better resource utilization
6. Cloud Technologies
Moving computation to data
Better Quality of Service (QoS)
Simple communication topologies
Distributed file system (HDFS,GFS)
Most HPC applications are based upon MPI
Many fine grained communication topologies
Usage of fast network
7. Software framework to support distributed computing
on large datasets on cluster of computers
Map step - The master node takes the input, partitions it
up into smaller sub-problems, and distributes them to
worker nodes. A worker node may do this again in
turn, leading to a multi-level tree structure. The worker
node processes the smaller problem, and passes the
answer back to its master node
Reduce step - The master node collects the answers to all
the sub-problems and combines them in some way to
form the output or answer
8. Large data/compute intensive applications
Traditional approach
Execution on Clusters/grid/supercomputers
Moving both application and data to available
computational power
Efficiency decreases with large datasets
Better approach
Execution with Cloud technologies
Moving computations to data to perform processing
More data centric approach
10. What applications are best handled by cloud
technologies?
What overheads do they introduce?
Can traditional parallel runtimes such as MPI
be used in cloud?
If so, what overheads do they have?
11. Types of Applications (Based upon
communication)
Map only (Cap3)
Map Reduce (HEP)
Iterative/Complex style (Matrix Multiplication and
K-Means Clustering)
12. Cap3 - Sequence assembly program that operates
on a collection of gene sequence files to produce
several outputs
HEP - High Energy Physics data analysis application
K-Means clustering - Performs iteratively refining
computation of clusters
Matrix Multiplication – Cannon’s algorithm
13.
14. MapReduce does not support iterative/complex style
applications so [Fox] build CGL- MapReduce
CGL-Mapreduce – Supports long running tasks and retains
static data in memory across invocations
15. Performance (average running time)
Overhead = [P * T(P) – T(1)]/T(1)
P = No. of processes
DryadLINQ
Hadoop/
CGL
MapReduce/M
PI
16.
17.
18. CAP3 (map only) and HEP (mapreduce) perform well
with cloud runtimes
K-means clustering (iterative) and matrix
multiplications (iterative) show high overheads with
cloud runtimes compared to MPI runtime
CGL-Mapreduce also gives less overhead for large
datasets
19. Goals
Overhead of Virtual Machines (VM) on parallel
applications in MPI
How applications with different
communication/computation (c/c) ratio perform on
cloud?
Effect of different CPU core assignment strategies
on VMs and running these MPI applications on
these VMs
20. Three MPI applications with different c/c
ratios requirements
Matrix multiplication (Cannon’s algorithm)
K-Means clustering
Concurrent wave solver
22. Eucalyptus and Xen based cloud
infrastructure
16 nodes with 2 Quad Core Intel Xeon processors and
32 GB of memory
Nodes connected with 1 gigabit Ethernet connection
Same s/w configuration for both bare-metal
nodes and VMs
• OS - Red Hat Enterprise Linux Server release 5.2
• OpenMP version 1.3.2
23. Different CPU core/virtual machines assignment strategies
Invariant to select the number of MPI processes
Number of MPI processes = Number of CPU cores used
24. Performance – 64 CPU Cores Speedup – Fixed Matrix size
(5184*5184)
◦ Speedup decrease 34% between Bare metal and 8-VM/node
at 81 processes
◦ Exchange of large messages and more communication
25. Performance – 128 CPU Cores Total overhead (Number of MPI
Processes =128)
◦ Communication is very less than computations
◦ Communication here depends upon number of clusters formed
◦ Overhead is large for small data sizes, so less speedup is
observed
26. Total Overhead (Number of MPI
Performance – 128 CPU Cores Processes = 128)
◦ Amount of communications is fixed, less data transfer rates
◦ Lower c/c ratio of O(1/n) leads to more latency and lower
performance on VMs
◦ 8-VMs per node has 7% more overhead than bare metal node
27. Communication between dom0 and domUs when 1-VM per node is deployed
(top). Communication between dom0 and domUs when 8-VMs per node are
deployed (bottom)
◦ In multi VMs configuration scheduling of I/O
operation of DomUs (user domains) happens via
Dom0 (privileged OS)
28. Figure: LAM vs. OpenMPI in different VM configurations
When using mutliple VMs on multi-core CPUs, it is good to
use runtimes supporting in-node communications
(OpenMP vs LAM-MPI)
29. Cloud runtimes work well for pleasing parallel (map
only and mapreduce) applications with large
datasets
Overheads of cloud runtimes are high with parallel
applications that require iterative/complex
communication patterns (MPI based applications)
Work needs to be done on finding algorithms for
these applications that are cloud friendly
CGL-MapReduce is efficient for iterative style
mapreduce applications (k-means)
30. Overheads for MPI applications increase as number
of VMs/node increase (22-50% degradation)
In-node communication in important
MapReduce applications (not susceptible to
latencies) may perform well on VMs deployed on
clouds
Integration of MapReduce and MPI (biological DNA
sequencing application)
31. No results of implementation of pleasing parallel
applications (Cap3, HEP) with MPI, missing MPI and
cloud runtimes time comparisons
Missing evaluations of HPC applications
implemented with cloud runtimes on private
cloud, which is critical to show the effect of multi
VMs/multi-core configurations on performances of
these applications
Difference in memory sizes (16/32 GB) for clusters
of different OS. This could lead to biased results
32. Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel
Computing with Clouds and Cloud Technologies, Lecture Notes of
the Institute for Computer Sciences, Social Informatics and
Telecommunications Engineering (2010), Pages 20, Volume 34
High Performance Parallel Computing with Clouds and Cloud
Technologies. http://www.slideshare.net/jaliyae/high-performance-
parallel-computing-with-clouds-and-cloud-technologies
Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce