1. Cloud Computing and Management
Mostafa Ead
October 17, 2011 CS854: Cloud Computing and Management 1
2. Objective
Harness the enormous power
of current clusters of
commodity machines and fast
reaction to the data-tsunami
October 17, 2011 CS854: Cloud Computing and Management 2
3. Outline
Parallel Processing
Google MapReduce
DryadLINQ
Dryad
General Comments
Takeaways
Discussion
October 17, 2011 CS854: Cloud Computing and Management 3
4. Parallel Processing
Why Parallel Processing?
Execution time reduction
Cheap clusters of commodity hardware
Data-Driven world
Exploit multi-cores in your workstation or many
machines in your cluster.
October 17, 2011 CS854: Cloud Computing and Management 4
5. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions
2. Mapping the concurrent portions to multiple
processes running in parallel
Static mapping hinders scalability
3. Distribute input, intermediate, output data, or any
combination of them
4. Manage accesses to shared data
5. Handle failures
In commodity clusters, failure is the norm rather than the
exception.
October 17, 2011 CS854: Cloud Computing and Management 5
6. Parallel Processing
Nightmare for a parallel programs developer
Which task can be automated ?
October 17, 2011 CS854: Cloud Computing and Management 6
7. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions No
2. Mapping the concurrent portions to multiple
processes running in parallel
3. Distribute input, intermediate, output data, or any
combination of them
4. Manage accesses to shared data
5. Handle failures
October 17, 2011 CS854: Cloud Computing and Management 7
8. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions No
2. Mapping the concurrent portions to multiple
processes running in parallel Yes iff dynamic
mapping
3. Distribute input, intermediate, output data, or any
combination of them
4. Manage accesses to shared data
5. Handle failures
October 17, 2011 CS854: Cloud Computing and Management 8
9. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions No
2. Mapping the concurrent portions to multiple
processes running in parallel Yes iff dynamic
mapping
3. Distribute input, intermediate, output data, or any
combination of them Yes: GFS and HDFS
4. Manage accesses to shared data
5. Handle failures
October 17, 2011 CS854: Cloud Computing and Management 9
10. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions No
2. Mapping the concurrent portions to multiple
processes running in parallel Yes iff dynamic
mapping
3. Distribute input, intermediate, output data, or any
combination of them Yes: GFS and HDFS
4. Manage accesses to shared data Yes iff read only
access
5. Handle failures
October 17, 2011 CS854: Cloud Computing and Management 10
11. Parallel Processing
Tasks of a parallel programs developer:
1. Identifying concurrent portions No
2. Mapping the concurrent portions to multiple
processes running in parallel Yes iff dynamic
mapping
3. Distribute input, intermediate, output data, or any
combination of them Yes: GFS and HDFS
4. Manage accesses to shared data Yes iff read only
access
5. Handle failures Yes with restrictions
October 17, 2011 CS854: Cloud Computing and Management 11
12. MapReduce: Simplified Data
Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Google, Inc.
OSDI-2004
Citation Count: 3487 (Google Scholar)
October 17, 2011 CS854: Cloud Computing and Management 12
13. MapReduce: Programming model
Developer specifies one map and one reduce
functions.
Map:
Input: one key-value pair
Output: set of intermediate key-value pairs
Reduce:
Input: intermediate key-value pairs grouped by the key
Output: one or multiple output key-value pairs
October 17, 2011 CS854: Cloud Computing and Management 13
14. MapReduce: Wordcount Example
Mostafa: 1
Key: offset in input file
is: 1
Value: “Mostafa is map1
presenting: 1
presenting MapReduce”
MapReduce: 1
Execution
Parallel
Mostafa: 1
Key: offset in input file
is: 1
Value: “Mostafa is map2
presenting: 1
presenting DryadLINQ”
DryadLINQ: 1
October 17, 2011 CS854: Cloud Computing and Management 14
15. MapReduce: Wordcount Example
Mostafa: 1 DryadLINQ: 1
is: 1
presenting: 1 is: 1
MapReduce: 1 is: 1
Shuffle MapReduce: 1
& Sort
Mostafa: 1
Mostafa: 1 Mostafa: 1
is: 1
presenting: 1 presenting: 1
DryadLINQ: 1 presenting: 1
October 17, 2011 CS854: Cloud Computing and Management 15
16. MapReduce: Wordcount Example
DryadLINQ: 1
DryadLINQ: 1
is: 1
is: 1
is: 2
MapReduce: 1 5 calls
reduce MapReduce: 1
Mostafa: 1
Mostafa: 2
Mostafa: 1
presenting: 2
presenting: 1
presenting: 1
October 17, 2011 CS854: Cloud Computing and Management 16
18. MapReduce: Fault Tolerance
States of worker tasks: idle, in-progress or completed
Worker Failure:
Master pings workers periodically.
In-Progress tasks are set to idle
Completed map tasks are set to idle, why?
All in-progress (no completed yet) reduce tasks are
informed with the re-execution of map tasks
October 17, 2011 CS854: Cloud Computing and Management 18
19. MapReduce: Fault Tolerance
Master Failure:
Current (in 2004) implementation aborts the whole MR
job
It is not a free lunch:
The developer should write deterministic map/reduce
functions.
October 17, 2011 CS854: Cloud Computing and Management 19
20. MapReduce: Backup Tasks
Problem:
Straggler Tasks
Bad disk performance (30 MB/s vs 1 MB/s)
Contention on the machine’s resources by other competing
tasks.
Total elapsed time of the job will be increased.
Solution:
Master schedules some of the in-progress tasks for re-
execution.
The first task to finish will be considered and the other
will be ignored.
October 17, 2011 CS854: Cloud Computing and Management 20
21. MapReduce: Refinements
1. Combiner Function:
Reduce function should be commutative & associative
<“the”, 1>, <“the”, 1>, <“the”, 1>, <“the”, 1> <“the”, 4>
<“the”, 2>, <“the”, 2> <“the”, 4>
Decreases intermediate data size sent over the
network from mappers to reducers.
The same code of reduce function can be executed as
the combiner in the map side.
October 17, 2011 CS854: Cloud Computing and Management 21
22. MapReduce: Refinements
2. Counters:
Count number of uppercase words, or count number
of German documents
Useful also for sanity check:
Sort: number of input key-value pairs should be equal to the
number of output key-value pairs.
Local copy of the counter at mappers/reducers
Periodic propagation to the master for aggregation.
October 17, 2011 CS854: Cloud Computing and Management 22
23. MapReduce: Performance
Cluster Configuration:
1800 machines
Machine: 2x 2GHz xeon processor, 4GB RAM, 2x 160 GB
local storage, and 1Gbps Eth
Network Topology: two-level tree-shaped switched
network
October 17, 2011 CS854: Cloud Computing and Management 23
24. MapReduce: Performance
Grep:
Input: 1010 100-byte records
3-char pattern occurs in 92,337 records.
M = 15000 and R = 1
October 17, 2011 CS854: Cloud Computing and Management 24
25. MapReduce: Performance
Grep:
Input: 1010 100-byte records
3-char pattern occurs in 92,337 records.
M = 15000 and R = 1 Low input rate as overhead of program
propagation to all workers and data
locality optimizations
October 17, 2011 CS854: Cloud Computing and Management 25
26. MapReduce: Performance
Grep:
Input: 1010 100-byte records
3-char pattern occurs in 92,337 records.
M = 15000 and R = 1 30 GB/s when 1764 workers
October 17, 2011 CS854: Cloud Computing and Management 26
27. MapReduce: Performance
Sort:
Input: 1010 100-byte records
Map function extracts sortKey from the record and
emits sortKey-record pairs
Identity Reduce function
Actual sort occurs at each reducer and handled by the
library.
M = 15000 and R = 4000
October 17, 2011 CS854: Cloud Computing and Management 27
28. MapReduce: Performance
Sort: Peak is 13 GB/s,
lower than Grep
October 17, 2011 CS854: Cloud Computing and Management 28
29. MapReduce: Performance
Sort: Peak is 13 GB/s,
lower than Grep
Shuffle starts as soon as the
first map finishes.
October 17, 2011 CS854: Cloud Computing and Management 29
30. MapReduce: Performance
Sort: Peak is 13 GB/s,
lower than Grep
Shuffle starts as soon as the
first map finishes.
Two humps: all workers are
assigned reduce tasks
October 17, 2011 CS854: Cloud Computing and Management 30
31. MapReduce: Performance
Sort: Peak is 13 GB/s,
lower than Grep
Shuffle starts as soon as the
first map finishes.
Two humps: all workers are
assigned reduce tasks
Note that rate of
input > shuffle > output
October 17, 2011 CS854: Cloud Computing and Management 31
32. MapReduce: Performance
Sort:
No Backup tasks
After 960 s: 5 straggler
reduce tasks and finish
300 s later; 44% increase
in elapsed time
October 17, 2011 CS854: Cloud Computing and Management 32
33. MapReduce: Performance
Sort:
200 tasks killed.
5% increase in
elapsed time
October 17, 2011 CS854: Cloud Computing and Management 33
34. Hadoop MapReduce
Hadoop[3] MapReduce is the open source
implementation of Google’s MapReduce
Terminology mappings:
Google MR Hadoop MR
Scheduling System JobTracker
Worker TaskTracker
GFS HDFS
October 17, 2011 CS854: Cloud Computing and Management 34
35. DryadLINQ: A System for General Purpose
Distributed Data-Parallel Computing Using a
High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar
Erlingsson, Pradeep Kumar Gunda and Job Currey
OSDI-2008
October 17, 2011 CS854: Cloud Computing and Management 35
36. DryadLINQ
Dryad LINQ
October 17, 2011 CS854: Cloud Computing and Management 36
37. DryadLINQ
Dryad [4] LINQ
A general purpose distributed execution
engine for coarse-grain data-parallel
applications
October 17, 2011 CS854: Cloud Computing and Management 37
38. DryadLINQ
Dryad LINQ [5]
LINQ: Language INtegrated Query
A set of extensions to the .NET Framework
that encompass language-integrated query,
and set operations.
October 17, 2011 CS854: Cloud Computing and Management 38
39. Dryad: Distributed Data-Parallel Programs from
Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell and
Dennis Fetterly
EuroSys-2007
October 17, 2011 CS854: Cloud Computing and Management 39
40. Dryad: System Overview
One Dryad job is a Directed Acyclic Graph (DAG)
Vertex is a sequential program, or a program that
exploits multi-cores on chip.
Edge is a data communication channel
October 17, 2011 CS854: Cloud Computing and Management 40
41. Dryad: System Overview
Tasks of Dryad Job
Manager:
Instantiation of DAG
Vertices assignment
Fault Tolerance
Job execution
progress
October 17, 2011 CS854: Cloud Computing and Management 41
42. Dryad: DAG Description Language
October 17, 2011 CS854: Cloud Computing and Management 42
43. Dryad: Writing a Program
Use this description language to describe the
concurrency between tasks of the job.
Identify channel types between communicating
vertices:
Shared-Memory and the awareness of each node
resources.
October 17, 2011 CS854: Cloud Computing and Management 43
44. DryadLINQ: Objective
DryadLINQ compiles LINQ programs into distributed
computations to run on Dryad
Instead of using DAG description language
It automatically specifies channel types in DAG
Targets wide variety of developers
Declarative and imperative programming paradigms
The illusion of writing programs that will be executed
sequentially.
Independence property of data
October 17, 2011 CS854: Cloud Computing and Management 44
46. DryadLINQ: Programming
DryadTable <T>
Supports underlying DFS, collections of NTFS files and
sets of database tables
Schema for data items
Partitioning schemes
HashParition<T, K> and RangePartition<T, K>
October 17, 2011 CS854: Cloud Computing and Management 46
47. DryadLINQ: Programming
If computation can not be expressed using any of
LINQ operators:
Apply: windowed computations
Fork:
Sharing scans, or eliminating common sub-expressions
October 17, 2011 CS854: Cloud Computing and Management 47
50. DryadLINQ: EPG
Every LINQ operator is represented by one vertex
Each vertex is replicated at runtime to represent
one Dryad stage
Vertex and Edge annotations
October 17, 2011 CS854: Cloud Computing and Management 50
51. DryadLINQ: Optimizations
Static optimizations
Pipelining
Removing Redundancy
I/O Reduction
Dynamic optimizations:
Modifications to the DAG at runtime.
October 17, 2011 CS854: Cloud Computing and Management 51
52. DryadLINQ: Dynamic Optimizations
Dynamic Aggregation (Combiners):
Node, Rack then Cluster levels
Aggregation topology is computed at runtime
Number of replicas of one vertex is dependent on the
number of independent partitions of input data
Job skeleton will remain the same
October 17, 2011 CS854: Cloud Computing and Management 52
57. DryadLINQ: TeraSort
Data is partitioned on a
key other than the
sortKey.
Each machine stores
3.87 GB
At n = 240, TeraBytes
data are sorted
October 17, 2011 CS854: Cloud Computing and Management 57
58. DryadLINQ: TeraSort
The more nodes added,
the increased data size,
and hence elapsed time
should be constant.
At n=1, no sampling , no
re-partitioning is
performed and no
network communication
2 ≤ n ≤ 20, machines are
connected to the same
switch.
October 17, 2011 CS854: Cloud Computing and Management 58
59. DryadLINQ: SkyServer
Compares locations and colors of stars in a large
astronomical table
Join two tables: 11.8 GB and 41.8 GB.
Input tables are manually range-partitioned into 40
partitions using the join key.
Number of machines n is varied between 1 and 40.
Output of joining two partitions is stored locally.
October 17, 2011 CS854: Cloud Computing and Management 59
60. DryadLINQ: SkyServer
DryadLINQ is 1.3 times
slower than Dryad
DryadLINQ is written in
a higher level language
Overhead of
communication between
.Net-DryadLINQ layer
and the Dryad layer
October 17, 2011 CS854: Cloud Computing and Management 60
61. General Comments
Stragglers and interaction with databases
mapred.map.tasks.speculative.execution property in
Hadoop MR
Fault tolerance and the blocking property
Missing Scalability evaluation of Google MR.
October 17, 2011 CS854: Cloud Computing and Management 61
62. Takeaways
Parallel processing becomes an easier task
Write deterministic functions
Independence property of data
October 17, 2011 CS854: Cloud Computing and Management 62
65. References
[1] Principles of Parallel Algorithm Design
[2] DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified data processing on
large clusters. In Proceedings of the 6th Symposium on Operating Systems
Design and Implementation (OSDI), 2004
[3] Hadoop MapReduce Project
http://hadoop.apache.org/mapreduce/
[4] ISARD, M., BUDIU, M., YU, Y., BIRRELL, A., AND FETTERLY, D. Dryad:
Distributed data-parallel programs from sequential building blocks. In
Proceedings of European Conference on Computer Systems (EuroSys), 2007.
[5] The LINQ project.
http://msdn.microsoft.com/netframework/future/linq/.
October 17, 2011 CS854: Cloud Computing and Management 65