BigData Analysis Frameworks

Cloud Computing and Management
Mostafa Ead

October 17, 2011 CS854: Cloud Computing and Management 1

Objective

Harness the enormous power
of current clusters of
commodity machines and fast
reaction to the data-tsunami


Outline
 Parallel Processing
 Google MapReduce
 DryadLINQ
 Dryad
 General Comments
 Takeaways
 Discussion


Parallel Processing
 Why Parallel Processing?
 Execution time reduction
 Cheap clusters of commodity hardware
 Data-Driven world
 Exploit multi-cores in your workstation or many
machines in your cluster.


Parallel Processing
 Tasks of a parallel programs developer:
1. Identifying concurrent portions
2. Mapping the concurrent portions to multiple
processes running in parallel
 Static mapping hinders scalability
3. Distribute input, intermediate, output data, or any
combination of them
4. Manage accesses to shared data
5. Handle failures
 In commodity clusters, failure is the norm rather than the
exception.


Parallel Processing
 Nightmare for a parallel programs developer
 Which task can be automated ?


Parallel Processing
1. Identifying concurrent portions  No
processes running in parallel
combination of them
5. Handle failures


Parallel Processing
processes running in parallel  Yes iff dynamic
mapping
combination of them
5. Handle failures


Parallel Processing
mapping
combination of them  Yes: GFS and HDFS
5. Handle failures


Parallel Processing
mapping
4. Manage accesses to shared data  Yes iff read only
access
5. Handle failures


Parallel Processing
mapping
4. Manage accesses to shared data  Yes iff read only
access
5. Handle failures  Yes with restrictions


MapReduce: Simplified Data
Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Google, Inc.
OSDI-2004
Citation Count: 3487 (Google Scholar)


MapReduce: Programming model
 Developer specifies one map and one reduce
functions.
 Map:
 Input: one key-value pair
 Output: set of intermediate key-value pairs
 Reduce:
 Input: intermediate key-value pairs grouped by the key
 Output: one or multiple output key-value pairs


MapReduce: Wordcount Example
Mostafa: 1
Key: offset in input file
is: 1
Value: “Mostafa is map1
presenting: 1
presenting MapReduce”
MapReduce: 1

Execution
Parallel
Mostafa: 1
Key: offset in input file
is: 1
Value: “Mostafa is map2
presenting: 1
presenting DryadLINQ”
DryadLINQ: 1


Mostafa: 1 DryadLINQ: 1
is: 1
presenting: 1 is: 1
MapReduce: 1 is: 1

Shuffle MapReduce: 1
& Sort
Mostafa: 1
Mostafa: 1 Mostafa: 1
is: 1
presenting: 1 presenting: 1
DryadLINQ: 1 presenting: 1


DryadLINQ: 1
DryadLINQ: 1
is: 1
is: 1
is: 2
MapReduce: 1 5 calls
reduce MapReduce: 1
Mostafa: 1
Mostafa: 2
Mostafa: 1
presenting: 2
presenting: 1
presenting: 1


Execution Overview


MapReduce: Fault Tolerance
 States of worker tasks: idle, in-progress or completed
 Worker Failure:
 Master pings workers periodically.
 In-Progress tasks are set to idle
 Completed map tasks are set to idle, why?
 All in-progress (no completed yet) reduce tasks are
informed with the re-execution of map tasks


MapReduce: Fault Tolerance
 Master Failure:
 Current (in 2004) implementation aborts the whole MR
job
 It is not a free lunch:
 The developer should write deterministic map/reduce
functions.


MapReduce: Backup Tasks
 Problem:
 Straggler Tasks
 Bad disk performance (30 MB/s vs 1 MB/s)
 Contention on the machine’s resources by other competing
tasks.
 Total elapsed time of the job will be increased.
 Solution:
 Master schedules some of the in-progress tasks for re-
execution.
 The first task to finish will be considered and the other
will be ignored.


MapReduce: Refinements
1. Combiner Function:
 Reduce function should be commutative & associative
 <“the”, 1>, <“the”, 1>, <“the”, 1>, <“the”, 1>  <“the”, 4>
 <“the”, 2>, <“the”, 2>  <“the”, 4>
 Decreases intermediate data size sent over the
network from mappers to reducers.
 The same code of reduce function can be executed as
the combiner in the map side.


MapReduce: Refinements
2. Counters:
 Count number of uppercase words, or count number
of German documents
 Useful also for sanity check:
 Sort: number of input key-value pairs should be equal to the
number of output key-value pairs.
 Local copy of the counter at mappers/reducers
 Periodic propagation to the master for aggregation.


MapReduce: Performance
 Cluster Configuration:
 1800 machines
 Machine: 2x 2GHz xeon processor, 4GB RAM, 2x 160 GB
local storage, and 1Gbps Eth
 Network Topology: two-level tree-shaped switched
network


 Grep:
 Input: 1010 100-byte records
 3-char pattern occurs in 92,337 records.
 M = 15000 and R = 1


 Grep:
 M = 15000 and R = 1 Low input rate as overhead of program
propagation to all workers and data
locality optimizations


 Grep:
 M = 15000 and R = 1 30 GB/s when 1764 workers


 Sort:
 Map function extracts sortKey from the record and
emits sortKey-record pairs
 Identity Reduce function
 Actual sort occurs at each reducer and handled by the
library.
 M = 15000 and R = 4000


 Sort: Peak is 13 GB/s,
lower than Grep


lower than Grep

Shuffle starts as soon as the
first map finishes.


lower than Grep

first map finishes.

Two humps: all workers are
assigned reduce tasks


lower than Grep

first map finishes.

Two humps: all workers are
assigned reduce tasks

Note that rate of
input > shuffle > output


 Sort:
No Backup tasks

After 960 s: 5 straggler
reduce tasks and finish
300 s later; 44% increase
in elapsed time


 Sort:
200 tasks killed.
 5% increase in
elapsed time


Hadoop MapReduce
 Hadoop[3] MapReduce is the open source
implementation of Google’s MapReduce
 Terminology mappings:

Google MR Hadoop MR
Scheduling System JobTracker
Worker TaskTracker
GFS HDFS


DryadLINQ: A System for General Purpose
Distributed Data-Parallel Computing Using a
High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar
Erlingsson, Pradeep Kumar Gunda and Job Currey
OSDI-2008


DryadLINQ

Dryad LINQ


DryadLINQ

Dryad [4] LINQ

A general purpose distributed execution
engine for coarse-grain data-parallel
applications


DryadLINQ

Dryad LINQ [5]

LINQ: Language INtegrated Query
A set of extensions to the .NET Framework
that encompass language-integrated query,
and set operations.


Dryad: Distributed Data-Parallel Programs from
Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell and
Dennis Fetterly
EuroSys-2007


Dryad: System Overview
 One Dryad job is a Directed Acyclic Graph (DAG)
 Vertex is a sequential program, or a program that
exploits multi-cores on chip.
 Edge is a data communication channel


Dryad: System Overview
 Tasks of Dryad Job
Manager:
 Instantiation of DAG
 Vertices assignment
 Fault Tolerance
 Job execution
progress


Dryad: DAG Description Language


Dryad: Writing a Program
 Use this description language to describe the
concurrency between tasks of the job.
 Identify channel types between communicating
vertices:
 Shared-Memory and the awareness of each node
resources.


DryadLINQ: Objective
 DryadLINQ compiles LINQ programs into distributed
computations to run on Dryad
 Instead of using DAG description language
 It automatically specifies channel types in DAG
 Targets wide variety of developers
 Declarative and imperative programming paradigms
 The illusion of writing programs that will be executed
sequentially.
 Independence property of data


DryadLINQ: System Overview


DryadLINQ: Programming
 DryadTable <T>
 Supports underlying DFS, collections of NTFS files and
sets of database tables
 Schema for data items
 Partitioning schemes
 HashParition<T, K> and RangePartition<T, K>


DryadLINQ: Programming
 If computation can not be expressed using any of
LINQ operators:
 Apply: windowed computations
 Fork:
 Sharing scans, or eliminating common sub-expressions


DryadLINQ: example


DryadLINQ: EPG
 Every LINQ operator is represented by one vertex
 Each vertex is replicated at runtime to represent
one Dryad stage
 Vertex and Edge annotations


DryadLINQ: Optimizations
 Static optimizations
 Pipelining
 Removing Redundancy
 I/O Reduction
 Dynamic optimizations:
 Modifications to the DAG at runtime.


DryadLINQ: Dynamic Optimizations
 Dynamic Aggregation (Combiners):
 Node, Rack then Cluster levels
 Aggregation topology is computed at runtime

 Number of replicas of one vertex is dependent on the
number of independent partitions of input data
 Job skeleton will remain the same


DryadLINQ: OrderBy optimization

statically



statically at
runtime



statically at
runtime

Suitable
partition sizes
for in-memory
sort


DryadLINQ: Performance
 Cluster Configurations:
 240 machines
 Machine: 2x dual-core AMD Opteron 2.6 GHz, 16 GB
RAM, 4x 750 GB SATA
 Network Topology: two-level tree-shaped switched
network.


DryadLINQ: TeraSort
 Data is partitioned on a
key other than the
sortKey.
 Each machine stores
3.87 GB
 At n = 240, TeraBytes
data are sorted


DryadLINQ: TeraSort
 The more nodes added,
the increased data size,
and hence elapsed time
should be constant.
 At n=1, no sampling , no
re-partitioning is
performed and no
network communication
 2 ≤ n ≤ 20, machines are
connected to the same
switch.


DryadLINQ: SkyServer
 Compares locations and colors of stars in a large
astronomical table
 Join two tables: 11.8 GB and 41.8 GB.
 Input tables are manually range-partitioned into 40
partitions using the join key.
 Number of machines n is varied between 1 and 40.
 Output of joining two partitions is stored locally.


DryadLINQ: SkyServer
 DryadLINQ is 1.3 times
slower than Dryad
 DryadLINQ is written in
a higher level language
 Overhead of
communication between
.Net-DryadLINQ layer
and the Dryad layer


General Comments
 Stragglers and interaction with databases
 mapred.map.tasks.speculative.execution property in
Hadoop MR
 Fault tolerance and the blocking property
 Missing Scalability evaluation of Google MR.


Takeaways
 Parallel processing becomes an easier task
 Write deterministic functions
 Independence property of data


Big-Data Analytics Software Stack

Pig, Hive,
Sawzall Cascading, DryadLINQ
Abacus, Jaql

MapReduce Hadoop MR Dryad

GFS HDFS COSMOS


Discussion

References
[1] Principles of Parallel Algorithm Design
[2] DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified data processing on
large clusters. In Proceedings of the 6th Symposium on Operating Systems
Design and Implementation (OSDI), 2004
[3] Hadoop MapReduce Project
http://hadoop.apache.org/mapreduce/
[4] ISARD, M., BUDIU, M., YU, Y., BIRRELL, A., AND FETTERLY, D. Dryad:
Distributed data-parallel programs from sequential building blocks. In
Proceedings of European Conference on Computer Systems (EuroSys), 2007.
[5] The LINQ project.
http://msdn.microsoft.com/netframework/future/linq/.


BigData Analysis Frameworks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BigData Analysis Frameworks

Similar to BigData Analysis Frameworks (20)

BigData Analysis Frameworks