This document provides an overview of Hadoop MapReduce scheduling algorithms. It discusses several commonly used algorithms like FIFO, fair scheduling, and capacity scheduler. It also introduces more advanced algorithms such as LATE, SAMR, ESAMR, locality-aware scheduling, and center-of-gravity scheduling that aim to improve metrics like fairness, throughput, response time, and resource utilization. The document concludes by listing references for further reading on MapReduce scheduling techniques.
3. Introduction
Job scheduling in multi-user environments -> challenge in MapReduce
Each node is a physical machine with computational and storage capabilities
Hadoop uses the number of slots concept for each node in order to control the
maximum number of tasks that can be executed concurrently on a node.
Each slot of the node at any time is only capable of executing one task
Two types of slot : map slot, and reduce slot.
3/30
5. Quality Metrics for MapReduce scheduling algorithms
5/30
Quality
Metrics
Fairness
Throughput
Response
Time
Availability
Energy
Efficiency
Resource
Util.
Scalability
Overheads
6. FIFO
Default
First in – First out
The main objective:
to schedule jobs based on their priorities in first-come first-out of first serve
order
limitations:
poor response times for short jobs compared to large jobs,
Low performance when run multiple types of jobs
it give good result only for single type of job
6/30
7. Fair Scheduling
all jobs get, on average, an equal share of
resources over time
The objective:
is to do a equal distribution of compute resources
among the users/jobs in the system
Covers some limitation of FIFO:
it can works well in both small and large clusters
and less complex.
disadvantage :
does not consider the job weight of each node
7/30
8. Capacity scheduler
similar to fair scheduling But used of queues instead of pool
Queues and sub-queues
Capacity Guarantee with elasticity
ACLs for security
Runtime changes/draining apps
Resource based scheduling
8/30
9. Speculation Execution
Identify slow tasks
The job progress in Hadoop >> 𝑝𝑠 =
𝑀
𝑁
𝑓𝑜𝑟 𝑀𝑎𝑝 𝑇𝑎𝑠𝑘𝑠
1
3
× 𝑘 +
𝑀
𝑁
𝑓𝑜𝑟 𝑅𝑒𝑑𝑢𝑐𝑒 𝑇𝑎𝑠𝑘𝑠
The average job progress in Hadoop >> 𝑝𝑠 𝑎𝑣𝑔 = 𝑖=1
𝐾 𝑃𝑠[𝑖]
𝐾
Jobs need to backup >> 𝑓𝑜𝑟 𝑡𝑎𝑠𝑘 𝑇𝑖: 𝑝𝑠 𝑖 < 𝑝𝑠 𝑎𝑣𝑔 − 20%
9/30
10. Longest Approximate Time to End (LATE)
scheduler to robustly improve performance by reducing overhead of
speculation execution tasks
in heterogeneous environment
find real slow tasks by computing remaining time of all the tasks
it ranks tasks by estimated time remaining and starts a copy of the highest
ranked task that has a progress rate lower than the Slow Task Threshold
𝑃𝑅 =
𝑃𝑆
𝑇𝑟
(Progress Rate)
𝑇𝑇𝐸 =
1−𝑃𝑆
𝑃𝑅
(Time To End)
10/30
11. Longest Approximate Time to End (LATE)
The advantage:
robustness to node heterogeneity, since only some of the slowest speculative
tasks are restarted.
This method does not break the synchronization phase between the map and
reduce phases, but only takes action on appropriate slow tasks.
11/30
12. Self-Adaptive MapReduce (SAMR)
Historical information
nodes
jobs
12/30
execution time
system resources
Fast : finish a task in a shorter time
Slow : finish a task in a longer time
fast
slow
13. Self-Adaptive MapReduce (SAMR)
SAMR decreases the time of the execution up to 25% compared with Hadoop’s
scheduler and 14% compared with LATE scheduler.
13/30
14. Enhanced Self-Adaptive MapReduce (ESAMR)
SAMR does not consider the fact that size of datasets and type
of jobs may lead to different weights for map and reduce stage.
classifies the historical information stored on every node into k
clusters using a machine learning technique.
If a running job has completed some map tasks on a node:
temporary map phase weight (M1) on the node according to the
job’s map tasks completed on the node.
14/30
15. Enhanced Self-Adaptive MapReduce (ESAMR)
The temporary M1 weight is used to find the cluster whose M1 weight is the
closest.
Uses the cluster’s stage weights to estimate the job’s map tasks’ TimeToEnd on
the node and identify slow tasks that need to be re-executed.
Reduce phase : similar procedure.
After a job has finished, ESAMR calculates the job’s stage weights on every
node and saves these new weighs as a part of the historical information.
Applies k-means to re-classify the historical information stored on every worker
node into k clusters and saves the updated average stage weights for each of the
k clusters
15/30
16. Delay
To address the conflict between locality and fairness
when a node requests a task,
if the headof-line job cannot launch a local task
skip it and look at subsequent jobs
if a job has been skipped long enough
start allowing it to launch non- local tasks, to avoid starvation
temporarily relaxes fairness to improve locality by asking
jobs to wait for a scheduling opportunity on a node with
local data
16/30
17. Maestro
avoid the non-local Map tasks execution problem that relies on replica aware
execution of Map tasks
keeps track of the chunks and replica locations, along with the number of other
chunks hosted by each node
efficiently schedule the map task on a data local node which causes minimal
impacts on other nodes local map tasks executions
17/30
18. Maestro
It does map task scheduling in two waves:
initially, it fills the empty slots of each data node based on the number of
hosted map tasks and on the replication scheme for their input data
second, runtime scheduling takes into account the probability of
scheduling a map task on a given machine depending on the replicas of
the task’s input data
provide a higher locality in the execution of map tasks
more balanced intermediate data distribution for the shuffling phase.
18/30
19. Context-aware Scheduler
uses the existing heterogeneity of most clusters and the workload
mix, proposing optimizations for jobs using the same dataset
The design is based on two key insights:
First, a large percentage of MapReduce jobs are run periodically and
roughly have the same characteristics regarding CPU, network, and
disk requirements
Second, the nodes in a Hadoop cluster become heterogeneous over
time due to failures, when newer nodes replace old ones
19/30
20. Context-aware Scheduler
The scheduler uses three steps to
accomplish its objective
classify jobs as CPU or I/O bound
classify nodes as Computational or I/O
map the tasks of a job with different
demands to the nodes that can fulfill the
demands
20/30
21. Locality-Aware Reduce Task Scheduler
The Reduce phase scheduling is modified to become aware of
balance among scheduling delay
scheduling skew
system utilization
parallelism
Partitions
Locations
Size
decrease network traffic
21/30
22. Center-of-Gravity Reduce Scheduler
locality-aware
skew-aware
The proposed scheduler attempts to schedule every Reduce task at
its center-of-gravity node deter-mined by the network locations
MapReduce jobs to co-exist on the same system
saving MapReduce
network traffic
Reduce task scheduler
22/30
23. COSSH
it considers heterogeneity at both application and cluster levels
The main approach: use system information to make better scheduling decisions, which leads
to improving the performance.
two main processes
New job
(user)
• queuing process to store the
incoming job in an appropriate queue
Heartbeat
(free resource)
• triggers the routing process to assign
a job to the current free resource
23/30
24. self-adaptive scheduling algorithm for reduce start time (SARS)
optimal reduce scheduling policy for reduce tasks start time
works by delaying the reduce processes
Shorten the copy duration of the reduce Process
Decrease the task complete time
Save the reduce slots resources
limitation
only focus on
reduce process
24/30
25. Summary
Scheduling Algorithm Idea to Implementation
FIFO schedule jobs based on their priorities in first-come firstout.
Fair
Scheduling
do a equal distribution of compute resources among the users/jobs in the
system.
Capacity Maximization the resource utilization and throughput in multi-tenant cluster
environment.
hybrid
scheduler
based on
dynamic
Priority
designed for data intensive workloads and tries to maintain data locality during
job execution
LATE Fault Tolerance
25/30
26. Scheduling Algorithm Idea to Implementation
SAMR To improve MapReduce in terms of saving the time of the execution and the
system’s resources.
delay
scheduling
To address the conflict between locality and fairness.
Maestro Proposed for map tasks, to improve the overall performance of the
MapReduce computation.
CREST re-executing a combination of tasks on a group of computing nodes.
context-aware
scheduler
To optimizations for jobs using the same dataset
LARTS decrease network traffic
Summary
26/30
27. Summary
Scheduling Algorithm Idea to Implementation
CoGRS proposed scheduler attempts to schedule every Reduce task at its center-of-
gravity node deter-mined by the network locations.
MaRCO achieve nearly full overlap via the novel idea of including the reduce in the
overlap.
COSHH proposed to improve the mean completion time of jobs
SARS shorten the copy duration of the reduce process, decrease the task complete
time, and save the reduce slots Resources
27/30
28. Refrences
1. Varma, Rakesh. "Survey on MapReduce and Scheduling Algorithms in Hadoop." International Journal of Science and
Research 4.2 (2015).
2. Zaharia, Matei, et al. "Job scheduling for multi-user mapreduce clusters." EECS Department, University of California,
Berkeley, Tech. Rep. UCB/EECS-2009-55 (2009).
3. Tiwari, Nidhi, et al. "Classification framework of MapReduce scheduling algorithms." ACM Computing Surveys
(CSUR) 47.3 (2015): 49.
4. Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4.
2008.
5. Kumar, K. Arun, et al. "CASH: context aware scheduler for Hadoop." Proceedings of the International Conference on
Advances in Computing, Communications and Informatics. ACM, 2012.
6. Hammoud, Mohammad, M. Suhail Rehman, and Majd F. Sakr. "Center-of-gravity reduce task scheduling to lower
mapreduce network traffic." Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on. IEEE, 2012.
7. Rasooli, Aysan, and Douglas G. Down. "COSHH: A classification and optimization based scheduler for heterogeneous
Hadoop systems." Future Generation Computer Systems 36 (2014): 1-15.
8. Lei, Lei, Tianyu Wo, and Chunming Hu. "CREST: Towards fast speculation of straggler tasks in MapReduce." e-
Business Engineering (ICEBE), 2011 IEEE 8th International Conference on. IEEE, 2011.
29. Refrences
9. Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling."
Proceedings of the 5th European conference on Computer systems. ACM, 2010.
10. Sun, Xiaoyu, Chen He, and Ying Lu. "ESAMR: an enhanced self-adaptive MapReduce scheduling algorithm." Parallel
and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE, 2012.
11. Nguyen, Phuong, et al. "A hybrid scheduling algorithm for data intensive workloads in a mapreduce environment."
Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. IEEE Computer
Society, 2012.
12. Hammoud, Mohammad, and Majd F. Sakr. "Locality-aware reduce task scheduling for MapReduce." Cloud
Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. IEEE, 2011.
13. Ibrahim, Shadi, et al. "Maestro: Replica-aware map scheduling for mapreduce." Cluster, Cloud and Grid Computing
(CCGrid), 2012 12th IEEE/ACM International Symposium on. IEEE, 2012.
14. Chen, Quan, et al. "Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment."
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.
15. Tang, Zhuo, et al. "A self-adaptive scheduling algorithm for reduce start time." Future Generation Computer Systems
43 (2015): 51-60.