HadoopCon 2016 In Taiwan - How to maximum the utilization of Hadoop computing power is the biggest challenge for Hadoop administer. In this talk I will explain how we use Machine Learning to build the prediction model for the computing power requirements and setting up the MapReduce scheduler parameters dynamically, to fully utilize our Hadoop cluster computing power.
2. About Me
â˘âŻ ĺä¸ćŁ Tony Liu
â˘âŻ TrendMicro Staff Engineer
â˘âŻ Big Data platform Administrator
â˘âŻ TSMC Big Data Consultant Project
â˘âŻ Keep improving Big Data platform
â˘âŻ tony_liu@trend.com.tw; ojavajava@gmail.com
3. Agenda
â˘âŻ Questions About YARN
â˘âŻ The ways to find the answers
â˘âŻ YARN resource consumption prediction
â˘âŻ Conclusion
4. Questions about YARN
YARN
Fair
Scheduler
What is the proper setting for
container
What is the characteristics of jobs
run in the cluster
How to properly allocate resource
to queues
Why cluster has resources, but still
has pending jobs
5. The ways to ďŹnd the answers
â˘âŻ Appropriate configurations for
Container
â˘âŻ CPU bound / IO bound
â˘âŻ Queue resource consumption in
the cluster
â˘âŻ Predict and allocate resources
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction
6. My Thinking
Container SeAing
Job
CPU / IO bound
â˘âŻ Correct container seAing
â˘âŻ Whatâs the primary constraints
â˘âŻ Number of containers in the
cluster
â˘âŻ Memory calculation
Queue Status
â˘âŻ Queue status in the cluster
â˘âŻ Allocate resource by Job SLA
â˘âŻ Pending Job and Unused
resource in queue
â˘âŻ BoAleneck resource
Prediction
â˘âŻ Classify Job type:
CPU bound or
IO bound
â˘âŻ Predict resource
consumption
â˘âŻ Allocate unused
resource to queue
according to job type
7. Appropriate conďŹgurations for
Container
â˘âŻ Appropriate configurations for
Container
â˘âŻ CPU bound / IO bound
â˘âŻ Queue resource consumption in
the cluster
â˘âŻ Predict and allocate resource
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction
8. Appropriate conďŹgurations for
Container
Container
â˘âŻ Total available resource
- Available vmems:
total memory â reserved memory
- Available vcores:
total cpu â reserved cpu
â˘âŻ Number of YARN containers
- concurrent processing
min(vcores, 2 * Disks)
â˘âŻ RAM per container
max(2G,
total available mem / number of containers)
* reserved:
for system and HBase
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM
9. Appropriate conďŹgurations for
Container
â˘âŻ yarn.nodemanager.resource.memory-mb
= containers * RAM per container
= total available vmems
â˘âŻ yarn.nodemanager.resource.cpu-vcores
= total cores â reserved cores
= total available vcores
YARN NodeManager Resource
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM
10. Appropriate conďŹgurations for
Container
â˘âŻ yarn.scheduler.minimum-allocation-mb
= RAM per container
â˘âŻ yarn.scheduler.maximum-allocation-mb
= containers * RAM per container
â˘âŻ yarn.scheduler.minimum-allocation-vcores
= 1
â˘âŻ yarn.scheduler.maximum-allocation-vcores
= total available cores
YARN Scheduler
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM
11. Appropriate conďŹgurations for
Container
â˘âŻ mapreduce.map.memory.mb
= RAM per container
â˘âŻ mapreduce.map.java.opts
= 0.8 * RAM per container
â˘âŻ mapreduce.map.cpu.vcores
= 1
â˘âŻ mapreduce.map.disk
= 0.5
Map
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM
13. Appropriate conďŹgurations for
Container
â˘âŻ yarn.app.mapreduce.am.resource. mb
= 2 * RAM per container
â˘âŻ yarn.app.mapreduce.am.command-opts
= 0.8 * ( 2 * RAM per container)
â˘âŻ yarn.app.mapreduce.am.resource.cpu-vcores
= 1
AM
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM
14. Container Size â Memory
Calculation
r = Requested memory
The logic works like below:
a. Take max of(requested resource and minimum resource) = max(768, 512)
= 768
b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)
Roundup does :
((768 + (512 -1)) / 512) * 512
c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024)
= 1024
So ďŹnally, the alloAed memory is 1024 MB, which is what you are geAing.
16. How Many Containers Launch
â˘âŻ Map split (HDFS block size)
Input ďŹle
Map
Container
Map Task
Reducer
Container
Application
Master
Container
Map Task
Map Task
Map Task
Map
Container
Map
Container
Map
Container
â˘âŻ Data locality
(data located,
rack located,
any other NM)
â˘âŻ Application Master
will re-aAempt tasks
â˘âŻ 4 times fail task fail
â˘âŻ Require resource from Resource Manager
â˘âŻ AM stops sending heartbeats, RM will re-aAempt
â˘âŻ 2 times fail whole application fail
â˘âŻ mapred.job.reduces parameter
Reducer
Task
â˘âŻ Reducers can be given resources before all the map tasks complete
mapreduce.job.reduce.slowstart.completedmaps
â˘âŻ Wasting resources on process that are waiting for work
â˘âŻ Potentially creating a deadlock when resources are constrained in a
shared environment
17. Observe the conďŹguration
â˘âŻ Observe which configuration is best for you through TeraGen and TeraSort
â˘âŻ hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen
-Dmapreduce.job.maps=$i
-Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB
â˘âŻ hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort
-Dmapreduce.job.maps=$i
-Dmapreduce.job.reduces=$j
-Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB
-Dmapreduce.reduce.memory.mb=$k
-Dmapreduce.reduce.java.opts.max.heap=$RED_MB
18. Container Resource
Requirement Testing
â˘âŻ Appropriate configurations for
Container
â˘âŻ CPU bound / IO bound
â˘âŻ Queue resource consumption in
the cluster
â˘âŻ Predict and allocate resource
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction
19. Job Characteristics
â˘âŻ Container is the basic unit of processing capacity in
YARN, and is an encapsulation of resource elements
(memory, cpu etc.).
â˘âŻ Different jobs make different workloads on the
cluster, including the CPU-bound and I/O-bound
â˘âŻ So, what is the characteristics of the jobs running in
the cluster ?
20. Job Characteristics
â˘âŻ Reference Tian et al., 2009 investigate the
characteristic of MapReduce jobs in a practical
data center
â˘âŻ Define a classification model to classify MapReduce
jobs is belong to CPU-bound or I/O-bound
21. Job Characteristics
â˘âŻ In the Map-Shuffle phase
does five actions:
1) init input data
2) compute map task
3) store output result to local disk
4) shuffle map tasks result data
5) shuffle reduce input data in
22. Job Characteristics
â˘âŻ According to the utilization of I/O and CPU, classification
of workloads on the Map-Shuffle phase of MapReduce
â˘âŻ MID: map input data
â˘âŻ MOD: map output data
â˘âŻ SOD: Shuffle out data (=MOD)
â˘âŻ SID: Shuffle in data
â˘âŻ MTCT: Map task completed time
â˘âŻ DIOR: Disk I/O Rate(DFSIO I/O Rate)
â˘âŻ n: Number of YARN containers(concurrent processing)
27. Queue Type
I/O Bound
domain_census
myrep
pathcensus
CPU Bound
alps
census
census-oozie
data_importer
domain_census-
oozie
domain_census_
ews
hdfs
magicQ
myspn
platinum
platinum-oozie
retroscan
retrosplunk
rnu
spnungle
threatconnect
threathub
threathub-oozie
user
28. Thinking
â˘âŻ Besides base on the jobâs SLA to allocate resource,
what factors should I consider too?
- Job Characteristics?
- Queue type?
29. Queue Resource Consumption
â˘âŻ Appropriate conďŹgurations for Container
â˘âŻ CPU bound / IO bound
â˘âŻ Queue resource consumption in
the cluster
â˘âŻ Predict and allocate resource
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction
30. Cluster Resource Allocation
â˘âŻ YARN fair scheduler
- yarn.scheduler.fair.allocation.file
fair-scheduler.xml
â˘âŻ The allocation file is reloaded every 10 seconds,
allowing changes to be made on the fly.
43. Thinking
â˘âŻ Why clusterâs resource canât be fully utilized?
â˘âŻ Is there any resource limitation? (bottleneck)
â˘âŻ How to reduce pending jobs when cluster still has
resource?
44. Thinking
â˘âŻ Is it possible to predict when will has pending job in
the cluster?
â˘âŻ Can I predict the resource consumption at specific
time and dynamic allocate to fully utilize cluster
resource?
45. Predict Resource Consumption
And Allocate Resource
â˘âŻ Appropriate conďŹgurations for Container
â˘âŻ CPU bound / IO bound
â˘âŻ Queue resource consumption in the cluster
â˘âŻ Predict and allocate resource
Container Size
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction
47. Training Data
Fields
Description
Process
date
date
Ignore
time
hour: 0 ~ 23
feature
working day
0: working day
1: non-working day
feature
weekday
week day
feature
cluster_appsPending
Pending apps in the cluster
feature
cluster_appsRunning
Running apps in the cluster
feature
cluster_availableMB
Available vmem in the cluster
feature
cluster_allocatedMB
Allocated vmem in the cluster
feature
cluster_availableVcore
Available vcore in the cluster
feature
cluster_allocatedVcore
Allocated vcore in the cluster
feature
â˘âŻ Data source: Job history log
48. Training Data
Fields
Description
Process
queue_name
Queue name
feature
minResources_memory
Min vmem for queue
feature
minResources_vcores
Min vcore for queue
feature
maxResources_memory
Max vmem for queue
feature
maxResources_vcores
Max vcore for queue
feature
numPendingApps
Pending apps in queue
feature
numActiveApps
Running apps in queue
feature
usedResources.memory
Used vmem in queue
feature
usedResources.vcore
Used vcore in queue
feature
label
label (predict target)
label
50. Training Model
â˘âŻ Training model: RandomForest
â˘âŻ Predict: vcore
â˘âŻ Data source: Job history log
â˘âŻ Test data set: 109,736
â˘âŻ Test mode: split 66% train, remainder test
â˘âŻ Attributes: 19
=== Summary ===
Correlation coefficient 0.999
Mean absolute error 0.1262
Root mean squared error 0.8494
Relative absolute error 1.5905 %
Root relative squared error 4.5017 %
Total Number of Instances 37,310
52. Training Model
â˘âŻ Training model: RandomForest
â˘âŻ Predict: vmemory
â˘âŻ Data source: Job history log
â˘âŻ Test data set: 109,736
â˘âŻ Test mode: split 66% train, remainder test
â˘âŻ Attributes: 19
=== Summary ===
Correlation coefficient 0.9995
Mean absolute error 0.0003
Root mean squared error 0.0019
Relative absolute error 1.4174 %
Root relative squared error 3.2014 %
Total Number of Instances 37,310
53. Training Model
â˘âŻ Training model: RandomForest
â˘âŻ Predict: Pending job
â˘âŻ Data source: Job history log
â˘âŻ Test data set: 122,120
â˘âŻ Test mode: split 66% train, remainder test
â˘âŻ Attributes: 19
=== Summary ===
Correlation coefficient 0.9917
Mean absolute error 0.0002
Root mean squared error 0.0054
Relative absolute error 7.9308 %
Root relative squared error 14.4934 %
Total Number of Instances 41,521
55. Experiment Result
â˘âŻ According to the prediction result, we reallocate
the resource of the queues which may has pending
jobs on specific weekday.
â˘âŻ Experiment result:
Pending jobs reduce 82%
Pending jobs ratio
Before
0.005
After
0.0009
56. Experiment Result
â˘âŻ Something you should know:
- The total of queuesâ minResources should less than
the cluster fair share
- Queue may not gets its minResources immediately
- Preemption kills resources from other Queues to
satisfy minResources, but also means waste
resources
57. Experiment Result
â˘âŻ Something you should know:
- Modify fair-scheduler.xml too frequently may
cause ResourceManager weird
- Failover ResourceManager will cause the jobs
submit by oozie retry again
- Does tight resource cluster need resource
prediction?
58. Conclusion
â˘âŻ Deep understand the architecture is the key of
tuning and management.
â˘âŻ Think about are there any other tools good for my
daily job? Even from different domain.
â˘âŻ Machine Learning has been used on many domains
for prediction, it definitely can provide you different
perspective.