Yarn Resource Management Using Machine Learning

YARN Resource Management
Using Machine Learning

TrendMicro
劉一正 Tony Liu

About Me
•  劉一正 Tony Liu
•  TrendMicro Staff Engineer
•  Big Data platform Administrator
•  TSMC Big Data Consultant Project
•  Keep improving Big Data platform
•  tony_liu@trend.com.tw; ojavajava@gmail.com

Agenda
•  Questions About YARN
•  The ways to find the answers
•  YARN resource consumption prediction
•  Conclusion

Questions about YARN
YARN
Fair
Scheduler
What is the proper setting for
container
What is the characteristics of jobs
run in the cluster
How to properly allocate resource
to queues
Why cluster has resources, but still
has pending jobs

The ways to ﬁnd the answers
•  Appropriate configurations for
Container
•  CPU bound / IO bound
•  Queue resource consumption in
the cluster
•  Predict and allocate resources
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction

My Thinking
Container SeAing
Job
CPU / IO bound
•  Correct container seAing
•  What’s the primary constraints
•  Number of containers in the
cluster
•  Memory calculation

Queue Status
•  Queue status in the cluster
•  Allocate resource by Job SLA
•  Pending Job and Unused
resource in queue
•  BoAleneck resource

Prediction
•  Classify Job type:
CPU bound or
IO bound

•  Predict resource
consumption
•  Allocate unused
resource to queue
according to job type

Appropriate conﬁgurations for
Container
Container
the cluster
•  Predict and allocate resource
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction

Container
Container
•  Total available resource
- Available vmems:
total memory – reserved memory
- Available vcores:
total cpu – reserved cpu

•  Number of YARN containers
- concurrent processing
min(vcores, 2 * Disks)

•  RAM per container
max(2G,
total available mem / number of containers)

* reserved:
for system and HBase

YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container
•  yarn.nodemanager.resource.memory-mb
= containers * RAM per container
= total available vmems

•  yarn.nodemanager.resource.cpu-vcores
= total cores – reserved cores
= total available vcores

YARN NodeManager Resource
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container
•  yarn.scheduler.minimum-allocation-mb
= RAM per container

•  yarn.scheduler.maximum-allocation-mb
= containers * RAM per container

•  yarn.scheduler.minimum-allocation-vcores
= 1

•  yarn.scheduler.maximum-allocation-vcores
= total available cores

YARN Scheduler
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container
•  mapreduce.map.memory.mb
= RAM per container

•  mapreduce.map.java.opts
= 0.8 * RAM per container

•  mapreduce.map.cpu.vcores
= 1

•  mapreduce.map.disk
= 0.5

Map
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container
•  mapreduce.reduce.memory.mb
= 2 * RAM per container

•  mapreduce.reduce.java.opts
= 0.8 * ( 2 * RAM per container)

•  mapreduce.reduce.cpu.vcores
= 1

•  mapreduce.reduce.disk
= 1.33

Reduce
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container
•  yarn.app.mapreduce.am.resource. mb
= 2 * RAM per container

•  yarn.app.mapreduce.am.command-opts
= 0.8 * ( 2 * RAM per container)

•  yarn.app.mapreduce.am.resource.cpu-vcores
= 1

AM
YARN
Container
Node
Manager
Scheduler
Map
Reduce
AM

Container Size – Memory
Calculation
r = Requested memory

The logic works like below:

a. Take max of(requested resource and minimum resource) = max(768, 512)
= 768

b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)

Roundup does :
((768 + (512 -1)) / 512) * 512

c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024)
= 1024

So ﬁnally, the alloAed memory is 1024 MB, which is what you are geAing.

Container Size – Memory
Calculation
Map
Container
Map Task
Map
Container
Map asking 1500 MB memory per map container

mapreduce.map.memory.mb = 1500

yarn.scheduler.minimum-allocation-mb = 1024
RM will allocate 2048 MB container

2 * yarn.scheduler.minimum-allocation-mb

How Many Containers Launch
•  Map split (HDFS block size)
Input ﬁle
Map
Container
Map Task
Reducer
Container
Application
Master
Container
Map Task
Map Task
Map Task
Map
Container
Map
Container
Map
Container
•  Data locality
(data located,
rack located,
any other NM)
•  Application Master
will re-aAempt tasks
•  4 times fail task fail

•  Require resource from Resource Manager
•  AM stops sending heartbeats, RM will re-aAempt
•  2 times fail whole application fail

•  mapred.job.reduces parameter
Reducer
Task
•  Reducers can be given resources before all the map tasks complete
mapreduce.job.reduce.slowstart.completedmaps
•  Wasting resources on process that are waiting for work
•  Potentially creating a deadlock when resources are constrained in a
shared environment

Observe the conﬁguration
•  Observe which configuration is best for you through TeraGen and TeraSort
•  hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen
-Dmapreduce.job.maps=$i
-Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB
•  hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort
-Dmapreduce.job.maps=$i
-Dmapreduce.job.reduces=$j
-Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB
-Dmapreduce.reduce.memory.mb=$k
-Dmapreduce.reduce.java.opts.max.heap=$RED_MB

Container Resource
Requirement Testing
Container
the cluster
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction

Job Characteristics
•  Container is the basic unit of processing capacity in
YARN, and is an encapsulation of resource elements
(memory, cpu etc.).
•  Different jobs make different workloads on the
cluster, including the CPU-bound and I/O-bound
•  So, what is the characteristics of the jobs running in
the cluster ?

Job Characteristics
•  Reference Tian et al., 2009 investigate the
characteristic of MapReduce jobs in a practical
data center
•  Define a classification model to classify MapReduce
jobs is belong to CPU-bound or I/O-bound

Job Characteristics
•  In the Map-Shuffle phase
does five actions:
1) init input data
2) compute map task
3) store output result to local disk
4) shuffle map tasks result data
5) shuffle reduce input data in

Job Characteristics
•  According to the utilization of I/O and CPU, classification
of workloads on the Map-Shuffle phase of MapReduce
•  MID: map input data
•  MOD: map output data
•  SOD: Shuffle out data (=MOD)
•  SID: Shuffle in data
•  MTCT: Map task completed time
•  DIOR: Disk I/O Rate(DFSIO I/O Rate)
•  n: Number of YARN containers(concurrent processing)

Job Characteristics
• 
•  CPU-Bound
•  I/O-Bound
•  DIOR: DFSIO

Job Characteristics
Program
MID
MOD
MTCT
myspn_top_cve
1395184
620928
15185
myspn_top_url
54481169
52528135
9867
aggregate_url
286007534
1155960828
420225
USandbox Data
Statistic
37612436
4921787
45423
ﬁle-solr-daily
75167686
4660452644
224488
aggregate_url_de
dupe
639896245
561632270
73926
myspn_top_url_b
y_origin
499348380
506962079
53927
•  Data source: Job history log

Job Characteristics
•  Test data set: 5,942
•  Test mode: split 66% train, remainder test
•  Classifier model: RandomForest
•  Attributes: MID, MOD, MTCT, n, dior, lable
=== Summary ===
Correlation coefficient 0.9934
Mean absolute error 0.0099
Root mean squared error 0.0513
Relative absolute error 2.4872 %
Root relative squared error 11.4997 %
Total Number of Instances 2020

Job Characteristics
0
200
400
600
800
1000
1200
IO Bound
CPU Bound
Queue Name
Numbers
of jobs

Queue Type
I/O Bound
domain_census
myrep
pathcensus

CPU Bound
alps
census
census-oozie
data_importer
domain_census-
oozie
domain_census_
ews
hdfs
magicQ
myspn
platinum
platinum-oozie
retroscan
retrosplunk
rnu
spnungle
threatconnect
threathub
threathub-oozie
user

Thinking
•  Besides base on the job’s SLA to allocate resource,
what factors should I consider too?
- Job Characteristics?
- Queue type?

Queue Resource Consumption
•  Appropriate conﬁgurations for Container
the cluster
Container SeAing
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction

Cluster Resource Allocation
•  YARN fair scheduler
- yarn.scheduler.fair.allocation.file
fair-scheduler.xml
•  The allocation file is reloaded every 10 seconds,
allowing changes to be made on the fly.

•  Fair Scheduler
- default queue: root
- Hierarchical queues
- placement policy
- preemption
- resource reserved
•  Cluster resource
- FairShare
memory: x, vcores: y

•  Queue Properties
- minResources (soft limit)
- maxResources (hard limit)
- weight
weight1.0/weight
- maxRunningApps
- schedulingPolicy
YARN
Research
Production
Service
Marketing
Report
adhoc
•  ﬁfo
•  fair
•  drf
Queues

Analysis Cluster Status
•  Retrieve YARN metrics from YARN REST APIs
•  FileSystemCounter
•  JobCounters
•  Task Counters

Pending apps and
Available Vcore0320
0335
0350
0405
0420
0435
0450
0505
0520
0535
0550
0605
0620
0635
0650
0705
0720
0735
0750
0805
0820
0835
0850
0905
0920
0935
0950
1005
1020
1035
1050
appsPending
availableVCores
Time
100 %
50 %
0%
Vcore

Vcores Utilization0320
0335
0350
0405
0420
0435
0450
0505
0520
0535
0550
0605
0620
0635
0650
0705
0720
0735
0750
0805
0820
0835
0850
0905
0920
0935
0950
1005
1020
1035
1050
total_vCores
used_vCores
100 %
50 %
0%
Vcore
Time

Vmemory Utilization0320
0335
0350
0405
0420
0435
0450
0505
0520
0535
0550
0605
0620
0635
0650
0705
0720
0735
0750
0805
0820
0835
0850
0905
0920
0935
0950
1005
1020
1035
1050
used_memory
total_memory
100 %
50 %
Vmemory
Time
0%

Cluster Resource Utilization
Queue

BoAleneck Resource
•  Vcores becomes bottleneck resource
Memory Usage: 41.5%
VCores Usage: 99.5%

Over Fair Share
•  Cluster still has resources

Thinking
•  Why cluster’s resource can’t be fully utilized?
•  Is there any resource limitation? (bottleneck)
•  How to reduce pending jobs when cluster still has
resource?

Thinking
•  Is it possible to predict when will has pending job in
the cluster?
•  Can I predict the resource consumption at specific
time and dynamic allocate to fully utilize cluster
resource?

Predict Resource Consumption
And Allocate Resource
•  Appropriate conﬁgurations for Container
•  Queue resource consumption in the cluster
Container Size
Job Characteristics
Proper Allocate
Resource to Queue
Resource
Prediction

YARN resource
consumption prediction
Collect
Metrics
Data
Processing
Training Model
Pre-procession
Training Model
Evaluate
RMSE
Model
Prediction
Prediction
Queue
Consumption

Training Data
Fields
Description
Process
date
date
Ignore
time
hour: 0 ~ 23
feature
working day
0: working day
1: non-working day
feature
weekday
week day
feature
cluster_appsPending
Pending apps in the cluster
feature
cluster_appsRunning
Running apps in the cluster
feature
cluster_availableMB
Available vmem in the cluster
feature
cluster_allocatedMB
Allocated vmem in the cluster
feature
cluster_availableVcore
Available vcore in the cluster
feature
cluster_allocatedVcore
Allocated vcore in the cluster
feature

Training Data
Fields
Description
Process
queue_name
Queue name
feature
minResources_memory
Min vmem for queue
feature
minResources_vcores
Min vcore for queue
feature
maxResources_memory
Max vmem for queue
feature
maxResources_vcores
Max vcore for queue
feature
numPendingApps
Pending apps in queue
feature
numActiveApps
Running apps in queue
feature
usedResources.memory
Used vmem in queue
feature
usedResources.vcore
Used vcore in queue
feature
label
label (predict target)
label

Training Model
•  Training Model: Linear Regression
•  Predict: vcore

Training Model
•  Training model: RandomForest
•  Predict: vcore
•  Attributes: 19
=== Summary ===
Total Number of Instances 37,310

Training Model
•  Training Model: Linear Regression
•  Predict: vmemory

Training Model
•  Predict: vmemory
=== Summary ===

Training Model
•  Predict: Pending job
=== Summary ===

AAribute Evaluation
•  Predict: Pending jobs
•  Attribute Evaluator: Information Gain
•  Ranked attributes :

ABribute
Score
maxResource_memory
1.14465
maxResource_vcore
1.04186
usedResource_memory
0.53004
usedResource_vcore
0.51167
minResource_memory
0.47563
numActiveApps
0.34418
minResource_vcore
0.3179

Experiment Result
•  According to the prediction result, we reallocate
the resource of the queues which may has pending
jobs on specific weekday.
•  Experiment result:
Pending jobs reduce 82%
Pending jobs ratio
Before
0.005
After
0.0009

Experiment Result
•  Something you should know:
- The total of queues’ minResources should less than
the cluster fair share
- Queue may not gets its minResources immediately
- Preemption kills resources from other Queues to
satisfy minResources, but also means waste
resources

Experiment Result
•  Something you should know:
- Modify fair-scheduler.xml too frequently may
cause ResourceManager weird
- Failover ResourceManager will cause the jobs
submit by oozie retry again
- Does tight resource cluster need resource
prediction?

Conclusion
•  Deep understand the architecture is the key of
tuning and management.
•  Think about are there any other tools good for my
daily job? Even from different domain.
•  Machine Learning has been used on many domains
for prediction, it definitely can provide you different
perspective.

Yarn Resource Management Using Machine Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a Yarn Resource Management Using Machine Learning

Semelhante a Yarn Resource Management Using Machine Learning (20)

Último

Último (20)

Yarn Resource Management Using Machine Learning