The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of applications built on YARN.
3. The
OS
analogy
Traditional Operating System
Storage:
File System
Execution/Scheduling:
Processes/Kernel
Scheduler
4. The
OS
analogy
Hadoop
Storage:
Hadoop Distributed
File System (HDFS)
Execution/Scheduling:
YARN!
5. Goal:
Mul4tenancy
• Different types of applications on the same
•
cluster
Different users and organizations on the
same cluster
6. ResourceManager
(RM)
• Central service that tracks
•
o Nodes
§ Resources
o Applications
o Containers
Houses scheduler, which is in charge of all
container placement decisions
7. NodeManager
(NM)
• One on every node
• Launches container processes
• Enforces resource allocations
• Monitors liveliness
8. Applica4on
Master
(AM)
• User/application code
• Every application instance has one
• Runs inside a container on the cluster
• Requests resources from ResourceManager
10. Processing
Frameworks
/
YARN
apps
• MapReduce
•
•
•
o Batch processing, fault tolerant
Impala
o Low latency SQL on Hadoop
Spark
o Load data into memory, great for iterative
algorithms
Storm
o Stream processing
11. YARN
app
models
•
Applica4on
master
(AM)
per
job
Most
simple
for
batch
• Used
by
MapReduce
•
12. YARN
app
models
•
Applica4on
master
per
session
Runs
mul4ple
jobs
on
behalf
of
the
same
user
• Recently
added
in
Tez
• Spark
interac4ve
mode
•
13. YARN
app
models
•
Singleton
AM
as
permanent
service
Always
on,
waits
around
for
jobs
to
come
in
• Used
for
Impala
•
14. YARN/MR
Scheduling
ResourceManage
r
Fair Scheduler
Decide which jobs to give resources to
MapReduce
Application Master
Decide which tasks to give
resources to within a job
16. Scheduling
on
Hadoop
I want 2 containers
with 1024 MB and a
1 core each
Application
Master 1
ResourceManager
Application
Master 2
Node 1
Node 2
Node 3
21. Scheduling
on
Hadoop
Here’s a security
token to let you launch
a container on Node 1
Application
Master 1
ResourceManager
Application
Master 2
Node 1
Node 2
Node 3
22. Scheduling
on
Hadoop
Hey, launch my
container with this
shell command
Application
Master 1
ResourceManager
Application
Master 2
Node 1
Node 2
Node 3