Mais conteúdo relacionado Semelhante a Combine SAS High-Performance Capabilities with Hadoop YARN (20) Combine SAS High-Performance Capabilities with Hadoop YARN1. We’ll get started soon…
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. Your speakers…
Arun Murthy, Founder and Architect
Hortonworks
@acmurthy
Paul Kent, Vice President Big Data
SAS
@hornpolish
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
4. Agenda
• Introduction to YARN
• SAS Workloads on the Cluster
• SAS Workloads: Resource Settings
• SAS and YARN
• YARN Futures
• Next Steps
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
5. The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps
Single
App
INTERACTIVE
Single
App
BATCH
HDFS
Single
App
BATCH
HDFS
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
• All other usage patterns must
leverage that same
infrastructure
• Forces the creation of silos
for managing mixed
workloads
Single
App
ONLINE
Single
App
BATCH
HDFS
6. Hadoop MapReduce Classic
JobTracker
§ Manages cluster resources and job scheduling
TaskTracker
§ Per-node agent
§ Manage tasks
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
7. MapReduce Classic: Limitations
Scalability
§ Maximum Cluster size – 4,000 nodes
§ Maximum concurrent tasks – 40,000
§ Coarse synchronization in JobTracker
Availability
§ Failure kills all queued and running jobs
Hard partition of resources into map and reduce slots
§ Low resource utilization
Lacks support for alternate paradigms and services
§ Iterative applications implemented using MapReduce are 10x slower
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
8. Our Vision: Hadoop as Next-Gen Platform
Real-time
HBase
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tez
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
MapReduce
(Cluster Resource Management & Data Processing)
Script
Pig
SQL
Hive
Others
Storm,
Solr, etc.
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
°
°
N
HDFS
(Hadoop Distributed File System)
Script
Pig
SQL
Hive
Engines
HBase
Accumulo, Storm,
Solr, Spark.
Others
ISV Engines
TezTez
Others
Engines
Tez
Hadoop 1
• Silos & Largely batch
• Single Processing engine
Hadoop 2 w/
• Multiple Engines, Single Data Set
• Batch, Interactive & Real-Time
Java
Cascading
T ez
° °
° °
° °
°
°
N
HDFS
(Hadoop Distributed File System)
9. YARN: Taking Hadoop Beyond Batch
Applica,ons
Run
Na,vely
IN
Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS2
(Redundant,
Reliable
Storage)
YARN
(Cluster
Resource
Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm,
S4,…)
GRAPH
(Giraph)
IN-‐MEMORY
(Spark)
HPC
MPI
(OpenMPI)
ONLINE
(HBase)
OTHER
(Search)
(Weave…)
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
10. YARN
Hortonworks Data Platform
Script
Pig
SQL
Hive
TezT ez
Java
Cascading
T ez
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Accumulo
NoSQL
YARN: Data Operating System
(Cluster Resource Management)
Others
Engines
Tez
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° °
HBase
NoSQL
Storm
Stream
Slider
Sli der
Others
Engines
Slider
Slider
° ° ° ° °
° ° ° ° °
° ° ° ° °
°
°
°
Spark
In-Memory
°
°
°
°
°
°
PaaS
Kubernetes
LASR
HPA
°
°
N
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
Batch
MR
11. 5 Key Benefits of YARN
1. Scale
2. New Programming Models 5 & Services
3. Improved cluster utilization
4. Agility
5. Beyond Java
© Hortonworks Inc. 2011 – 2014. All Rights Reserved 12. Concepts
Application
§ Application is a temporal job or a service submitted YARN
§ Examples
– Map Reduce Job (job)
– Hbase Cluster (service)
Container
§ Basic unit of allocation
§ Fine-grained resource allocation across multiple resource types (memory, cpu, disk,
network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
§ Replaces the fixed map/reduce slots
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
13. Design Centre
Split up the two major functions of JobTracker
§ Cluster resource management
§ Application life-cycle management
MapReduce becomes user-land library
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
14. NodeManager
NodeManager
Container
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
Container
1.1
Container
2.4
NodeManager
NodeManager
NodeManager
NodeManager
1.2
NodeManager
NodeManager
NodeManager
NodeManager
Container
1.3
AM
1
Container
2.2
Container
2.1
Container
2.3
AM2
YARN Architecture - Walkthrough
Client2
ResourceManager
Scheduler
15. Multi-Tenancy with YARN
Economics as queue-capacity
§ Heirarchical Queues
SLAs
§ Preemption
Resource Isolation
§ Linux: cgroups
§ MS Windows: Job Control
§ Roadmap: Virtualization (Xen, KVM)
Administration
§ Queue ACLs
§ Run-time re-configuration for queues
§ Charge-back
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
ResourceManager
Scheduler
root
Adhoc
10%
DW
70%
Mrkting
20%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
Capacity Scheduler
Hierarchical
Queues
16. YARN Applications
Data processing applications and services
§ Services - Slider
§ Real-time event processing – Storm, S4, other commercial platforms
§ Tez – Generic framework to run a complex DAG
§ MPI: OpenMPI, MPICH2
§ Master-Worker
§ Machine Learning: Spark
§ Graph processing: Giraph
§ Enabled by allowing the use of paradigm-specific application master
Run all on the same Hadoop cluster!
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
17. SHARE!
Customers are:
wrapping up POCs
building Bigger Clusters
assembling their Data { Lake, Reservoir }
want their software to SHARE the cluster
Copyright © 2014, SAS Institute Inc. All rights reserved.
18. SAS Workloads on the Cluster
Copyright © 2014, SAS Institute Inc. All rights reserved.
19. SAS Workloads on the Cluster - Video
Copyright © 2014, SAS Institute Inc. All rights reserved.
20. SAS Workloads on the Cluster
Some Requests are for a significant slice of the cluster
Reservation will be ALL DAY, ALL WEEK, ALL MONTH?
Memory typically fixed (15% of cluster)
CPU floor, would like the spare capacity when available
Some Requests are more short term
Memory can be estimated
Duration can be capped
CPU floor, would like spare capacity
Copyright © 2014, SAS Institute Inc. All rights reserved.
21. SAS Workloads on the Cluster
Copyright © 2014, SAS Institute Inc. All rights reserved.
22. SAS Workloads – Resource Settings
How much should you reserve?
not a perfect science yet
Long Running?
LASR server by percent of total memory
More like a batch request?
HPA procedure by anecdotal experience
Copyright © 2014, SAS Institute Inc. All rights reserved.
23. SAS Workloads – Resource Settings
if [ "$USER" = "lasradm" ]; then
# Custom settings for running under the lasradm account.
export TKMPI_ULIMIT="-v 50000000”
export TKMPI_MEMSIZE=50000
export TKMPI_CGROUP="cgexec -g cpu:75”
fi
# if [ "$TKMPI_APPNAME" = "lasr" ]; then
# Custom settings for a lasr process running under any account.
# export TKMPI_ULIMIT="-v 50000000"
# export TKMPI_MEMSIZE=50000
# export TKMPI_CGROUP="cgexec -g cpu:75"
Copyright © 2014, SAS Institute Inc. All rights reserved.
24. YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applica,ons
Run
Na,vely
IN
Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS2
YARN
(Redundant,
Reliable
Storage)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm,
S4,…)
GRAPH
(Giraph)
IN-‐MEMORY
(Spark)
ONLINE
(HBase)
26. YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
Container
ResourceManager
1.1
NodeManager
NodeManager
AM
1
startContainer!
Scheduler
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
1
allocate!
container! 2
3
27. YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
ResourceManager
ServiceX
NodeManager
NodeManager
AM
1
delegateContainer!
Scheduler
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
1
allocate!
2
container!
3
4
28. YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
ServiceX
NodeManager
NodeManager
AM
1
ResourceManager
Scheduler
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
5
29. YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
NodeManager
NodeManager
AM
1
ResourceManager
Scheduler
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
6 ServiceX
30. PaaS - Kubernetes-on-YARN
YARN as the default enterprise-class scheduler and resource manager for Kubernetes and
OpenShift 3
q First class support for containerization and mainstream PaaS
q Updated go language bindings for YARN
q Uses container delegation model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
31. Labels – Constraint Specifications
NodeManager
NodeManager
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager
NodeManager
w/
GPU
map
1.1
NodeManager
NodeManager
NodeManager
w/
GPU
NodeManager
w/
GPU
NodeManager
NodeManager
NodeManager
NodeManager
w/
GPU
map1.2
reduce1.1
MR
AM
1
DL1.1
DL1.2
DL1.3
DL-‐AM
ResourceManager
Scheduler
32. Reservations - SLAs via Allocation Planning
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
33. YARN
Hortonworks Data Platform
Script
Pig
SQL
Hive
TezT ez
Java
Cascading
T ez
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Accumulo
NoSQL
YARN: Data Operating System
(Cluster Resource Management)
Others
Engines
Tez
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° °
HBase
NoSQL
Storm
Stream
Slider
Sli der
Others
Engines
Slider
Slider
° ° ° ° °
° ° ° ° °
° ° ° ° °
°
°
°
Spark
In-Memory
°
°
°
°
°
°
PaaS
Kubernetes
LASR
HPA
°
°
N
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
Batch
MR
34. Next Steps…
More about SAS & Hortonworks
http://hortonworks.com/partner/SAS/
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
Contact us: events@hortonworks.com
© Hortonworks Inc. 2011 – 2014. All Rights Reserved