SlideShare uma empresa Scribd logo
1 de 43
Hadoop 2.0 –YARN
Yet Another Resource Negotiator
Rommel Garcia
Solutions Engineer

© Hortonworks Inc. 2013

Page 1
Agenda
• Hadoop 1.X & 2.X – Concepts Recap
• YARN Architecture – How does this affect MRv1?
• Slots be gone – What does this mean for MapReduce?
• Building YARN Applications
•Q &A

© Hortonworks Inc. 2013
Hadoop 1.X vs. 2.X
Recap over the differences

© Hortonworks Inc. 2013
The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps

Single App

Single App

INTERACTIVE

ONLINE

Single App

Single App

Single App

BATCH

BATCH

BATCH

HDFS

HDFS

HDFS

© Hortonworks Inc. 2013

• All other usage
patterns must
leverage that same
infrastructure
• Forces the creation
of silos for managing
mixed workloads
Hadoop MapReduce Classic
• JobTracker
–Manages cluster resources and job scheduling

• TaskTracker
–Per-node agent
–Manage tasks

© Hortonworks Inc. 2013

Page 5
Hadoop 1
• Limited up to 4,000 nodes per cluster
• O(# of tasks in a cluster)
• JobTracker bottleneck - resource management, job
scheduling and monitoring
• Only has one namespace for managing HDFS
• Map and Reduce slots are static
• Only job to run is MapReduce

© Hortonworks Inc. 2013
Hadoop 1.X Stack
OPERATIONAL
SERVICES
AMBARI

DATA
SERVICES
FLUME

HIVE &

PIG
OOZIE

HCATALOG

SQOOP

HBASE

LOAD &
EXTRACT

HADOOP
CORE

PLATFORM
SERVICES

NFS

MAP REDUCE

WebHDFS

HDFS

Enterprise Readiness
High Availability, Disaster Recovery,
Security and Snapshots

HORTONWORKS
DATA PLATFORM (HDP)
OS

© Hortonworks Inc. 2013

Cloud

VM

Appliance

Page 7
Our Vision: Hadoop as Next-Gen Platform

Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce
(data processing)

MapReduce

Others
(data processing)

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, reliable storage)

© Hortonworks Inc. 2012 Confidential and Proprietary.
2013.

Page 8
YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applications Run Natively IN Hadoop
BATCH
INTERACTIVE
(MapReduce)
(Tez)

ONLINE
(HBase)

STREAMING
(Storm, S4,…)

GRAPH
(Giraph)

IN-MEMORY
(Spark)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

© Hortonworks Inc. 2013

Page 9
Hadoop 2
• Potentially up to 10,000 nodes per cluster
• O(cluster size)
• Supports multiple namespace for managing HDFS
• Efficient cluster utilization (YARN)
• MRv1 backward and forward compatible
• Any apps can integrate with Hadoop
• Beyond Java

© Hortonworks Inc. 2013
Hadoop 2.X Stack
OPERATIONAL
SERVICES
AMBARI

DATA
SERVICES
FLUME
HBASE

FALCON*
OOZIE

HIVE &

PIG

HCATALOG

SQOOP
LOAD &
EXTRACT

HADOOP
CORE
PLATFORM
SERVICES

NFS
WebHDFS

KNOX*

MAP
REDUCE

TEZ*

YARN
HDFS
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS
DATA PLATFORM (HDP)
OS/VM

Cloud

Appliance
*included Q1 2013

© Hortonworks Inc. 2013

Page 11
YARN Architecture

© Hortonworks Inc. 2013
A Brief History of YARN
• Originally conceived & architected by the team at Yahoo!
– Arun Murthy created the original JIRA in 2008, led the PMC
– Currently Arun is the Lead for Map-Reduce/YARN/Tez at Hortonworks and
was formerly Architect Hadoop MapReduce at Yahoo

• The team at Hortonworks has been working on YARN for 4 years
• YARN based architecture running at scale at Yahoo!
– Deployed on 35,000 nodes for about a year
– Implemented Storm-on-Yarn that processes 133,000 events per second.

© Hortonworks Inc. 2013

Page 13
Concepts
• Application
–Application is a job submitted to the framework
–Example – Map Reduce Job

• Container
–Basic unit of allocation
–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU

–Replaces the fixed map/reduce slots

© Hortonworks Inc. 2013

14
Architecture
• Resource Manager
–Global resource scheduler
–Hierarchical queues
–Application management

• Node Manager
–Per-machine agent
–Manages the life-cycle of container
–Container resource monitoring

• Application Master
–Per-application
–Manages application scheduling and task execution
–E.g. MapReduce Application Master
© Hortonworks Inc. 2013

15
YARN – Running Apps
create app1

Hadoop Client 1

ASM
NM

ResourceManager

.......negotiates....... Containers
.......reports to....... ASM

submit app1
Scheduler .......partitions.......
Resources

create app2

Hadoop Client 2

submit app2

Scheduler

ASM

queues
status report

NodeManager
C2.1
NodeManager
C2.2
NodeManager
AM2

Rack1

© Hortonworks Inc. 2012

NodeManager

NodeManager

C1.3
NodeManager
C2.3

C1.2

NodeManager
AM1

Rack2

NodeManager
C1.4
NodeManager
C1.1

RackN
Slots be gone!
How does MapReduce run on YARN

© Hortonworks Inc. 2013
Apache Hadoop MapReduce on YARN
• Original use-case
• Most complex application to build
– Data-locality
– Fault tolerance
– ApplicationMaster recovery: Check point to HDFS
– Intra-application Priorities: Maps v/s Reduces
– Needed complex feedback mechanism from ResourceManager

– Security
– Isolation

• Binary compatible with Apache Hadoop 1.x

© Hortonworks Inc. 2013

Page 18
Apache Hadoop MapReduce on YARN
ResourceManager

Scheduler

NodeManager

NodeManager

NodeManager

NodeManager

map 1.1
map2.1
reduce2.1

NodeManager

NodeManager

MR AM 1

NodeManager

map1.2

NodeManager

reduce1.1

© Hortonworks Inc. 2012

NodeManager

MR AM2

NodeManager

NodeManager

map2.2

NodeManager

reduce2.2
Efficiency Gains of YARN
• Key Optimizations
– No hard segmentation of resource into map and reduce slots
– Yarn scheduler is more efficient
– All resources are fungible

• Yahoo has over 30000 nodes running YARN across over
365PB of data.
• They calculate running about 400,000 jobs per day for
about 10 million hours of compute time.
• They also have estimated a 60% – 150% improvement on
node usage per day.
• Yahoo got rid of a whole colo (10,000 node datacenter)
because of their increased utilization.

© Hortonworks Inc. 2013
An Example Calculating Node Capacity
• Important Parameters
– mapreduce.[map|reduce].memory.mb
– This is the physical ram hard-limit enforced by Hadoop on the task

– mapreduce.[map|reduce].java.opts
– The heapsize of the jvm –Xmx

– yarn.scheduler.minimum-allocation-mb
– The smallest container yarn will allow

– yarn.nodemanager.resource.memory-mb
– The amount of physical ram on the node

– yarn.nodemanager.vmem-pmem-ratio
– The amount of virtual ram each container is allowed.
– This is calculated by containerMemoryRequest*vmem-pmem-ratio

© Hortonworks Inc. 2013
Calculating Node Capacity Continued
• Lets pretend we need a 1g map and a 2g reduce
• mapreduce[map|reduce].memory.mb = [-Xmx 1g | -Xmx 2g]

• Remember a container has more overhead then just your heap! Add
512mb to the container limit for overhead
• mapreduce.[map.reduce].memory.mb= [1536 | 2560]

• We have 36g per node and minimum allocations of 512mb
• yarn.nodemanager.resource.memory-mb=36864
• yarn.scheduler.minimum-allocation-mb=512

• Virtual Memory for each container is
• Map: 1536mb*vmem-pmem-ratio (default is 2.1) = 3225.6mb
• Reduce 2560mb*vmem-pmem-ratio = 5376mb

• Our 36g node can support
• 24 Maps OR 14 Reducers OR any combination allowed by the
resources on the node

© Hortonworks Inc. 2013
Building YARN Apps
Super Simple APIs

© Hortonworks Inc. 2013
YARN – Implementing Applications
• What APIs do I need to use?
–Only three protocols
– Client to ResourceManager
– Application submission

– ApplicationMaster to ResourceManager
– Container allocation

– ApplicationMaster to NodeManager
– Container launch

–Use client libraries for all 3 actions
–Module yarn-client
–Provides both synchronous and asynchronous libraries
–Use 3rd party like Weave
– http://continuuity.github.io/weave/
© Hortonworks Inc. 2013

24
YARN – Implementing Applications
• What do I need to do?
–Write a submission Client
–Write an ApplicationMaster (well copy-paste)
–DistributedShell is the new WordCount

–Get containers, run whatever you want!

© Hortonworks Inc. 2013

25
YARN – Implementing Applications
• What else do I need to know?
–Resource Allocation & Usage
–ResourceRequest
–Container
–ContainerLaunchContext
–LocalResource

–ApplicationMaster
–ApplicationId
–ApplicationAttemptId
–ApplicationSubmissionContext

© Hortonworks Inc. 2013

26
YARN – Resource Allocation & Usage
• ResourceRequest
– Fine-grained resource ask to the ResourceManager
– Ask for a specific amount of resources (memory, cpu etc.) on a
specific machine or rack
– Use special value of * for resource name for any machine

ResourceRequest
priority
resourceName
capability
numContainers

© Hortonworks Inc. 2013

Page 27
YARN – Resource Allocation & Usage
• ResourceRequest

priority

1

© Hortonworks Inc. 2013

<4gb, 1 core>

numContainers
1

rack0

1

*

<2gb, 1 core>

resourceName
host01

0

capability

1

*

1

Page 28
YARN – Resource Allocation & Usage
• Container
– The basic unit of allocation in YARN
– The result of the ResourceRequest provided by ResourceManager
to the ApplicationMaster
– A specific amount of resources (cpu, memory etc.) on a specific
machine
Container
containerId
resourceName
capability

tokens

© Hortonworks Inc. 2013

Page 29
YARN – Resource Allocation & Usage
• ContainerLaunchContext
– The context provided by ApplicationMaster to NodeManager to
launch the Container
– Complete specification for a process
– LocalResource used to specify container binary and
dependencies
– NodeManager responsible for downloading from shared namespace
(typically HDFS)

ContainerLaunchContext
container
commands
environment
localResources

LocalResource
uri
type

© Hortonworks Inc. 2013

Page 30
YARN - ApplicationMaster
• ApplicationMaster
– Per-application controller aka container_0
– Parent for all containers of the application
– ApplicationMaster negotiates all it’s containers from
ResourceManager

– ApplicationMaster container is child of ResourceManager
– Think init process in Unix
– RM restarts the ApplicationMaster attempt if required (unique
ApplicationAttemptId)

– Code for application is submitted along with Application itself

© Hortonworks Inc. 2013

Page 31
YARN - ApplicationMaster
• ApplicationMaster
– ApplicationSubmissionContext is the complete specification of the
ApplicationMaster, provided by Client
– ResourceManager responsible for allocating and launching
ApplicationMaster container

ApplicationSubmissionContext
resourceRequest
containerLaunchContext
appName
queue

© Hortonworks Inc. 2013

Page 32
YARN Application API - Overview
• YarnClient is submission client api
• Both synchronous & asynchronous APIs for resource
allocation and container start/stop
• Synchronous API
– AMRMClient
– AMNMClient

• Asynchronous API
– AMRMClientAsync
– AMNMClientAsync

© Hortonworks Inc. 2013

Page 33
YARN Application API – The Client
New Application Request: YarnClient.createApplication

ResourceManager

1

Client2

Scheduler

Submit Application:
YarnClient.submitApplication

2
NodeManager

NodeManager

NodeManager

NodeManager

Container 1.1

Container 2.2
Container 2.4

NodeManager

NodeManager

AM 1

NodeManager

Container 1.2

NodeManager

Container 1.3

© Hortonworks Inc. 2012

NodeManager

AM2

NodeManager

NodeManager

Container 2.1

NodeManager

Container 2.3
YARN Application API – The Client
• YarnClient
– createApplication to create application
– submitApplication to start application
– Application developer needs to provide ApplicationSubmissionContext

– APIs to get other information from ResourceManager
– getAllQueues
– getApplications
– getNodeReports

– APIs to manipulate submitted application e.g. killApplication

© Hortonworks Inc. 2013

Page 35
YARN Application API – Resource Allocation
ResourceManager

AMRMClient.allocate

Scheduler
Container

3
NodeManager

NodeManager

NodeManager

2

4

unregisterApplicationMaster

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

AM

1
NodeManager

registerApplicationMaster

NodeManager

© Hortonworks Inc. 2012
YARN Application API – Resource Allocation
• AMRMClient - Synchronous API for ApplicationMaster
to interact with ResourceManager
– Prologue / epilogue – registerApplicationMaster /
unregisterApplicationMaster
– Resource negotiation with ResourceManager
– Internal book-keeping - addContainerRequest / removeContainerRequest /
releaseAssignedContainer
– Main API – allocate

– Helper APIs for cluster information
– getAvailableResources
– getClusterNodeCount

© Hortonworks Inc. 2013

Page 37
YARN Application API – Resource Allocation
• AMRMClientAsync - Asynchronous API for
ApplicationMaster
– Extension of AMRMClient to provide asynchronous
CallbackHandler
– Callbacks make it easier to build mental model of interaction with
ResourceManager for the application developer
– onContainersAllocated
– onContainersCompleted
– onNodesUpdated
– onError
– onShutdownRequest

© Hortonworks Inc. 2013

Page 38
YARN Application API – Using Resources
ResourceManager

Scheduler

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

Container 1.1
AMNMClient.startContainer

NodeManager

NodeManager

AM 1

AMNMClient.getContainerStatus

NodeManager

NodeManager

© Hortonworks Inc. 2012
YARN Application API – Using Resources
• AMNMClient - Synchronous API for ApplicationMaster
to launch / stop containers at NodeManager
– Simple (trivial) APIs
– startContainer
– stopContainer
– getContainerStatus

© Hortonworks Inc. 2013

Page 40
YARN Application API – Using Resources
• AMNMClient - Asynchronous API for
ApplicationMaster to launch / stop containers at
NodeManager
– Simple (trivial) APIs
– startContainerAsync
– stopContainerAsync
– getContainerStatusAsync

– CallbackHandler to make it easier to build mental model of
interaction with NodeManager for the application developer
– onContainerStarted
– onContainerStopped
– onStartContainerError
– onContainerStatusReceived

© Hortonworks Inc. 2013

Page 41
Hadoop Summit 2014

© Hortonworks Inc. 2013

Page 42
THANK YOU!
Rommel Garcia, Solution Engineer – Big Data
rgarcia@hortonworks.com

© Hortonworks Inc. 2012

Page 43

Mais conteúdo relacionado

Mais procurados

Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...DataWorks Summit
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 

Mais procurados (20)

Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 

Semelhante a YARN - Presented At Dallas Hadoop User Group

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Abhishek Kapoor
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 

Semelhante a YARN - Presented At Dallas Hadoop User Group (20)

Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 

Mais de Rommel Garcia

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With HadoopRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 

Mais de Rommel Garcia (12)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

YARN - Presented At Dallas Hadoop User Group

  • 1. Hadoop 2.0 –YARN Yet Another Resource Negotiator Rommel Garcia Solutions Engineer © Hortonworks Inc. 2013 Page 1
  • 2. Agenda • Hadoop 1.X & 2.X – Concepts Recap • YARN Architecture – How does this affect MRv1? • Slots be gone – What does this mean for MapReduce? • Building YARN Applications •Q &A © Hortonworks Inc. 2013
  • 3. Hadoop 1.X vs. 2.X Recap over the differences © Hortonworks Inc. 2013
  • 4. The 1st Generation of Hadoop: Batch HADOOP 1.0 Built for Web-Scale Batch Apps Single App Single App INTERACTIVE ONLINE Single App Single App Single App BATCH BATCH BATCH HDFS HDFS HDFS © Hortonworks Inc. 2013 • All other usage patterns must leverage that same infrastructure • Forces the creation of silos for managing mixed workloads
  • 5. Hadoop MapReduce Classic • JobTracker –Manages cluster resources and job scheduling • TaskTracker –Per-node agent –Manage tasks © Hortonworks Inc. 2013 Page 5
  • 6. Hadoop 1 • Limited up to 4,000 nodes per cluster • O(# of tasks in a cluster) • JobTracker bottleneck - resource management, job scheduling and monitoring • Only has one namespace for managing HDFS • Map and Reduce slots are static • Only job to run is MapReduce © Hortonworks Inc. 2013
  • 7. Hadoop 1.X Stack OPERATIONAL SERVICES AMBARI DATA SERVICES FLUME HIVE & PIG OOZIE HCATALOG SQOOP HBASE LOAD & EXTRACT HADOOP CORE PLATFORM SERVICES NFS MAP REDUCE WebHDFS HDFS Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OS © Hortonworks Inc. 2013 Cloud VM Appliance Page 7
  • 8. Our Vision: Hadoop as Next-Gen Platform Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce (data processing) MapReduce Others (data processing) YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, reliable storage) © Hortonworks Inc. 2012 Confidential and Proprietary. 2013. Page 8
  • 9. YARN: Taking Hadoop Beyond Batch Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service Applications Run Natively IN Hadoop BATCH INTERACTIVE (MapReduce) (Tez) ONLINE (HBase) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave…) YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) © Hortonworks Inc. 2013 Page 9
  • 10. Hadoop 2 • Potentially up to 10,000 nodes per cluster • O(cluster size) • Supports multiple namespace for managing HDFS • Efficient cluster utilization (YARN) • MRv1 backward and forward compatible • Any apps can integrate with Hadoop • Beyond Java © Hortonworks Inc. 2013
  • 11. Hadoop 2.X Stack OPERATIONAL SERVICES AMBARI DATA SERVICES FLUME HBASE FALCON* OOZIE HIVE & PIG HCATALOG SQOOP LOAD & EXTRACT HADOOP CORE PLATFORM SERVICES NFS WebHDFS KNOX* MAP REDUCE TEZ* YARN HDFS Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OS/VM Cloud Appliance *included Q1 2013 © Hortonworks Inc. 2013 Page 11
  • 13. A Brief History of YARN • Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008, led the PMC – Currently Arun is the Lead for Map-Reduce/YARN/Tez at Hortonworks and was formerly Architect Hadoop MapReduce at Yahoo • The team at Hortonworks has been working on YARN for 4 years • YARN based architecture running at scale at Yahoo! – Deployed on 35,000 nodes for about a year – Implemented Storm-on-Yarn that processes 133,000 events per second. © Hortonworks Inc. 2013 Page 13
  • 14. Concepts • Application –Application is a job submitted to the framework –Example – Map Reduce Job • Container –Basic unit of allocation –Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU –Replaces the fixed map/reduce slots © Hortonworks Inc. 2013 14
  • 15. Architecture • Resource Manager –Global resource scheduler –Hierarchical queues –Application management • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master © Hortonworks Inc. 2013 15
  • 16. YARN – Running Apps create app1 Hadoop Client 1 ASM NM ResourceManager .......negotiates....... Containers .......reports to....... ASM submit app1 Scheduler .......partitions....... Resources create app2 Hadoop Client 2 submit app2 Scheduler ASM queues status report NodeManager C2.1 NodeManager C2.2 NodeManager AM2 Rack1 © Hortonworks Inc. 2012 NodeManager NodeManager C1.3 NodeManager C2.3 C1.2 NodeManager AM1 Rack2 NodeManager C1.4 NodeManager C1.1 RackN
  • 17. Slots be gone! How does MapReduce run on YARN © Hortonworks Inc. 2013
  • 18. Apache Hadoop MapReduce on YARN • Original use-case • Most complex application to build – Data-locality – Fault tolerance – ApplicationMaster recovery: Check point to HDFS – Intra-application Priorities: Maps v/s Reduces – Needed complex feedback mechanism from ResourceManager – Security – Isolation • Binary compatible with Apache Hadoop 1.x © Hortonworks Inc. 2013 Page 18
  • 19. Apache Hadoop MapReduce on YARN ResourceManager Scheduler NodeManager NodeManager NodeManager NodeManager map 1.1 map2.1 reduce2.1 NodeManager NodeManager MR AM 1 NodeManager map1.2 NodeManager reduce1.1 © Hortonworks Inc. 2012 NodeManager MR AM2 NodeManager NodeManager map2.2 NodeManager reduce2.2
  • 20. Efficiency Gains of YARN • Key Optimizations – No hard segmentation of resource into map and reduce slots – Yarn scheduler is more efficient – All resources are fungible • Yahoo has over 30000 nodes running YARN across over 365PB of data. • They calculate running about 400,000 jobs per day for about 10 million hours of compute time. • They also have estimated a 60% – 150% improvement on node usage per day. • Yahoo got rid of a whole colo (10,000 node datacenter) because of their increased utilization. © Hortonworks Inc. 2013
  • 21. An Example Calculating Node Capacity • Important Parameters – mapreduce.[map|reduce].memory.mb – This is the physical ram hard-limit enforced by Hadoop on the task – mapreduce.[map|reduce].java.opts – The heapsize of the jvm –Xmx – yarn.scheduler.minimum-allocation-mb – The smallest container yarn will allow – yarn.nodemanager.resource.memory-mb – The amount of physical ram on the node – yarn.nodemanager.vmem-pmem-ratio – The amount of virtual ram each container is allowed. – This is calculated by containerMemoryRequest*vmem-pmem-ratio © Hortonworks Inc. 2013
  • 22. Calculating Node Capacity Continued • Lets pretend we need a 1g map and a 2g reduce • mapreduce[map|reduce].memory.mb = [-Xmx 1g | -Xmx 2g] • Remember a container has more overhead then just your heap! Add 512mb to the container limit for overhead • mapreduce.[map.reduce].memory.mb= [1536 | 2560] • We have 36g per node and minimum allocations of 512mb • yarn.nodemanager.resource.memory-mb=36864 • yarn.scheduler.minimum-allocation-mb=512 • Virtual Memory for each container is • Map: 1536mb*vmem-pmem-ratio (default is 2.1) = 3225.6mb • Reduce 2560mb*vmem-pmem-ratio = 5376mb • Our 36g node can support • 24 Maps OR 14 Reducers OR any combination allowed by the resources on the node © Hortonworks Inc. 2013
  • 23. Building YARN Apps Super Simple APIs © Hortonworks Inc. 2013
  • 24. YARN – Implementing Applications • What APIs do I need to use? –Only three protocols – Client to ResourceManager – Application submission – ApplicationMaster to ResourceManager – Container allocation – ApplicationMaster to NodeManager – Container launch –Use client libraries for all 3 actions –Module yarn-client –Provides both synchronous and asynchronous libraries –Use 3rd party like Weave – http://continuuity.github.io/weave/ © Hortonworks Inc. 2013 24
  • 25. YARN – Implementing Applications • What do I need to do? –Write a submission Client –Write an ApplicationMaster (well copy-paste) –DistributedShell is the new WordCount –Get containers, run whatever you want! © Hortonworks Inc. 2013 25
  • 26. YARN – Implementing Applications • What else do I need to know? –Resource Allocation & Usage –ResourceRequest –Container –ContainerLaunchContext –LocalResource –ApplicationMaster –ApplicationId –ApplicationAttemptId –ApplicationSubmissionContext © Hortonworks Inc. 2013 26
  • 27. YARN – Resource Allocation & Usage • ResourceRequest – Fine-grained resource ask to the ResourceManager – Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack – Use special value of * for resource name for any machine ResourceRequest priority resourceName capability numContainers © Hortonworks Inc. 2013 Page 27
  • 28. YARN – Resource Allocation & Usage • ResourceRequest priority 1 © Hortonworks Inc. 2013 <4gb, 1 core> numContainers 1 rack0 1 * <2gb, 1 core> resourceName host01 0 capability 1 * 1 Page 28
  • 29. YARN – Resource Allocation & Usage • Container – The basic unit of allocation in YARN – The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster – A specific amount of resources (cpu, memory etc.) on a specific machine Container containerId resourceName capability tokens © Hortonworks Inc. 2013 Page 29
  • 30. YARN – Resource Allocation & Usage • ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext container commands environment localResources LocalResource uri type © Hortonworks Inc. 2013 Page 30
  • 31. YARN - ApplicationMaster • ApplicationMaster – Per-application controller aka container_0 – Parent for all containers of the application – ApplicationMaster negotiates all it’s containers from ResourceManager – ApplicationMaster container is child of ResourceManager – Think init process in Unix – RM restarts the ApplicationMaster attempt if required (unique ApplicationAttemptId) – Code for application is submitted along with Application itself © Hortonworks Inc. 2013 Page 31
  • 32. YARN - ApplicationMaster • ApplicationMaster – ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client – ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue © Hortonworks Inc. 2013 Page 32
  • 33. YARN Application API - Overview • YarnClient is submission client api • Both synchronous & asynchronous APIs for resource allocation and container start/stop • Synchronous API – AMRMClient – AMNMClient • Asynchronous API – AMRMClientAsync – AMNMClientAsync © Hortonworks Inc. 2013 Page 33
  • 34. YARN Application API – The Client New Application Request: YarnClient.createApplication ResourceManager 1 Client2 Scheduler Submit Application: YarnClient.submitApplication 2 NodeManager NodeManager NodeManager NodeManager Container 1.1 Container 2.2 Container 2.4 NodeManager NodeManager AM 1 NodeManager Container 1.2 NodeManager Container 1.3 © Hortonworks Inc. 2012 NodeManager AM2 NodeManager NodeManager Container 2.1 NodeManager Container 2.3
  • 35. YARN Application API – The Client • YarnClient – createApplication to create application – submitApplication to start application – Application developer needs to provide ApplicationSubmissionContext – APIs to get other information from ResourceManager – getAllQueues – getApplications – getNodeReports – APIs to manipulate submitted application e.g. killApplication © Hortonworks Inc. 2013 Page 35
  • 36. YARN Application API – Resource Allocation ResourceManager AMRMClient.allocate Scheduler Container 3 NodeManager NodeManager NodeManager 2 4 unregisterApplicationMaster NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager AM 1 NodeManager registerApplicationMaster NodeManager © Hortonworks Inc. 2012
  • 37. YARN Application API – Resource Allocation • AMRMClient - Synchronous API for ApplicationMaster to interact with ResourceManager – Prologue / epilogue – registerApplicationMaster / unregisterApplicationMaster – Resource negotiation with ResourceManager – Internal book-keeping - addContainerRequest / removeContainerRequest / releaseAssignedContainer – Main API – allocate – Helper APIs for cluster information – getAvailableResources – getClusterNodeCount © Hortonworks Inc. 2013 Page 37
  • 38. YARN Application API – Resource Allocation • AMRMClientAsync - Asynchronous API for ApplicationMaster – Extension of AMRMClient to provide asynchronous CallbackHandler – Callbacks make it easier to build mental model of interaction with ResourceManager for the application developer – onContainersAllocated – onContainersCompleted – onNodesUpdated – onError – onShutdownRequest © Hortonworks Inc. 2013 Page 38
  • 39. YARN Application API – Using Resources ResourceManager Scheduler NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.1 AMNMClient.startContainer NodeManager NodeManager AM 1 AMNMClient.getContainerStatus NodeManager NodeManager © Hortonworks Inc. 2012
  • 40. YARN Application API – Using Resources • AMNMClient - Synchronous API for ApplicationMaster to launch / stop containers at NodeManager – Simple (trivial) APIs – startContainer – stopContainer – getContainerStatus © Hortonworks Inc. 2013 Page 40
  • 41. YARN Application API – Using Resources • AMNMClient - Asynchronous API for ApplicationMaster to launch / stop containers at NodeManager – Simple (trivial) APIs – startContainerAsync – stopContainerAsync – getContainerStatusAsync – CallbackHandler to make it easier to build mental model of interaction with NodeManager for the application developer – onContainerStarted – onContainerStopped – onStartContainerError – onContainerStatusReceived © Hortonworks Inc. 2013 Page 41
  • 42. Hadoop Summit 2014 © Hortonworks Inc. 2013 Page 42
  • 43. THANK YOU! Rommel Garcia, Solution Engineer – Big Data rgarcia@hortonworks.com © Hortonworks Inc. 2012 Page 43

Notas do Editor

  1. Traditional Batch vsNextGen
  2. First Hadoop use case was to map the whole internet. Graph with millions of nodes and trillions of edges. Still back to the same problem of silos in order to manipulate data for interactive or Online applicationsLong Story Short, no support for alterative processing models. Iterative tasks can take 10x longer…. because of IO barriers
  3. In Classic Hadoop MapReduce was part of the JobTracker and TaskTracker. Hence everything had to be built on MapReduce firstHas some scalbility limits 4k nodesIterative processes can take forever on map reduceJT Failures kill jobs, and everything in the que
  4. Typical Open Soruce stack, as we can see MapReduce is On TOP of MapReduce, All other applications like Pig and Hive and HBase are also on top of MapReduce
  5. So while Hadoop 1.x had its uses this is really about turning Hadoop into the next generation platform. So what does that mean? A platform should be able to do multiple things, ergo more then just batch processing. Need Batch, Interactive, Online, and Streaming capabilities to really turn Hadoop into a Next Gen Platform.SCALES! Yahoo plans to move into a 10k node cluster
  6. So what does this really do? It provides a distributed application framework.Hadoop is now providing a platform were we can store all data in a reliable wayThen on the same platform be able to processes this data without having to move it. Data locality
  7. New additions to the family like Falcon for data lifecycle management TEZ a new way of processing which avoids some of the IO barriers that MapReduce experiencedKnox for security and other enterprise features.But most importantly Yarn which as you can see everything is now on top of.
  8. AKA a container spawned by an AM can be a Client and ask for another application to start which in turn can do the same thing.
  9. Now we have a concept of deploying applications into the hadoop clusterThese applications run in containers of set resources
  10. RM takes place of JT and still has scheduling ques and such like the fair, capacity and hierarchical ques
  11. Datalocality – attempts to find a local host, if that fails moves to nearest rackFault Tolerance – Rebust in terms of managing containers Recovery- If the AM dies MapReduce application master writes a checkpoint to HDFS. This way we can recover from an AM that dies, it will read the checkpoint and continue.Inter-application Priorites -Maps have to be completed before Reducers right- so there is a complex process in the application master to balance mappers and reducers. Complex feedback from RM -- App master can now kinda look ahead and find out how many resources it can get in the next 20 minutesMigrate directly to YARN without changing a single line of code. Just need to recompile.
  12. Havn’t used Weave but its on the 2do list. Saddly my tasks keep growing and I can’t do them in parallel
  13. Application Attempt Id ( combination of attemptId and fail count )Application Submission Context – submitted by the client
  14. getAllQueues - metrics related to the que such as max capacity, current capacity, application count.getApplications - list of applicationsgetNodeReports – id, rack, host, numCansApplicationSubContext needs a ContainerLaunchContext as well + resources, priority, que etc.