SlideShare a Scribd company logo
1 of 17
© Hortonworks Inc. 2013
YARN and AmbariYARN service management using Ambari
Srimanth Gunturi
September 25th, 2013
Page 1
© Hortonworks Inc. 2013
Agenda
• YARN Overview
• Installing
• Monitoring
• Configuration
• Capacity Scheduler
• MapReduce2
• Future
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2013
YARN Overview
Page 3
Architecting the Future of Big Data
• Yet Another Resource Negotiator (YARN)
• Purpose - Negotiating resources (Memory, CPU, Disk, etc.) on a cluster
• What was wrong with original MapReduce?
– Cluster not open to non-MapReduce paradigms
– Inefficient resource utilization (Map slots, Reduce slots)
– Upgrade rigidity
• MapReduce (Hadoop 1.0) -> YARN + MapReduce 2 (Hadoop 2.0)
© Hortonworks Inc. 2013
YARN Overview - Applications
Page 4
Architecting the Future of Big Data
• MapReduce1 applications are fully compatible with MapReduce2
– Same JARs can be used
– Binary compatibility with org.apache.hadoop.mapred API
– Source compatibility with org.apache.hadoop.mapreduce API
© Hortonworks Inc. 2013
YARN Overview - Architecture
Page 5
Architecting the Future of Big Data
• ResourceManager
• NodeManagers
• Containers
• Applications (ApplicationMasters)
© Hortonworks Inc. 2013
Installing
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2013
Monitoring
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2013
Monitoring – NodeManager Summary
Page 8
Architecting the Future of Big Data
NodeManagers Status
• Active – In communication with RM
• Lost – Not communicating with RM
• Unhealthy – Flagged by custom health
check script identified by property
yarn.nodemanager.health-checker.script.path
• Rebooted – Automatically restarted due
to internal problems
• Decommissioned – RM ignoring
communications from host. Host placed
in yarn.resourcemanager.nodes.exclude-path
file.
© Hortonworks Inc. 2013
Monitoring – Container Summary
Page 9
Architecting the Future of Big Data
Containers
• Allocated – Containers which have
been created with requested resources.
• Pending – Containers, whose resources
will become available and are pending
creation.
• Reserved – Containers, whose
resources are not yet available.
Examples
10 GB Cluster
• Request three 5GB containers
• 2 allocated, 1 pending.
• Request three 4GB containers
• 2 allocated, 1 reserved (2GB)
© Hortonworks Inc. 2013
Monitoring – Applications Summary
Page 10
Architecting the Future of Big Data
Applications
• Submitted – Application requests made
to YARN.
• Running – Application with Masters
which have been created and are running.
• Pending – Application requests which
are pending creation.
• Completed – Applications which have
completed running. They could have
been successful, killed or failed.
• Killed – Applications which have been
terminated by user
• Failed – Applications which have failed
to run due to internal failures.
© Hortonworks Inc. 2013
Monitoring – Memory Summary
Page 11
Architecting the Future of Big Data
Cluster Memory
• Used – Memory resource currently
being used across the cluster
• Reserved – Memory resources that
are set aside for being allocated.
• Total – Memory resource available
across entire cluster
© Hortonworks Inc. 2013
Monitoring – Alerts
Page 12
Architecting the Future of Big Data
Service Alerts
- ResourceManager health
- % NodeManagers alive
Host Alerts
- NodeManager health
- NodeManager process check
© Hortonworks Inc. 2013
Monitoring – Graphs
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2013
Configuration
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2013
Configuration – Capacity Scheduler
Page 15
Architecting the Future of Big Data
Queues
Root
o A
o B
o C
o C1
o C2
o default
• Hierarchical Queues
• Capacity Guarantees
• Capacity (%)
• Maximum-am-resource-percent (%)
• Elasticity
• Maximum-capacity (%)
• Access Control
© Hortonworks Inc. 2013
MapReduce2
Page 16
Architecting the Future of Big Data
YARN-321: Generic application history service
© Hortonworks Inc. 2013
Future
Page 17
Architecting the Future of Big Data
• Support more YARN applications
• Improve per application-type information
• Improve Capacity Scheduler configuration
• Better health checks

More Related Content

What's hot

What's hot (20)

Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
 

Similar to Ambari Meetup: YARN

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

Similar to Ambari Meetup: YARN (20)

Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Yarn
YarnYarn
Yarn
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 

More from Hortonworks

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Ambari Meetup: YARN

  • 1. © Hortonworks Inc. 2013 YARN and AmbariYARN service management using Ambari Srimanth Gunturi September 25th, 2013 Page 1
  • 2. © Hortonworks Inc. 2013 Agenda • YARN Overview • Installing • Monitoring • Configuration • Capacity Scheduler • MapReduce2 • Future Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2013 YARN Overview Page 3 Architecting the Future of Big Data • Yet Another Resource Negotiator (YARN) • Purpose - Negotiating resources (Memory, CPU, Disk, etc.) on a cluster • What was wrong with original MapReduce? – Cluster not open to non-MapReduce paradigms – Inefficient resource utilization (Map slots, Reduce slots) – Upgrade rigidity • MapReduce (Hadoop 1.0) -> YARN + MapReduce 2 (Hadoop 2.0)
  • 4. © Hortonworks Inc. 2013 YARN Overview - Applications Page 4 Architecting the Future of Big Data • MapReduce1 applications are fully compatible with MapReduce2 – Same JARs can be used – Binary compatibility with org.apache.hadoop.mapred API – Source compatibility with org.apache.hadoop.mapreduce API
  • 5. © Hortonworks Inc. 2013 YARN Overview - Architecture Page 5 Architecting the Future of Big Data • ResourceManager • NodeManagers • Containers • Applications (ApplicationMasters)
  • 6. © Hortonworks Inc. 2013 Installing Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2013 Monitoring Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2013 Monitoring – NodeManager Summary Page 8 Architecting the Future of Big Data NodeManagers Status • Active – In communication with RM • Lost – Not communicating with RM • Unhealthy – Flagged by custom health check script identified by property yarn.nodemanager.health-checker.script.path • Rebooted – Automatically restarted due to internal problems • Decommissioned – RM ignoring communications from host. Host placed in yarn.resourcemanager.nodes.exclude-path file.
  • 9. © Hortonworks Inc. 2013 Monitoring – Container Summary Page 9 Architecting the Future of Big Data Containers • Allocated – Containers which have been created with requested resources. • Pending – Containers, whose resources will become available and are pending creation. • Reserved – Containers, whose resources are not yet available. Examples 10 GB Cluster • Request three 5GB containers • 2 allocated, 1 pending. • Request three 4GB containers • 2 allocated, 1 reserved (2GB)
  • 10. © Hortonworks Inc. 2013 Monitoring – Applications Summary Page 10 Architecting the Future of Big Data Applications • Submitted – Application requests made to YARN. • Running – Application with Masters which have been created and are running. • Pending – Application requests which are pending creation. • Completed – Applications which have completed running. They could have been successful, killed or failed. • Killed – Applications which have been terminated by user • Failed – Applications which have failed to run due to internal failures.
  • 11. © Hortonworks Inc. 2013 Monitoring – Memory Summary Page 11 Architecting the Future of Big Data Cluster Memory • Used – Memory resource currently being used across the cluster • Reserved – Memory resources that are set aside for being allocated. • Total – Memory resource available across entire cluster
  • 12. © Hortonworks Inc. 2013 Monitoring – Alerts Page 12 Architecting the Future of Big Data Service Alerts - ResourceManager health - % NodeManagers alive Host Alerts - NodeManager health - NodeManager process check
  • 13. © Hortonworks Inc. 2013 Monitoring – Graphs Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2013 Configuration Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2013 Configuration – Capacity Scheduler Page 15 Architecting the Future of Big Data Queues Root o A o B o C o C1 o C2 o default • Hierarchical Queues • Capacity Guarantees • Capacity (%) • Maximum-am-resource-percent (%) • Elasticity • Maximum-capacity (%) • Access Control
  • 16. © Hortonworks Inc. 2013 MapReduce2 Page 16 Architecting the Future of Big Data YARN-321: Generic application history service
  • 17. © Hortonworks Inc. 2013 Future Page 17 Architecting the Future of Big Data • Support more YARN applications • Improve per application-type information • Improve Capacity Scheduler configuration • Better health checks

Editor's Notes

  1. References: http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/
  2. http://hortonworks.com/hadoop/yarn/http://hortonworks.com/blog/running-existing-applications-on-hadoop-2-yarn/
  3. - MapReduce2 becomes bound to YARN. - MR2 is currently the only application exposed in Ambari
  4. Properties grouped by type.Properties end up in either core-site.xml, yarn-site.xml or capacity-scheduler.xml/etc/hadoop/conf/
  5. Maximum-am-resource-percent. % of queue capacity allocated to AppMasters. Determines number of concurrent applications.