SlideShare uma empresa Scribd logo
1 de 26
An Introduction to
                        MapReduce 2 and
                              YARN

                            Tom White, Cloudera
                              @tom_e_white
                               June 4, 2012
                               Chicago HUG
Tuesday, June 5, 2012
Road Trip




Tuesday, June 5, 2012
About me
                    •   Apache Hadoop Committer,
                        PMC Member, Apache
                        Member

                    •   Engineer at Cloudera
                        working on core Hadoop

                    •   Founder of Apache Whirr

                    •   Author of “Hadoop: The
                        Definitive Guide”

                        •   http://hadoopbook.com

Tuesday, June 5, 2012
First, whatʼs
                        MapReduce 1?


Tuesday, June 5, 2012
Tuesday, June 5, 2012
Whatʼs wrong with
                             MR1?


Tuesday, June 5, 2012
Motivation 1


                    •   Scaling >4000 nodes
                    • Fewer, larger clusters



Tuesday, June 5, 2012
Motivation 2


                    •   HA of Job Tracker
                    • Large, complex state



Tuesday, June 5, 2012
Motivation 3

                    •   Poor resource utilization
                    • Slots in MR1 are for either
                        map or reduce


Tuesday, June 5, 2012
Yet Another Resource Negotiator




Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Node Manager
               is a generalized Task Tracker
                    • Task Tracker
                     • fixed number of map or reduce
                        slots
                    • Node Manager
                     • containers with variable resource
                        limits

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
MR is user space
                         YARN is kernel


Tuesday, June 5, 2012
Bonus Apps

                    •   Distributed shell
                    • MPI   (MAPREDUCE-2911)



                    • Master-worker            (MAPREDUCE-3315)



                    • Apache Giraph, Hama

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Old API ≠ MR1
                        New API ≠ MR2


Tuesday, June 5, 2012
Old API         New API
                              o.a.h.mapred   o.a.h.mapreduce



                        MR1       ✓                ✓

                        MR2       ✓                ✓

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Try out MR2

                    • Apache Hadoop 2.0.0-alpha
                     • hadoop.apache.org
                    • CDH4 and Cloudera Manager
                     • cloudera.com
                    • Cloud - Apache Whirr
Tuesday, June 5, 2012
MR1
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>1.0.3</version>
   </dependency>


     MR2
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>2.0.0-alpha</version>
   </dependency>

Tuesday, June 5, 2012
TODO

                    • Still alpha status
                    • Performance tuning
                    • Usability bug fixes
                    • RM recovery
                    • Security in MR2 not complete

Tuesday, June 5, 2012
Questions?



Tuesday, June 5, 2012

Mais conteúdo relacionado

Destaque

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARN
tomwhite
 

Destaque (19)

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARN
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road Ahead
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Semelhante a Map Reduce v2 and YARN - CHUG - 20120604

The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
Chiradeep Vittal
 

Semelhante a Map Reduce v2 and YARN - CHUG - 20120604 (7)

Ruby CI with Jenkins
Ruby CI with JenkinsRuby CI with Jenkins
Ruby CI with Jenkins
 
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bddMichael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Hadoop-2 @ eBay
Hadoop-2 @ eBayHadoop-2 @ eBay
Hadoop-2 @ eBay
 

Mais de Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
 

Mais de Chicago Hadoop Users Group (18)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Map Reduce v2 and YARN - CHUG - 20120604

  • 1. An Introduction to MapReduce 2 and YARN Tom White, Cloudera @tom_e_white June 4, 2012 Chicago HUG Tuesday, June 5, 2012
  • 3. About me • Apache Hadoop Committer, PMC Member, Apache Member • Engineer at Cloudera working on core Hadoop • Founder of Apache Whirr • Author of “Hadoop: The Definitive Guide” • http://hadoopbook.com Tuesday, June 5, 2012
  • 4. First, whatʼs MapReduce 1? Tuesday, June 5, 2012
  • 6. Whatʼs wrong with MR1? Tuesday, June 5, 2012
  • 7. Motivation 1 • Scaling >4000 nodes • Fewer, larger clusters Tuesday, June 5, 2012
  • 8. Motivation 2 • HA of Job Tracker • Large, complex state Tuesday, June 5, 2012
  • 9. Motivation 3 • Poor resource utilization • Slots in MR1 are for either map or reduce Tuesday, June 5, 2012
  • 10. Yet Another Resource Negotiator Tuesday, June 5, 2012
  • 13. Node Manager is a generalized Task Tracker • Task Tracker • fixed number of map or reduce slots • Node Manager • containers with variable resource limits Tuesday, June 5, 2012
  • 16. MR is user space YARN is kernel Tuesday, June 5, 2012
  • 17. Bonus Apps • Distributed shell • MPI (MAPREDUCE-2911) • Master-worker (MAPREDUCE-3315) • Apache Giraph, Hama Tuesday, June 5, 2012
  • 20. Old API ≠ MR1 New API ≠ MR2 Tuesday, June 5, 2012
  • 21. Old API New API o.a.h.mapred o.a.h.mapreduce MR1 ✓ ✓ MR2 ✓ ✓ Tuesday, June 5, 2012
  • 23. Try out MR2 • Apache Hadoop 2.0.0-alpha • hadoop.apache.org • CDH4 and Cloudera Manager • cloudera.com • Cloud - Apache Whirr Tuesday, June 5, 2012
  • 24. MR1 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>1.0.3</version> </dependency> MR2 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.0.0-alpha</version> </dependency> Tuesday, June 5, 2012
  • 25. TODO • Still alpha status • Performance tuning • Usability bug fixes • RM recovery • Security in MR2 not complete Tuesday, June 5, 2012