SlideShare uma empresa Scribd logo
1 de 40
MapReduce Online Tyson Condie UC Berkeley Joint work with  Neil Conway, Peter Alvaro, and Joseph M. Hellerstein (UC Berkeley) KhaledElmeleegy and Russell Sears (Yahoo! Research)
MapReduce Programming Model Think data-centric Apply a two step transformation to data sets Map step: Map(k1, v1) -> list(k2, v2) Apply map function to input records Divide output records into groups Reduce step: Reduce(k2, list(v2)) -> list(v3) Consolidate groups from the map step Apply reduce function to each group
MapReduce System Model Shared-nothing architecture Tuned for massive data parallelism Many maps operate on portions of the input Many reduces, each assigned specific groups ,[object Object],Runtimes range in minutes tohours Execute on 10s to 1000s of machines Failures common (fault tolerance crucial) Fault tolerance via operator restart since … Operators complete before producing any output Atomic data exchange between operators Simple, elegant, easy to remember
Life Beyond Batch MapReduce often used for analytics on streams of data that arrive continuously Click streams, network traffic, web crawl data, … Batch approach: buffer, load, process High latency Not scalable Online approach: run MR jobs continuously Analyze data as it arrives
Online Query Processing Two domains of interest (at massive scale): Online aggregation Interactive data analysis (watch answer evolve) Stream processing Continuous (real-time) analysis of data streams Blocking operators are a poor fit Final answers only No infinite streams Operators need to pipeline BUT we must retain fault tolerance AND Keep It Simple Stupid!
A Brave New MapReduce World Pipelined MapReduce Maps can operate on infinite data (Stream processing) Reduces can export early answers (Online aggregation) Hadoop Online Prototype (HOP) Hadoop with pipelining support Preserves Hadoop interfaces and APIs Pipelining fault tolerance model
Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Stream Processing Performance (blocking vs. pipelining) Future Work
Hadoop Architecture HadoopMapReduce Single master node (JobTracker), many worker nodes (TaskTrackers) Client submits a job to the JobTracker JobTracker splits each job into tasks (map/reduce) Assigns tasks to TaskTrackers on demand Hadoop Distributed File System (HDFS) Single name node, many data nodes Data is stored as fixed-size (e.g., 64MB) blocks HDFS typically holds map input and reduce output
Hadoop Job Execution Submit job map reduce schedule map reduce
Hadoop Job Execution Read  Input File map reduce HDFS Block 1 Block 2 map reduce
Hadoop Job Execution Finished Map Locations Local FS map reduce Local FS map reduce Map output: sorted by group id and key group id = hash(key) mod # reducers
Hadoop Job Execution Local FS map reduce HTTP GET Local FS map reduce
Hadoop Job Execution reduce Write Final Answer HDFS reduce Input: sorted runs of records assigned the same group id Process: merge-sort runs, for each final group call reduce
Hadoop Online Prototype (HOP) Pipelining between operators Data pushed from producers to consumers Producers schedule data transfer concurrently with operator computation Consumers notify producers early (ASAP) HOP API ,[object Object],Pig, Hive, Jaqlstill work ,[object Object]
JobTracker accepts a series of jobs,[object Object]
Pipelining Data Unit Initial design: pipeline eagerly (each record) Prevents map side preaggregation (a.k.a., combiner) Moves all the sorting work to the reduce step Map computation can block on network I/O Revised design: pipeline small sorted runs (spills) Task thread: apply (map/reduce) function, buffer output Spill thread: sort & combine buffer, spill to a file TaskTracker: sends spill files to consumers Simple adaptive algorithm Halt pipeline when  1. spill files backup OR 2. effective combiner  Resume pipeline by first merging & combining accumulated spill files into a single file
Pipelined Fault Tolerance (PFT) Simple PFT design: Reduce treats in-progress map output as tentative If map dies then throw away output If map succeeds then accept output Revised PFT design: Spill files have deterministic boundaries and are assigned a sequence number Correctness: Reduce tasks ensure spill files are idempotent Optimization: Map tasks avoid sending redundant spill files
Benefits of Pipelining Online aggregation An early view of the result from a running computation Interactive data analysis (you say when to stop) Stream processing Tasks operate on infinite data streams Real-time data analysis Performance? Pipelining can… Improve CPU and I/O overlap Steady network traffic (fewer load spikes) Improve cluster utilization (reducers presort earlier)
Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Implementation Example Approximation Query Stream Processing Performance (blocking vs. pipelining) Future Work
Implementation Read  Input File map reduce Block 1 HDFS HDFS Block 2 map reduce Write Snapshot Answer ,[object Object]
Intermediate results published to HDFS,[object Object]
Bar graph shows results for a single hour (1600) Taken less than 2 minutes into a ~2 hour job!
Approximation error: |estimate – actual| / actual Job progress assumes hours are uniformly sampled Sample fraction closer to the sample distribution of each hour
Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Stream Processing Implementation Use case: real-time monitoring system Performance (blocking vs. pipelining) Future Work
Implementation Map and reduce tasks run continuously Challenge: what if number of tasks exceed slot capacity? Current approach: wait for required slot capacity Map tasks stream spill files Input taken from arbitrary source (MR job, socket, etc.) Garbage collection handled by system Window management done at reducer Reduce function arguments Input data: the set of current input records OutputCollector: output records for the current window InputCollector: records for subsequent windows Return value says when to call next e.g., in X milliseconds, after receiving Y records, etc.
Real-time Monitoring System Use MapReduce to monitor MapReduce Continuous jobs that monitor cluster health Same scheduler for user jobs and monitoring jobs Economy of Mechanism Agents monitor machines  Record statistics of interest (/proc, log files, etc.) Implemented as a continuous map task Aggregators group agent-local statistics High level (rack, datacenter) analysis and correlations Reduce windows: 1, 5, and 15 second load averages
Monitor /proc/vmstat for swapping Alert triggered after some threshold Alert reported around a second after passing threshold Faster than the (~5 second) TaskTracker reporting interval ,[object Object],[object Object]
Performance Open problem! A lot of performance related work still remains Focus on obvious cases first Why block? Effective combiner Reduce step is a bottleneck Why pipeline? Improve cluster utilization Smooth out network traffic
Blocking vs. Pipelining Final  map finishes, sorts and sends to reduce 3rd map finishes, sorts, and sends to reduce 2 maps sort and send output to reducer Simple wordcount on two (small) EC2 nodes Map machine: 2 map slots Reduce machine: 2 reduce slots Input 2GB data, 512MB block size So job contains 4 maps and (a hard-coded) 2 reduces
Blocking vs. Pipelining Job completion when reduce finishes Reduce task performing final merge-sort No significant idle periods during the shuffle phase 4th map output received 3rd map output received 2nd map output received 1st map output received Reduce task 6.5 minute idle period ~ 15 minutes ~ 9 minutes Simple wordcount on two (small) EC2 nodes Map machine: 2 map slots Reduce machine: 2 reduce slots Input 2GB data, 512MB block size So job contains 4 maps and (a hard-coded) 2 reduces
Recall in blocking… Operators block Poor CPU and I/O overlap Reduce task idle periods Only the final answer is fetched So more data is fetched resulting in… Network traffic spikes Especially when a group of maps finish
CPU Utilization Map tasks loading 2GB of data Mapper CPU Pipelining reduce tasks start working (presorting) early Reduce task 6.5 minute idle period Reducer CPU Amazon Cloudwatch
Recall in blocking… Operators block Poor CPU and I/O overlap Reduce task idle periods Only the final answer is fetched So more data is fetched at once resulting in… Network traffic spikes Especially when a group of maps finish
Network Traffic Spikes(map machine network out) First 2 maps finish and send output Last map finishes and sends output 3rd map finishes and sends output Amazon Cloudwatch
A more realistic setup ,[object Object],Slot capacity:  80 maps (4 per node) 60 reduces (3 per node) Evaluate using wordcount job Input: 100GB randomly generated words Block size: 512MB (240 maps) and 32MB (3120 maps) Hard coded 60 reduce tasks
Adaptive vs. Fixed Block Size  Job completion time ~ 49 minutes ~ 36 minutes 512MB block size, 240 maps Maps scheduled in 3 waves (1st 80, 2nd 80, last 80) Large block size creates reduce idle periods Poor CPU and I/O overlap Pipelining minimizes idle periods
Adaptive vs. Fixed Block Size  Job completion time ~ 42 minutes ~ 35 minutes 32MB block size, 3120 maps Maps scheduled in 39 waves, maps finish fast!  Small block size improves CPU and I/O overlap BUT increases scheduling overhead so not a scalable solution Adaptive policy finds the right degree of pipelined parallelism Based on runtime dynamics (load, network capacity, etc.)
Future Work Blocking vs. Pipelining Thorough performance analysis at scale Hadoop optimizer Online Aggregation Statically-robust estimation Random sampling of the input Better UI for approximate results Stream Processing Develop full-fledged stream processing framework Stream support for high-level query languages
Thank you! Questions? MapReduce Online paper: In NSDI 2010 Source code: http://code.google.com/p/hop/ Contact: tcondie@cs.berkeley.edu
Map Reduce Online

Mais conteúdo relacionado

Mais procurados

Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopMohamed Elsaka
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 
Join optimization in hive
Join optimization in hive Join optimization in hive
Join optimization in hive Liyin Tang
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 

Mais procurados (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
Join optimization in hive
Join optimization in hive Join optimization in hive
Join optimization in hive
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Semelhante a Map Reduce Online

mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 

Semelhante a Map Reduce Online (20)

mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
E031201032036
E031201032036E031201032036
E031201032036
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
mapreduce ppt.ppt
mapreduce ppt.pptmapreduce ppt.ppt
mapreduce ppt.ppt
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 

Mais de Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 

Mais de Hadoop User Group (20)

Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 

Map Reduce Online

  • 1. MapReduce Online Tyson Condie UC Berkeley Joint work with Neil Conway, Peter Alvaro, and Joseph M. Hellerstein (UC Berkeley) KhaledElmeleegy and Russell Sears (Yahoo! Research)
  • 2. MapReduce Programming Model Think data-centric Apply a two step transformation to data sets Map step: Map(k1, v1) -> list(k2, v2) Apply map function to input records Divide output records into groups Reduce step: Reduce(k2, list(v2)) -> list(v3) Consolidate groups from the map step Apply reduce function to each group
  • 3.
  • 4. Life Beyond Batch MapReduce often used for analytics on streams of data that arrive continuously Click streams, network traffic, web crawl data, … Batch approach: buffer, load, process High latency Not scalable Online approach: run MR jobs continuously Analyze data as it arrives
  • 5. Online Query Processing Two domains of interest (at massive scale): Online aggregation Interactive data analysis (watch answer evolve) Stream processing Continuous (real-time) analysis of data streams Blocking operators are a poor fit Final answers only No infinite streams Operators need to pipeline BUT we must retain fault tolerance AND Keep It Simple Stupid!
  • 6. A Brave New MapReduce World Pipelined MapReduce Maps can operate on infinite data (Stream processing) Reduces can export early answers (Online aggregation) Hadoop Online Prototype (HOP) Hadoop with pipelining support Preserves Hadoop interfaces and APIs Pipelining fault tolerance model
  • 7. Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Stream Processing Performance (blocking vs. pipelining) Future Work
  • 8. Hadoop Architecture HadoopMapReduce Single master node (JobTracker), many worker nodes (TaskTrackers) Client submits a job to the JobTracker JobTracker splits each job into tasks (map/reduce) Assigns tasks to TaskTrackers on demand Hadoop Distributed File System (HDFS) Single name node, many data nodes Data is stored as fixed-size (e.g., 64MB) blocks HDFS typically holds map input and reduce output
  • 9. Hadoop Job Execution Submit job map reduce schedule map reduce
  • 10. Hadoop Job Execution Read Input File map reduce HDFS Block 1 Block 2 map reduce
  • 11. Hadoop Job Execution Finished Map Locations Local FS map reduce Local FS map reduce Map output: sorted by group id and key group id = hash(key) mod # reducers
  • 12. Hadoop Job Execution Local FS map reduce HTTP GET Local FS map reduce
  • 13. Hadoop Job Execution reduce Write Final Answer HDFS reduce Input: sorted runs of records assigned the same group id Process: merge-sort runs, for each final group call reduce
  • 14.
  • 15.
  • 16. Pipelining Data Unit Initial design: pipeline eagerly (each record) Prevents map side preaggregation (a.k.a., combiner) Moves all the sorting work to the reduce step Map computation can block on network I/O Revised design: pipeline small sorted runs (spills) Task thread: apply (map/reduce) function, buffer output Spill thread: sort & combine buffer, spill to a file TaskTracker: sends spill files to consumers Simple adaptive algorithm Halt pipeline when 1. spill files backup OR 2. effective combiner Resume pipeline by first merging & combining accumulated spill files into a single file
  • 17. Pipelined Fault Tolerance (PFT) Simple PFT design: Reduce treats in-progress map output as tentative If map dies then throw away output If map succeeds then accept output Revised PFT design: Spill files have deterministic boundaries and are assigned a sequence number Correctness: Reduce tasks ensure spill files are idempotent Optimization: Map tasks avoid sending redundant spill files
  • 18. Benefits of Pipelining Online aggregation An early view of the result from a running computation Interactive data analysis (you say when to stop) Stream processing Tasks operate on infinite data streams Real-time data analysis Performance? Pipelining can… Improve CPU and I/O overlap Steady network traffic (fewer load spikes) Improve cluster utilization (reducers presort earlier)
  • 19. Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Implementation Example Approximation Query Stream Processing Performance (blocking vs. pipelining) Future Work
  • 20.
  • 21.
  • 22. Bar graph shows results for a single hour (1600) Taken less than 2 minutes into a ~2 hour job!
  • 23. Approximation error: |estimate – actual| / actual Job progress assumes hours are uniformly sampled Sample fraction closer to the sample distribution of each hour
  • 24. Outline Hadoop MR Background Hadoop Online Prototype (HOP) Online Aggregation Stream Processing Implementation Use case: real-time monitoring system Performance (blocking vs. pipelining) Future Work
  • 25. Implementation Map and reduce tasks run continuously Challenge: what if number of tasks exceed slot capacity? Current approach: wait for required slot capacity Map tasks stream spill files Input taken from arbitrary source (MR job, socket, etc.) Garbage collection handled by system Window management done at reducer Reduce function arguments Input data: the set of current input records OutputCollector: output records for the current window InputCollector: records for subsequent windows Return value says when to call next e.g., in X milliseconds, after receiving Y records, etc.
  • 26. Real-time Monitoring System Use MapReduce to monitor MapReduce Continuous jobs that monitor cluster health Same scheduler for user jobs and monitoring jobs Economy of Mechanism Agents monitor machines Record statistics of interest (/proc, log files, etc.) Implemented as a continuous map task Aggregators group agent-local statistics High level (rack, datacenter) analysis and correlations Reduce windows: 1, 5, and 15 second load averages
  • 27.
  • 28. Performance Open problem! A lot of performance related work still remains Focus on obvious cases first Why block? Effective combiner Reduce step is a bottleneck Why pipeline? Improve cluster utilization Smooth out network traffic
  • 29. Blocking vs. Pipelining Final map finishes, sorts and sends to reduce 3rd map finishes, sorts, and sends to reduce 2 maps sort and send output to reducer Simple wordcount on two (small) EC2 nodes Map machine: 2 map slots Reduce machine: 2 reduce slots Input 2GB data, 512MB block size So job contains 4 maps and (a hard-coded) 2 reduces
  • 30. Blocking vs. Pipelining Job completion when reduce finishes Reduce task performing final merge-sort No significant idle periods during the shuffle phase 4th map output received 3rd map output received 2nd map output received 1st map output received Reduce task 6.5 minute idle period ~ 15 minutes ~ 9 minutes Simple wordcount on two (small) EC2 nodes Map machine: 2 map slots Reduce machine: 2 reduce slots Input 2GB data, 512MB block size So job contains 4 maps and (a hard-coded) 2 reduces
  • 31. Recall in blocking… Operators block Poor CPU and I/O overlap Reduce task idle periods Only the final answer is fetched So more data is fetched resulting in… Network traffic spikes Especially when a group of maps finish
  • 32. CPU Utilization Map tasks loading 2GB of data Mapper CPU Pipelining reduce tasks start working (presorting) early Reduce task 6.5 minute idle period Reducer CPU Amazon Cloudwatch
  • 33. Recall in blocking… Operators block Poor CPU and I/O overlap Reduce task idle periods Only the final answer is fetched So more data is fetched at once resulting in… Network traffic spikes Especially when a group of maps finish
  • 34. Network Traffic Spikes(map machine network out) First 2 maps finish and send output Last map finishes and sends output 3rd map finishes and sends output Amazon Cloudwatch
  • 35.
  • 36. Adaptive vs. Fixed Block Size Job completion time ~ 49 minutes ~ 36 minutes 512MB block size, 240 maps Maps scheduled in 3 waves (1st 80, 2nd 80, last 80) Large block size creates reduce idle periods Poor CPU and I/O overlap Pipelining minimizes idle periods
  • 37. Adaptive vs. Fixed Block Size Job completion time ~ 42 minutes ~ 35 minutes 32MB block size, 3120 maps Maps scheduled in 39 waves, maps finish fast! Small block size improves CPU and I/O overlap BUT increases scheduling overhead so not a scalable solution Adaptive policy finds the right degree of pipelined parallelism Based on runtime dynamics (load, network capacity, etc.)
  • 38. Future Work Blocking vs. Pipelining Thorough performance analysis at scale Hadoop optimizer Online Aggregation Statically-robust estimation Random sampling of the input Better UI for approximate results Stream Processing Develop full-fledged stream processing framework Stream support for high-level query languages
  • 39. Thank you! Questions? MapReduce Online paper: In NSDI 2010 Source code: http://code.google.com/p/hop/ Contact: tcondie@cs.berkeley.edu

Notas do Editor

  1. TARGET Batch-oriented computations
  2. DEFINE COMBINER!!!
  3. Vertical line indicates alert
  4. Data transfer scheduled by reducerReduce partitions serviced in arbitrary orderPoor spatial locality!
  5. Data transfer scheduled by reducerReduce partitions serviced in arbitrary orderPoor spatial locality!
  6. Operators blockPoor CPU and I/O overlapReduce task idle periods
  7. Cluster capacity: 80 maps, 60 reduces
  8. First plateau due to 80 maps finishing at the same timeSubsequent map waves get spread out by the scheduler duty cycle.