SlideShare uma empresa Scribd logo
1 de 15
Wei’s Notes on Map-Reduce Job Scheduling Feb 2011
[Map-Reduce] Workflow Master splits a job into small chunks (symd model) Assign to slaves with available mapper slots (taking into account of data locality) Mapper collects required data, puts through user defined mapper function Mapper writes intermediate results to local disk, report to Master with location of the results Master record status, pick slaves with available reducer and push over location info for reduce phase (*locality? Yes!) Reducer copies data from mapper via RPC, waits for all mappers to finish, then sorts by intermediate keys, eventually puts through user defined reducer function Reducer writes final output to DFS, report to Master
[Map-Reduce] Data flow Raw Map(k1, v1) -> list(k2, v2) Reduce(k2, list(v2)) -> list(v2) *why not v3?
[Map-Reduce] Fault Tolerance Upon machine failure:
[Map-Reduce] To-Dos Splitting:  When: upon arrival or upon head-of-queue  how is size M determined? (based on chunk size) “can be processed in parallel by different machines” Cost of re-execution Map & reduce
[Fair Scheduler] 3-phase allocation Satisfy the pool whose min share >= demand Allocate resources to the other pools up to its min share Residual given to the unfilled, starting with the least fulfilled Notes Resource allocation is pool based instead of job based Pool: min share is user specified
[Fair Scheduler] reschedule Policy: wait & kill Algorithm: Wait Tmin. If min share not achieved, kill others Wait Tfair. If fare share not achieved, kill more.
[Fair Scheduler] Issues & Solutions Data Locality Delay scheduling: address sticky slots issue IO-rate biasing: address hotspot node  Map/Reduce interdependency Copy-Compute Splitting: overlapping IO intensive copy and CPU intensive reducing
[Fair Scheduler] Tradeoffs Batch response time: fairness vs. utilization tradeoff (throughput)  Average Response Time Space Usage with Intermediate Data User Isolation: “ability to provide worst-case performance comparable to owning a small private cluster regardless of user workload”
[Fair Scheduler] To-Dos<done> Reschedule/Reassignment FairScheduler keeps UPDATE_INTERVAL, check all pools for tasks to preempt and set status of those tasks, and place in action queue.  Next heartbeat will pick up the changes in task status and carry out the kills. Relationship between batch response time and throughput: measure the same thing.  Relationship between average response time and user isolation: could be correlated, but not all the time. ART is not a quantitative measurement of user isolation
[Quincy] Model the problem as a flow network Flow network: a directed graph each of whose  Edges e is annotated with a non-negative integer capacity and a cost, and whose Nodes v is annotated with an integer “supply” where total supply of the graph equals to zero To construct simplest graph with only hard constraint being no starvation
Quincy vs. Fair Scheduler
Readings MapReduce. Jeffery Dean* Google: Cluster Computing and MR Job Scheduling for Multi-User. Matei Zaharia* Max-min fairness. Wikipedia + algo* Quincy. Michael Isard* An update on Google’s infrastructure
Topic Before: Existing systems predetermined and fixed allocation of resources/slots to queries/tasks. Intuitively, if resources can be dynamically allocated to tasks, the resources can be better utilized. After: Enable scheduler to make resource aware decisions. (IO, CPU, memory) + bring fair scheduler from pool level to job level.
Tips from Prof Tan Keep references of all the literature reviews done and note where it is published

Mais conteúdo relacionado

Mais procurados

Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...jencyjayastina
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningLavanya Vigrahala
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloudSudhagarp Cse
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationQian Lin
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTMHartanto Sanjaya
 
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Base paper ppt-. A  load balancing model based on cloud partitioning for the ...Base paper ppt-. A  load balancing model based on cloud partitioning for the ...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...Lavanya Vigrahala
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reducePranshu Pathak
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm ModelsMartin Coronel
 
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Georgiana Copil
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAisha Kalsoom
 

Mais procurados (20)

Map reduce
Map reduceMap reduce
Map reduce
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioning
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloud
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM
 
Graph chi
Graph chiGraph chi
Graph chi
 
Communication
CommunicationCommunication
Communication
 
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Base paper ppt-. A  load balancing model based on cloud partitioning for the ...Base paper ppt-. A  load balancing model based on cloud partitioning for the ...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
 

Semelhante a Wei's notes on MapReduce Scheduling

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large ClustersIRJET Journal
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model examIndhujeni
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 

Semelhante a Wei's notes on MapReduce Scheduling (20)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 

Último

(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
PB Project 1: Exploring Your Personal Brand
PB Project 1: Exploring Your Personal BrandPB Project 1: Exploring Your Personal Brand
PB Project 1: Exploring Your Personal BrandSharisaBethune
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditNhtLNguyn9
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 

Último (20)

(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
PB Project 1: Exploring Your Personal Brand
PB Project 1: Exploring Your Personal BrandPB Project 1: Exploring Your Personal Brand
PB Project 1: Exploring Your Personal Brand
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal audit
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 

Wei's notes on MapReduce Scheduling

  • 1. Wei’s Notes on Map-Reduce Job Scheduling Feb 2011
  • 2. [Map-Reduce] Workflow Master splits a job into small chunks (symd model) Assign to slaves with available mapper slots (taking into account of data locality) Mapper collects required data, puts through user defined mapper function Mapper writes intermediate results to local disk, report to Master with location of the results Master record status, pick slaves with available reducer and push over location info for reduce phase (*locality? Yes!) Reducer copies data from mapper via RPC, waits for all mappers to finish, then sorts by intermediate keys, eventually puts through user defined reducer function Reducer writes final output to DFS, report to Master
  • 3. [Map-Reduce] Data flow Raw Map(k1, v1) -> list(k2, v2) Reduce(k2, list(v2)) -> list(v2) *why not v3?
  • 4. [Map-Reduce] Fault Tolerance Upon machine failure:
  • 5. [Map-Reduce] To-Dos Splitting: When: upon arrival or upon head-of-queue how is size M determined? (based on chunk size) “can be processed in parallel by different machines” Cost of re-execution Map & reduce
  • 6. [Fair Scheduler] 3-phase allocation Satisfy the pool whose min share >= demand Allocate resources to the other pools up to its min share Residual given to the unfilled, starting with the least fulfilled Notes Resource allocation is pool based instead of job based Pool: min share is user specified
  • 7. [Fair Scheduler] reschedule Policy: wait & kill Algorithm: Wait Tmin. If min share not achieved, kill others Wait Tfair. If fare share not achieved, kill more.
  • 8. [Fair Scheduler] Issues & Solutions Data Locality Delay scheduling: address sticky slots issue IO-rate biasing: address hotspot node Map/Reduce interdependency Copy-Compute Splitting: overlapping IO intensive copy and CPU intensive reducing
  • 9. [Fair Scheduler] Tradeoffs Batch response time: fairness vs. utilization tradeoff (throughput) Average Response Time Space Usage with Intermediate Data User Isolation: “ability to provide worst-case performance comparable to owning a small private cluster regardless of user workload”
  • 10. [Fair Scheduler] To-Dos<done> Reschedule/Reassignment FairScheduler keeps UPDATE_INTERVAL, check all pools for tasks to preempt and set status of those tasks, and place in action queue. Next heartbeat will pick up the changes in task status and carry out the kills. Relationship between batch response time and throughput: measure the same thing. Relationship between average response time and user isolation: could be correlated, but not all the time. ART is not a quantitative measurement of user isolation
  • 11. [Quincy] Model the problem as a flow network Flow network: a directed graph each of whose Edges e is annotated with a non-negative integer capacity and a cost, and whose Nodes v is annotated with an integer “supply” where total supply of the graph equals to zero To construct simplest graph with only hard constraint being no starvation
  • 12. Quincy vs. Fair Scheduler
  • 13. Readings MapReduce. Jeffery Dean* Google: Cluster Computing and MR Job Scheduling for Multi-User. Matei Zaharia* Max-min fairness. Wikipedia + algo* Quincy. Michael Isard* An update on Google’s infrastructure
  • 14. Topic Before: Existing systems predetermined and fixed allocation of resources/slots to queries/tasks. Intuitively, if resources can be dynamically allocated to tasks, the resources can be better utilized. After: Enable scheduler to make resource aware decisions. (IO, CPU, memory) + bring fair scheduler from pool level to job level.
  • 15. Tips from Prof Tan Keep references of all the literature reviews done and note where it is published