SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Scheduling
Hadoopas batch processing system 
•Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. 
•Users submitted jobs to a queue, and the cluster ran them in order. 
•Soon, another use case became attractive: 
–Sharing a MapReducecluster between multiple users.
The benefits of sharing 
•With all the data in one place, users can run queries that they may never have been able to execute otherwise, and 
•Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. 
•However, sharing requires support from the Hadoopjob scheduler to 
–provide guaranteed capacity to production jobs and 
–good response time to interactive jobs while allocating resources fairly between users.
Approaches to Sharing 
•FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job 
•Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. 
•Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
Hadoop default scheduler (FIFO) 
–Problem:short jobs get stuck behind long ones 
•Separate clusters 
–Problem 1:poor utilization 
–Problem 2:costly data replication 
•Full replication across clusters nearly infeasible at Facebook/Yahoo! scale 
•Partial replication prevents cross-dataset queries
Fair Scheduling 
Job Queue
Fair Scheduling 
Job Queue
Fair Scheduler Basics 
•Group jobs into “pools” 
•Assign each pool a guaranteed minimum share 
•Divide excess capacity evenly between pools
Pools 
•Determined from a configurable job property 
–Default in 0.20: user.name (one pool per user) 
•Pools have properties: 
–Minimum map slots 
–Minimum reduce slots 
–Limit on # of running jobs
Example Pool Allocations 
entire cluster100 slots 
emp1 
emp2 
finance 
min share = 40 
emp6 
min share = 30 
job 2 
15 slots 
job 3 
15 slots 
job 1 
30 slots 
job 4 
40 slots
Scheduling Algorithm 
•Split each pool’s min share among its jobs 
•Split each pool’s total share among its jobs 
•When a slot needs to be assigned: 
–If there is any job below its min share, schedule it 
–Else schedule the job that we’ve been most unfair to (based on “deficit”)
Scheduler Dashboard
Scheduler Dashboard 
Change priority 
Change pool 
FIFO mode (for testing)
Additional Features 
•Weights for unequal sharing: 
–Job weights based on priority (each level = 2x) 
–Job weights based on size 
–Pool weights 
•Limits for # of running jobs: 
–Per user 
–Per pool
Installing the Fair Scheduler 
•Build it: 
–ant package 
•Place it on the classpath: 
–cp build/contrib/fairscheduler/*.jar lib
Configuration Files 
•Hadoop config(conf/mapred-site.xml) 
–Contains scheduler options, pointer to pools file 
•Pools file (pools.xml) 
–Contains min share allocations and limits on pools 
–Reloaded every 15 seconds at runtime
Minimal hadoop-site.xml 
<property> 
<name>mapred.jobtracker.taskScheduler</name> 
<value>org.apache.hadoop.mapred.FairScheduler</ value> 
</property> 
<property> 
<name>mapred.fairscheduler.allocation.file</name> 
<value>/path/to/pools.xml</value> 
</property>
Minimal pools.xml 
<?xml version="1.0"?> 
<allocations> 
</allocations>
Configuring a Pool 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
</pool> 
</allocations>
Setting Running Job Limits 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
</allocations>
Default Per-User Running Job Limit 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
<userMaxJobsDefault>10</userMaxJobsDefault> 
</allocations>
Other Parameters 
mapred.fairscheduler.assignmultiple: 
•Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
Other Parameters 
mapred.fairscheduler.poolnameproperty: 
•Which JobConfproperty sets what pool a job is in 
-Default: user.name (one pool per user) 
-Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
Useful Setting 
<property> 
<name>mapred.fairscheduler.poolnameproperty</name> 
<value>pool.name</value> 
</property> 
<property> 
<name>pool.name</name> 
<value>${user.name}</value> 
</property> 
Make pool.name default to user.name
Issues with Fair Scheduler 
–Fine-grained sharing at level of map & reduce tasks 
–Predictable response times and user isolation 
•Problem:data locality 
–For efficiency, must run tasks near their input data 
–Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes 
•Solution:delay scheduling 
–Relax queuing policy for limited time to achieve locality
The Problem 
Job 2 
Master 
Job 1 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
Task 3 
Task 1 
Task 7 
Task 4 
File 1: 
File 2:
The Problem 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
2 
2 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
3 
Task 1 
File 1: 
File 2: 
Task 1 
7 
Task 2 
Task 4 
3 
Problem: Fair decision hurts locality 
Especially bad for jobs with small input files 
1 
3
Solution: Delay Scheduling 
•Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks 
•Result: Very short wait time (1-5s) is enough to get nearly 100% locality
Delay Scheduling Example 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
3 
File 1: 
File 2: 
Task 8 
7 
Task 2 
Task 4 
6 
Idea: Wait a short time to get data-local scheduling opportunities 
5 
Task 1 
1 
Task 3 
Wait!
Delay Scheduling Details 
•Scan jobs in order given by queuing policy, picking first that is permitted to launch a task 
•Jobs must wait before being permitted to launch non-local tasks 
–If wait < T1, only allow node-local tasks 
–If T1< wait < T2, also allow rack-local 
–If wait > T2, also allow off-rack 
•Increase a job’s time waited when it is skipped
Capacity Scheduler 
•Organizes jobs into queues 
•Queue shares as %’s of cluster 
•FIFO scheduling within each queue 
•Supports preemption 
•Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
End of session 
Day –1: Scheduling

Mais conteúdo relacionado

Mais procurados

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 

Mais procurados (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 

Semelhante a Scheduling Hadoop for Batch and Interactive Workloads

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHanborq Inc.
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTAmmarHassan80
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterIOSR Journals
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 

Semelhante a Scheduling Hadoop for Batch and Interactive Workloads (20)

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop Cluster
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 

Mais de Subhas Kumar Ghosh

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descentSubhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hiveSubhas Kumar Ghosh
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)Subhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 

Mais de Subhas Kumar Ghosh (20)

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
01 hbase
01 hbase01 hbase
01 hbase
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 

Último

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Scheduling Hadoop for Batch and Interactive Workloads

  • 2. Hadoopas batch processing system •Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. •Users submitted jobs to a queue, and the cluster ran them in order. •Soon, another use case became attractive: –Sharing a MapReducecluster between multiple users.
  • 3. The benefits of sharing •With all the data in one place, users can run queries that they may never have been able to execute otherwise, and •Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. •However, sharing requires support from the Hadoopjob scheduler to –provide guaranteed capacity to production jobs and –good response time to interactive jobs while allocating resources fairly between users.
  • 4. Approaches to Sharing •FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job •Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. •Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 8. Hadoop default scheduler (FIFO) –Problem:short jobs get stuck behind long ones •Separate clusters –Problem 1:poor utilization –Problem 2:costly data replication •Full replication across clusters nearly infeasible at Facebook/Yahoo! scale •Partial replication prevents cross-dataset queries
  • 11. Fair Scheduler Basics •Group jobs into “pools” •Assign each pool a guaranteed minimum share •Divide excess capacity evenly between pools
  • 12. Pools •Determined from a configurable job property –Default in 0.20: user.name (one pool per user) •Pools have properties: –Minimum map slots –Minimum reduce slots –Limit on # of running jobs
  • 13. Example Pool Allocations entire cluster100 slots emp1 emp2 finance min share = 40 emp6 min share = 30 job 2 15 slots job 3 15 slots job 1 30 slots job 4 40 slots
  • 14. Scheduling Algorithm •Split each pool’s min share among its jobs •Split each pool’s total share among its jobs •When a slot needs to be assigned: –If there is any job below its min share, schedule it –Else schedule the job that we’ve been most unfair to (based on “deficit”)
  • 16. Scheduler Dashboard Change priority Change pool FIFO mode (for testing)
  • 17. Additional Features •Weights for unequal sharing: –Job weights based on priority (each level = 2x) –Job weights based on size –Pool weights •Limits for # of running jobs: –Per user –Per pool
  • 18. Installing the Fair Scheduler •Build it: –ant package •Place it on the classpath: –cp build/contrib/fairscheduler/*.jar lib
  • 19. Configuration Files •Hadoop config(conf/mapred-site.xml) –Contains scheduler options, pointer to pools file •Pools file (pools.xml) –Contains min share allocations and limits on pools –Reloaded every 15 seconds at runtime
  • 20. Minimal hadoop-site.xml <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</ value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/pools.xml</value> </property>
  • 21. Minimal pools.xml <?xml version="1.0"?> <allocations> </allocations>
  • 22. Configuring a Pool <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> </pool> </allocations>
  • 23. Setting Running Job Limits <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> </allocations>
  • 24. Default Per-User Running Job Limit <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> <userMaxJobsDefault>10</userMaxJobsDefault> </allocations>
  • 25. Other Parameters mapred.fairscheduler.assignmultiple: •Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
  • 26. Other Parameters mapred.fairscheduler.poolnameproperty: •Which JobConfproperty sets what pool a job is in -Default: user.name (one pool per user) -Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
  • 27. Useful Setting <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</value> </property> Make pool.name default to user.name
  • 28. Issues with Fair Scheduler –Fine-grained sharing at level of map & reduce tasks –Predictable response times and user isolation •Problem:data locality –For efficiency, must run tasks near their input data –Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes •Solution:delay scheduling –Relax queuing policy for limited time to achieve locality
  • 29. The Problem Job 2 Master Job 1 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 Task 3 Task 1 Task 7 Task 4 File 1: File 2:
  • 30. The Problem Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 2 2 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 3 Task 1 File 1: File 2: Task 1 7 Task 2 Task 4 3 Problem: Fair decision hurts locality Especially bad for jobs with small input files 1 3
  • 31. Solution: Delay Scheduling •Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks •Result: Very short wait time (1-5s) is enough to get nearly 100% locality
  • 32. Delay Scheduling Example Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 3 File 1: File 2: Task 8 7 Task 2 Task 4 6 Idea: Wait a short time to get data-local scheduling opportunities 5 Task 1 1 Task 3 Wait!
  • 33. Delay Scheduling Details •Scan jobs in order given by queuing policy, picking first that is permitted to launch a task •Jobs must wait before being permitted to launch non-local tasks –If wait < T1, only allow node-local tasks –If T1< wait < T2, also allow rack-local –If wait > T2, also allow off-rack •Increase a job’s time waited when it is skipped
  • 34. Capacity Scheduler •Organizes jobs into queues •Queue shares as %’s of cluster •FIFO scheduling within each queue •Supports preemption •Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 35. End of session Day –1: Scheduling