SlideShare uma empresa Scribd logo
1 de 21
CLOUD SCHEDULERS
P A L L A V J H A ( 1 0 - 1 - 5 - 0 2 3 )
P R A B H A K A R B A R U A ( 1 0 - 1 - 5 - 0 1 7 )
P R A B O D H H E N D ( 1 0 - 1 - 5 - 0 5 3 )
J U G A L A S S U D A N I ( 1 0 - 1 - 5 - 0 6 8 )
P R E M C H A N D R A ( 0 9 - 1 - 5 - 0 6 2 )
SCHEDULING IN HADOOP
When a node has an empty task slot, Hadoop chooses a task for it from
one of three categories. First, if any task has failed, it is given highest
priority. This is done to detect when a task fails repeatedly due to a bug
and stop the job. Second, unscheduled tasks are considered. For
maps, tasks with data local to the node are chosen first. Finally, Hadoop
looks for a task to speculate on.
To select speculative tasks, Hadoop monitors task progress using a
progress score, which is a number from 0 to 1. For a map, the score is
the fraction of input data read. For a reduce task, the execution is
divided into three phases, each of which accounts for 1/3 of the score:
1.The copy phase, when the task is copying outputs
of all maps. In this phase, the score is the percent of
maps that output has been copied from.
2.The sort phase, when map outputs are sorted by key.
Here the score is the percent of data merged.
3.The reduce phase, when a user-defined function is
applied to the map outputs. Here the score is the
percent of data passed through the reduce function
ASSUMPTIONS IN HADOOP’S SCHEDULER
1. Nodes can perform work at roughly the same rate.
2. Tasks progress at a constant rate throughout time.
3. There is no cost to launching a speculative task on a
node that would otherwise have an idle slot.
4. A task‟s progress score is roughly equal to the fraction of its
total work that it has done. Specifically,
in a reduce task, the copy, reduce and merge phases
each take 1/3 of the total time.
5. Tasks tend to finish in waves, so a task with a low
progress score is likely a slow task.
6. Different tasks of the same category (map or reduce)
require roughly the same amount of work.
RDD
an RDD is a read-only, partitioned collection of records. RDDs can only
be
created through deterministic operations on either
(1) data in stable storage or
(2) other RDDs. We call these operations transformations to
differentiate them from other operations on RDDs. Examples of
transformations include map, filter, and join.
RDDs do not need to be materialized at all times. Instead, an RDD has
enough information about how it was
derived from other datasets (its lineage) to compute its
partitions from data in stable storage. This is a powerful property: in
essence, a program cannot reference an
RDD that it cannot reconstruct after a failure.
RDD
RDDs do not need to be materialized at all
times. Instead, an RDD has enough
information about how it was derived from
other datasets (its lineage) to compute its
partitions from data in stable storage. This is
a powerful property: in essence, a program
cannot reference an
RDD that it cannot reconstruct after a
failure.
APPLICATION OF RDD: “LOG MINING”
val lines = spark.textFile(“hdfs://...”)
val errors = lines.filter(_.startsWith(“ERROR”))
val messages = errors.map(_.split('t')(2))
messages.cache()
message.filter(_.contains(“foo”)).count
message.filter(_.contains(“bar”)).count
RDD FAULT TOLERANCE
RDDs track the series of transformations used to
build them (their lineage) to recompute lost data
E.g:
messages = textFile(...).filter(_.contains(“error”))
.map(_.split(„t‟)(2)
NAIVE FAIR SHARING ALGORITHM
1. when a heartbeat is received from node n:
2. if n has a free slot then
3. sort jobs in increasing order of number of running tasks
4. for j in jobs do
5. if j has unlaunched task t with data on n then
6. launch t on n
7. else if j has unlaunched task t then
8. launch t on n
9. end if
10. end for
11. end if
DELAY SCHEDULING
1. Initialize j.skipcount to 0 for all jobs j.
2. when a heartbeat is received from node n:
3. if n has a free slot then
4. sort jobs in increasing order of number of running tasks
5. for j in jobs do
6. if j has unlaunched task t with data on n then
7. launch t on n
8. set j.skipcount = 0
9. else if j has unlaunched task t then
10.if j.skipcount >= D then
11.launch t on n
12.else
13.set j.skipcount = j.skipcount +1
14.end if
15.end if
16.end for
17.end if
LATE(LONGEST APPROXIMATE TIME TO END) SCHEDULER
the LATE algorithm works as follows:
• If a task slot becomes available and there are less
than SpeculativeCap speculative tasks running:
– Ignore the request if the node‟s total progress
is below SlowNodeThreshold.
– Rank currently running, non-speculatively executed tasks by estimated time
left.
– Launch a copy of the highest-ranked task with
progress rate below SlowTaskThreshold.
LIMITATIONS :FSS
Locality Problems with Fair Sharing
The main aspect of MapReduce that complicates
scheduling is the need to place tasks near their
input data. Locality increases throughput
because network bandwidth in a large cluster is
much lower than the total bandwidth of the
cluster‟s disks. Running on a node that contains the
data(node locality) is most efficient, but when
this is not possible, running on the same rack
(rack locality) is faster than running off-rack.
But in fair share scheduling we only consider
node locality.
LIMITATIONS : FSS
Head of line scheduling:
The first locality problem occurs in small jobs (jobs that
have small input files and hence have a small number of
data blocks to read). The problem is that whenever a job
reaches the head of the sorted list in fair share algorithm (i.e.
has the fewest running tasks), one of its tasks is launched on
the next slot that becomes free, no matter which node this slot
is on. If the head-of-line job is small, it is unlikely to have
data on the node that is given to it. For example, a job with
data on 10% of nodes will only achieve 10% locality.
LIMITATIONS :FSS
STICKY SLOTS
The problem is that there is a tendency for a job to be
assigned the same slot repeatedly.
Suppose that job j‟s fractional share of the cluster is f . Then
for any given block b, the probability that none of j‟s slots are
on a node with a copy of b is (1− f )RL: there are R replicas of
b, each replica is on a node with L slots, and the probability
that a slot does not belong to j is 1− f . Therefore, j is expected
to achieve at most 1−(1− f )RL locality.
LIMITATIONS : DELAY SCHEDULING
Long Task Balancing:
To lower the chance that a node fills with long tasks, we can
spread long tasks through out the cluster by changing the
locality test in Algorithm to prevent jobs with long tasks from
launching tasks on nodes that are running a higher-than-
average number of long tasks. Although we do not know
which jobs have long tasks in advance, we can treat new jobs
as long-task jobs, and mark them as short-task jobs if their
tasks finish quickly.
LIMITATIONS :DELAY SCHEDULING
Hotspots are only likely to occur if multiple jobs
need to read the same data file, and that file is
small enough that copies of its blocks are only
present on a small fraction of nodes. In this
case, no scheduling algorithm can achieve high
locality without excessive queueing delays.
LIMITATIONS:LATE SCHEDULER
If some commodity hardware computers is running
behind its peers this scheduler,instead of trying to
finding out the reasons as to why it is behaving this
way, it marks it as a straggler. The complications
associated with it are tremendous as this does not
observe whether it is temporary defect or a
permanent crippling one we are not giving it any
more tasks during the entire duration of
computation
PROPOSITION
We have tried to crete a scheduler which may be able to circumvent the
limitations described earlier. In this scheduler the task is enqueued into the
priority queues of the nodes where the data for those tasks are avialable.
Algorithm
1. Retrieve the list of local nodes from the arriving task.
2. Set n= REPLICATION.FACTOR
3. Create n instances of the task in the in n task trackers ‟ priority
queue with different priority value where the data will be local
for that specific task.
4. The tasks will be executed in accordance with the priority
status, the task which has a certain priority other than 1 will
have to skip that many number of tasks if and only if those
tasks have a higher priority and have arrived at a later time
with reference to that task.
PROPOSITIONS
T1-1 T4-2 T5-1
T2-1 T5-2
T1-2 T2-3
T4-1 T2-2
T1-3 T4-3 T5-3
TT1
TT2
TT3
TT4
TT5
PROPOSITIONS
T4-2 T5-1
T5-2
T2-3
T2-2
T4-3 T5-3
TT1
TT2
TT3
TT4
TT5
EXPERIMENT AND RESULTS
FCFS Simulation
50 Cloudlets; 8 VMs
Total time to Complete: 1165.96 ms
PQRST Simulation
50 Cloudlets; 8 VMs
Total time to Complete: 1120.10 ms
THANK YOU

Mais conteúdo relacionado

Mais procurados

Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Rajesh Ananda Kumar
 
Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programmingSreerag Gopinath
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
 
Memory allocation
Memory allocationMemory allocation
Memory allocationsanya6900
 

Mais procurados (20)

Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programming
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Compiler design
Compiler designCompiler design
Compiler design
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Heap Management
Heap ManagementHeap Management
Heap Management
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Memory allocation
Memory allocationMemory allocation
Memory allocation
 

Semelhante a Cloud schedulers and Scheduling in Hadoop

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptxShimoFcis
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model examIndhujeni
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache SparkGao Yunzhong
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 

Semelhante a Cloud schedulers and Scheduling in Hadoop (20)

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptx
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Map reduce
Map reduceMap reduce
Map reduce
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache Spark
 
compiler design
compiler designcompiler design
compiler design
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 

Último

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Último (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Cloud schedulers and Scheduling in Hadoop

  • 1. CLOUD SCHEDULERS P A L L A V J H A ( 1 0 - 1 - 5 - 0 2 3 ) P R A B H A K A R B A R U A ( 1 0 - 1 - 5 - 0 1 7 ) P R A B O D H H E N D ( 1 0 - 1 - 5 - 0 5 3 ) J U G A L A S S U D A N I ( 1 0 - 1 - 5 - 0 6 8 ) P R E M C H A N D R A ( 0 9 - 1 - 5 - 0 6 2 )
  • 2. SCHEDULING IN HADOOP When a node has an empty task slot, Hadoop chooses a task for it from one of three categories. First, if any task has failed, it is given highest priority. This is done to detect when a task fails repeatedly due to a bug and stop the job. Second, unscheduled tasks are considered. For maps, tasks with data local to the node are chosen first. Finally, Hadoop looks for a task to speculate on. To select speculative tasks, Hadoop monitors task progress using a progress score, which is a number from 0 to 1. For a map, the score is the fraction of input data read. For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score: 1.The copy phase, when the task is copying outputs of all maps. In this phase, the score is the percent of maps that output has been copied from. 2.The sort phase, when map outputs are sorted by key. Here the score is the percent of data merged. 3.The reduce phase, when a user-defined function is applied to the map outputs. Here the score is the percent of data passed through the reduce function
  • 3. ASSUMPTIONS IN HADOOP’S SCHEDULER 1. Nodes can perform work at roughly the same rate. 2. Tasks progress at a constant rate throughout time. 3. There is no cost to launching a speculative task on a node that would otherwise have an idle slot. 4. A task‟s progress score is roughly equal to the fraction of its total work that it has done. Specifically, in a reduce task, the copy, reduce and merge phases each take 1/3 of the total time. 5. Tasks tend to finish in waves, so a task with a low progress score is likely a slow task. 6. Different tasks of the same category (map or reduce) require roughly the same amount of work.
  • 4. RDD an RDD is a read-only, partitioned collection of records. RDDs can only be created through deterministic operations on either (1) data in stable storage or (2) other RDDs. We call these operations transformations to differentiate them from other operations on RDDs. Examples of transformations include map, filter, and join. RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure.
  • 5. RDD RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure.
  • 6. APPLICATION OF RDD: “LOG MINING” val lines = spark.textFile(“hdfs://...”) val errors = lines.filter(_.startsWith(“ERROR”)) val messages = errors.map(_.split('t')(2)) messages.cache() message.filter(_.contains(“foo”)).count message.filter(_.contains(“bar”)).count
  • 7. RDD FAULT TOLERANCE RDDs track the series of transformations used to build them (their lineage) to recompute lost data E.g: messages = textFile(...).filter(_.contains(“error”)) .map(_.split(„t‟)(2)
  • 8. NAIVE FAIR SHARING ALGORITHM 1. when a heartbeat is received from node n: 2. if n has a free slot then 3. sort jobs in increasing order of number of running tasks 4. for j in jobs do 5. if j has unlaunched task t with data on n then 6. launch t on n 7. else if j has unlaunched task t then 8. launch t on n 9. end if 10. end for 11. end if
  • 9. DELAY SCHEDULING 1. Initialize j.skipcount to 0 for all jobs j. 2. when a heartbeat is received from node n: 3. if n has a free slot then 4. sort jobs in increasing order of number of running tasks 5. for j in jobs do 6. if j has unlaunched task t with data on n then 7. launch t on n 8. set j.skipcount = 0 9. else if j has unlaunched task t then 10.if j.skipcount >= D then 11.launch t on n 12.else 13.set j.skipcount = j.skipcount +1 14.end if 15.end if 16.end for 17.end if
  • 10. LATE(LONGEST APPROXIMATE TIME TO END) SCHEDULER the LATE algorithm works as follows: • If a task slot becomes available and there are less than SpeculativeCap speculative tasks running: – Ignore the request if the node‟s total progress is below SlowNodeThreshold. – Rank currently running, non-speculatively executed tasks by estimated time left. – Launch a copy of the highest-ranked task with progress rate below SlowTaskThreshold.
  • 11. LIMITATIONS :FSS Locality Problems with Fair Sharing The main aspect of MapReduce that complicates scheduling is the need to place tasks near their input data. Locality increases throughput because network bandwidth in a large cluster is much lower than the total bandwidth of the cluster‟s disks. Running on a node that contains the data(node locality) is most efficient, but when this is not possible, running on the same rack (rack locality) is faster than running off-rack. But in fair share scheduling we only consider node locality.
  • 12. LIMITATIONS : FSS Head of line scheduling: The first locality problem occurs in small jobs (jobs that have small input files and hence have a small number of data blocks to read). The problem is that whenever a job reaches the head of the sorted list in fair share algorithm (i.e. has the fewest running tasks), one of its tasks is launched on the next slot that becomes free, no matter which node this slot is on. If the head-of-line job is small, it is unlikely to have data on the node that is given to it. For example, a job with data on 10% of nodes will only achieve 10% locality.
  • 13. LIMITATIONS :FSS STICKY SLOTS The problem is that there is a tendency for a job to be assigned the same slot repeatedly. Suppose that job j‟s fractional share of the cluster is f . Then for any given block b, the probability that none of j‟s slots are on a node with a copy of b is (1− f )RL: there are R replicas of b, each replica is on a node with L slots, and the probability that a slot does not belong to j is 1− f . Therefore, j is expected to achieve at most 1−(1− f )RL locality.
  • 14. LIMITATIONS : DELAY SCHEDULING Long Task Balancing: To lower the chance that a node fills with long tasks, we can spread long tasks through out the cluster by changing the locality test in Algorithm to prevent jobs with long tasks from launching tasks on nodes that are running a higher-than- average number of long tasks. Although we do not know which jobs have long tasks in advance, we can treat new jobs as long-task jobs, and mark them as short-task jobs if their tasks finish quickly.
  • 15. LIMITATIONS :DELAY SCHEDULING Hotspots are only likely to occur if multiple jobs need to read the same data file, and that file is small enough that copies of its blocks are only present on a small fraction of nodes. In this case, no scheduling algorithm can achieve high locality without excessive queueing delays.
  • 16. LIMITATIONS:LATE SCHEDULER If some commodity hardware computers is running behind its peers this scheduler,instead of trying to finding out the reasons as to why it is behaving this way, it marks it as a straggler. The complications associated with it are tremendous as this does not observe whether it is temporary defect or a permanent crippling one we are not giving it any more tasks during the entire duration of computation
  • 17. PROPOSITION We have tried to crete a scheduler which may be able to circumvent the limitations described earlier. In this scheduler the task is enqueued into the priority queues of the nodes where the data for those tasks are avialable. Algorithm 1. Retrieve the list of local nodes from the arriving task. 2. Set n= REPLICATION.FACTOR 3. Create n instances of the task in the in n task trackers ‟ priority queue with different priority value where the data will be local for that specific task. 4. The tasks will be executed in accordance with the priority status, the task which has a certain priority other than 1 will have to skip that many number of tasks if and only if those tasks have a higher priority and have arrived at a later time with reference to that task.
  • 18. PROPOSITIONS T1-1 T4-2 T5-1 T2-1 T5-2 T1-2 T2-3 T4-1 T2-2 T1-3 T4-3 T5-3 TT1 TT2 TT3 TT4 TT5
  • 20. EXPERIMENT AND RESULTS FCFS Simulation 50 Cloudlets; 8 VMs Total time to Complete: 1165.96 ms PQRST Simulation 50 Cloudlets; 8 VMs Total time to Complete: 1120.10 ms