SlideShare uma empresa Scribd logo
1 de 21
CLOUD SCHEDULERS
P A L L A V J H A ( 1 0 - 1 - 5 - 0 2 3 )
P R A B H A K A R B A R U A ( 1 0 - 1 - 5 - 0 1 7 )
P R A B O D H H E N D ( 1 0 - 1 - 5 - 0 5 3 )
J U G A L A S S U D A N I ( 1 0 - 1 - 5 - 0 6 8 )
P R E M C H A N D R A ( 0 9 - 1 - 5 - 0 6 2 )
SCHEDULING IN HADOOP
When a node has an empty task slot, Hadoop chooses a task for it from
one of three categories. First, if any task has failed, it is given highest
priority. This is done to detect when a task fails repeatedly due to a bug
and stop the job. Second, unscheduled tasks are considered. For
maps, tasks with data local to the node are chosen first. Finally, Hadoop
looks for a task to speculate on.
To select speculative tasks, Hadoop monitors task progress using a
progress score, which is a number from 0 to 1. For a map, the score is
the fraction of input data read. For a reduce task, the execution is
divided into three phases, each of which accounts for 1/3 of the score:
1.The copy phase, when the task is copying outputs
of all maps. In this phase, the score is the percent of
maps that output has been copied from.
2.The sort phase, when map outputs are sorted by key.
Here the score is the percent of data merged.
3.The reduce phase, when a user-defined function is
applied to the map outputs. Here the score is the
percent of data passed through the reduce function
ASSUMPTIONS IN HADOOP’S SCHEDULER
1. Nodes can perform work at roughly the same rate.
2. Tasks progress at a constant rate throughout time.
3. There is no cost to launching a speculative task on a
node that would otherwise have an idle slot.
4. A task‟s progress score is roughly equal to the fraction of its
total work that it has done. Specifically,
in a reduce task, the copy, reduce and merge phases
each take 1/3 of the total time.
5. Tasks tend to finish in waves, so a task with a low
progress score is likely a slow task.
6. Different tasks of the same category (map or reduce)
require roughly the same amount of work.
RDD
an RDD is a read-only, partitioned collection of records. RDDs can only
be
created through deterministic operations on either
(1) data in stable storage or
(2) other RDDs. We call these operations transformations to
differentiate them from other operations on RDDs. Examples of
transformations include map, filter, and join.
RDDs do not need to be materialized at all times. Instead, an RDD has
enough information about how it was
derived from other datasets (its lineage) to compute its
partitions from data in stable storage. This is a powerful property: in
essence, a program cannot reference an
RDD that it cannot reconstruct after a failure.
RDD
RDDs do not need to be materialized at all
times. Instead, an RDD has enough
information about how it was derived from
other datasets (its lineage) to compute its
partitions from data in stable storage. This is
a powerful property: in essence, a program
cannot reference an
RDD that it cannot reconstruct after a
failure.
APPLICATION OF RDD: “LOG MINING”
val lines = spark.textFile(“hdfs://...”)
val errors = lines.filter(_.startsWith(“ERROR”))
val messages = errors.map(_.split('t')(2))
messages.cache()
message.filter(_.contains(“foo”)).count
message.filter(_.contains(“bar”)).count
RDD FAULT TOLERANCE
RDDs track the series of transformations used to
build them (their lineage) to recompute lost data
E.g:
messages = textFile(...).filter(_.contains(“error”))
.map(_.split(„t‟)(2)
NAIVE FAIR SHARING ALGORITHM
1. when a heartbeat is received from node n:
2. if n has a free slot then
3. sort jobs in increasing order of number of running tasks
4. for j in jobs do
5. if j has unlaunched task t with data on n then
6. launch t on n
7. else if j has unlaunched task t then
8. launch t on n
9. end if
10. end for
11. end if
DELAY SCHEDULING
1. Initialize j.skipcount to 0 for all jobs j.
2. when a heartbeat is received from node n:
3. if n has a free slot then
4. sort jobs in increasing order of number of running tasks
5. for j in jobs do
6. if j has unlaunched task t with data on n then
7. launch t on n
8. set j.skipcount = 0
9. else if j has unlaunched task t then
10.if j.skipcount >= D then
11.launch t on n
12.else
13.set j.skipcount = j.skipcount +1
14.end if
15.end if
16.end for
17.end if
LATE(LONGEST APPROXIMATE TIME TO END) SCHEDULER
the LATE algorithm works as follows:
• If a task slot becomes available and there are less
than SpeculativeCap speculative tasks running:
– Ignore the request if the node‟s total progress
is below SlowNodeThreshold.
– Rank currently running, non-speculatively executed tasks by estimated time
left.
– Launch a copy of the highest-ranked task with
progress rate below SlowTaskThreshold.
LIMITATIONS :FSS
Locality Problems with Fair Sharing
The main aspect of MapReduce that complicates
scheduling is the need to place tasks near their
input data. Locality increases throughput
because network bandwidth in a large cluster is
much lower than the total bandwidth of the
cluster‟s disks. Running on a node that contains the
data(node locality) is most efficient, but when
this is not possible, running on the same rack
(rack locality) is faster than running off-rack.
But in fair share scheduling we only consider
node locality.
LIMITATIONS : FSS
Head of line scheduling:
The first locality problem occurs in small jobs (jobs that
have small input files and hence have a small number of
data blocks to read). The problem is that whenever a job
reaches the head of the sorted list in fair share algorithm (i.e.
has the fewest running tasks), one of its tasks is launched on
the next slot that becomes free, no matter which node this slot
is on. If the head-of-line job is small, it is unlikely to have
data on the node that is given to it. For example, a job with
data on 10% of nodes will only achieve 10% locality.
LIMITATIONS :FSS
STICKY SLOTS
The problem is that there is a tendency for a job to be
assigned the same slot repeatedly.
Suppose that job j‟s fractional share of the cluster is f . Then
for any given block b, the probability that none of j‟s slots are
on a node with a copy of b is (1− f )RL: there are R replicas of
b, each replica is on a node with L slots, and the probability
that a slot does not belong to j is 1− f . Therefore, j is expected
to achieve at most 1−(1− f )RL locality.
LIMITATIONS : DELAY SCHEDULING
Long Task Balancing:
To lower the chance that a node fills with long tasks, we can
spread long tasks through out the cluster by changing the
locality test in Algorithm to prevent jobs with long tasks from
launching tasks on nodes that are running a higher-than-
average number of long tasks. Although we do not know
which jobs have long tasks in advance, we can treat new jobs
as long-task jobs, and mark them as short-task jobs if their
tasks finish quickly.
LIMITATIONS :DELAY SCHEDULING
Hotspots are only likely to occur if multiple jobs
need to read the same data file, and that file is
small enough that copies of its blocks are only
present on a small fraction of nodes. In this
case, no scheduling algorithm can achieve high
locality without excessive queueing delays.
LIMITATIONS:LATE SCHEDULER
If some commodity hardware computers is running
behind its peers this scheduler,instead of trying to
finding out the reasons as to why it is behaving this
way, it marks it as a straggler. The complications
associated with it are tremendous as this does not
observe whether it is temporary defect or a
permanent crippling one we are not giving it any
more tasks during the entire duration of
computation
PROPOSITION
We have tried to crete a scheduler which may be able to circumvent the
limitations described earlier. In this scheduler the task is enqueued into the
priority queues of the nodes where the data for those tasks are avialable.
Algorithm
1. Retrieve the list of local nodes from the arriving task.
2. Set n= REPLICATION.FACTOR
3. Create n instances of the task in the in n task trackers ‟ priority
queue with different priority value where the data will be local
for that specific task.
4. The tasks will be executed in accordance with the priority
status, the task which has a certain priority other than 1 will
have to skip that many number of tasks if and only if those
tasks have a higher priority and have arrived at a later time
with reference to that task.
PROPOSITIONS
T1-1 T4-2 T5-1
T2-1 T5-2
T1-2 T2-3
T4-1 T2-2
T1-3 T4-3 T5-3
TT1
TT2
TT3
TT4
TT5
PROPOSITIONS
T4-2 T5-1
T5-2
T2-3
T2-2
T4-3 T5-3
TT1
TT2
TT3
TT4
TT5
EXPERIMENT AND RESULTS
FCFS Simulation
50 Cloudlets; 8 VMs
Total time to Complete: 1165.96 ms
PQRST Simulation
50 Cloudlets; 8 VMs
Total time to Complete: 1120.10 ms
THANK YOU

Mais conteúdo relacionado

Mais procurados

Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Rajesh Ananda Kumar
 
Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programmingSreerag Gopinath
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
 
Memory allocation
Memory allocationMemory allocation
Memory allocationsanya6900
 

Mais procurados (20)

Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programming
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Compiler design
Compiler designCompiler design
Compiler design
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Heap Management
Heap ManagementHeap Management
Heap Management
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Memory allocation
Memory allocationMemory allocation
Memory allocation
 

Semelhante a Cloud schedulers and Scheduling in Hadoop

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptxShimoFcis
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model examIndhujeni
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache SparkGao Yunzhong
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 

Semelhante a Cloud schedulers and Scheduling in Hadoop (20)

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptx
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Map reduce
Map reduceMap reduce
Map reduce
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache Spark
 
compiler design
compiler designcompiler design
compiler design
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 

Último

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 

Último (20)

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 

Cloud schedulers and Scheduling in Hadoop

  • 1. CLOUD SCHEDULERS P A L L A V J H A ( 1 0 - 1 - 5 - 0 2 3 ) P R A B H A K A R B A R U A ( 1 0 - 1 - 5 - 0 1 7 ) P R A B O D H H E N D ( 1 0 - 1 - 5 - 0 5 3 ) J U G A L A S S U D A N I ( 1 0 - 1 - 5 - 0 6 8 ) P R E M C H A N D R A ( 0 9 - 1 - 5 - 0 6 2 )
  • 2. SCHEDULING IN HADOOP When a node has an empty task slot, Hadoop chooses a task for it from one of three categories. First, if any task has failed, it is given highest priority. This is done to detect when a task fails repeatedly due to a bug and stop the job. Second, unscheduled tasks are considered. For maps, tasks with data local to the node are chosen first. Finally, Hadoop looks for a task to speculate on. To select speculative tasks, Hadoop monitors task progress using a progress score, which is a number from 0 to 1. For a map, the score is the fraction of input data read. For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score: 1.The copy phase, when the task is copying outputs of all maps. In this phase, the score is the percent of maps that output has been copied from. 2.The sort phase, when map outputs are sorted by key. Here the score is the percent of data merged. 3.The reduce phase, when a user-defined function is applied to the map outputs. Here the score is the percent of data passed through the reduce function
  • 3. ASSUMPTIONS IN HADOOP’S SCHEDULER 1. Nodes can perform work at roughly the same rate. 2. Tasks progress at a constant rate throughout time. 3. There is no cost to launching a speculative task on a node that would otherwise have an idle slot. 4. A task‟s progress score is roughly equal to the fraction of its total work that it has done. Specifically, in a reduce task, the copy, reduce and merge phases each take 1/3 of the total time. 5. Tasks tend to finish in waves, so a task with a low progress score is likely a slow task. 6. Different tasks of the same category (map or reduce) require roughly the same amount of work.
  • 4. RDD an RDD is a read-only, partitioned collection of records. RDDs can only be created through deterministic operations on either (1) data in stable storage or (2) other RDDs. We call these operations transformations to differentiate them from other operations on RDDs. Examples of transformations include map, filter, and join. RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure.
  • 5. RDD RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure.
  • 6. APPLICATION OF RDD: “LOG MINING” val lines = spark.textFile(“hdfs://...”) val errors = lines.filter(_.startsWith(“ERROR”)) val messages = errors.map(_.split('t')(2)) messages.cache() message.filter(_.contains(“foo”)).count message.filter(_.contains(“bar”)).count
  • 7. RDD FAULT TOLERANCE RDDs track the series of transformations used to build them (their lineage) to recompute lost data E.g: messages = textFile(...).filter(_.contains(“error”)) .map(_.split(„t‟)(2)
  • 8. NAIVE FAIR SHARING ALGORITHM 1. when a heartbeat is received from node n: 2. if n has a free slot then 3. sort jobs in increasing order of number of running tasks 4. for j in jobs do 5. if j has unlaunched task t with data on n then 6. launch t on n 7. else if j has unlaunched task t then 8. launch t on n 9. end if 10. end for 11. end if
  • 9. DELAY SCHEDULING 1. Initialize j.skipcount to 0 for all jobs j. 2. when a heartbeat is received from node n: 3. if n has a free slot then 4. sort jobs in increasing order of number of running tasks 5. for j in jobs do 6. if j has unlaunched task t with data on n then 7. launch t on n 8. set j.skipcount = 0 9. else if j has unlaunched task t then 10.if j.skipcount >= D then 11.launch t on n 12.else 13.set j.skipcount = j.skipcount +1 14.end if 15.end if 16.end for 17.end if
  • 10. LATE(LONGEST APPROXIMATE TIME TO END) SCHEDULER the LATE algorithm works as follows: • If a task slot becomes available and there are less than SpeculativeCap speculative tasks running: – Ignore the request if the node‟s total progress is below SlowNodeThreshold. – Rank currently running, non-speculatively executed tasks by estimated time left. – Launch a copy of the highest-ranked task with progress rate below SlowTaskThreshold.
  • 11. LIMITATIONS :FSS Locality Problems with Fair Sharing The main aspect of MapReduce that complicates scheduling is the need to place tasks near their input data. Locality increases throughput because network bandwidth in a large cluster is much lower than the total bandwidth of the cluster‟s disks. Running on a node that contains the data(node locality) is most efficient, but when this is not possible, running on the same rack (rack locality) is faster than running off-rack. But in fair share scheduling we only consider node locality.
  • 12. LIMITATIONS : FSS Head of line scheduling: The first locality problem occurs in small jobs (jobs that have small input files and hence have a small number of data blocks to read). The problem is that whenever a job reaches the head of the sorted list in fair share algorithm (i.e. has the fewest running tasks), one of its tasks is launched on the next slot that becomes free, no matter which node this slot is on. If the head-of-line job is small, it is unlikely to have data on the node that is given to it. For example, a job with data on 10% of nodes will only achieve 10% locality.
  • 13. LIMITATIONS :FSS STICKY SLOTS The problem is that there is a tendency for a job to be assigned the same slot repeatedly. Suppose that job j‟s fractional share of the cluster is f . Then for any given block b, the probability that none of j‟s slots are on a node with a copy of b is (1− f )RL: there are R replicas of b, each replica is on a node with L slots, and the probability that a slot does not belong to j is 1− f . Therefore, j is expected to achieve at most 1−(1− f )RL locality.
  • 14. LIMITATIONS : DELAY SCHEDULING Long Task Balancing: To lower the chance that a node fills with long tasks, we can spread long tasks through out the cluster by changing the locality test in Algorithm to prevent jobs with long tasks from launching tasks on nodes that are running a higher-than- average number of long tasks. Although we do not know which jobs have long tasks in advance, we can treat new jobs as long-task jobs, and mark them as short-task jobs if their tasks finish quickly.
  • 15. LIMITATIONS :DELAY SCHEDULING Hotspots are only likely to occur if multiple jobs need to read the same data file, and that file is small enough that copies of its blocks are only present on a small fraction of nodes. In this case, no scheduling algorithm can achieve high locality without excessive queueing delays.
  • 16. LIMITATIONS:LATE SCHEDULER If some commodity hardware computers is running behind its peers this scheduler,instead of trying to finding out the reasons as to why it is behaving this way, it marks it as a straggler. The complications associated with it are tremendous as this does not observe whether it is temporary defect or a permanent crippling one we are not giving it any more tasks during the entire duration of computation
  • 17. PROPOSITION We have tried to crete a scheduler which may be able to circumvent the limitations described earlier. In this scheduler the task is enqueued into the priority queues of the nodes where the data for those tasks are avialable. Algorithm 1. Retrieve the list of local nodes from the arriving task. 2. Set n= REPLICATION.FACTOR 3. Create n instances of the task in the in n task trackers ‟ priority queue with different priority value where the data will be local for that specific task. 4. The tasks will be executed in accordance with the priority status, the task which has a certain priority other than 1 will have to skip that many number of tasks if and only if those tasks have a higher priority and have arrived at a later time with reference to that task.
  • 18. PROPOSITIONS T1-1 T4-2 T5-1 T2-1 T5-2 T1-2 T2-3 T4-1 T2-2 T1-3 T4-3 T5-3 TT1 TT2 TT3 TT4 TT5
  • 20. EXPERIMENT AND RESULTS FCFS Simulation 50 Cloudlets; 8 VMs Total time to Complete: 1165.96 ms PQRST Simulation 50 Cloudlets; 8 VMs Total time to Complete: 1120.10 ms