SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Acharya Institute of Technology, Bangalore
A technical Seminar on,

A Survey of Scheduling Methods
in Hadoop MapReduce Framework
Presented by,
Mahantesh C. Angadi
M.Tech (CNE) First Year
Mahantesh.mtcn.13@acharya.ac.in
Under the Guidance of,
Prof. Amogh P. Kulkarni
AIT, Bangalore
Dept. of ISE, AIT, Bangalore
Agenda
 Motivation
 Introduction
 What is BigData…?

 What is Hadoop…?
 What is HDFS and MapReduce…?
 Challenges in MapReduce
 Literature Survey on Scheduling in MapReduce
 Survey of scheduling methods on proposed methods
 Conclusion
 References.
Dept. of ISE, AIT, Bangalore
Motivation
“Necessity” is the Mother of All the Inventions…!
 In 2000s, Google faced a serious

challenge: To organize the

world’s information.
 Google designed a new data processing infrastructure.
i. Google File System (GFS)
ii. MapReduce
 In 2004, Google published a paper describing its work to the
Community.
 Doug Cutting decided to use the technique Google described.
Dept. of ISE, AIT, Bangalore
Introduction
 With the current trend in increased use of internet in

everything, lot of data is generated and need to be analysed.
 Web search engines and social networking sites capture and
analyze every user action on their sites to improve site

design, detect spam, and find advertising opportunities.
 The processing of this can be best done using Distributed
computing and parallel processing mechanisms.
 Hadoop MapReduce is one of the most popularly used such
technique for handling the BigData. So here we discuss the
different scheduling methods.
Dept. of ISE, AIT, Bangalore
What is BigData…?
 Today we live in the data age.
 Every day, we create 2.5 quintillion bytes of data, 90% of
this data is unstructured.

 90% of the data in the world today has been created in the
last two years alone .
 By the end of 2015, CISCO estimate that global Internet
traffic will reach 4.8 zettabytes a year.
 Ex. Social Networking Sites, Airlines, Healthcare
Departments, Satellites,

Dept. of ISE, AIT, Bangalore
How is the BigData Generates…?

Dept. of ISE, AIT, Bangalore
What is Apache Hadoop…?
 Apache Hadoop is an open-source software
framework.
 A platform to manage Big Data.
 Its not only a tool, It’s a Framework of Tools.
 Most Important Hadoop subprojects:
i. HDFS: Hadoop Distributed File System
ii. MapReduce: A Programming Model

Dept. of ISE, AIT, Bangalore
Architecture of Hadoop

Dept. of ISE, AIT, Bangalore
Why only Hadoop…?
 It is Schema-less, but RDBMS is Schema-based.
 Handles large volumes of unstructured data easily.
 Hadoop is designed to run on cheap commodity
hardware.
 Automatically handles data replication and node
failure.
 Moving Computation is cheaper than moving Data.

 Last but not the least – Its Free…! (Open source)
Dept. of ISE, AIT, Bangalore
What is Hadoop HDFS…?
 Inspired by Google File System.
 It’s a Scalable, distributed, reliable file system

written in Java for Hadoop framework.
 An HFDS cluster primarily consists of:

i. NameNode
ii. DataNode
 Stores very large files in blocks across machines in
a large Cluster, deployed on low-cost hardware.
Dept. of ISE, AIT, Bangalore
What is MapReduce…?
 A software framework for distributed processing of
large data sets on computer clusters.
 First developed by Google.
 Intended to facilitate and simplify the processing of
vast amounts of data in parallel on large clusters of
commodity hardware in a reliable, fault-tolerant
manner.

 It includes JobTracker and TaskTracker.
Dept. of ISE, AIT, Bangalore
Typical Hadoop cluster integrates
MapReduce and HFDS

Dept. of ISE, AIT, Bangalore
Example: WordCount

Dept. of ISE, AIT, Bangalore
Challenges of MapReduce
 Job Scheduling problems
As the number and variety of jobs to be executed across
heterogeneous clusters are increasing, so is the complexity of

scheduling them efficiently to meet required objectives of
performance.
 Energy Efficiency Problems
The size of the clusters is usually in hundreds and
thousands, thus there is a need to look at energy efficiency of
MapReduce clusters.
Dept. of ISE, AIT, Bangalore
Literature Survey
Hadoop MapReduce Scheduling methods can be categorized
based on their runtime behavior as follows.
 Adaptive (Dynamic) Algorithms
These

methods

uses

the

previous,

current

and/or

future values of parameters to make scheduling decisions.
Ex. Fair, Capacity, Throughput scheduler etc.
 Non- adaptive (Static) Algorithms
These methods does not take into consideration the
changes taking place in environment and schedules job/tasks as

per a predefine policy/order.
EX. FIFO (First In First Out).
Dept. of ISE, AIT, Bangalore
Survey of Scheduling Methods on
Proposed Papers

Dept. of ISE, AIT, Bangalore
[1]. Survey of Task Scheduling Methods for
MapReduce Framework in Hadoop.
 This paper discusses about the survey of various earlier
scheduling methods which have been proposed.
 These scheduling methods include






First In First Out scheduler,
Fair Scheduler,
Capacity Scheduler,
LATE scheduler,
Deadline constraint scheduler,
Etc.,

Dept. of ISE, AIT, Bangalore
[1]. Conclusion and future scope

 By achieving data locality in the MapReduce framework
performance can be improved.
 Finally they concluded with how we can consider the
scheduling methods in Hadoop heterogeneous clusters.

Dept. of ISE, AIT, Bangalore
[2]. Perform Wordcount MapReduce Job in Single Node
Apache Hadoop Cluster & Compress Data Using LZO
Algorithm.
 Applications like Yahoo, Facebook, and Twitter have huge
data which has to be stored and retrieved as per client
access.

 This huge data storage requires huge database leading to
increase in physical storage and becomes complex for
analysis required in business growth.
 Lempel-Ziv-Oberhumer (LZO) algorithm, is used to
compress the redundant data.
 LZO algorithm is developed by considering the “Speed as
the Priority”.
Dept. of ISE, AIT, Bangalore
[2]. Conclusion and future scope

 LZO algorithm compress the file 5 times faster than the
gzip format.
 Decompression ratio of LZO algorithm is 2 times the faster
than gzip format.
 Size of the LZO file is slightly larger than the gzip file after
the compression.
 Compressed file using LZO or gzip format is very much
smaller than the original file.

 In future we can implement this in heterogeneous
multinode clusters.
Dept. of ISE, AIT, Bangalore
[3]. S3: An Efficient Shared Scan Scheduler on MapReduce
Framework.
 To improve performance, multiple jobs operating on a common
data file can be processed as a batch to share the cost of
scanning the file.
 Jobs often do not arrive at the same time.
 S3 operates like this: At the same time System may be processing a batch of sub-jobs,
 Also there are sub-jobs which are waiting in job-queue,
 As a new job arrives,
 Its sub-jobs can be aligned with waiting jobs in job-queue,
 Once the current-batch of sub-jobs completes processing Then next batch of sub-jobs is initiated for processing.

Dept. of ISE, AIT, Bangalore
[3]. Conclusion and future scope

 S3 can exploit the sharing of data scan to improve
performance.
 Unlike existing batch-based schedulers S3 allows jobs to
be processed as they arrive, and arriving job does not
need to wait for long time.

 More

computational

policies

such

as

computational

resources and job priorities can be added to S3 to make
more flexible.

Dept. of ISE, AIT, Bangalore
[4]. Two Sides of a Coin: Optimizing the Schedule of
MapReduce Jobs to Minimize their Makespan and Improve
Cluster Performance.

 This paper proposes the key- challenge to increase the
utilization of MapReduce clusters.
 Here the goal is to automate the design of a job schedule
that minimizes the completion- time or deadline of
MapReduce jobs.

 A novel abstraction framework and a heuristic called
BalancedPools are discussed.

Dept. of ISE, AIT, Bangalore
[4]. Conclusion and future scope

 They have simulated the things over a realistic workload
and

observed

that

15%-38%

completion-time

improvements.
 This shows that, the order in which jobs executed can have
significant impact on their overall completion-time and the

cluster resource utilization.
 Future step may include addressing a more general
problem of minimizing the deadline of batch workloads.

Dept. of ISE, AIT, Bangalore
[5]. ThroughputScheduler: Learning to Schedule on
Heterogeneous Hadoop Clusters.
 Presently available schedulers for Hadoop clusters assign

tasks to nodes without regard to the capability of the nodes.
 This paper proposes a method, which reduces the overall job
completion time on a cluster of heterogeneous nodes by

actively scheduling tasks on nodes based on optimally
matching job requirements to node capabilities.
 Node capabilities are learned by running probe jobs on the
cluster.
 Bayesian active learning scheme is used to learn source
requirements of jobs on-the-fly.
Dept. of ISE, AIT, Bangalore
[5]. Conclusion and future scope

 The framework learns both server capabilities and job task
parameters autonomously.
 ThroughputScheduler can reduce total job completion time
by almost 20% compared to the Hadoop Fair Scheduler
and 40% compared to FIFO Scheduler.

 ThroughputScheduler also reduces average mapping time
by 33% compared to either of these schedulers.

Dept. of ISE, AIT, Bangalore
Conclusion
Local data processing takes lesser time as compared to
moving

the

data

across

network.

So

to

improve

the

performance of jobs, most of the algorithms work to improve
the data locality. To meet the user expectations, scheduling
algorithms must use prediction methods based on the volume of

data to be processed and underlying hardware. So as a future
work we can consider developing the algorithms which can
schedule the jobs efficiently on heterogeneous clusters.

Dept. of ISE, AIT, Bangalore
References
[1]. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on
Large Clusters.”
Proc. Sixth Symp. Operating System Design and
Implementation, San Francisco, CA, Dec. 6-8, Usenix, 2004.
[2]. Lei Shi, Xiaohui Li, Kian-Lee Tan, “S3: An Efficient Shared Scan Scheduler
on MapReduce Framework.”, School of Computing National University of
Singapore, comp.nus.edu.sg, 2012.
[3]. Dr. Umesh Bellur, Nidhi Tiwari, “Scheduling and Energy Efficiency
Improvement Techniques for Hadoop MapReduce: State of Art and Directions
for Future Research.”, Department of Computer Science and Engineering
Indian Institute of Technology, Mumbai.
[4]. Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell, “Two Sides of a
Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan
and Improve Cluster Performance.”, HP Labs. Supported in part by Air Force
Research grant FA8750-11-2-0084.
[5]. Nandan Mirajkar, Sandeep Bhujbal, Aaradhana Deshmukh, “Perform
Wordcount MapReduce Job in Single Node Apache Hadoop Cluster and
Compress Data Using Lempel-Ziv-Oberhumer (LZO) Algorithm.”, Department
of Advanced Software and Computing Technologies IGNOU –I2IT, Centre of
Excellence for Advanced Education and Research Pune, India.
Dept. of ISE, AIT, Bangalore
References

continued…

[6]. Houvik B Ardhan, Daniel A. Menasce. “The Anatomy of
MapReduce Jobs, Scheduling, and Performance Challenges”,
Proceedings of the 2013 Conference of the Computer Measurement
Group, San Diego, CA, November 5-8, 2013.
[7]. Shekhar Gupta, Christian Fritz, Bob Price, Roger Hoover, and
Johan de Kleer, “ThroughputScheduler: Learning to Schedule on
Heterogeneous Hadoop Clusters”, USENIX Association, 10th
International Conference on Autonomic Computing (ICAC 2013).
[8]. Nilam Kadale, U. A. Mande, “Survey of Task Scheduling Method
for MapReduce Framework in Hadoop.”, 2nd National Conference on
Innovative Paradigms in Engineering & Technology (NCIPET 2013).
[9]. Tom Wille, “Hadoop: The Definitive Guide.” 2nd edition, O’Reilly
publications, Sebastopol, CA 95472. October 2010.
[10]. J Jeffery Hanson. “An Introduction to the Hadoop Distributed
File System.” IBM DeveloperWorks, 2011.

Dept. of ISE, AIT, Bangalore
Thank You All…!!! 

Dept. of ISE, AIT, Bangalore

Mais conteúdo relacionado

Mais procurados

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
Reservations Based Scheduling: if you’re late don’t blame us!
Reservations Based Scheduling: if you’re late don’t blame us!  Reservations Based Scheduling: if you’re late don’t blame us!
Reservations Based Scheduling: if you’re late don’t blame us! DataWorks Summit
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 

Mais procurados (19)

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Reservations Based Scheduling: if you’re late don’t blame us!
Reservations Based Scheduling: if you’re late don’t blame us!  Reservations Based Scheduling: if you’re late don’t blame us!
Reservations Based Scheduling: if you’re late don’t blame us!
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 

Destaque

MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsIAEME Publication
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Destaque (20)

MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop
HadoopHadoop
Hadoop
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Semelhante a BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Efficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveEfficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveGopi Krishnan Nambiar
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...iosrjce
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot ConfigurationsMap Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurationsdbpublications
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computingijccsa
 
Distributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisDistributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisIRJET Journal
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applicationsijcsit
 
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...ijgca
 

Semelhante a BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce (20)

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Efficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveEfficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and Hive
 
L017656475
L017656475L017656475
L017656475
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot ConfigurationsMap Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computing
 
Distributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisDistributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data Analysis
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applications
 
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...
 

Último

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 

Último (20)

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce

  • 1. Acharya Institute of Technology, Bangalore A technical Seminar on, A Survey of Scheduling Methods in Hadoop MapReduce Framework Presented by, Mahantesh C. Angadi M.Tech (CNE) First Year Mahantesh.mtcn.13@acharya.ac.in Under the Guidance of, Prof. Amogh P. Kulkarni AIT, Bangalore Dept. of ISE, AIT, Bangalore
  • 2. Agenda  Motivation  Introduction  What is BigData…?  What is Hadoop…?  What is HDFS and MapReduce…?  Challenges in MapReduce  Literature Survey on Scheduling in MapReduce  Survey of scheduling methods on proposed methods  Conclusion  References. Dept. of ISE, AIT, Bangalore
  • 3. Motivation “Necessity” is the Mother of All the Inventions…!  In 2000s, Google faced a serious challenge: To organize the world’s information.  Google designed a new data processing infrastructure. i. Google File System (GFS) ii. MapReduce  In 2004, Google published a paper describing its work to the Community.  Doug Cutting decided to use the technique Google described. Dept. of ISE, AIT, Bangalore
  • 4. Introduction  With the current trend in increased use of internet in everything, lot of data is generated and need to be analysed.  Web search engines and social networking sites capture and analyze every user action on their sites to improve site design, detect spam, and find advertising opportunities.  The processing of this can be best done using Distributed computing and parallel processing mechanisms.  Hadoop MapReduce is one of the most popularly used such technique for handling the BigData. So here we discuss the different scheduling methods. Dept. of ISE, AIT, Bangalore
  • 5. What is BigData…?  Today we live in the data age.  Every day, we create 2.5 quintillion bytes of data, 90% of this data is unstructured.  90% of the data in the world today has been created in the last two years alone .  By the end of 2015, CISCO estimate that global Internet traffic will reach 4.8 zettabytes a year.  Ex. Social Networking Sites, Airlines, Healthcare Departments, Satellites, Dept. of ISE, AIT, Bangalore
  • 6. How is the BigData Generates…? Dept. of ISE, AIT, Bangalore
  • 7. What is Apache Hadoop…?  Apache Hadoop is an open-source software framework.  A platform to manage Big Data.  Its not only a tool, It’s a Framework of Tools.  Most Important Hadoop subprojects: i. HDFS: Hadoop Distributed File System ii. MapReduce: A Programming Model Dept. of ISE, AIT, Bangalore
  • 8. Architecture of Hadoop Dept. of ISE, AIT, Bangalore
  • 9. Why only Hadoop…?  It is Schema-less, but RDBMS is Schema-based.  Handles large volumes of unstructured data easily.  Hadoop is designed to run on cheap commodity hardware.  Automatically handles data replication and node failure.  Moving Computation is cheaper than moving Data.  Last but not the least – Its Free…! (Open source) Dept. of ISE, AIT, Bangalore
  • 10. What is Hadoop HDFS…?  Inspired by Google File System.  It’s a Scalable, distributed, reliable file system written in Java for Hadoop framework.  An HFDS cluster primarily consists of: i. NameNode ii. DataNode  Stores very large files in blocks across machines in a large Cluster, deployed on low-cost hardware. Dept. of ISE, AIT, Bangalore
  • 11. What is MapReduce…?  A software framework for distributed processing of large data sets on computer clusters.  First developed by Google.  Intended to facilitate and simplify the processing of vast amounts of data in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.  It includes JobTracker and TaskTracker. Dept. of ISE, AIT, Bangalore
  • 12. Typical Hadoop cluster integrates MapReduce and HFDS Dept. of ISE, AIT, Bangalore
  • 13. Example: WordCount Dept. of ISE, AIT, Bangalore
  • 14. Challenges of MapReduce  Job Scheduling problems As the number and variety of jobs to be executed across heterogeneous clusters are increasing, so is the complexity of scheduling them efficiently to meet required objectives of performance.  Energy Efficiency Problems The size of the clusters is usually in hundreds and thousands, thus there is a need to look at energy efficiency of MapReduce clusters. Dept. of ISE, AIT, Bangalore
  • 15. Literature Survey Hadoop MapReduce Scheduling methods can be categorized based on their runtime behavior as follows.  Adaptive (Dynamic) Algorithms These methods uses the previous, current and/or future values of parameters to make scheduling decisions. Ex. Fair, Capacity, Throughput scheduler etc.  Non- adaptive (Static) Algorithms These methods does not take into consideration the changes taking place in environment and schedules job/tasks as per a predefine policy/order. EX. FIFO (First In First Out). Dept. of ISE, AIT, Bangalore
  • 16. Survey of Scheduling Methods on Proposed Papers Dept. of ISE, AIT, Bangalore
  • 17. [1]. Survey of Task Scheduling Methods for MapReduce Framework in Hadoop.  This paper discusses about the survey of various earlier scheduling methods which have been proposed.  These scheduling methods include      First In First Out scheduler, Fair Scheduler, Capacity Scheduler, LATE scheduler, Deadline constraint scheduler, Etc., Dept. of ISE, AIT, Bangalore
  • 18. [1]. Conclusion and future scope  By achieving data locality in the MapReduce framework performance can be improved.  Finally they concluded with how we can consider the scheduling methods in Hadoop heterogeneous clusters. Dept. of ISE, AIT, Bangalore
  • 19. [2]. Perform Wordcount MapReduce Job in Single Node Apache Hadoop Cluster & Compress Data Using LZO Algorithm.  Applications like Yahoo, Facebook, and Twitter have huge data which has to be stored and retrieved as per client access.  This huge data storage requires huge database leading to increase in physical storage and becomes complex for analysis required in business growth.  Lempel-Ziv-Oberhumer (LZO) algorithm, is used to compress the redundant data.  LZO algorithm is developed by considering the “Speed as the Priority”. Dept. of ISE, AIT, Bangalore
  • 20. [2]. Conclusion and future scope  LZO algorithm compress the file 5 times faster than the gzip format.  Decompression ratio of LZO algorithm is 2 times the faster than gzip format.  Size of the LZO file is slightly larger than the gzip file after the compression.  Compressed file using LZO or gzip format is very much smaller than the original file.  In future we can implement this in heterogeneous multinode clusters. Dept. of ISE, AIT, Bangalore
  • 21. [3]. S3: An Efficient Shared Scan Scheduler on MapReduce Framework.  To improve performance, multiple jobs operating on a common data file can be processed as a batch to share the cost of scanning the file.  Jobs often do not arrive at the same time.  S3 operates like this: At the same time System may be processing a batch of sub-jobs,  Also there are sub-jobs which are waiting in job-queue,  As a new job arrives,  Its sub-jobs can be aligned with waiting jobs in job-queue,  Once the current-batch of sub-jobs completes processing Then next batch of sub-jobs is initiated for processing. Dept. of ISE, AIT, Bangalore
  • 22. [3]. Conclusion and future scope  S3 can exploit the sharing of data scan to improve performance.  Unlike existing batch-based schedulers S3 allows jobs to be processed as they arrive, and arriving job does not need to wait for long time.  More computational policies such as computational resources and job priorities can be added to S3 to make more flexible. Dept. of ISE, AIT, Bangalore
  • 23. [4]. Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize their Makespan and Improve Cluster Performance.  This paper proposes the key- challenge to increase the utilization of MapReduce clusters.  Here the goal is to automate the design of a job schedule that minimizes the completion- time or deadline of MapReduce jobs.  A novel abstraction framework and a heuristic called BalancedPools are discussed. Dept. of ISE, AIT, Bangalore
  • 24. [4]. Conclusion and future scope  They have simulated the things over a realistic workload and observed that 15%-38% completion-time improvements.  This shows that, the order in which jobs executed can have significant impact on their overall completion-time and the cluster resource utilization.  Future step may include addressing a more general problem of minimizing the deadline of batch workloads. Dept. of ISE, AIT, Bangalore
  • 25. [5]. ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters.  Presently available schedulers for Hadoop clusters assign tasks to nodes without regard to the capability of the nodes.  This paper proposes a method, which reduces the overall job completion time on a cluster of heterogeneous nodes by actively scheduling tasks on nodes based on optimally matching job requirements to node capabilities.  Node capabilities are learned by running probe jobs on the cluster.  Bayesian active learning scheme is used to learn source requirements of jobs on-the-fly. Dept. of ISE, AIT, Bangalore
  • 26. [5]. Conclusion and future scope  The framework learns both server capabilities and job task parameters autonomously.  ThroughputScheduler can reduce total job completion time by almost 20% compared to the Hadoop Fair Scheduler and 40% compared to FIFO Scheduler.  ThroughputScheduler also reduces average mapping time by 33% compared to either of these schedulers. Dept. of ISE, AIT, Bangalore
  • 27. Conclusion Local data processing takes lesser time as compared to moving the data across network. So to improve the performance of jobs, most of the algorithms work to improve the data locality. To meet the user expectations, scheduling algorithms must use prediction methods based on the volume of data to be processed and underlying hardware. So as a future work we can consider developing the algorithms which can schedule the jobs efficiently on heterogeneous clusters. Dept. of ISE, AIT, Bangalore
  • 28. References [1]. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters.” Proc. Sixth Symp. Operating System Design and Implementation, San Francisco, CA, Dec. 6-8, Usenix, 2004. [2]. Lei Shi, Xiaohui Li, Kian-Lee Tan, “S3: An Efficient Shared Scan Scheduler on MapReduce Framework.”, School of Computing National University of Singapore, comp.nus.edu.sg, 2012. [3]. Dr. Umesh Bellur, Nidhi Tiwari, “Scheduling and Energy Efficiency Improvement Techniques for Hadoop MapReduce: State of Art and Directions for Future Research.”, Department of Computer Science and Engineering Indian Institute of Technology, Mumbai. [4]. Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell, “Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance.”, HP Labs. Supported in part by Air Force Research grant FA8750-11-2-0084. [5]. Nandan Mirajkar, Sandeep Bhujbal, Aaradhana Deshmukh, “Perform Wordcount MapReduce Job in Single Node Apache Hadoop Cluster and Compress Data Using Lempel-Ziv-Oberhumer (LZO) Algorithm.”, Department of Advanced Software and Computing Technologies IGNOU –I2IT, Centre of Excellence for Advanced Education and Research Pune, India. Dept. of ISE, AIT, Bangalore
  • 29. References continued… [6]. Houvik B Ardhan, Daniel A. Menasce. “The Anatomy of MapReduce Jobs, Scheduling, and Performance Challenges”, Proceedings of the 2013 Conference of the Computer Measurement Group, San Diego, CA, November 5-8, 2013. [7]. Shekhar Gupta, Christian Fritz, Bob Price, Roger Hoover, and Johan de Kleer, “ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters”, USENIX Association, 10th International Conference on Autonomic Computing (ICAC 2013). [8]. Nilam Kadale, U. A. Mande, “Survey of Task Scheduling Method for MapReduce Framework in Hadoop.”, 2nd National Conference on Innovative Paradigms in Engineering & Technology (NCIPET 2013). [9]. Tom Wille, “Hadoop: The Definitive Guide.” 2nd edition, O’Reilly publications, Sebastopol, CA 95472. October 2010. [10]. J Jeffery Hanson. “An Introduction to the Hadoop Distributed File System.” IBM DeveloperWorks, 2011. Dept. of ISE, AIT, Bangalore
  • 30. Thank You All…!!!  Dept. of ISE, AIT, Bangalore