SlideShare uma empresa Scribd logo
1 de 11
SHILPA KRISHNA
RESEARCH SCHOLAR
Input
Map Tasks
Reduce Tasks
Output
Map()
Map()
Map()
Reduce()
Reduce()
 The MapReduce is one of the main components of the Hadoop Ecosystem.
 MapReduce is designed to process a large amount of data in parallel by dividing
the work into some smaller and independent tasks.
 MapReduce programs take input as a list and convert to the output as a list also.
 The Map takes a set of keys and values as input. It may be in a structured
or unstructured form
 The Keys are the reference of input files and Values are the dataset
 The task is applied on every input value
 The Reducer takes the key-value pair which is created by the mapper as
input
 The key-value pairs are sorted by the key elements
 In the Reducer, we perform the sorting, aggregation or summation type
jobs
 The given inputs are processed
by the user-defined methods.
All different business logics
are working on the mapper
section. Mapper generates
intermediate data and Reducer
takes them as input. The data
are processed by user-defined
function in the Reducer
section. The final output is
stored in HDFS (Hadoop
Distributed File System).
 When Mapper output is collected it is partitioned which means that it will be
written to the output specified by the partitioner
 Partitioning is responsible for dividing up the intermediate key space and
assigning intermediate key-value pairs to reducers
 It assigns approximately the same number of keys to each reducer
 Combiners are an optimization in MapReduce that allow for local
aggregation before the shuffle and sort phase
 If a Combiner is used then the map key-value pairs are not immediately
written to the output. Instead they will be collected in lists, one list per
each key-value
 Let us take a real-world example to comprehend the power of
MapReduce
 Twitter receives around 500 million tweets per day which is nearly 3000
tweets per second
INPUT
TWITTER
DATA
MAPREDUCE
T
O
K
E
N
I
Z
E
F
I
L
T
E
R
C
O
U
N
T
A
G
G
R
E
G
A
T
E
C
O
U
N
T
E
R
S DATA
SOURCE
ADAPTER
HADOOP
RELATED
DATABASES
What is MapReduce ?

Mais conteúdo relacionado

Mais procurados

Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
mathieuraj
 

Mais procurados (18)

Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce Application
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using Hadoop
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical Packages
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
 
Big Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce ParadigmsBig Data and Hadoop with MapReduce Paradigms
Big Data and Hadoop with MapReduce Paradigms
 
Map algebra
Map algebraMap algebra
Map algebra
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
WELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANINGWELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANING
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Modeling the water food-energy nexus in the Arab world: The NASA land informa...
Modeling the water food-energy nexus in the Arab world: The NASA land informa...Modeling the water food-energy nexus in the Arab world: The NASA land informa...
Modeling the water food-energy nexus in the Arab world: The NASA land informa...
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 

Semelhante a What is MapReduce ?

Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
BRNSSPublicationHubI
 

Semelhante a What is MapReduce ? (20)

MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Unit 2
Unit 2Unit 2
Unit 2
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 

Mais de ShilpaKrishna6

Mais de ShilpaKrishna6 (13)

WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Big data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business AnalyticsBig data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business Analytics
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
 
Internet of Things(IoT) Applications
Internet of Things(IoT) ApplicationsInternet of Things(IoT) Applications
Internet of Things(IoT) Applications
 
4 pillers of iot
4 pillers of iot4 pillers of iot
4 pillers of iot
 
Iot enabled technologies
Iot enabled technologiesIot enabled technologies
Iot enabled technologies
 
Iot logical design
Iot logical designIot logical design
Iot logical design
 
Physical design of io t
Physical design of io tPhysical design of io t
Physical design of io t
 
Introduction to iot(internet of things)
Introduction to iot(internet of things)Introduction to iot(internet of things)
Introduction to iot(internet of things)
 
Number system and its conversions
Number system and its conversionsNumber system and its conversions
Number system and its conversions
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

What is MapReduce ?

  • 3.  The MapReduce is one of the main components of the Hadoop Ecosystem.  MapReduce is designed to process a large amount of data in parallel by dividing the work into some smaller and independent tasks.  MapReduce programs take input as a list and convert to the output as a list also.
  • 4.  The Map takes a set of keys and values as input. It may be in a structured or unstructured form  The Keys are the reference of input files and Values are the dataset  The task is applied on every input value
  • 5.  The Reducer takes the key-value pair which is created by the mapper as input  The key-value pairs are sorted by the key elements  In the Reducer, we perform the sorting, aggregation or summation type jobs
  • 6.  The given inputs are processed by the user-defined methods. All different business logics are working on the mapper section. Mapper generates intermediate data and Reducer takes them as input. The data are processed by user-defined function in the Reducer section. The final output is stored in HDFS (Hadoop Distributed File System).
  • 7.  When Mapper output is collected it is partitioned which means that it will be written to the output specified by the partitioner  Partitioning is responsible for dividing up the intermediate key space and assigning intermediate key-value pairs to reducers  It assigns approximately the same number of keys to each reducer
  • 8.  Combiners are an optimization in MapReduce that allow for local aggregation before the shuffle and sort phase  If a Combiner is used then the map key-value pairs are not immediately written to the output. Instead they will be collected in lists, one list per each key-value
  • 9.  Let us take a real-world example to comprehend the power of MapReduce  Twitter receives around 500 million tweets per day which is nearly 3000 tweets per second