What is MapReduce ?

SHILPA KRISHNA
RESEARCH SCHOLAR

Input
Map Tasks
Reduce Tasks
Output
Map()
Map()
Map()
Reduce()
Reduce()

 The MapReduce is one of the main components of the Hadoop Ecosystem.
 MapReduce is designed to process a large amount of data in parallel by dividing
the work into some smaller and independent tasks.
 MapReduce programs take input as a list and convert to the output as a list also.

 The Map takes a set of keys and values as input. It may be in a structured
or unstructured form
 The Keys are the reference of input files and Values are the dataset
 The task is applied on every input value

 The Reducer takes the key-value pair which is created by the mapper as
input
 The key-value pairs are sorted by the key elements
 In the Reducer, we perform the sorting, aggregation or summation type
jobs

 The given inputs are processed
by the user-defined methods.
All different business logics
are working on the mapper
section. Mapper generates
intermediate data and Reducer
takes them as input. The data
are processed by user-defined
function in the Reducer
section. The final output is
stored in HDFS (Hadoop
Distributed File System).

 When Mapper output is collected it is partitioned which means that it will be
written to the output specified by the partitioner
 Partitioning is responsible for dividing up the intermediate key space and
assigning intermediate key-value pairs to reducers
 It assigns approximately the same number of keys to each reducer

 Combiners are an optimization in MapReduce that allow for local
aggregation before the shuffle and sort phase
 If a Combiner is used then the map key-value pairs are not immediately
written to the output. Instead they will be collected in lists, one list per
each key-value

 Let us take a real-world example to comprehend the power of
MapReduce
 Twitter receives around 500 million tweets per day which is nearly 3000
tweets per second

INPUT
TWITTER
DATA
MAPREDUCE
T
O
K
E
N
I
Z
E
F
I
L
T
E
R
C
O
U
N
T
A
G
G
R
E
G
A
T
E
C
O
U
N
T
E
R
S DATA
SOURCE
ADAPTER
HADOOP
RELATED
DATABASES

What is MapReduce ?

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a What is MapReduce ?

Semelhante a What is MapReduce ? (20)

Mais de ShilpaKrishna6

Mais de ShilpaKrishna6 (13)

Último

Último (20)

What is MapReduce ?