Map Reduce basics

Abhishek Mukherjee
Utkarsh Srivastava
13th,September
Not everything that can be counted counts, and not
everything that counts can be counted.
WELCOME TO BIG DATA
TRANING

What are we going to cover today?
 Uses of Big Data
 What is Hadoop?
 Short intro to the HDFS architecture.
 What is Map Reduce?
 The components of Map Reduce Algorithm
 Hello world of map reduce i.e. Word Count Algorithm
 Tips and Tricks of Map Reduce
 Distribution of twitter data to test Map Reduce jars

 Big data is an evolving term that describes any voluminous
amount of structured, semi-structured and
unstructured data that has the potential to be mined for
information.
 Lots of Data(Zetabytes or Terabytes or Petabytes)
 Systems / Enterprises generate huge amount of data from
Terabytes to and even Petabytes of information.
 A airline jet collects 10 terabytes of sensor data for every
30 minutes of flying time.
What is Big Data?

 Map Phase
 Combiner Phase(Optional)
 Sort Phase
 Shuffle Phase
 Partition Phase(Optional)
 Reducer Phase
Key points
Map Reduce Algorithm

 Hello my name is abhishek Hello my name is utsav
 Hello my passion is cricket
Imagine this as the input file:
Map Phase
This file has 2 lines. Each line in the file has a byte offset of
its own which serves as a key to the mapper and the
value of the mapper is the data which is present In the
line.

Operation on output of map phase
Hello 1
my 1
name 1
is 1
abhishek 1
Hello 1
my 1
name 1
is 1
utsav 1
Hello 1
my 1
passion 1
is 1
cricket 1
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)

 The key points are as follows:
 Sort the key value pairs according to the key values
 Shuffle the mapped output to get values with same key to
create a tuple of values with same key
 This output is fed to the reducer which in turn maps the
values of the tuple by returning a single value for a list of
values present in the tuple
Explaination of sort and shuffle phase

Reducer phase
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)
abhishek(1)
cricket(1)
Hello(3)
is(3)
my(3)
name(3)
passion(1)
utsav(1)
Key(single value)

Map Reduce basics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Destaque

Destaque (18)

Semelhante a Map Reduce basics

Semelhante a Map Reduce basics (20)

Último

Último (20)

Map Reduce basics