Hadoop-BigData

Apache Hadoop
Store and process Huge Data

Hadoop
Apache Hadoop is an open-source software framework written in Java for distributed
storage and distributed processing of very large data sets on computer clusters built
from commodity hardware

History
o Hadoop was created by Doug Cutting and Mike Cafarella in 2005
o Apache components were inspired by Google papers on their MapReduce and
Google File System
Doug Cutting Mike Cafrella

Core of Hadoop
HDFS
( Hadoop Distributed File System)
MAP REDUCE
Storage Part
Processing Part

HDFS- Hadoop Distributed File System
HDFS is a specially designed file system for storing huge data
and can be implemented through a cluster of commodity
hardware

HDFS- Services
Name Node
Secondary
Name Node
Job Tracker
Master
Data Node
Task Tracker
Slaves

Function
Big Data
Input Split-1
Input Split-2
Input Split-3
Input Split-4
Name Node
Data Node

Input splits will be stored in commodity hardware and Name Node handles meta data
Name Node
Big Data
Job Tracker
Task
Tracker,
Map
Task
Tracker,
Map
Task
Tracker,
Map
Every three
seconds, task
tracker will talk
back to job
tracker that it is
alive

File Formats
Text Input Format- Default Key value
Text Input Format Sequence File
Input Format Sequence File as Text
Input Format

Record reader will convert the data to (key , value)
Record Reader
Mapper
(Key, Value) (Byte offset, Entire Line)
Reducer
Name Node
Program LogicMappers and
reducer only
understands
(Key, Value)pair

Reducer
Reducer combines the processed
data from Data Nodes and report
to Name Node that where the
output stores
Cont…

Hadoop-BigData

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (17)

Semelhante a Hadoop-BigData

Semelhante a Hadoop-BigData (20)

Último

Último (20)

Hadoop-BigData