Big data

P.SAMPATH BHARGAV
16311A05N1
CSE-D
AND

CONTENTS
Big data
 WHAT IS BIG DATA
 VOLUME
 VELOCITY
 VARIETY
 SOURCES OF BIG DATA
 CHALLENGES WITH BIG DATA
 TECHNOLOGIES TO MEET BIG DATA
Hadoop
 HISTORY OF HADOOP
 BEFORE HADOOP
 ARCHITECTURE
 COMPANIES WHICH USE HADOOP
 BIG DATA JOB ROLES

NAME SYMBOL VALUE
KILOBYTE KB 10^3
MEGABYTE MB 10^6
GIGABYTE GB 10^9
TERABYTE TB 10^12
PETABYTE PB 10^15
EXABYTE EB 10^18
ZETTABYTE ZB 10^21
YOTTABYTES YB 10^24

WHAT IS “BIG DATA”?
 “BIG DATA” is data that is big in
volume
velocity and
Variety
“TODAY’S BIG MAY BE TOMMOROW’S NORMAL”
3 V’S

VARIETY
 Varieties deals with a wide range of data types
Structured data - RDMS
Semi – structured data – HTML,XML
Unstructured data – audios, videos, emails, photos,
pdf, social media

SOURCES OF “BIG DATA”
 Social media
 Machine log data
 Public web
 Docs
 Business apps
 Data storage

CHALLENGES WITH “BIG DATA”
CAPTURE
STORAGE
CURATION
SEARCH
ANALYSIS
TRANSFER
VISUALIZATION
PRIVACY VIOLATIONS

WHAT KIND OF TECHNOLOGIES NEEDED TO
MEET CHALLENGES POSED BY “BIG DATA”
 Cheap and abundant storage
 Faster processors to help with quicker processing of
big data
 Affordable open-source, distributed big data
platforms, such as “hadoop”
 Cloud computing and other flexible resource
allocation arrangement

HISTORY OF HADOOP
 It was created by DOUG CUTTING and MICHEAL
CAFARELLA in 2005
 2003 – NUTCH open source search engine( lucene
,sphinx ,etc…)
 (google published some papers mentioning about
DFS and MAP REDUCE)
 After yahoo took this initiative step
 Then the creation of hadoop took place
 Hadoop 0.1.0 was relesed april 2006
 As of now hadoop 2.8 is available

BEFORE HADOOP
 Suppose you are having 100tb of data in a data
center
 And one time you want to retrieve some 2tb of data
and you wrote a code to do that let us say a 100kb of
code
 To get that out you should get that data out to your
system and do that you supposed to do…
 i.e where ever you should run that program, to that
system you should fetch that data
“COMPUTATION IS ALWAYS PROCESSOR BORN”

ARCHITECTURE
 Hadoop is a collection of several tools….
MAP REDUCE
FILE SYSTEM
(HDFS)
PROJECTS

Contd…
 HDFS – (hadoop distributed file system)
a distributed file system that stores data on
commodity machines ,providing very high aggregate
bandwidth across the cluster(storing)
 MAP REDUCE – a system for parllel processing of
large data sets(processing)

Contd…
 HDFS - name node
secondary name node
job tracker
data node
task tracker
Master node
Slave nodes

PROCESS…
 File -> name node -> division into blocks ->
replication of blocks by three times ->addressing
that each replicated blocks in the name node
 Suppose if any error occurred with the hardware
then that information is let to know to name node
and set the number of the data replicated constant
By again replicating to set the number as three
 And mentioning the address of the node to the name
node so that there is no error in processing

COMPANIES WHICH USE “HADOOP”

“BIG DATA” JOB ROLES
CHIEF DATA OFFICER
BIG DATA SCIENTIST
BIG DATA ANALYST
BIG DATA VISUALIZER
BIG DATA MANAGER
BIG DATA SOLUTIONS AECHITECT
BIG DATA ENGINEER
BIG DATA RESEARCHER
BIG DATA CONSULTANT

Big data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data

Similar to Big data (20)

Recently uploaded

Recently uploaded (20)

Big data