Big data

Handling
Big Data
Deddy Setyadi
www.elakiri.com

... - 2003
2 days in 2011
10 minutes in 2013
5 billion GB
Live stats
2016

Where those data comes from?
Activity Listening music, reading a book, searching, shopping, etc.
Our conversations in social media are now digitally recorded.Conversation
We upload and share 100s of thousands of them on social media
sites every second.
Photo and Video
We are increasingly surrounded by sensors that collect and share
data.
Sensor
We now have smart TVs that are able to collect and process data.The Internet of Things

The basic idea behind the phrase
'Big Data' is that everything we
do is increasingly leaving a digital
trace (or data), which we (and
others) can use and analyse

Big data :
means really a big data, it is
a collection of large
datasets that cannot be
processed using traditional
computing techniques.

Big Data includes huge volume, high velocity,
and extensible variety of data.
Structured
Item 2
Semi Structured Unstructured
● Database
● Census records
● Economic data
● Phone numbers
● JSON
● XML
● Word
● PDF
● Text
● Media Logs

Benefits of Big Data
https://www.youtube.com/watch?v=HqsBensINkE

Big Data Technologies
Operational Big Data
This include systems like MongoDB that
provide operational capabilities for real-
time, interactive workloads where data is
primarily captured and stored.
NoSQL Big Data systems are designed to
allow massive computations to be run
inexpensively and efficiently. This makes
operational big data workloads much
easier to manage, cheaper, and faster to
implement.
Analytical Big Data
This includes systems like Massively
Parallel Processing (MPP) database
systems and MapReduce that provide
analytical capabilities for retrospective
and complex analysis.
A system based on MapReduce can be
scaled up from single servers to
thousands of high and low end machines.

Traditional Approach
In this approach, an enterprise will have a
computer to store and process big data. Here
data will be stored in an RDBMS, process the
required data and present it to the users for
analysis purpose. tutorialspoint.com

Google’s
Solution
Google solved this problem using an
algorithm called MapReduce. This
algorithm divides the task into small
parts and assigns those parts to
many computers connected over
the network, and collects the results
to form the final result dataset.
tutorialspoint.com

Hadoop
Hadoop runs applications using the
MapReduce algorithm, where the
data is processed in parallel on
different CPU nodes. In short,
Hadoop framework is capable
enough to develop applications,
capable of running on clusters of
computers and they could perform
complete statistical analysis for a
huge amounts of data.
tutorialspoint.com

Hadoop
Architecture
tutorialspoint.com

MapReduce
Data
Map
Converts data into another set of
data. Elements are broken down
into tuples (key/value pairs).
Reduce
Shuffle stage and the Reduce
stage that produces a new set
of output, which will be stored
in the HDFS.
1 2 3

MapReduce
http://mm-tom.s3.amazonaws.com/blog/MapReduce.png

MapReduce
noviardisyamsuir.blogspot.com

HDFS
● Fault detection and recovery : HDFS
should have mechanisms for quick and
automatic fault detection and recovery.
● Huge datasets : HDFS should have
hundreds of nodes per cluster to manage
the applications having huge data sets.
● Hardware at data : A requested task can
be done efficiently.
tutorialspoint.com

References & Source
http://www.tutorialspoint.com/hadoop/
http://www.wired.com/2013/02/the-decades-that-invented-the-future-part-11-2001-2010/
http://www.slideshare.net/BernardMarr/140228-big-data-slide-share/3-The_basic_idea_behind_the
https://www.youtube.com/watch?v=HqsBensINkE
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
http://noviardisyamsuir.blogspot.co.id/2016/03/hadoop-mapreduce-adalah.html
http://www.slideshare.net/lynnlangit/hadoop-mapreduce-fundamentals-21427224/5-
What_types_of_business_problems
https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-
cluster/

Big data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (14)

Semelhante a Big data

Semelhante a Big data (20)

Último

Último (20)

Big data

Notas do Editor