Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
1. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
avkash@bigdataperspective.com
https://www.linkedin.com/in/avkashchauhan
Lets Start and
Define Big
Data
Apache Hadoop Training Series: Hadoop Introduction 1
2. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 2
3. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and
Define
Big Data
How
Hadoop
Fits in this
scenario
Apache Hadoop Training Series: Hadoop Introduction 3
4. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
http://www.packtpub.com/using-cloudera-impala/book
http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx
https://www.linkedin.com/in/avkashchauhan
Apache Hadoop Training Series: Hadoop Introduction 4
5. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Hadoop is an Open Source (Java based), “Scalable”, “fault
tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
Apache Hadoop Training Series: Hadoop Introduction 5
6. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Flexibility
A Single Repo for
storing and analyzing
any kind of data not
bounded by schema
Scalability
Scale-out architecture
divides workload across
multiple nodes using flexible
distributed file system
Low Cost
Deployed on
commodity
hardware & open
source platform
Fault Tolerant
Continue working
event if node(s) go
down
A system to move computation, where the data is.
Apache Hadoop Training Series: Hadoop Introduction 6
7. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
Hadoop
Landscape
How
Hadoop
Fits in this
scenario
Apache Hadoop Training Series: Hadoop Introduction 7
8. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 8
9. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 9
10. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 10
11. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Core
Components
Hadoop
Landscape
Data
Storage
Data
Processing
Apache Hadoop Training Series: Hadoop Introduction 11
12. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
HDFS
MapReduce
/YARN
Hadoop Common
Apache Hadoop Training Series: Hadoop Introduction 12
13. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Cloud
Cloudera Impala Hortonworks Tez
Impala uses C++ based in-memory
processing of HDFS data through SQL
like statements to expedite the data
processing
Use cases include user collaborative
filtering, user recommendations,
clustering and classification.
Apache Hadoop Training Series: Hadoop Introduction 13
14. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How
Hadoop Fits
in this
scenario
Hadoop
Landscape
Applying
Hadoop to
Save $$
Hadoop
Core
Components
Apache Hadoop Training Series: Hadoop Introduction 14
15. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Concept of
Data Lake
Hadoop Core
Components
Applying
Hadoop to
Save $$
Apache Hadoop Training Series: Hadoop Introduction 15
16. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 16
17. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 17
18. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How
Hadoop Fits
in this
scenario
Hadoop
Landscape
Hadoop
Core
Components
Concept of
Data Lake
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Apache Hadoop Training Series: Hadoop Introduction 18
19. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 19
20. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 20
21. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Big Data
Analytics
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Apache Hadoop Training Series: Hadoop Introduction 21
22. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
EDW
OLAP
ODS
Apache Hadoop Training Series: Hadoop Introduction 22
23. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 23
24. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Big Data
Analytics
With Hadoop
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Apache Hadoop Training Series: Hadoop Introduction 24
25. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Amazon HDInsight Directives
Data Storage S3 Azure Blobs Direct access to compute
machine to super fast data
delivery
Processing EC2
Azure Compute Dedicated Machines ready to
turn with specific version of
Hadoop runtime
Processing Libraries Java based or any
other language
supported through
Hadoop Streaming
.Net based code User uploads their code
processing binaries/ libraries
Results S3 Azure Blobs Once job is completed the
results are stored back to
specific data storage used as
source
Visualization Custom Custom 3rd party application can
connect to storage to perform
visualization
Apache Hadoop Training Series: Hadoop Introduction 25
26. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 26
27. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 27
28. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Big Data
Analytics
With Hadoop
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Apache Hadoop Training Series: Hadoop Introduction 28
29. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 29
30. Apache Hadoop Training Series: Hadoop
Introduction
10/23/14
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx
Apache Hadoop Training Series: Hadoop Introduction 30