Introduction to Cloud computing and Big Data-Hadoop

Introduction: Cloud Computing and
Big Data - Hadoop
Presented By:
Nagarjuna D.N
SAP CTL
AT&T, Bengaluru
Date: 14-07-2015

Overview
• Cloud Computing Evolution
• Why Cloud Computing needed?
• Cloud Computing Models
• Cloud Solutions
• Cloud Jobs opportunities
• Criteria for Big Data
• Big Data challenges
• Technologies to process Big Data- Hadoop
• Hadoop History and Architecture
• Hadoop Eco-System
• Hadoop Real-time Use cases
• Hadoop Job opportunities
• Hadoop and SAP HANA integration
• Summary
2

Internet of Things (IoT)
Big Data “One of the Reason is Cloud Computing….!”
3

Cloud Computing
(Evolution of an internet and its hidden from the end user)
• Infrastructure is maintained somewhere with shared computing
resources -servers and storage, network, all delivered over the Internet.
• The Cloud delivers a hosting environment that is-
-immediate,
-flexible,
-scalable,
-secure,
-available,
-saves corporations money, time and resources.
Flexible
Scalable
Secure

Cloud Computing (Cont….)
• In addition, the platform provides on demand services, i.e
always on, anywhere, anytime and any place.
• “Pay-for-what-you-use”- metered basis.
• Its based on utility computing and Virtualization.
5

Traditional Infrastructure Model
Forecasted
Infrastructure
Demand
Time
Capital
7

Acceptable Surplus
Forecasted
Infrastructure
Demand
Surplus
Time
Capital
8

Actual Infrastructure Model
Actual
Infrastructure
Demand
Time
Capital
9

Unacceptable Surplus
Surplus
Time
Capital
10

Unacceptable Deficit
Deficit
Time
Capital
11

Utility Infrastructure Model
(Concept of Cloud Computing)
Actual
Infrastructure
Demand
Time
Capital
12

Cloud Flavors (Service Models)
• IaaS – Infrastructure as a Service
• PaaS – Platform as a Service
• SaaS – Software as a Service
13

Cloud Deployment Models
• Public Cloud
• Private Cloud
• Hybrid Cloud
• Community Cloud
17

Cloud Distribution Examined
18

Enterprise Cloud Solutions
1. Test / Development / QA Platform
o Use cloud infrastructure servers as test and development
platform
2. Disaster Recovery
o Keep images of servers on cloud infrastructure ready to
go in case of a disaster
3. Cloud File Storage
o Backup or Archive company data to cloud file storage
4. Load Balancing
o Use cloud infrastructure for overflow management during
peak usage times
19

Enterprise Cloud Solutions (cont)
5. Overhead Control
o Lower overhead costs and make bids more competitive
6. Distributed Network Control and Cost Reporting
o Create an individual private networks (VPC) for each of
subsidiaries or contracts
7. Rapid Deployment
o Turn up servers immediately to fulfill project timelines
8. Functional IT Labor Shift
o Refocus IT labor expense on revenue producing activities
20

Preparing for the Future Cloud IT
Jobs
Sampling of IT skills likely to be in demand in the future
o Functional application development and support
 I.e. Oracle, SAP, SQL, linking hardware to software
o Leveraging data to make strategic business decisions
 I.e. Business Intelligence : Applying sales forecasts to inventory and
manufacturing decisions
o Mobile apps
 Android, iPhone, Windows Mobile
o Wi-Fi engineers
 USF to include broadband communications (LTE replaces GSM/CDMA)
o Optical engineers
 Optical offers the highest bandwidth today (PON, CWDM, DWDM)
o Virtualization Specialists
 Economies of scale require virtualization (server, storage, client…)
o IP Engineers
o Network Security Specialists
o Web developers
o Social Media developers
o Business Intelligence application development and support
21

“Big Data- Big Thing”
• Big Data is exactly like Rubik’s cube.
• Just like a Rubik’s cube Big Data has many different solutions.
• If you take five Rubik’s cube and mix up the same way and give it to five
different expert’s.
• They will solve the Rubik’s cube in fractions of the seconds.
• But if you pay attention to the same closely, you will notice that even though
the final outcome is the same, the route taken to solve the Rubik’s cube is
not the same.
• Every expert will start at a different place(colors) and will try to resolve it
with different methods.
• It is nearly impossible to have a exact same route taken by two experts.
Begining Big Data
24

Big Data Definition in general
• Big Data is a collection of data sets that are large and complex in
nature.
• They constitute both structured and unstructured data that grow
large so fast that they are not manageable by traditional relational
database systems(Eg., RDBMS).
26

Big Data Technically
i. Volume
petta bytes or Zetta bytes.
ii. Velocity
Batch or real(stream) time processing.
iii. Variety
Structured, semi-structured &
Unstructured.
It is estimated that 80% of world’s data
are unstructured and rest of them
semi-structured and structured.
iv. Veracity
The quality of the data being captured
can vary greatly.
Fig.Big Data Based on Doug Cutting 3Vs model
27

Variety of Data
1. Structured Data:- Data i.e. identifiable because its organized in a
structure(Standard defined format)
E.g.: Database, Data Warehouses & Electronic spreadsheets.
2. Semi-Structured Data:- Data i.e. neither raw data, nor typed data in
a conventional database system
E.g.: Wiki pages, Tweets, Facebook data & Instant Messages.
3. Unstructured Data:- its doesn’t have standard defined structure
E.g.: Data files, Audio files, Video, Graphics & Multimedia.
28

Traditional Data v/s Big Data
Attributes Traditional Data Big Data
Volume Gigabytes to terabytes Petabytes to zettabytes
Organizaton Centralized Distributed
Structure Structured Semi-structured & unstructured
Data model Strict schema based Flat schema
Data relationship Complex interrelationships Almost flat with few relationships
29

Criteria of Big Data
1. 272 hours of video are uploaded to YouTube every minute and
over 3 billion hours of video are watched every month.
2. Radio Frequency ID (RFID) systems generated up to 1,000 times
more data compared to the conventional bar code systems.
3. 340 million tweets are sent every day and that amounts of 7TB of
data.
4. Social networking site, Facebook, processes over 10TB of data
every day.
5. Over 5 billion people use cell phones to call, send SMS, email,
browse Internet, and interact via social networking sites.
6. The Square Kilometre Array project of NASA receives 700 TB of
data per second.
30

Challenges with Big Data
1. Scaling is costly.
2. Strategy must be in place before you hit the limit of a single
computer.
3. Most entreprises responded to scalability needs when they started
facing problems of poor response and low throughput.
4. Adding hardware to existing system is manpower extensive and
hence error prone.
5. Mixed data type - structured and unstructured - makes scaling even
harder.
31

Exploring Big Data for business insights
32

Big Data solutions with Hadoop
34

Organizations Adopted Big Data
35

How are Organizations using Big Data
Technology?
36

Feb 14th 2011 –Watson is IBM’s super
computer built using Big Data Technology.
Its not online & its process like a human brain.
38

Tools typically used in Big Data
Scenarios
40

Technology to process Big Data- Hadoop
(Open-source software framework written in Java)
• Open-source software: It's free to download, though more and
more commercial versions of Hadoop are becoming available.
• Framework: It means that everything you need to develop and run
software applications is provided –programs, connections, etc.
• Distributed storage: The Hadoop framework breaks big data into
blocks, which are stored on clusters of commodity hardware.
• Processing power: Hadoop concurrently processes large amounts
of data using multiple low-cost computers for fast results.
• Hadoop an DFS and not Database. Its designed for information from
many forms.
• Open source project started by Doug Cutting-
employee of Yahoo. Hadoop is the name of his sons toy elephant.
• Apache software foundation- Apache Hadoop.
41

Hadoop Architecture
Hadoop core has two major components (daemons):
1. HDFS
a. NameNode
b. Secondary NameNode
c. DataNode
2. MapReduce Engine (distributed data processing framework)
a. JobTracker
b. TaskTracker
46

What components make up Hadoop?
• Hadoop Common – the libraries and utilities used by other Hadoop
modules.
• Hadoop Distributed File System (HDFS) – the Java-based
scalable system that stores data across multiple machines without
prior organization.
• MapReduce – a software programming model for processing large
sets of data in parallel.
• YARN – resource management framework for scheduling and
handling resource requests from distributed applications. (YARN is
an acronym for Yet Another Resource Negotiator.)
45

Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slaves
Master
Task
Tracker
Data
Node
Job
Tracker
Name
Node
MapReduce
HDFS
Hadoop Architecture
47

Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slaves
Master
Task
Tracker
Data
Node
Job
Tracker
Name
Node
48

Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slaves
Master
Task
Tracker
Data
Node
Job
Tracker
Name
Node
49

Node
RACK RACK RACK
RACK
Cluster
Data Center
50

Benefits of Hadoop
• Scalable– New nodes can be added without needing to change
data formats.
• Cost effective– Hadoop brings massively parallel computing to
commodity hardwares.
• Flexible– Hadoop is schema-less, and can absorb any type of data,
structured or not, from any number of sources.
• Fault tolerant– When you lose a node, the system redirects work to
another location of the data and continues processing without
missing a heartbeat.
• Programming languages- Java(default)/python.
• Last but not least – it’s free! ( Open source).
43

Hadoop is not Suitable for All Kinds of
Applications
Hadoop is not suitable to:
• perform real-time, stream-based processing where data is
processed immediately upon its arrival.
• perform online access where low latency is required.
44

Real-Time Hadoop
Use Cases
1. Risk Modeling (How can banks
understand customers & markets ?)
2. Customer churn analysis (why do
companies really lose customers?)
3. Ad Targeting (How can companies
increase campaign efficiency?)
4. Point of sale transaction analysis (How do retailers
target promotion guaranteed to make you buy?)
5. Search quality
(What’s in your search?) Hyperlink54

Apache Hadoop & SAP HANA Integration
(Future Generation Technologies)
59

Summary
o Cloud Computing
o Big Data
o Apache Hadoop
o Hadoop and SAP HANA integration
62

More Details
Nagarjuna D N
nagarjunadn.arjun@gmail.com
nagarjuna_dn@live.com
More Cloud Solutions Architect Skills:
• Amazon Cloud (Amazon Web Services)
• MongoDB (NoSQL Database)
• Play Framework (Web Application Framework)
• Domain/ SSL Certificate setup
• Apache Hadoop, Apache Pig, Apache hive

Your Valuable Feedback Please
• Compulsory to where I must improve………..!

Introduction to Cloud computing and Big Data-Hadoop

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Introduction to Cloud computing and Big Data-Hadoop

Semelhante a Introduction to Cloud computing and Big Data-Hadoop (20)

Último

Último (20)

Introduction to Cloud computing and Big Data-Hadoop

Notas do Editor