De-Mystifying Big Data

De-Mystifying Big Data
Prasad Mavuduri
American Institute of Big Data Professionals

RIGHTFOCUSANDONTARGET
Agenda
Analyze &
Define
• Progression of Analytics
• The new phenomenon - Big Data
• Big Data Defined
Technology
Discussion
• Big Data Technology – Hadoop
• Big Data – Big Savings – Hadoop
Use Cases
• What can we solve with Big Data – example
• What is next ? Where are the opportunities

Progression of Analytics
Structured – Known
Data
Traditional – ETL, Data Marts,
DW, RDBMS
Growth – Normal
Incremental – Archive
Less Cross Functional Integration
More Tactical than
Strategic
Sizes GBs to TBs
Data Architects vs.
Functional
So Far…..

A serious Matter

The new phenomenon - Big Data
Growing Pains ??!!!
Big Data ?!!!
Is it just data ?

The new phenomenon - Big Data
1. No to “fit-for-all” but Yes to “fit-for-purpose”
2. Proliferation of data sources – variety of data
3. Proliferation of volume of data
4. The demand for the speed (velocity) of data
5. Demand for high value & accuracy ( veracity)
of info
6. Massive Parallel processing
7. Commodity servers vs. Specialized servers
DATA DRIVEN BUSINESS
is
THE SMART BUSINESS

Big Data Definition
• High volume of
data which is
growing every
year more than
50 % every year
• High Speed
Streaming,
Machine
generated data
etc
• Different Data
sources In-the-
enterprise and
external data
around the
enterprise data
• Data collected
taking huge
memory (typically
100 TB or more)
where RDBMS is
inefficient
Value Variety
VolumeVelocity
VERACITY
Meaningful

Big Data Definition
VERACITY
Big Data is the new art and science, using Massive
Parallel Processing (MPP) technology, of
collection, storage, processing, distribution, and
analysis of data with any of the attributes – high
volume, high velocity, high variety to extract high
value and greater accuracy (veracity).
IBM Says, BIG DATA means
1.Volume (Terabytes --‐> Zettabytes) 2. Variety (Structured --‐>
Semi--‐structured --‐> Unstructured)
3. Velocity (Batch --‐> Streaming Data)

RIGHTFOCUSANDONTARGET Big Data Technologies – Typical Stack
Big Data Infrastructure
Data Manipulation & Management
Data Analysis & Mining
Predictive & Prescriptive Analysis
Process Automation& Decision Support Systems
Big Data Stack

RIGHTFOCUSANDONTARGET Big Data Technologies – SMAQ
User-friendly Analytics
1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL)
3. Cascading ( Workflow) 4. Mahout ( Machine Learning)
5. Zookeeper (Coordination Service)
Data Distribution & Management across nodes in Batch Mode
1. Hadoop MapReduce
2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M),
Strom, HPCC (LexisNexis)
Distributed Non-Relational
1. HBase ( columnar DB)
2. HDFS – Hadoop Distributed File System
Query
Map Reduce
Storage
SMAQ Stack

Big Data – Big Savings – Economics
ROI on Big Data Approach (with Hadoop)
Source : American Institute for Analytics
1TB of RDBMS TCO
$37,000 - Traditional RDBMS
$2,000 only !!!! Hadoop
Source :American Institute for Analytics

Where is the market on Big Data
Infrastructure / Framework / Analytics software
Horizontal Solutions like EDW etc
HealthCare
RetailIndustry
Government/
Publicsector
Education&
HumanCapital
HealthSciences
/Genomics
Telecommunicat
ions/Services
Energy&
Utilities
E-Commerce/
Marketing
Media&
Entertainment
Source: IDC 2011
0
5
10
15
20
2010 2011 2012 2013 2014 2015
Big Data Market In $B
Current
State

Web Logs
Images &
Videos
Social
Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Existing
EDW
Prescriptive
Predictive
Reporting
OLAP
Modeling
Integrated Big data Implementation - Architecture
Coexistence of Big Data with existing EDW
Connectors
/ Adapters

Web Logs
Images &
Videos
Social Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Prescriptive
Predictive
Reporting
OLAP
Modeling
Pure Big data Implementation - Architecture
Pure Big Data
Connectors
/ Adapters
Barriers
Disruption to existing Analytics ?!
Roadmap / Methodology
Certainty of costs
HADOOP / Big Table can replace traditional EDWs !!

Big Data Landscape

Applied BIG Data

BIG Data Opportunities
Some Gaps & opportunities
•Real-time Analysis ( may be use SAP HANA etc !!)
•User interface (UI) frameworks
•App development Big Data on Cloud (multi-Tenancy)
•Security & Data Governance
•Cross Application Integration
•Industry Standards

AIBDP – Contribution to Big Data

Business Focus
 Identify data needs
Identify Business Issues
 Layout data dependencies
between functions
 Resolve Competing priorities
 Clearly lay out the levels of
data, cross-functional
requirements
Stakeholder Focus
 Identify the stake holders
 Align best practices with the
project
 Plan out the objectives, scope,
and timelines
Identify the KPIs, Reports,
Dashboards, Predictive &
Prescriptive Analysis to be delivered
Technology Focus
 Synergies in current technology
 Take stock of existing “technology
assets” towards Big Data
Assess your current capabilities and
architecture
 Identify the resources and minimize
“specialties” to exploit synergies with
existing resource pool
 Lay out a development methodology
to streamline delivery
Process Focus
 Establish clear data flows
 Identify Data Governance
execution process – People,
Processes, Mechanisms
 Design the process to be more
Business focused than IT
 Clearly establish measures to
achieve – Accuracy, Repeatability,
Agility, and accountability (
reconcilability)
Our Big Data Strategy at a glance

Our Execution Approach – AGILE methodology
Agile Approach to reduce risks
• Close coordination
between the customer and
the developer
• Small incremental steps
makes testing easier and
manageable & avoid
surprises
• Early recovery from
expectation mismatch
• Clarity on Design
understanding and regular
communication with user.
• Early warning about risks
regular status reports.
• Full Knowledge Transfer

Thank You !!
Please contact us
for any enquiries at:
Prasad Mavuduri
prasad@aibdp.org
408 828 9909
Q & A

De-Mystifying Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to De-Mystifying Big Data

Similar to De-Mystifying Big Data (20)

De-Mystifying Big Data

Editor's Notes