SlideShare a Scribd company logo
1 of 22
De-Mystifying Big Data
Prasad Mavuduri
American Institute of Big Data Professionals
RIGHTFOCUSANDONTARGET
Agenda
Analyze &
Define
• Progression of Analytics
• The new phenomenon - Big Data
• Big Data Defined
Technology
Discussion
• Big Data Technology – Hadoop
• Big Data – Big Savings – Hadoop
Use Cases
• What can we solve with Big Data – example
• What is next ? Where are the opportunities
RIGHTFOCUSANDONTARGET
Progression of Analytics
Structured – Known
Data
Traditional – ETL, Data Marts,
DW, RDBMS
Growth – Normal
Incremental – Archive
Less Cross Functional Integration
More Tactical than
Strategic
Sizes GBs to TBs
Data Architects vs.
Functional
So Far…..
RIGHTFOCUSANDONTARGET
A serious Matter
RIGHTFOCUSANDONTARGET
The new phenomenon - Big Data
Growing Pains ??!!!
Big Data ?!!!
Is it just data ?
RIGHTFOCUSANDONTARGET
The new phenomenon - Big Data
1. No to “fit-for-all” but Yes to “fit-for-purpose”
2. Proliferation of data sources – variety of data
3. Proliferation of volume of data
4. The demand for the speed (velocity) of data
5. Demand for high value & accuracy ( veracity)
of info
6. Massive Parallel processing
7. Commodity servers vs. Specialized servers
DATA DRIVEN BUSINESS
is
THE SMART BUSINESS
RIGHTFOCUSANDONTARGET
Big Data Definition
• High volume of
data which is
growing every
year more than
50 % every year
• High Speed
Streaming,
Machine
generated data
etc
• Different Data
sources In-the-
enterprise and
external data
around the
enterprise data
• Data collected
taking huge
memory (typically
100 TB or more)
where RDBMS is
inefficient
Value Variety
VolumeVelocity
VERACITY
Meaningful
RIGHTFOCUSANDONTARGET
Big Data Definition
VERACITY
Big Data is the new art and science, using Massive
Parallel Processing (MPP) technology, of
collection, storage, processing, distribution, and
analysis of data with any of the attributes – high
volume, high velocity, high variety to extract high
value and greater accuracy (veracity).
IBM Says, BIG DATA means
1.Volume (Terabytes --‐> Zettabytes) 2. Variety (Structured --‐>
Semi--‐structured --‐> Unstructured)
3. Velocity (Batch --‐> Streaming Data)
RIGHTFOCUSANDONTARGET Big Data Technologies – Typical Stack
Big Data Infrastructure
Data Manipulation & Management
Data Analysis & Mining
Predictive & Prescriptive Analysis
Process Automation& Decision Support Systems
Big Data Stack
RIGHTFOCUSANDONTARGET Big Data Technologies – SMAQ
User-friendly Analytics
1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL)
3. Cascading ( Workflow) 4. Mahout ( Machine Learning)
5. Zookeeper (Coordination Service)
Data Distribution & Management across nodes in Batch Mode
1. Hadoop MapReduce
2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M),
Strom, HPCC (LexisNexis)
Distributed Non-Relational
1. HBase ( columnar DB)
2. HDFS – Hadoop Distributed File System
Query
Map Reduce
Storage
SMAQ Stack
RIGHTFOCUSANDONTARGET
Big Data – Big Savings – Economics
ROI on Big Data Approach (with Hadoop)
Source : American Institute for Analytics
1TB of RDBMS TCO
$37,000 - Traditional RDBMS
$2,000 only !!!! Hadoop
Source :American Institute for Analytics
RIGHTFOCUSANDONTARGET
Where is the market on Big Data
Infrastructure / Framework / Analytics software
Horizontal Solutions like EDW etc
HealthCare
RetailIndustry
Government/
Publicsector
Education&
HumanCapital
HealthSciences
/Genomics
Telecommunicat
ions/Services
Energy&
Utilities
E-Commerce/
Marketing
Media&
Entertainment
Source: IDC 2011
0
5
10
15
20
2010 2011 2012 2013 2014 2015
Big Data Market In $B
Current
State
RIGHTFOCUSANDONTARGET
Web Logs
Images &
Videos
Social
Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Existing
EDW
Prescriptive
Predictive
Reporting
OLAP
Modeling
Integrated Big data Implementation - Architecture
Coexistence of Big Data with existing EDW
Connectors
/ Adapters
RIGHTFOCUSANDONTARGET
Web Logs
Images &
Videos
Social Media
Documents
Structured
Data
Big Data /
Hadoop
etc.
Prescriptive
Predictive
Reporting
OLAP
Modeling
Pure Big data Implementation - Architecture
Pure Big Data
Connectors
/ Adapters
Barriers
Disruption to existing Analytics ?!
Roadmap / Methodology
Certainty of costs
HADOOP / Big Table can replace traditional EDWs !!
RIGHTFOCUSANDONTARGET
Big Data Landscape
RIGHTFOCUSANDONTARGET
Big Data Landscape
RIGHTFOCUSANDONTARGET
Applied BIG Data
RIGHTFOCUSANDONTARGET
BIG Data Opportunities
Some Gaps & opportunities
•Real-time Analysis ( may be use SAP HANA etc !!)
•User interface (UI) frameworks
•App development Big Data on Cloud (multi-Tenancy)
•Security & Data Governance
•Cross Application Integration
•Industry Standards
RIGHTFOCUSANDONTARGET
AIBDP – Contribution to Big Data
RIGHTFOCUSANDONTARGET
Business Focus
 Identify data needs
Identify Business Issues
 Layout data dependencies
between functions
 Resolve Competing priorities
 Clearly lay out the levels of
data, cross-functional
requirements
Stakeholder Focus
 Identify the stake holders
 Align best practices with the
project
 Plan out the objectives, scope,
and timelines
Identify the KPIs, Reports,
Dashboards, Predictive &
Prescriptive Analysis to be delivered
Technology Focus
 Synergies in current technology
 Take stock of existing “technology
assets” towards Big Data
Assess your current capabilities and
architecture
 Identify the resources and minimize
“specialties” to exploit synergies with
existing resource pool
 Lay out a development methodology
to streamline delivery
Process Focus
 Establish clear data flows
 Identify Data Governance
execution process – People,
Processes, Mechanisms
 Design the process to be more
Business focused than IT
 Clearly establish measures to
achieve – Accuracy, Repeatability,
Agility, and accountability (
reconcilability)
Our Big Data Strategy at a glance
RIGHTFOCUSANDONTARGET
Our Execution Approach – AGILE methodology
Agile Approach to reduce risks
• Close coordination
between the customer and
the developer
• Small incremental steps
makes testing easier and
manageable & avoid
surprises
• Early recovery from
expectation mismatch
• Clarity on Design
understanding and regular
communication with user.
• Early warning about risks
regular status reports.
• Full Knowledge Transfer
RIGHTFOCUSANDONTARGET
Thank You !!
Please contact us
for any enquiries at:
Prasad Mavuduri
prasad@aibdp.org
408 828 9909
Q & A

More Related Content

What's hot

Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and GovernanceIMC Institute
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 

What's hot (20)

Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
 
Big Data
Big DataBig Data
Big Data
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and Governance
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 

Viewers also liked (20)

Cover Letter Edit
Cover Letter EditCover Letter Edit
Cover Letter Edit
 
peerj-1949
peerj-1949peerj-1949
peerj-1949
 
Banach Tarski
Banach TarskiBanach Tarski
Banach Tarski
 
Kripi Chandna-CV-Updated
Kripi Chandna-CV-UpdatedKripi Chandna-CV-Updated
Kripi Chandna-CV-Updated
 
nette_lijnen_algoritme_v1
nette_lijnen_algoritme_v1nette_lijnen_algoritme_v1
nette_lijnen_algoritme_v1
 
UK students' financial literacy
UK students' financial literacyUK students' financial literacy
UK students' financial literacy
 
Tom Day HRL Investor Day 2015
Tom Day HRL Investor Day 2015Tom Day HRL Investor Day 2015
Tom Day HRL Investor Day 2015
 
Gear logo in (ten minutes logo) by ahmad raniri
Gear logo in (ten minutes logo) by ahmad raniriGear logo in (ten minutes logo) by ahmad raniri
Gear logo in (ten minutes logo) by ahmad raniri
 
Hitesh maheshbhai patel
Hitesh maheshbhai patelHitesh maheshbhai patel
Hitesh maheshbhai patel
 
Ilmu pengetahuan alam
Ilmu pengetahuan alam Ilmu pengetahuan alam
Ilmu pengetahuan alam
 
ΠΟΛ.1163/16
ΠΟΛ.1163/16ΠΟΛ.1163/16
ΠΟΛ.1163/16
 
Jody Feragen HRL Investor Day 2015
Jody Feragen HRL Investor Day 2015Jody Feragen HRL Investor Day 2015
Jody Feragen HRL Investor Day 2015
 
Ralph Dunuan 2015
Ralph Dunuan 2015Ralph Dunuan 2015
Ralph Dunuan 2015
 
Letter ofRecommendation-Daniel[1]
Letter ofRecommendation-Daniel[1]Letter ofRecommendation-Daniel[1]
Letter ofRecommendation-Daniel[1]
 
Jeff Ettinger closing HRL Investor Day 2015
Jeff Ettinger closing HRL Investor Day 2015Jeff Ettinger closing HRL Investor Day 2015
Jeff Ettinger closing HRL Investor Day 2015
 
Illustrert vitenskap illvit.no kan hasj helbrede
Illustrert vitenskap illvit.no kan hasj helbredeIllustrert vitenskap illvit.no kan hasj helbrede
Illustrert vitenskap illvit.no kan hasj helbrede
 
Sudhanshu Final
Sudhanshu FinalSudhanshu Final
Sudhanshu Final
 
British Quotative Use on Social Media Platform, Twitter.
British Quotative Use on Social Media Platform, Twitter.British Quotative Use on Social Media Platform, Twitter.
British Quotative Use on Social Media Platform, Twitter.
 
Slide final
Slide finalSlide final
Slide final
 
Kia
KiaKia
Kia
 

Similar to De-Mystifying Big Data

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviationranjit banshpal
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 

Similar to De-Mystifying Big Data (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Bigdata
BigdataBigdata
Bigdata
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Thilga
ThilgaThilga
Thilga
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Big Data SE vs. SE for Big Data
Big Data SE vs. SE for Big DataBig Data SE vs. SE for Big Data
Big Data SE vs. SE for Big Data
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 

De-Mystifying Big Data

  • 1. De-Mystifying Big Data Prasad Mavuduri American Institute of Big Data Professionals
  • 2. RIGHTFOCUSANDONTARGET Agenda Analyze & Define • Progression of Analytics • The new phenomenon - Big Data • Big Data Defined Technology Discussion • Big Data Technology – Hadoop • Big Data – Big Savings – Hadoop Use Cases • What can we solve with Big Data – example • What is next ? Where are the opportunities
  • 3. RIGHTFOCUSANDONTARGET Progression of Analytics Structured – Known Data Traditional – ETL, Data Marts, DW, RDBMS Growth – Normal Incremental – Archive Less Cross Functional Integration More Tactical than Strategic Sizes GBs to TBs Data Architects vs. Functional So Far…..
  • 5. RIGHTFOCUSANDONTARGET The new phenomenon - Big Data Growing Pains ??!!! Big Data ?!!! Is it just data ?
  • 6. RIGHTFOCUSANDONTARGET The new phenomenon - Big Data 1. No to “fit-for-all” but Yes to “fit-for-purpose” 2. Proliferation of data sources – variety of data 3. Proliferation of volume of data 4. The demand for the speed (velocity) of data 5. Demand for high value & accuracy ( veracity) of info 6. Massive Parallel processing 7. Commodity servers vs. Specialized servers DATA DRIVEN BUSINESS is THE SMART BUSINESS
  • 7. RIGHTFOCUSANDONTARGET Big Data Definition • High volume of data which is growing every year more than 50 % every year • High Speed Streaming, Machine generated data etc • Different Data sources In-the- enterprise and external data around the enterprise data • Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient Value Variety VolumeVelocity VERACITY Meaningful
  • 8. RIGHTFOCUSANDONTARGET Big Data Definition VERACITY Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity). IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes) 2. Variety (Structured --‐> Semi--‐structured --‐> Unstructured) 3. Velocity (Batch --‐> Streaming Data)
  • 9. RIGHTFOCUSANDONTARGET Big Data Technologies – Typical Stack Big Data Infrastructure Data Manipulation & Management Data Analysis & Mining Predictive & Prescriptive Analysis Process Automation& Decision Support Systems Big Data Stack
  • 10. RIGHTFOCUSANDONTARGET Big Data Technologies – SMAQ User-friendly Analytics 1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL) 3. Cascading ( Workflow) 4. Mahout ( Machine Learning) 5. Zookeeper (Coordination Service) Data Distribution & Management across nodes in Batch Mode 1. Hadoop MapReduce 2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M), Strom, HPCC (LexisNexis) Distributed Non-Relational 1. HBase ( columnar DB) 2. HDFS – Hadoop Distributed File System Query Map Reduce Storage SMAQ Stack
  • 11. RIGHTFOCUSANDONTARGET Big Data – Big Savings – Economics ROI on Big Data Approach (with Hadoop) Source : American Institute for Analytics 1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! Hadoop Source :American Institute for Analytics
  • 12. RIGHTFOCUSANDONTARGET Where is the market on Big Data Infrastructure / Framework / Analytics software Horizontal Solutions like EDW etc HealthCare RetailIndustry Government/ Publicsector Education& HumanCapital HealthSciences /Genomics Telecommunicat ions/Services Energy& Utilities E-Commerce/ Marketing Media& Entertainment Source: IDC 2011 0 5 10 15 20 2010 2011 2012 2013 2014 2015 Big Data Market In $B Current State
  • 13. RIGHTFOCUSANDONTARGET Web Logs Images & Videos Social Media Documents Structured Data Big Data / Hadoop etc. Existing EDW Prescriptive Predictive Reporting OLAP Modeling Integrated Big data Implementation - Architecture Coexistence of Big Data with existing EDW Connectors / Adapters
  • 14. RIGHTFOCUSANDONTARGET Web Logs Images & Videos Social Media Documents Structured Data Big Data / Hadoop etc. Prescriptive Predictive Reporting OLAP Modeling Pure Big data Implementation - Architecture Pure Big Data Connectors / Adapters Barriers Disruption to existing Analytics ?! Roadmap / Methodology Certainty of costs HADOOP / Big Table can replace traditional EDWs !!
  • 18. RIGHTFOCUSANDONTARGET BIG Data Opportunities Some Gaps & opportunities •Real-time Analysis ( may be use SAP HANA etc !!) •User interface (UI) frameworks •App development Big Data on Cloud (multi-Tenancy) •Security & Data Governance •Cross Application Integration •Industry Standards
  • 20. RIGHTFOCUSANDONTARGET Business Focus  Identify data needs Identify Business Issues  Layout data dependencies between functions  Resolve Competing priorities  Clearly lay out the levels of data, cross-functional requirements Stakeholder Focus  Identify the stake holders  Align best practices with the project  Plan out the objectives, scope, and timelines Identify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered Technology Focus  Synergies in current technology  Take stock of existing “technology assets” towards Big Data Assess your current capabilities and architecture  Identify the resources and minimize “specialties” to exploit synergies with existing resource pool  Lay out a development methodology to streamline delivery Process Focus  Establish clear data flows  Identify Data Governance execution process – People, Processes, Mechanisms  Design the process to be more Business focused than IT  Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability) Our Big Data Strategy at a glance
  • 21. RIGHTFOCUSANDONTARGET Our Execution Approach – AGILE methodology Agile Approach to reduce risks • Close coordination between the customer and the developer • Small incremental steps makes testing easier and manageable & avoid surprises • Early recovery from expectation mismatch • Clarity on Design understanding and regular communication with user. • Early warning about risks regular status reports. • Full Knowledge Transfer
  • 22. RIGHTFOCUSANDONTARGET Thank You !! Please contact us for any enquiries at: Prasad Mavuduri prasad@aibdp.org 408 828 9909 Q & A

Editor's Notes

  1. Progression of Analytics 3 minutes The new phenomenon - Big Data 4 minutes Big Data Defined 3 minutes 2 minutes Where is the Technology 5 minutes What can we solve with Big Data – example Case Studies 5 minutes What is next ? Where are the opportunities ? 10 minutes
  2. Internal Information –Known questions and answers - Known structures, structured data types, known volumes, mostly transactional data Master data is very well defined - Storage Typical Data Warehouses, Data Marts using batch processing & traditional ETL, and relational databases Data growth is incremental and regular archival Just reporting, a little bit of mining – mostly descriptive - predictive analysis is very light Cross functional integration of data is very limited, very structured around customers, services & products, logistics etc. Functional & Technical responsibilities are very clearly demarcated. Mostly Data engineers / architects at the backend supporting business analysts / users. Most of the reports are just a measurement of their tactics – more supporting the strategy than inducing a strategy Data sizes are in Giga and Terra byte range , becomes inefficient and costly after a certain size limit.
  3. Narrow & focused business missions – not “fit-for-all” but “fit-for-purpose” The need to discover more - Facts, Relationships, Indicators, Patterns, Trends, Pointers which could not probably be discovered before by using cross integration of data from various sources Need to capture & store data and just not collect Proliferation of data sources – variety of data Multi-Dimensional Data Streaming Data Geo Spatial Data Social Networking Data Internal Data (RDBMS) Video & Image data Text data (logs etc) Time series Data Genomics Proliferation of volume of data ( crossed to Petabytes and above) Internet / intranet Social networks ( FB & Twitter) Mobile Devices Smart Home devices Smart systems (Utilities etc) Media & entertainment The demand for the speed (velocity) of the data collected, understood, processed, and distributed Accessibility - where when, who, and how Time value – Real Time or not Increased speeds of consumption Increased speeds of data generation Demand for high value & accuracy ( veracity) of information Advent of Technology with Massive Parallel processing - Availability of Hadoop / Map reduce kind of open source & packaged technologies Affordability of infrastructure – Commodity servers vs. Specialized servers Hadoop enables a computing solution that is: Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide. Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
  4. The word of the hour is “SMART” !! Smart Business – Targeted value proposition Businesses are under pressure to maximize their investments ( focused approach, not one-fit-all methodology) Targeted value proposition Targeted advertisement, Tailored menu, Focused Initiatives, Individualized Attention, Non-impersonal Messaging, Efficient Governance, Greater Accuracy Narrow & focused business missions – not “fit-for-all” but “fit-for-purpose” The need to discover more - Facts, Relationships, Indicators, Patterns, Trends, Pointers which could not probably be discovered before by using cross integration of data from various sources Need to capture & store data and just not collect Proliferation of data sources – variety of data Multi-Dimensional Data Streaming Data Geo Spatial Data Social Networking Data Internal Data (RDBMS) Video & Image data Text data (logs etc) Time series Data Genomics Proliferation of volume of data ( crossed to Petabytes and above) Internet / intranet Social networks ( FB & Twitter) Mobile Devices Smart Home devices Smart systems (Utilities etc) Media & entertainment The demand for the speed (velocity) of the data collected, understood, processed, and distributed Accessibility - where when, who, and how Time value – Real Time or not Increased speeds of consumption Increased speeds of data generation Demand for high value & accuracy ( veracity) of information Advent of Technology with Massive Parallel processing - Availability of Hadoop / Map reduce kind of open source & packaged technologies Affordability of infrastructure – Commodity servers vs. Specialized servers Hadoop enables a computing solution that is: Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide. Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
  5. Targeted advertisement, Tailored menu, focused initiatives, individualized attention, non-impersonal messaging, efficient governance, greater accuracy Businesses want to gain competitive advantage by being able to take action based on timely, relevant, complete, and accurate information, rather than one-fit-for all solutions There is immense volume, variety and velocity of data that is produced today is new information, facts, relationships, indicators and pointers, that either could not be practically discovered in the past, or simply did not exist before
  6. Targeted advertisement, Tailored menu, focused initiatives, individualized attention, non-impersonal messaging, efficient governance, greater accuracy Businesses want to gain competitive advantage by being able to take action based on timely, relevant, complete, and accurate information, rather than one-fit-for all solutions There is immense volume, variety and velocity of data that is produced today is new information, facts, relationships, indicators and pointers, that either could not be practically discovered in the past, or simply did not exist before
  7. Market has just started picking up There is a lot of gap in vertical solutions Biggest gap is in Big Data Services Hardware & Software components seem to have been available already
  8. Adapting to Real-time Analysis ( may be use HANA !!) Development of industry standards Development of Universal Schema for metadata and cataloging Tools to support security & data governance Support for Cloud-ification (multi-tenancy) Support for data lineage Framework for cross-application integration Support for testing Automated & configurable monitoring and management console User interface (UI) frameworks
  9. Business Focus Identify data needs for strategic business functions Identify Business Issues that need to be solved by big Data Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirements Technology Focus Identify the right technology to align with the current landscape for synergies in technology Take stock of existing “technology assets” towards Big Data Assess your current capabilities and architecture to support your goals, and select the deployment strategy that best fits your Big Data questions Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline delivery Stakeholder Focus Clearly identify the stake holders at all levels of data consumption Present best practices and align them with the project Plan out the objectives, scope, and timelines Identify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered Process Focus Establish clear data flows from collection of data to consumption of data Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)