IBM Big Data & Analytics
Integrate and Govern
all Data Sources
Integration, Data Quality,
Security, ILM, MDM
Leveraging Big Data Requires Multiple Platform Capabilities
8
Manage Streaming Data Stream Computing
Understand and Navigate
Federated Big Data Sources
Federated Discovery
and Navigation
Data WarehousingStructure and Control Data
Manage and Store Huge
Volume of any Data
Hadoop File System
MapReduce
Analyze Unstructured Data Text Analytics Engine
IBM Big Data & Analytics
• Financial and
tax
preparation
software and
services
• $4.15B rev
2012
A Big Data Journey:
Anticipating and Improving Customer Interactions
Project 1: Big Data Foundation
-Data Warehousing, Data Quality, Customer Data Hub
-Single view of the customer
Project 2: Analytics
-Customer behavior and segmentation analysis
-Reduced customer churn 10%
-$10M new revenue in 12months
Project 3: Unstructured Data Analytics
-Social media analysis, Log Analysis, Text Analytics
-Augment customer profiles with new data sources
-Data warehouse cost optimization
-Data Exploration
Project 4: Real Time Analytics
-No latency analytics
-Real time behavior prediction
-Real time customer segmentation
10
IBM Big Data & Analytics
Cloud | Mobile | Security
Gather, extract
and explore data
using best of
breed
visualization
Speed time to
value with analytic
and application
accelerators
IBM Big Data Platform
Systems
Management
Applications &
Development
Visualization
& Discovery
Analyze streaming
data and large
data bursts for
real-time insights
Govern data
quality and
manage
information
lifecycle
Cost-effectively
analyze
Petabytes of
structured and
unstructured
information
Deliver deep insight
with advanced
in-database
analytics and
operational analytics
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Discovery
Index and
federated
discovery for
contextual
collaborative
insights
Solutions
Analytics and Decision Management
Big Data Infrastructure
Big Data Platform and Application Frameworks
IBM Big Data & Analytics
ETL, MDM, Data Governance
Metadata and Governance Zone
12
Warehousing Zone
Enterprise
Warehouse
Data Marts
An example of the big data platform in practice
Ingestion and Real-time Analytic Zone
Streams
Connectors
BI &
Reporting
Predictive
Analytics
Analytics and
Reporting Zone
Visualization
& Discovery
Landing and Analytics Sandbox Zone
Hive/HBase
Col Stores
Documents
in variety of formats
MapReduce
Hadoop
IBM Big Data & Analytics
TECHNOLOGY
Example: Integrate big data sources with
enterprise data
SPSS
Modeler
Cognos
RTM
Real-time
Analytics
Predictive
InfoSphere
BigInsights
Cognos
Insight
Cognos
BI
Export and
Explore
Social Media
Analysis
Reporting / Analysis
Dashboards
Cognos
Consumer
Insight
IBM Business Analytics
IBM Big Data Platform
PureData
Systems
Data In-Motion Data At-Rest
Other Sources
IBM Big Data & Analytics
BigInsights Enterprise Edition
Connectivity and Integration Streams
Netezza
Text
processing
engine and
library
JDBC
Flume
Infrastructure Jaql
Hive
Pig
HBase
MapReduce
HDFS
ZooKeeper
Indexing Lucene
Adaptive
MapReduce
Oozie
Text compression
Enhanced
security
Flexible
scheduler
Optional
IBM and
partner
offerings
Analytics and discovery “Apps”
DB2
BigSheets
Web Crawler
Distrib file copy
DB export
Boardreader
DB import
Ad hoc query
Machine
learning
Data
processing
. . .
Administrative and
development tools
Web console
• Monitor cluster health, jobs, etc.
• Add / remove nodes
• Start / stop services
• Inspect job status
• Inspect workflow status
• Deploy applications
• Launch apps / jobs
• Work with distrib file system
•Work with spreadsheet interface
•Support REST-based API
• . . .
R
Eclipse tools
• Text analytics
• MapReduce programming
• Jaql, Hive, Pig development
• BigSheets plug-in development
• Oozie workflow generation
Integrated
installer
Open Source IBMIBM
Cognos BI
GPFS (EAP)
Accelerator for
machine data
analysis
Accelerator for
social data
analysis
Guardium DataStageData Explorer
Sqoop
HCatalog
IBM Big Data & Analytics
Current fact finding
Analyze data in motion – before it is stored
Low latency paradigm, push model
Data driven – bring data to the analytics
Historical fact finding
Find and analyze information stored on disk
Batch paradigm, pull model
Query-driven: submits queries to static data
Traditional Computing Stream Computing
Stream Computing Represents a Paradigm Shift
Real-time
Analytics
1818
IBM Big Data & Analytics
Modify
Filter / Sample
Classify
Fuse
Annotate
Big Data in real-time with InfoSphere Streams
Score
Windowed
Aggregates
Analyze
IBM Big Data & Analytics
Mining in Microseconds
(included with Streams)
Image & Video
(Open Source)
Simple & Advanced Text
(included with Streams)
(IBM Research)
(Open Source UIMA)
Text
(listen, verb),
(radio, noun)
Acoustic
(IBM Research)
(Open Source)
Geospatial
(IBM Research)
Predictive
(IBM Research)
Advanced
Mathematical
Models
(IBM Research)
Statistics
(included with
Streams)
∑population
tt asR ),(
Analytic Accelerators Designed for Velocity (and Variety)
2020
IBM Big Data & Analytics
Putting it all together …end-to-end big data solution
Netezza
Appliance
InfoSphere
BigInsights
IBM Cognos
IBM SPSS
Streaming Data
Sources
Discover
Model
Visualize
& Publish
Score
Measure
InfoSphere
Streams
InfoSphere
Warehouse
2121
IBM Big Data & Analytics
Big SQL enables the Cognos BI
server to delegate many types of
analytical computations to
BigInsights MapReduce
processing instead of computing
them locally at a performance
cost like it would do with Hive
Faster response times due to
increased opportunity for query
processing to occur closer to the
data
Not hindered by the latency and
other limitations of querying
Hadoop via Hive
Application
(Map-Reduce)
Storage
(HBase, HDFS)
InfoSphere BigInsights
Cognos BI Server
Explore &
Analyze Report & Act
SQL
Interface
via JDBC
Hive
Cognos Business Intelligence optimized for Big SQL
IBM Big Data & Analytics
Of database queries
for reporting2
3838xx
Average
Acceleration
2. Based on internal tests.
Dynamic
Query
Compatible
Query
Dynamic
Cubes
Dynamic
Cubes
C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8
DB2 with BLU
Cognos BI
+
DB2 BLU
+
Power
Performance – Cognos BI + DB2 BLU
Dynamic
Query
Compatible
Query
Dynamic
Cubes
Dynamic
Cubes
Faster cube load*
Faster DB Query*
IBM Big Data & Analytics
For apps like E-commerce…
Database cluster services optimized for
transactional throughput and scalability
For apps like Customer Analysis…
Data warehouse services optimized for
high-speed, peta-scale analytics and simplicity
For apps like Real-time Fraud Detection…
Operational data warehouse services optimized to
balance high performance analytics and real-time
operational throughput
Meeting Big Data Challenges – Fast and Easy!
System for Transactions
System for Analytics
System for Operational Analytics
System for Hadoop
For Exploratory Analysis & Queryable Archive
Hadoop data services optimized for big data analytics
and online archive with appliance simplicity
IBM PureData Systems
IBM Big Data & Analytics
29
Every Industry can Leverage Big Data and Analytics.
Insurance
• 360˚˚˚˚ View of Domain
or Subject
• Catastrophe Modeling
• Fraud & Abuse
Banking
• Optimizing Offers and
Cross-sell
• Customer Service and
Call Center Efficiency
Telco
• Pro-active Call Center
• Network Analytics
• Location Based
Services
Energy &
Utilities
• Smart Meter Analytics
• Distribution Load
Forecasting/Scheduling
• Condition Based
Maintenance
Media &
Entertainment
• Business process
transformation
• Audience & Marketing
Optimization
Retail
• Actionable Customer
Insight
• Merchandise
Optimization
• Dynamic Pricing
Travel &
Transport
• Customer Analytics &
Loyalty Marketing
• Predictive Maintenance
Analytics
Consumer
Products
• Shelf Availability
• Promotional Spend
Optimization
• Merchandising
Compliance
Government
• Civilian Services
• Defense & Intelligence
• Tax & Treasury Services
Healthcare
• Measure & Act on
Population Health
Outcomes
• Engage Consumers in
their Healthcare
Automotive
• Advanced Condition
Monitoring
• Data Warehouse
Optimization
Life Sciences
• Increase visibility into
drug safety and
effectiveness
Chemical &
Petroleum
• Operational Surveillance,
Analysis & Optimization
• Data Warehouse
Consolidation, Integration
& Augmentation
Aerospace &
Defense
• Uniform Information
Access Platform
• Data Warehouse
Optimization
Electronics
• Customer/ Channel
Analytics
• Advanced Condition
Monitoring
IBM Big Data & Analytics
31
A Catalyst for ISV and Partner Innovation
Traditional Approach Transformational Outcomes
Customer segmentation based
on loyalty data
Historical analysis of
subscriber data
Managing rising cost of care
Capturing information from all
interactions to improve customer
lifetime value
Combining data from hundreds of
hospitals to improve results across
the healthcare continuum
2 million events analyzed
per minute, delivering real-time
insight to mobile operators
Use Big Data analytics to prioritize
and isolate areas of risk or rogue
activity
Anti-corruption and bribery
compliance program
Provide visibility, analysis and
reporting across the entire supply
chain (planning -> execution)
Measure and predict patient
payment behavior, reduce risk from
bad debt and boost collection rates
Analyzing parking systems to
maximize revenue & improve the
parking experience in cities
Treat-first, seek-payment-later
and write off bad debt
Manual supply chain
integration
Random parking meter patrols
& search for open spots
IBM Big Data & Analytics
Get started!
Identify and prioritize
business use cases
Identify and prioritize
business use cases
New insights and
new possibilities
New insights and
new possibilities
New revenue
opportunities
New revenue
opportunities
Process and performance
improvement
Process and performance
improvement
Evolve your existing
analytics capabilities
Evolve your existing
analytics capabilities
Build or acquire new
skills required
Build or acquire new
skills required
Measure and
communicate success
Measure and
communicate success
Ensure that the business
is engaged
Ensure that the business
is engaged
Agree on the key
measures for success
Agree on the key
measures for success
Think Big Pick your Spot
Execute and
Deliver Value