2. Agenda
Introduction to Big Data
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
3. Agenda
Introduction to Big Data
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
4. What is Big Data?
Big data are datasets that grow so large
that they become awkward to work with
using on-hand database management tools.
Difficulties include capture, storage, search,
sharing, analytics, and visualizing.
Source: Wikipedia
5. Big Data Characteristics
Information is growing at a phenomenal rate
as much data and content over coming decade
2009
800,000 petabytes
2020
35 zettabytes
=
4 Trillion 8GB iPods
44x
Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010
6. Big Data Characteristics
• About 80%of the world’s data is unstructured
• It may be data we’ve been collecting before, but could not
process
7. Types of Big Data
• Data in movement - streams
• Twitter / Facebook comments
• Stock market data
• Sensors: Vital signs of a newly-born
• Data at rest - oceans
• Collection of what has streamed
• Web logs, emails, social media
• Unstructured documents: forms, claims
• Structured data from disparate systems
8. IT
Structures the
data to answer
that question
IT
Delivers a platform to
enable creative
discovery
Business
Explores what questions
could be asked
Business Users
Determine what
question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data Approach
Iterative & Exploratory Analysis
Traditional Approach
Structured & Repeatable Analysis
Traditional vs. big data business approaches
9. Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
10. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
12. Use of Big Data globally and in the financial sector
Multiple responses accepted
13. Big Data: In Demand Well Paying Skill
Skills are in Demand Pays well
“If you can claim to be a data
scientist and have the chops to back
that up, you can pretty much write
your own ticket even in this tough
job market.”
Source: Gigaom http://gigaom.com/cloud/big-data-skills-bring-big-dough/
14. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
15. 15
KTH Swedish Royal Institute
of Technology Reducing
Traffic Congestion
• Deployed real-time Smarter Traffic system to
predict and improve traffic flow.
• Analyzes streaming real-time data gathered from
cameras at entry/exit to city, GPS data from taxis
and trucks, and weather information.
• Predicts best time and method to travel such as
when to leave to catch a flight at the airport
Results
• Enables ability to analyze and predict traffic
faster and more accurately than ever before
• Provides new insight into mechanisms that affect
a complex traffic system
• Smarter, more efficient, and more
environmentally friendly traffic
15
16. Benefits
Real-time display of public sentiment as
candidates respond to questions
Debate winner prediction based on public
opinion instead of solely political analysts
University of Southern
California Innovation
Lab Monitors Political
Debates
17. Big Data – A holistic approach
Big Data is Not Only Hadoop!
Examples where Hadoop is not entirely applicable:
– Cyber security, Stock market, Traffic control, Sensor
information, monitoring trends in Social Media
– What if your company has many silos of information,
difficult to move to HDFS?
– What about governance? Can we trust the source of
this data?
19. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
The IBM Big Data Platform
Delivers deep insight
with advanced in-
database analytics &
operational analytics
Data
Warehouse
Data
Warehouse
Big data holistic approach: A platform
20. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Stream
Computing
Data
Warehouse
Analyze streaming
data and large
data bursts for
real-time insightsStream
Computing
Big data holistic approach: A platform
21. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
The IBM Big Data Platform
Hadoop
System
Stream
Computing
Data
Warehouse
Cost-effectively
analyze Petabytes
of unstructured and
structured data
Hadoop
System
Big data holistic approach: A platform
22. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
22
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Govern data quality
and manage the
information lifecycle
Information Integration & Governance
Big data holistic approach: A platform
23. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Speed time to
value with
analytic and
application
accelerators
Accelerators
Big data holistic approach: A platform
24. Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Systems
Management
Application
Development
Visualization
& Discovery
The IBM Big Data Platform
Discover,
understand, search,
and navigate
federated sources
of big data
Visualization
& Discovery
Big data holistic approach: A platform
25. Process any type of data
– Structured, unstructured, in-
motion, at-rest, in-place
Built-for-purpose engines
– Designed to handle different
requirements
Manage and govern data in the
ecosystem
Enterprise data integration
Grow and evolve on current
infrastructure
The whole is greater than the sum
of parts
Integrated components
Out of the box, standards-based services
Start small (value is additive)
25
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Systems
Management
Application
Development
Visualization
& Discovery
Big data holistic approach: A platform
26. ETL, MDM, Data Governance
Metadata and Governance Zone
Warehousing Zone
Enterprise
Warehouse
Data Marts
Ingestion and Real-time Analytic Zone
Streams
Connectors
BI &
Reporting
Predictive
Analytics
Analytics and
Reporting Zone
Visualization
& Discovery
Landing and Analytics Sandbox Zone
Hive/HBase
Col Stores
Documents
in variety of formats
MapReduce
Hadoop
An example of the big data platform in practice
27. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
28. Big Data Exploration
Find, visualize, understand
all big data to improve
business knowledge
Enhanced 360o View
of the Customer
Achieve a true unified view,
incorporating internal and
external sources
Security/Intelligence
Extension
Lower risk, detect fraud
and monitor cyber security
in real-time
Data Warehouse Augmentation
Integrate big data and data warehouse
capabilities to increase operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
The 5 High Value Big Data Use Cases
29. Find, visualize and
understand all big
data to improve
business
knowledge
• Greater efficiencies in
business processes
• New insights from
combining and
analyzing data types in
new ways
• Develop new business
models with resulting
increased market
presence and revenue
CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems
Connector
Framework
App Builder
Hadoop
Integration & Governance
UI / User
Streams
Big Data Exploration: Illustrated
WarehouseData Explorer
30. Big Data Exploration: Example in Practice
• Exploring 4 TB to drive point business solutions
(supplier portal, call center, etc.)
• Single-point of data fusion for all employees to use
• Reduced costs & improved operational performance for the business
How do you enable employees to navigate and
explore enterprise and external content? Can you
present this in a single user interface?
How do you identify areas of data risk before they
become a problem?
What is the starting point for your big data initiatives?
Is Big Data Exploration Right for You?
How do you separate the “noise” from useful
content?
How do you perform data exploration on large
and complex data?
How do you find insights in new or unstructured
data types (e.g. social media and email)?
Airplane Manufacturer
Blinded for confidentiality
Big Data Platform Component Starting Point: Data Explorer
31. Enhanced 360º View of the Customer: Illustrated
CRM
J Robertson
Pittsburgh, PA 15213
35 West 15th
Name:
Address:
Address:
ERP
Janet Robertson
Pittsburgh, PA 15213
35 West 15th St.
Name:
Address:
Address:
Legacy
Jan Robertson
Pittsburgh, PA 15213
36 West 15th St.
Name:
Address:
Address:
SOURCE SYSTEMS
Janet
35 West 15th St
Pittsburgh
Robertson
PA / 15213
F
48
1/4/64
First:
Last:
Address:
City:
State/Zip:
Gender:
Age:
DOB:
360 View of
Party Identity
Master
Data
Management
Unified View of Party’s Information
Hadoop Streams Warehouse
32. Logs
Events Alerts
Configuration
information
System
audit trails
External threat
intelligence feeds
Network flows
and anomalies
Identity
context
Web page
text
Video/audio
surveillance
E-mail and
social activity
Business
process data
Customer
transactions
Traditional Security
Operations and
Technology
Big Data
Analytics
New Considerations
Collection, Storage
and Processing
Collection and integration
Size and speed
Enrichment and correlation
Analytics and Workflow
Visualization
Unstructured analysis
Learning and prediction
Customization
Sharing and export
Security/Intelligence Extension: Illustrated
33. “Reconstructing Events” – Integrating Multimedia from Diverse Sources
• Correlate
multimedia
content across a
wide diversity of
sources and
dynamic topology
of cameras
• Exploit partial
overlaps in field
of view, re-
identification of
objects/people
and contextual
information
• Obtain real-time
operational
picture across
diverse content• 100K security cameras (static cameras, slowly changing topology)
• 10M mobile photos/day (limited knowledge about locations)
• 50M social media photos/video (uncertain geo-temporal context)
• Moving vehicles (patrol cars), overhead drones, broadcast, retail, 311, etc.
Overhead
Social MediaMobile
Cameras
Security
Cameras
33
34. Security/Intelligence Extension: Customer Example
What are your plans to enrich your security or
intel system with unused or underleveraged
data sources (video, audio, smart devices,
network, Telco, social media)?
How will you address the need sub second
detection, identification, resolution of physical
or cyber threats?
How do you intend to follow activities of
criminals, terrorists, or persons in a blacklist?
How do you plan to enhance your surveillance system
with real-time data from video, acoustic, thermal or
other security sensors?
Do you want to correlate lots of technical or human
intel data and sources looking for associations or
patterns (big data forensics)?
How are you going to deal with unstructured data
(email, social, etc.) in your Security Information &
Event Management (SIEM) solution to improve cyber
threat detection & remediation?
Would the Security / Intelligence Extension benefit you?
Captured and analyzed 42TB of daily traffic in real-time for tracking persons of
interest to take suitable action and reduce risk.
Big Data Platform Component Starting Point: Streams, Hadoop
37. Operations Analysis: Customer Example
• Intelligent Infrastructure Management: log analytics, energy bill
forecasting, energy consumption optimization, anomalous energy
usage detection, presence-aware energy management
• Optimized building energy consumption with centralized monitoring;
Automated preventive and corrective maintenance
• Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos
Do you deal with large volumes of machine data?
How do you access and search that data?
How do you perform root cause analysis?
How do you perform complex real-time analysis to
correlate across different data sets?
How do you monitor and visualize streaming data
in real time and generate alerts?
Would Operations Analysis benefit you?
Big Data Platform Component Starting Point: Hadoop, Streams
38. Integrate big data and data warehouse
capabilities to increase operational efficiency
Data Warehouse Augmentation: Needs
Need to leverage variety of data Extend warehouse infrastructure
• Optimized storage, maintenance and licensing
costs by migrating rarely used data to Hadoop
• Reduced storage costs through smart
processing of streaming data
• Improved warehouse performance by
determining what data to feed into it
• Structured, unstructured, and streaming
data sources required for deep analysis
• Low latency requirements
(hours—not weeks or months)
• Required query access to data
39. Filter and summarize big data for the warehouse
Hadoop
Data Warehouse Augmentation: Illustrated
40. Hadoop as a query-ready archive for a data warehouse
Hadoop
Data Warehouse Augmentation: Illustrated
41. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
42. Open Source Hadoop
Visualization & Discovery Connectors
Workload Optimization
Flume
Runtime
Advanced Engines
File System
MapReduce
HDFS
Data Store
HBase
Development Tools
Eclipse Plug-ins
Systems Management
Jaql
Pig
ZooKeeper
Lucene
Oozie
Hive
Open Source
Mahout
Whirr
Sqoop
Hue
H Catalog
R
43. Visualization & Discovery Integration
Workload Optimization
Streams
Netezza
Flume
DB2
DataStage
IBM InfoSphere BigInsights v2.1 Enterprise Edition
Runtime
Advanced Analytic Engines
File System
MapReduce
HDFS
Data Store
HBase
Text Processing Engine &
Extractor Library)
BigSheets
JDBC
Applications & Development
Text Analytics
Administration
Index
Splittable Text
Compression
Enhanced
Security
Flexible
Scheduler
Jaql
Pig
ZooKeeper
Lucene
Oozie
Adaptive
MapReduce
Hive
Integrated
Installer
Admin Console
Sqoop
Adaptive Algorithms
Dashboard &
Visualization Apps Workflow Monitoring
Management
Security
Audit & History
Lineage
R
Guardium
Platform
Computing
Cognos
GPFS
IBMOpen Source
High
Availability
Big SQL
H Catalog
Whirr
Mahout
Hue
Added Value on Top of Open Source Hadoop
44. InfoSphere BigInsights Added Value
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors and
APIs
SQL API
45. InfoSphere BigInsights Added Value: Accelerators
Data Ingest
and Prep
Extract Buzz,
Intent , Sentiment
Entity
Analytics:
Profile
Resolution
Real time analytics.
Pre-defined views
and charts
Dashboard
Stream Computing and Analytics
BigInsights System and Analytics
Online flow: Data-in-motion analysis
Offline flow: Data-at-rest analysis
Pre-defined
Workbooks and
Dashboards
Social Media Data
Extract Buzz,
Intent , Sentiment
And Consumer
Profiles
Entity
Analytics and
Integration
Comprehensive
Social Media
Customer Profiles
Social Media
Optional: Indexed Search
Index using Push
API
Data Explorer
Ad hoc access
Social Data Analytics Accelerator Architecture
46. InfoSphere BigInsights Added Value: BigSheets
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors and
APIs
SQL API
BigSheets Visualization and
Exploration
• Web-based analysis and visualization
for Users
• Familiar spreadsheet-like interface
• Define and manage long running data
collection jobs
47. InfoSphere BigInsights Added Value: BigSheets
No programming knowledge needed!
How it works
Model “big data” collected
from various sources as
collections
Filter and enrich content
with built-in functions
Combine data in different
collections
Visualize results through
spreadsheets, charts
Export data into common
formats (if desired)
48. InfoSphere BigInsights Added Value: Dev Tools
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source
components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors
and APIs
SQL API
Development Environment
• Eclipse based dev environment
• Developer tools and a set of analytic
extractors for fast adoption and reduction
in coding and debugging time
• Plugin for Text Analytics, MapReduce
programming, Jaql development, Hive
query development, …. and more
57. BigInsights – Added Value: Security
Security
• LDAP authentication
• Support for PAM & Flat File configuration
• Administrators restrict access to authorized
users
• HTTPS support for the InfoSphere
BigInsights console, and reverse proxy.
• Role based access
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
Advanced Engines
Visualization & Exploration
Development Tools
Open source Hadoop
components
58. Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Transform
Filter / Sample
Classify
Correlate
Annotate
Where appropriate:
Elements can be fused together
for lower communication latency
Continuous ingestion
Continuous analysis
How Streams Works
59. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
60. The Future of Big Data and Cloud
SQL for Hadoop support improvements – towards full ANSI support
Hive
Impala (Cloudera)
Big SQL (IBM)
Stinger (Hortonworks)
Drill (MapR)
HAWQ (Pivotal)
SQL-H (Teradata)
Improvements in Multimedia Analytics
Growth in usage and adoption of R programming language
Cloud
Bare metal support helping with Hadoop workloads
Private network
Full support with APIs
61. Big SQL overview
Big SQL fully integrates with SQL
applications and BI tooling with
benefits including:
• Existing queries run with no or
few modifications
• Existing JDBC and ODBC
compliant tools can be
leveraged
• Applications do not have to
compensate for constraints of
Hive QL which may result in:
• more statements
• potentially moving more
data over the network to
the application
Data Sources
Hive Tables HBase Tables CSV Files
BigSQL Engine
BigInsights
Application
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Try it out!
Big SQL 3.0 Technology Preview: bigsql.imdemocloud.com
62. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
63. BigInsights on the Cloud - Making Learning Hadoop Easy
and FunM2M Demos (using Streams)
•The Connected Car Demo
– http://ausgsa.ibm.com/projects/c/connected_car/index.html
– http://m2m.demos.ibm.com/
YouTube IBM Big Data Channel
– http://www.youtube.com/user/ibmbigdata
Big Data University (bigdatauniversity.com)
64. Agenda
The state of Big Data adoption
Big Data – A holistic approach
The 5 high value Big Data use cases
Technical details of key Big Data components
The future of Big Data and Cloud
Demos
Resources
65. Flexible on-line delivery allows
learning @your place and
@your pace
Free courses, free study
materials.
Cloud-based sandbox for
exercises – zero setup with
Robust Course Management
System and Content
Distribution infrastructure
169,000 registered students.
Free IBM Hadoop, BigInsights
Publications
Big Data University (bigdatauniversity.com)
66. BigInsights on the Cloud - Making Learning Hadoop Easy
and FunQuick Start Editions available (Free, non-
production, no time bomb):
– IBM InfoSphere BigInsights (IBM’s Hadoop Distribution)
ibm.co/QuickStart
– IBM InfoSphere Streams
ibm.co/streamsqs
Big Data University (bigdatauniversity.com)
67. 67
My contact information
Contact Info:
Twitter: @raulchong
Facebook: facebook.com/raul.f.chong
LinkedIN: linkedin.com/pub/raul-f-chong/8/aa2/b63
My contact information