SlideShare uma empresa Scribd logo
1 de 68
Baixar para ler offline
Raul F. Chong
Senior Big Data and Cloud Program Manager
Big Data University Community Leader
@raulchong
A holistic approach to Big Data
© 2013 BigDataUniversity.com
Agenda
 Introduction to Big Data
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
Agenda
 Introduction to Big Data
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
What is Big Data?
Big data are datasets that grow so large
that they become awkward to work with
using on-hand database management tools.
Difficulties include capture, storage, search,
sharing, analytics, and visualizing.
Source: Wikipedia
Big Data Characteristics
Information is growing at a phenomenal rate
as much data and content over coming decade
2009
800,000 petabytes
2020
35 zettabytes
=
4 Trillion 8GB iPods
44x
Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010
Big Data Characteristics
• About 80%of the world’s data is unstructured
• It may be data we’ve been collecting before, but could not
process
Types of Big Data
• Data in movement - streams
• Twitter / Facebook comments
• Stock market data
• Sensors: Vital signs of a newly-born
• Data at rest - oceans
• Collection of what has streamed
• Web logs, emails, social media
• Unstructured documents: forms, claims
• Structured data from disparate systems
IT
Structures the
data to answer
that question
IT
Delivers a platform to
enable creative
discovery
Business
Explores what questions
could be asked
Business Users
Determine what
question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data Approach
Iterative & Exploratory Analysis
Traditional Approach
Structured & Repeatable Analysis
Traditional vs. big data business approaches
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
Big Data Adoption Phases
Use of Big Data globally and in the financial sector
Multiple responses accepted
Big Data: In Demand Well Paying Skill
Skills are in Demand Pays well
“If you can claim to be a data
scientist and have the chops to back
that up, you can pretty much write
your own ticket even in this tough
job market.”
Source: Gigaom http://gigaom.com/cloud/big-data-skills-bring-big-dough/
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
15
KTH Swedish Royal Institute
of Technology Reducing
Traffic Congestion
• Deployed real-time Smarter Traffic system to
predict and improve traffic flow.
• Analyzes streaming real-time data gathered from
cameras at entry/exit to city, GPS data from taxis
and trucks, and weather information.
• Predicts best time and method to travel such as
when to leave to catch a flight at the airport
Results
• Enables ability to analyze and predict traffic
faster and more accurately than ever before
• Provides new insight into mechanisms that affect
a complex traffic system
• Smarter, more efficient, and more
environmentally friendly traffic
15
Benefits
 Real-time display of public sentiment as
candidates respond to questions
 Debate winner prediction based on public
opinion instead of solely political analysts
University of Southern
California Innovation
Lab Monitors Political
Debates
Big Data – A holistic approach
Big Data is Not Only Hadoop!
 Examples where Hadoop is not entirely applicable:
– Cyber security, Stock market, Traffic control, Sensor
information, monitoring trends in Social Media
– What if your company has many silos of information,
difficult to move to HDFS?
– What about governance? Can we trust the source of
this data?
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
The IBM Big Data Platform
Delivers deep insight
with advanced in-
database analytics &
operational analytics
Data
Warehouse
Data
Warehouse
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Stream
Computing
Data
Warehouse
Analyze streaming
data and large
data bursts for
real-time insightsStream
Computing
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
The IBM Big Data Platform
Hadoop
System
Stream
Computing
Data
Warehouse
Cost-effectively
analyze Petabytes
of unstructured and
structured data
Hadoop
System
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
22
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Govern data quality
and manage the
information lifecycle
Information Integration & Governance
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Speed time to
value with
analytic and
application
accelerators
Accelerators
Big data holistic approach: A platform
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Systems
Management
Application
Development
Visualization
& Discovery
The IBM Big Data Platform
Discover,
understand, search,
and navigate
federated sources
of big data
Visualization
& Discovery
Big data holistic approach: A platform
 Process any type of data
– Structured, unstructured, in-
motion, at-rest, in-place
 Built-for-purpose engines
– Designed to handle different
requirements
 Manage and govern data in the
ecosystem
 Enterprise data integration
 Grow and evolve on current
infrastructure
 The whole is greater than the sum
of parts
 Integrated components
 Out of the box, standards-based services
 Start small (value is additive)
25
Solutions
Big Data Platform
Analytics and Decision Management
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Systems
Management
Application
Development
Visualization
& Discovery
Big data holistic approach: A platform
ETL, MDM, Data Governance
Metadata and Governance Zone
Warehousing Zone
Enterprise
Warehouse
Data Marts
Ingestion and Real-time Analytic Zone
Streams
Connectors
BI &
Reporting
Predictive
Analytics
Analytics and
Reporting Zone
Visualization
& Discovery
Landing and Analytics Sandbox Zone
Hive/HBase
Col Stores
Documents
in variety of formats
MapReduce
Hadoop
An example of the big data platform in practice
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
Big Data Exploration
Find, visualize, understand
all big data to improve
business knowledge
Enhanced 360o View
of the Customer
Achieve a true unified view,
incorporating internal and
external sources
Security/Intelligence
Extension
Lower risk, detect fraud
and monitor cyber security
in real-time
Data Warehouse Augmentation
Integrate big data and data warehouse
capabilities to increase operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
The 5 High Value Big Data Use Cases
Find, visualize and
understand all big
data to improve
business
knowledge
• Greater efficiencies in
business processes
• New insights from
combining and
analyzing data types in
new ways
• Develop new business
models with resulting
increased market
presence and revenue
CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems
Connector
Framework
App Builder
Hadoop
Integration & Governance
UI / User
Streams
Big Data Exploration: Illustrated
WarehouseData Explorer
Big Data Exploration: Example in Practice
• Exploring 4 TB to drive point business solutions
(supplier portal, call center, etc.)
• Single-point of data fusion for all employees to use
• Reduced costs & improved operational performance for the business
 How do you enable employees to navigate and
explore enterprise and external content? Can you
present this in a single user interface?
 How do you identify areas of data risk before they
become a problem?
 What is the starting point for your big data initiatives?
Is Big Data Exploration Right for You?
 How do you separate the “noise” from useful
content?
 How do you perform data exploration on large
and complex data?
 How do you find insights in new or unstructured
data types (e.g. social media and email)?
Airplane Manufacturer
Blinded for confidentiality
Big Data Platform Component Starting Point: Data Explorer
Enhanced 360º View of the Customer: Illustrated
CRM
J Robertson
Pittsburgh, PA 15213
35 West 15th
Name:
Address:
Address:
ERP
Janet Robertson
Pittsburgh, PA 15213
35 West 15th St.
Name:
Address:
Address:
Legacy
Jan Robertson
Pittsburgh, PA 15213
36 West 15th St.
Name:
Address:
Address:
SOURCE SYSTEMS
Janet
35 West 15th St
Pittsburgh
Robertson
PA / 15213
F
48
1/4/64
First:
Last:
Address:
City:
State/Zip:
Gender:
Age:
DOB:
360 View of
Party Identity
Master
Data
Management
Unified View of Party’s Information
Hadoop Streams Warehouse
Logs
Events Alerts
Configuration
information
System
audit trails
External threat
intelligence feeds
Network flows
and anomalies
Identity
context
Web page
text
Video/audio
surveillance
E-mail and
social activity
Business
process data
Customer
transactions
Traditional Security
Operations and
Technology
Big Data
Analytics
New Considerations
Collection, Storage
and Processing
Collection and integration
Size and speed
Enrichment and correlation
Analytics and Workflow
Visualization
Unstructured analysis
Learning and prediction
Customization
Sharing and export
Security/Intelligence Extension: Illustrated
“Reconstructing Events” – Integrating Multimedia from Diverse Sources
• Correlate
multimedia
content across a
wide diversity of
sources and
dynamic topology
of cameras
• Exploit partial
overlaps in field
of view, re-
identification of
objects/people
and contextual
information
• Obtain real-time
operational
picture across
diverse content• 100K security cameras (static cameras, slowly changing topology)
• 10M mobile photos/day (limited knowledge about locations)
• 50M social media photos/video (uncertain geo-temporal context)
• Moving vehicles (patrol cars), overhead drones, broadcast, retail, 311, etc.
Overhead
Social MediaMobile
Cameras
Security
Cameras
33
Security/Intelligence Extension: Customer Example
 What are your plans to enrich your security or
intel system with unused or underleveraged
data sources (video, audio, smart devices,
network, Telco, social media)?
 How will you address the need sub second
detection, identification, resolution of physical
or cyber threats?
 How do you intend to follow activities of
criminals, terrorists, or persons in a blacklist?
 How do you plan to enhance your surveillance system
with real-time data from video, acoustic, thermal or
other security sensors?
 Do you want to correlate lots of technical or human
intel data and sources looking for associations or
patterns (big data forensics)?
 How are you going to deal with unstructured data
(email, social, etc.) in your Security Information &
Event Management (SIEM) solution to improve cyber
threat detection & remediation?
Would the Security / Intelligence Extension benefit you?
Captured and analyzed 42TB of daily traffic in real-time for tracking persons of
interest to take suitable action and reduce risk.
Big Data Platform Component Starting Point: Streams, Hadoop
RawLogsandMachineData
Indexing, Search
Statistical Modeling
Root Cause Analysis
Federated Navigation
& Discovery
Real-time Analysis
Only store
what is needed
Operations Analysis: Illustrated
Machine Data
Accelerator
1 http://www.information-management.com/infodirect/2009_133/downtime_cost-10015855-1.html
2 http://www.itchannelplanet.com/business_news/article.php/3916786/IT-System-Downtime-Costs-265-Billion-A-Year-Study-Finds.htm
Operations analysis is a Business Imperative
Cost of System Down Time
– 49% of Fortune 500 companies > 80 hrs down time/year1
• Cost of down time: $90,000/hr to $6.48 million/hr
• 80 hours * $6.48M = approx $500M per year
– System downtown costs North American businesses $26.5
billion a year in lost revenue2
Operations Analysis: Customer Example
• Intelligent Infrastructure Management: log analytics, energy bill
forecasting, energy consumption optimization, anomalous energy
usage detection, presence-aware energy management
• Optimized building energy consumption with centralized monitoring;
Automated preventive and corrective maintenance
• Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos
 Do you deal with large volumes of machine data?
 How do you access and search that data?
 How do you perform root cause analysis?
 How do you perform complex real-time analysis to
correlate across different data sets?
 How do you monitor and visualize streaming data
in real time and generate alerts?
Would Operations Analysis benefit you?
Big Data Platform Component Starting Point: Hadoop, Streams
Integrate big data and data warehouse
capabilities to increase operational efficiency
Data Warehouse Augmentation: Needs
Need to leverage variety of data Extend warehouse infrastructure
• Optimized storage, maintenance and licensing
costs by migrating rarely used data to Hadoop
• Reduced storage costs through smart
processing of streaming data
• Improved warehouse performance by
determining what data to feed into it
• Structured, unstructured, and streaming
data sources required for deep analysis
• Low latency requirements
(hours—not weeks or months)
• Required query access to data
Filter and summarize big data for the warehouse
Hadoop
Data Warehouse Augmentation: Illustrated
Hadoop as a query-ready archive for a data warehouse
Hadoop
Data Warehouse Augmentation: Illustrated
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
Open Source Hadoop
Visualization & Discovery Connectors
Workload Optimization
Flume
Runtime
Advanced Engines
File System
MapReduce
HDFS
Data Store
HBase
Development Tools
Eclipse Plug-ins
Systems Management
Jaql
Pig
ZooKeeper
Lucene
Oozie
Hive
Open Source
Mahout
Whirr
Sqoop
Hue
H Catalog
R
Visualization & Discovery Integration
Workload Optimization
Streams
Netezza
Flume
DB2
DataStage
IBM InfoSphere BigInsights v2.1 Enterprise Edition
Runtime
Advanced Analytic Engines
File System
MapReduce
HDFS
Data Store
HBase
Text Processing Engine &
Extractor Library)
BigSheets
JDBC
Applications & Development
Text Analytics
Administration
Index
Splittable Text
Compression
Enhanced
Security
Flexible
Scheduler
Jaql
Pig
ZooKeeper
Lucene
Oozie
Adaptive
MapReduce
Hive
Integrated
Installer
Admin Console
Sqoop
Adaptive Algorithms
Dashboard &
Visualization Apps Workflow Monitoring
Management
Security
Audit & History
Lineage
R
Guardium
Platform
Computing
Cognos
GPFS
IBMOpen Source
High
Availability
Big SQL
H Catalog
Whirr
Mahout
Hue
Added Value on Top of Open Source Hadoop
InfoSphere BigInsights Added Value
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors and
APIs
SQL API
InfoSphere BigInsights Added Value: Accelerators
Data Ingest
and Prep
Extract Buzz,
Intent , Sentiment
Entity
Analytics:
Profile
Resolution
Real time analytics.
Pre-defined views
and charts
Dashboard
Stream Computing and Analytics
BigInsights System and Analytics
Online flow: Data-in-motion analysis
Offline flow: Data-at-rest analysis
Pre-defined
Workbooks and
Dashboards
Social Media Data
Extract Buzz,
Intent , Sentiment
And Consumer
Profiles
Entity
Analytics and
Integration
Comprehensive
Social Media
Customer Profiles
Social Media
Optional: Indexed Search
Index using Push
API
Data Explorer
Ad hoc access
Social Data Analytics Accelerator Architecture
InfoSphere BigInsights Added Value: BigSheets
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors and
APIs
SQL API
BigSheets Visualization and
Exploration
• Web-based analysis and visualization
for Users
• Familiar spreadsheet-like interface
• Define and manage long running data
collection jobs
InfoSphere BigInsights Added Value: BigSheets
No programming knowledge needed!
How it works
 Model “big data” collected
from various sources as
collections
 Filter and enrich content
with built-in functions
 Combine data in different
collections
 Visualize results through
spreadsheets, charts
 Export data into common
formats (if desired)
InfoSphere BigInsights Added Value: Dev Tools
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source
components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Extraction engine (System T)
Visualization & Exploration
Extractors
and APIs
SQL API
Development Environment
• Eclipse based dev environment
• Developer tools and a set of analytic
extractors for fast adoption and reduction
in coding and debugging time
• Plugin for Text Analytics, MapReduce
programming, Jaql development, Hive
query development, …. and more
InfoSphere BigInsights Added Value: Dev Tools
How it works
• Built-in Apps make it easy to run Big
Data applications & tasks:
 Import and Export Data from a
Database or files
 Import and Export Web and
Social Data
 Perform Tex Analytics on
specified content
 Query HBase Content
 Query content stored in
BigInsights using Big SQL.
 Execute Pig or JAQL applications
• EXT E N S I B L E !! Build your own
applications and make them easy to
execute from an appealing
Application launcher
© 2013 IBM Corporation
InfoSphere BigInsights Added Value: Dev Tools
InfoSphere BigInsights Added Value: Text Analytics
51
Advanced Text Analytics Engine
Automatically identify and understand key
information in text
Football World Cup 2010, one team
distinguished themselves well, losing to
the eventual champions 1-0 in the Final.
Early in the second half, Netherlands’
striker, Arjen Robben, had a breakaway,
but the keeper for Spain, Iker Casillas
made the save. Winger Andres Iniesta
scored for Spain for the win.
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
Advanced Engines
Visualization & Exploration
Development Tools
Open source Hadoop
components
© 2013 IBM Corporation
© 2013 BigDataUniversity.com
Sentiments for movie Ra.One :-(
© 2013 BigDataUniversity.com
Architecture Diagram
AQL Text AnalyticsText Analytics
Optimizer
Text Analytics
RuntimeGraph (.aog)
Compiled
Operator
Graph (.aog)
Rule language with
familiar SQL-like syntax
Specify annotator
semantics declaratively
Choose an
efficient
execution plan
that implements
the semantics
Highly scalable,
embeddable
Java runtime
Input
Document
Stream
Annotated
Document
Stream
© 2013 BigDataUniversity.com
InfoSphere BigInsights – Added Value: Connectors
Connectors
• Databases
• DB2, Netezza, Oracle, Teradata
Integrations
• InfoSphere Data Stage
(data collection and integration)
• InfoSphere Streams
(real-time streams processing)
• InfoSphere Guardium
(security and monitoring)
• Cognos Business Intelligence
(Business Intelligence capabilities)
• IBM Platform Computing
(cluster/grid infrastructure and management)
and more…
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
Advanced Engines
Visualization & Exploration
Development Tools
Open source Hadoop
components
© 2013 BigDataUniversity.com
BigInsights – Added Value: Workload optimization
55
Task Map Adaptive
Map
Reduce
Hadoop System Scheduler
• Identifies small and large jobs from prior
experience
• Sequences work to reduce overhead
Adaptive MapReduce
• Drop-in replacement for Hadoop batch
scheduler
• Dramatic performance gains for latency-
sensitive application workloads
• Agile scheduling, dynamically adjust
priorities at run-time
© 2013 IBM Corporation
InfoSphere BigInsights
Administration & Security
Workload Optimization (MapReduce/SQL)
Connectors
Development Tools
IBM tested & supported
open source
components
Accelerators
Open source
based
components
Workload
Management
Security
Development
Environment
Analytics/Extractors
Analytics
Analytics Extraction Engine
Visualization & Exploration
Extractors
and APIs
SQL API
© 2013 BigDataUniversity.com
BigInsights – Added Value: Web Console
56
Web Console
• Start / stop services
• Run / monitor jobs (applications)
• Explore / modify file system
• Built in Apps simplify common tasks
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
Advanced Engines
Visualization & Exploration
Development Tools
Open source Hadoop
components
BigInsights – Added Value: Security
Security
• LDAP authentication
• Support for PAM & Flat File configuration
• Administrators restrict access to authorized
users
• HTTPS support for the InfoSphere
BigInsights console, and reverse proxy.
• Role based access
InfoSphere BigInsights
Administration & Security
Workload Optimization
Connectors
Advanced Engines
Visualization & Exploration
Development Tools
Open source Hadoop
components
Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Transform
Filter / Sample
Classify
Correlate
Annotate
Where appropriate:
Elements can be fused together
for lower communication latency
 Continuous ingestion
 Continuous analysis
How Streams Works
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
The Future of Big Data and Cloud
 SQL for Hadoop support improvements – towards full ANSI support
 Hive
 Impala (Cloudera)
 Big SQL (IBM)
 Stinger (Hortonworks)
 Drill (MapR)
 HAWQ (Pivotal)
 SQL-H (Teradata)
 Improvements in Multimedia Analytics
 Growth in usage and adoption of R programming language
 Cloud
 Bare metal support helping with Hadoop workloads
 Private network
 Full support with APIs
Big SQL overview
Big SQL fully integrates with SQL
applications and BI tooling with
benefits including:
• Existing queries run with no or
few modifications
• Existing JDBC and ODBC
compliant tools can be
leveraged
• Applications do not have to
compensate for constraints of
Hive QL which may result in:
• more statements
• potentially moving more
data over the network to
the application
Data Sources
Hive Tables HBase Tables CSV Files
BigSQL Engine
BigInsights
Application
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Try it out!
Big SQL 3.0 Technology Preview: bigsql.imdemocloud.com
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
BigInsights on the Cloud - Making Learning Hadoop Easy
and FunM2M Demos (using Streams)
•The Connected Car Demo
– http://ausgsa.ibm.com/projects/c/connected_car/index.html
– http://m2m.demos.ibm.com/
 YouTube IBM Big Data Channel
– http://www.youtube.com/user/ibmbigdata
Big Data University (bigdatauniversity.com)
Agenda
 The state of Big Data adoption
 Big Data – A holistic approach
 The 5 high value Big Data use cases
 Technical details of key Big Data components
 The future of Big Data and Cloud
 Demos
 Resources
 Flexible on-line delivery allows
learning @your place and
@your pace
 Free courses, free study
materials.
 Cloud-based sandbox for
exercises – zero setup with
Robust Course Management
System and Content
Distribution infrastructure
 169,000 registered students.
 Free IBM Hadoop, BigInsights
Publications
Big Data University (bigdatauniversity.com)
BigInsights on the Cloud - Making Learning Hadoop Easy
and FunQuick Start Editions available (Free, non-
production, no time bomb):
– IBM InfoSphere BigInsights (IBM’s Hadoop Distribution)
ibm.co/QuickStart
– IBM InfoSphere Streams
ibm.co/streamsqs
Big Data University (bigdatauniversity.com)
67
My contact information
Contact Info:
Twitter: @raulchong
Facebook: facebook.com/raul.f.chong
LinkedIN: linkedin.com/pub/raul-f-chong/8/aa2/b63
My contact information
Thank You!
© 2013 BigDataUniversity.com

Mais conteúdo relacionado

Mais procurados

What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overviewBisakha Praharaj
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
ParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream Inc.
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyDatabricks
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industryParviz Iskhakov
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...Databricks
 

Mais procurados (20)

What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overview
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
ParStream - Big Data for Business Users
ParStream - Big Data for Business UsersParStream - Big Data for Business Users
ParStream - Big Data for Business Users
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industry
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
 

Destaque

Native XML processing in C++ (BoostCon'11)
Native XML processing in C++ (BoostCon'11)Native XML processing in C++ (BoostCon'11)
Native XML processing in C++ (BoostCon'11)Sumant Tambe
 
Vasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONVasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONBigDataExpo
 
Cyberlaw and Cybercrime
Cyberlaw and CybercrimeCyberlaw and Cybercrime
Cyberlaw and CybercrimePravir Karna
 
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Holger Mueller
 
Things you should know about Scalability!
Things you should know about Scalability!Things you should know about Scalability!
Things you should know about Scalability!Robert Mederer
 
Fontys eric van tol
Fontys eric van tolFontys eric van tol
Fontys eric van tolBigDataExpo
 
Drive faster & better software delivery with performance monitoring & DevOps
Drive faster & better software delivery with performance monitoring & DevOpsDrive faster & better software delivery with performance monitoring & DevOps
Drive faster & better software delivery with performance monitoring & DevOpsVolker Linz
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Booktjelk1
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC
 
Polar bears and black bears
Polar bears and black bearsPolar bears and black bears
Polar bears and black bearsEmily Kissner
 
Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1Kohei Kumazawa
 
Revue de presse Telecom Valley - Juin 2016
Revue de presse Telecom Valley - Juin 2016Revue de presse Telecom Valley - Juin 2016
Revue de presse Telecom Valley - Juin 2016TelecomValley
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth gradeEmily Kissner
 
AWSome Day - Milan, July 24th 2014
AWSome Day - Milan, July 24th 2014AWSome Day - Milan, July 24th 2014
AWSome Day - Milan, July 24th 2014Amazon Web Services
 

Destaque (20)

Waarom ontwikkelt elk kind zich anders - prof. dr. Frank Verhulst
Waarom ontwikkelt elk kind zich anders - prof. dr. Frank VerhulstWaarom ontwikkelt elk kind zich anders - prof. dr. Frank Verhulst
Waarom ontwikkelt elk kind zich anders - prof. dr. Frank Verhulst
 
Native XML processing in C++ (BoostCon'11)
Native XML processing in C++ (BoostCon'11)Native XML processing in C++ (BoostCon'11)
Native XML processing in C++ (BoostCon'11)
 
Vasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGONVasilis Bankov & Calin Iliescu AEGON
Vasilis Bankov & Calin Iliescu AEGON
 
Cyberlaw and Cybercrime
Cyberlaw and CybercrimeCyberlaw and Cybercrime
Cyberlaw and Cybercrime
 
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
Oracle OpenWorld - A quick take on all 22 press releases of Day #1 - #3
 
Rb wilmer peres
Rb wilmer peresRb wilmer peres
Rb wilmer peres
 
Things you should know about Scalability!
Things you should know about Scalability!Things you should know about Scalability!
Things you should know about Scalability!
 
Fontys eric van tol
Fontys eric van tolFontys eric van tol
Fontys eric van tol
 
Drive faster & better software delivery with performance monitoring & DevOps
Drive faster & better software delivery with performance monitoring & DevOpsDrive faster & better software delivery with performance monitoring & DevOps
Drive faster & better software delivery with performance monitoring & DevOps
 
ecdevday7
ecdevday7ecdevday7
ecdevday7
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Book
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
 
Polar bears and black bears
Polar bears and black bearsPolar bears and black bears
Polar bears and black bears
 
Andreas weigend
Andreas weigendAndreas weigend
Andreas weigend
 
Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1Developers Summit 2012 16-E-1
Developers Summit 2012 16-E-1
 
Revue de presse Telecom Valley - Juin 2016
Revue de presse Telecom Valley - Juin 2016Revue de presse Telecom Valley - Juin 2016
Revue de presse Telecom Valley - Juin 2016
 
okspring3x
okspring3xokspring3x
okspring3x
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth grade
 
AWSome Day - Milan, July 24th 2014
AWSome Day - Milan, July 24th 2014AWSome Day - Milan, July 24th 2014
AWSome Day - Milan, July 24th 2014
 

Semelhante a 02 a holistic approach to big data

Big data insights part i
Big data insights   part iBig data insights   part i
Big data insights part iRaji Gogulapati
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningEmran Hossain
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 

Semelhante a 02 a holistic approach to big data (20)

Big data insights part i
Big data insights   part iBig data insights   part i
Big data insights part i
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
Manoj Kolhe - Presentation - ITW_PPT_Big_Data_Testingv1.6
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 

Mais de Raul Chong

Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong
 
Design thinking
Design thinkingDesign thinking
Design thinkingRaul Chong
 
Risk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical IntroductionRisk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical IntroductionRaul Chong
 
Introducing Bluemix
Introducing BluemixIntroducing Bluemix
Introducing BluemixRaul Chong
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Raul Chong
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionRaul Chong
 
What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?Raul Chong
 
SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!Raul Chong
 
Starting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data UniversityStarting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data UniversityRaul Chong
 
Developing wearable technology apps quickly
Developing wearable technology apps quicklyDeveloping wearable technology apps quickly
Developing wearable technology apps quicklyRaul Chong
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Mobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - CloudantMobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - CloudantRaul Chong
 
Mobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - WorklightMobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - WorklightRaul Chong
 
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Raul Chong
 
An Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseAn Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseRaul Chong
 
0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_tRaul Chong
 
0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_finalRaul Chong
 

Mais de Raul Chong (17)

Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Design thinking
Design thinkingDesign thinking
Design thinking
 
Risk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical IntroductionRisk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical Introduction
 
Introducing Bluemix
Introducing BluemixIntroducing Bluemix
Introducing Bluemix
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
 
What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?
 
SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!
 
Starting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data UniversityStarting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data University
 
Developing wearable technology apps quickly
Developing wearable technology apps quicklyDeveloping wearable technology apps quickly
Developing wearable technology apps quickly
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Mobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - CloudantMobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - Cloudant
 
Mobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - WorklightMobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - Worklight
 
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
 
An Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseAn Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use case
 
0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t
 
0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

02 a holistic approach to big data

  • 1. Raul F. Chong Senior Big Data and Cloud Program Manager Big Data University Community Leader @raulchong A holistic approach to Big Data © 2013 BigDataUniversity.com
  • 2. Agenda  Introduction to Big Data  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 3. Agenda  Introduction to Big Data  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 4. What is Big Data? Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. Source: Wikipedia
  • 5. Big Data Characteristics Information is growing at a phenomenal rate as much data and content over coming decade 2009 800,000 petabytes 2020 35 zettabytes = 4 Trillion 8GB iPods 44x Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010
  • 6. Big Data Characteristics • About 80%of the world’s data is unstructured • It may be data we’ve been collecting before, but could not process
  • 7. Types of Big Data • Data in movement - streams • Twitter / Facebook comments • Stock market data • Sensors: Vital signs of a newly-born • Data at rest - oceans • Collection of what has streamed • Web logs, emails, social media • Unstructured documents: forms, claims • Structured data from disparate systems
  • 8. IT Structures the data to answer that question IT Delivers a platform to enable creative discovery Business Explores what questions could be asked Business Users Determine what question to ask Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization Big Data Approach Iterative & Exploratory Analysis Traditional Approach Structured & Repeatable Analysis Traditional vs. big data business approaches
  • 9. Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Fraud and Risk Log Analysis Search Quality Retail: Churn, NBO
  • 10. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 12. Use of Big Data globally and in the financial sector Multiple responses accepted
  • 13. Big Data: In Demand Well Paying Skill Skills are in Demand Pays well “If you can claim to be a data scientist and have the chops to back that up, you can pretty much write your own ticket even in this tough job market.” Source: Gigaom http://gigaom.com/cloud/big-data-skills-bring-big-dough/
  • 14. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 15. 15 KTH Swedish Royal Institute of Technology Reducing Traffic Congestion • Deployed real-time Smarter Traffic system to predict and improve traffic flow. • Analyzes streaming real-time data gathered from cameras at entry/exit to city, GPS data from taxis and trucks, and weather information. • Predicts best time and method to travel such as when to leave to catch a flight at the airport Results • Enables ability to analyze and predict traffic faster and more accurately than ever before • Provides new insight into mechanisms that affect a complex traffic system • Smarter, more efficient, and more environmentally friendly traffic 15
  • 16. Benefits  Real-time display of public sentiment as candidates respond to questions  Debate winner prediction based on public opinion instead of solely political analysts University of Southern California Innovation Lab Monitors Political Debates
  • 17. Big Data – A holistic approach Big Data is Not Only Hadoop!  Examples where Hadoop is not entirely applicable: – Cyber security, Stock market, Traffic control, Sensor information, monitoring trends in Social Media – What if your company has many silos of information, difficult to move to HDFS? – What about governance? Can we trust the source of this data?
  • 18. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure Big data holistic approach: A platform
  • 19. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure The IBM Big Data Platform Delivers deep insight with advanced in- database analytics & operational analytics Data Warehouse Data Warehouse Big data holistic approach: A platform
  • 20. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure Stream Computing Data Warehouse Analyze streaming data and large data bursts for real-time insightsStream Computing Big data holistic approach: A platform
  • 21. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure The IBM Big Data Platform Hadoop System Stream Computing Data Warehouse Cost-effectively analyze Petabytes of unstructured and structured data Hadoop System Big data holistic approach: A platform
  • 22. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure 22 Information Integration & Governance Hadoop System Stream Computing Data Warehouse Govern data quality and manage the information lifecycle Information Integration & Governance Big data holistic approach: A platform
  • 23. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Speed time to value with analytic and application accelerators Accelerators Big data holistic approach: A platform
  • 24. Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Systems Management Application Development Visualization & Discovery The IBM Big Data Platform Discover, understand, search, and navigate federated sources of big data Visualization & Discovery Big data holistic approach: A platform
  • 25.  Process any type of data – Structured, unstructured, in- motion, at-rest, in-place  Built-for-purpose engines – Designed to handle different requirements  Manage and govern data in the ecosystem  Enterprise data integration  Grow and evolve on current infrastructure  The whole is greater than the sum of parts  Integrated components  Out of the box, standards-based services  Start small (value is additive) 25 Solutions Big Data Platform Analytics and Decision Management Big Data Infrastructure Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Systems Management Application Development Visualization & Discovery Big data holistic approach: A platform
  • 26. ETL, MDM, Data Governance Metadata and Governance Zone Warehousing Zone Enterprise Warehouse Data Marts Ingestion and Real-time Analytic Zone Streams Connectors BI & Reporting Predictive Analytics Analytics and Reporting Zone Visualization & Discovery Landing and Analytics Sandbox Zone Hive/HBase Col Stores Documents in variety of formats MapReduce Hadoop An example of the big data platform in practice
  • 27. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 28. Big Data Exploration Find, visualize, understand all big data to improve business knowledge Enhanced 360o View of the Customer Achieve a true unified view, incorporating internal and external sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results The 5 High Value Big Data Use Cases
  • 29. Find, visualize and understand all big data to improve business knowledge • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems Connector Framework App Builder Hadoop Integration & Governance UI / User Streams Big Data Exploration: Illustrated WarehouseData Explorer
  • 30. Big Data Exploration: Example in Practice • Exploring 4 TB to drive point business solutions (supplier portal, call center, etc.) • Single-point of data fusion for all employees to use • Reduced costs & improved operational performance for the business  How do you enable employees to navigate and explore enterprise and external content? Can you present this in a single user interface?  How do you identify areas of data risk before they become a problem?  What is the starting point for your big data initiatives? Is Big Data Exploration Right for You?  How do you separate the “noise” from useful content?  How do you perform data exploration on large and complex data?  How do you find insights in new or unstructured data types (e.g. social media and email)? Airplane Manufacturer Blinded for confidentiality Big Data Platform Component Starting Point: Data Explorer
  • 31. Enhanced 360º View of the Customer: Illustrated CRM J Robertson Pittsburgh, PA 15213 35 West 15th Name: Address: Address: ERP Janet Robertson Pittsburgh, PA 15213 35 West 15th St. Name: Address: Address: Legacy Jan Robertson Pittsburgh, PA 15213 36 West 15th St. Name: Address: Address: SOURCE SYSTEMS Janet 35 West 15th St Pittsburgh Robertson PA / 15213 F 48 1/4/64 First: Last: Address: City: State/Zip: Gender: Age: DOB: 360 View of Party Identity Master Data Management Unified View of Party’s Information Hadoop Streams Warehouse
  • 32. Logs Events Alerts Configuration information System audit trails External threat intelligence feeds Network flows and anomalies Identity context Web page text Video/audio surveillance E-mail and social activity Business process data Customer transactions Traditional Security Operations and Technology Big Data Analytics New Considerations Collection, Storage and Processing Collection and integration Size and speed Enrichment and correlation Analytics and Workflow Visualization Unstructured analysis Learning and prediction Customization Sharing and export Security/Intelligence Extension: Illustrated
  • 33. “Reconstructing Events” – Integrating Multimedia from Diverse Sources • Correlate multimedia content across a wide diversity of sources and dynamic topology of cameras • Exploit partial overlaps in field of view, re- identification of objects/people and contextual information • Obtain real-time operational picture across diverse content• 100K security cameras (static cameras, slowly changing topology) • 10M mobile photos/day (limited knowledge about locations) • 50M social media photos/video (uncertain geo-temporal context) • Moving vehicles (patrol cars), overhead drones, broadcast, retail, 311, etc. Overhead Social MediaMobile Cameras Security Cameras 33
  • 34. Security/Intelligence Extension: Customer Example  What are your plans to enrich your security or intel system with unused or underleveraged data sources (video, audio, smart devices, network, Telco, social media)?  How will you address the need sub second detection, identification, resolution of physical or cyber threats?  How do you intend to follow activities of criminals, terrorists, or persons in a blacklist?  How do you plan to enhance your surveillance system with real-time data from video, acoustic, thermal or other security sensors?  Do you want to correlate lots of technical or human intel data and sources looking for associations or patterns (big data forensics)?  How are you going to deal with unstructured data (email, social, etc.) in your Security Information & Event Management (SIEM) solution to improve cyber threat detection & remediation? Would the Security / Intelligence Extension benefit you? Captured and analyzed 42TB of daily traffic in real-time for tracking persons of interest to take suitable action and reduce risk. Big Data Platform Component Starting Point: Streams, Hadoop
  • 35. RawLogsandMachineData Indexing, Search Statistical Modeling Root Cause Analysis Federated Navigation & Discovery Real-time Analysis Only store what is needed Operations Analysis: Illustrated Machine Data Accelerator
  • 36. 1 http://www.information-management.com/infodirect/2009_133/downtime_cost-10015855-1.html 2 http://www.itchannelplanet.com/business_news/article.php/3916786/IT-System-Downtime-Costs-265-Billion-A-Year-Study-Finds.htm Operations analysis is a Business Imperative Cost of System Down Time – 49% of Fortune 500 companies > 80 hrs down time/year1 • Cost of down time: $90,000/hr to $6.48 million/hr • 80 hours * $6.48M = approx $500M per year – System downtown costs North American businesses $26.5 billion a year in lost revenue2
  • 37. Operations Analysis: Customer Example • Intelligent Infrastructure Management: log analytics, energy bill forecasting, energy consumption optimization, anomalous energy usage detection, presence-aware energy management • Optimized building energy consumption with centralized monitoring; Automated preventive and corrective maintenance • Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos  Do you deal with large volumes of machine data?  How do you access and search that data?  How do you perform root cause analysis?  How do you perform complex real-time analysis to correlate across different data sets?  How do you monitor and visualize streaming data in real time and generate alerts? Would Operations Analysis benefit you? Big Data Platform Component Starting Point: Hadoop, Streams
  • 38. Integrate big data and data warehouse capabilities to increase operational efficiency Data Warehouse Augmentation: Needs Need to leverage variety of data Extend warehouse infrastructure • Optimized storage, maintenance and licensing costs by migrating rarely used data to Hadoop • Reduced storage costs through smart processing of streaming data • Improved warehouse performance by determining what data to feed into it • Structured, unstructured, and streaming data sources required for deep analysis • Low latency requirements (hours—not weeks or months) • Required query access to data
  • 39. Filter and summarize big data for the warehouse Hadoop Data Warehouse Augmentation: Illustrated
  • 40. Hadoop as a query-ready archive for a data warehouse Hadoop Data Warehouse Augmentation: Illustrated
  • 41. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 42. Open Source Hadoop Visualization & Discovery Connectors Workload Optimization Flume Runtime Advanced Engines File System MapReduce HDFS Data Store HBase Development Tools Eclipse Plug-ins Systems Management Jaql Pig ZooKeeper Lucene Oozie Hive Open Source Mahout Whirr Sqoop Hue H Catalog R
  • 43. Visualization & Discovery Integration Workload Optimization Streams Netezza Flume DB2 DataStage IBM InfoSphere BigInsights v2.1 Enterprise Edition Runtime Advanced Analytic Engines File System MapReduce HDFS Data Store HBase Text Processing Engine & Extractor Library) BigSheets JDBC Applications & Development Text Analytics Administration Index Splittable Text Compression Enhanced Security Flexible Scheduler Jaql Pig ZooKeeper Lucene Oozie Adaptive MapReduce Hive Integrated Installer Admin Console Sqoop Adaptive Algorithms Dashboard & Visualization Apps Workflow Monitoring Management Security Audit & History Lineage R Guardium Platform Computing Cognos GPFS IBMOpen Source High Availability Big SQL H Catalog Whirr Mahout Hue Added Value on Top of Open Source Hadoop
  • 44. InfoSphere BigInsights Added Value InfoSphere BigInsights Administration & Security Workload Optimization (MapReduce/SQL) Connectors Development Tools IBM tested & supported open source components Accelerators Open source based components Workload Management Security Development Environment Analytics/Extractors Analytics Extraction engine (System T) Visualization & Exploration Extractors and APIs SQL API
  • 45. InfoSphere BigInsights Added Value: Accelerators Data Ingest and Prep Extract Buzz, Intent , Sentiment Entity Analytics: Profile Resolution Real time analytics. Pre-defined views and charts Dashboard Stream Computing and Analytics BigInsights System and Analytics Online flow: Data-in-motion analysis Offline flow: Data-at-rest analysis Pre-defined Workbooks and Dashboards Social Media Data Extract Buzz, Intent , Sentiment And Consumer Profiles Entity Analytics and Integration Comprehensive Social Media Customer Profiles Social Media Optional: Indexed Search Index using Push API Data Explorer Ad hoc access Social Data Analytics Accelerator Architecture
  • 46. InfoSphere BigInsights Added Value: BigSheets InfoSphere BigInsights Administration & Security Workload Optimization (MapReduce/SQL) Connectors Development Tools IBM tested & supported open source components Accelerators Open source based components Workload Management Security Development Environment Analytics/Extractors Analytics Extraction engine (System T) Visualization & Exploration Extractors and APIs SQL API BigSheets Visualization and Exploration • Web-based analysis and visualization for Users • Familiar spreadsheet-like interface • Define and manage long running data collection jobs
  • 47. InfoSphere BigInsights Added Value: BigSheets No programming knowledge needed! How it works  Model “big data” collected from various sources as collections  Filter and enrich content with built-in functions  Combine data in different collections  Visualize results through spreadsheets, charts  Export data into common formats (if desired)
  • 48. InfoSphere BigInsights Added Value: Dev Tools InfoSphere BigInsights Administration & Security Workload Optimization (MapReduce/SQL) Connectors Development Tools IBM tested & supported open source components Accelerators Open source based components Workload Management Security Development Environment Analytics/Extractors Analytics Extraction engine (System T) Visualization & Exploration Extractors and APIs SQL API Development Environment • Eclipse based dev environment • Developer tools and a set of analytic extractors for fast adoption and reduction in coding and debugging time • Plugin for Text Analytics, MapReduce programming, Jaql development, Hive query development, …. and more
  • 49. InfoSphere BigInsights Added Value: Dev Tools How it works • Built-in Apps make it easy to run Big Data applications & tasks:  Import and Export Data from a Database or files  Import and Export Web and Social Data  Perform Tex Analytics on specified content  Query HBase Content  Query content stored in BigInsights using Big SQL.  Execute Pig or JAQL applications • EXT E N S I B L E !! Build your own applications and make them easy to execute from an appealing Application launcher © 2013 IBM Corporation
  • 50. InfoSphere BigInsights Added Value: Dev Tools
  • 51. InfoSphere BigInsights Added Value: Text Analytics 51 Advanced Text Analytics Engine Automatically identify and understand key information in text Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. InfoSphere BigInsights Administration & Security Workload Optimization Connectors Advanced Engines Visualization & Exploration Development Tools Open source Hadoop components © 2013 IBM Corporation
  • 53. © 2013 BigDataUniversity.com Architecture Diagram AQL Text AnalyticsText Analytics Optimizer Text Analytics RuntimeGraph (.aog) Compiled Operator Graph (.aog) Rule language with familiar SQL-like syntax Specify annotator semantics declaratively Choose an efficient execution plan that implements the semantics Highly scalable, embeddable Java runtime Input Document Stream Annotated Document Stream
  • 54. © 2013 BigDataUniversity.com InfoSphere BigInsights – Added Value: Connectors Connectors • Databases • DB2, Netezza, Oracle, Teradata Integrations • InfoSphere Data Stage (data collection and integration) • InfoSphere Streams (real-time streams processing) • InfoSphere Guardium (security and monitoring) • Cognos Business Intelligence (Business Intelligence capabilities) • IBM Platform Computing (cluster/grid infrastructure and management) and more… InfoSphere BigInsights Administration & Security Workload Optimization Connectors Advanced Engines Visualization & Exploration Development Tools Open source Hadoop components
  • 55. © 2013 BigDataUniversity.com BigInsights – Added Value: Workload optimization 55 Task Map Adaptive Map Reduce Hadoop System Scheduler • Identifies small and large jobs from prior experience • Sequences work to reduce overhead Adaptive MapReduce • Drop-in replacement for Hadoop batch scheduler • Dramatic performance gains for latency- sensitive application workloads • Agile scheduling, dynamically adjust priorities at run-time © 2013 IBM Corporation InfoSphere BigInsights Administration & Security Workload Optimization (MapReduce/SQL) Connectors Development Tools IBM tested & supported open source components Accelerators Open source based components Workload Management Security Development Environment Analytics/Extractors Analytics Analytics Extraction Engine Visualization & Exploration Extractors and APIs SQL API
  • 56. © 2013 BigDataUniversity.com BigInsights – Added Value: Web Console 56 Web Console • Start / stop services • Run / monitor jobs (applications) • Explore / modify file system • Built in Apps simplify common tasks InfoSphere BigInsights Administration & Security Workload Optimization Connectors Advanced Engines Visualization & Exploration Development Tools Open source Hadoop components
  • 57. BigInsights – Added Value: Security Security • LDAP authentication • Support for PAM & Flat File configuration • Administrators restrict access to authorized users • HTTPS support for the InfoSphere BigInsights console, and reverse proxy. • Role based access InfoSphere BigInsights Administration & Security Workload Optimization Connectors Advanced Engines Visualization & Exploration Development Tools Open source Hadoop components
  • 58. Achieve scale: By partitioning applications into software components By distributing across stream-connected hardware hosts Infrastructure provides services for Scheduling analytics across hardware hosts, Establishing streaming connectivity Transform Filter / Sample Classify Correlate Annotate Where appropriate: Elements can be fused together for lower communication latency  Continuous ingestion  Continuous analysis How Streams Works
  • 59. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 60. The Future of Big Data and Cloud  SQL for Hadoop support improvements – towards full ANSI support  Hive  Impala (Cloudera)  Big SQL (IBM)  Stinger (Hortonworks)  Drill (MapR)  HAWQ (Pivotal)  SQL-H (Teradata)  Improvements in Multimedia Analytics  Growth in usage and adoption of R programming language  Cloud  Bare metal support helping with Hadoop workloads  Private network  Full support with APIs
  • 61. Big SQL overview Big SQL fully integrates with SQL applications and BI tooling with benefits including: • Existing queries run with no or few modifications • Existing JDBC and ODBC compliant tools can be leveraged • Applications do not have to compensate for constraints of Hive QL which may result in: • more statements • potentially moving more data over the network to the application Data Sources Hive Tables HBase Tables CSV Files BigSQL Engine BigInsights Application SQL Language JDBC / ODBC Driver JDBC / ODBC Server Try it out! Big SQL 3.0 Technology Preview: bigsql.imdemocloud.com
  • 62. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 63. BigInsights on the Cloud - Making Learning Hadoop Easy and FunM2M Demos (using Streams) •The Connected Car Demo – http://ausgsa.ibm.com/projects/c/connected_car/index.html – http://m2m.demos.ibm.com/  YouTube IBM Big Data Channel – http://www.youtube.com/user/ibmbigdata Big Data University (bigdatauniversity.com)
  • 64. Agenda  The state of Big Data adoption  Big Data – A holistic approach  The 5 high value Big Data use cases  Technical details of key Big Data components  The future of Big Data and Cloud  Demos  Resources
  • 65.  Flexible on-line delivery allows learning @your place and @your pace  Free courses, free study materials.  Cloud-based sandbox for exercises – zero setup with Robust Course Management System and Content Distribution infrastructure  169,000 registered students.  Free IBM Hadoop, BigInsights Publications Big Data University (bigdatauniversity.com)
  • 66. BigInsights on the Cloud - Making Learning Hadoop Easy and FunQuick Start Editions available (Free, non- production, no time bomb): – IBM InfoSphere BigInsights (IBM’s Hadoop Distribution) ibm.co/QuickStart – IBM InfoSphere Streams ibm.co/streamsqs Big Data University (bigdatauniversity.com)
  • 67. 67 My contact information Contact Info: Twitter: @raulchong Facebook: facebook.com/raul.f.chong LinkedIN: linkedin.com/pub/raul-f-chong/8/aa2/b63 My contact information
  • 68. Thank You! © 2013 BigDataUniversity.com