Big Data = Big Decisions

BIG DATA = BIG DECISIONS

Bob Zurek | SVP Products | Epsilon | www.epsilon.com

Consider the following:
• New model for data
• Accessible over TCP/IP and variety of languages
• Initially difficult to understand
• Capable of processing thousands of ops/sec
• Very different from old model
• Threatening as much was invested in old model
• Changing course seems ridiculous

Source: Eben Hewitt

IBM IMS

“IMS is IBM's premier transaction and hierarchical database
management system, virtually unsurpassed in database and
transaction processing availability and speed” – IBM 2013

“Mission-critical processing that requires unparalleled
performance is best served by a hierarchical model. Analytics
and business intelligence are best served by a relational
model. Most Fortune 100 companies use both.”

Source: IBM

Data evolution

A New Model Is Invented

A Disruptive Model

A Threatening Model

A Competitive Model

Source: Eben Hewitt

The relational model & SQL

A HUGE industry success

innovation complexity

confusion
a new model
disruption
fierce competition

Sound familiar?

Big data – a growing torrent
$600 to buy a disk drive that can
store all of the world’s music

5 billion mobile phones
in use in 2010

30 pieces of content shared
on Facebook every month

billion 40% projected growth in global data

generated per year vs.5%
235 terabytes data collected by the
U.S. Library of Congress by April 2011
growth in global
IT spending

15 out of 17
sectors in the United States have more data
stored per company than the U.S. Library of Congress

Source: McKinsey

Industry buzz

What is
big data,
exactly?

Big data confusion?

What do business executives
think “big data” is?

A greater scope of information 18%
New kinds of data and analysis 16%
Real-time information 15%
Data influx from new technologies 13%
Non-traditional forms of media 13%
Large volumes of data 10%
The latest buzzword 8%
Social media data 7%

Source: IBM

Big data is…

Large pools of data
that can be captured,
communicated,
aggregated, stored,
and analyzed

Source: McKinsey

Another way of looking at it

Source: TDWI

Is it time to look
for an alternative?

It’s not that simple,
is it?

How are we solving (historically)?
• Vertical scaling = throw hardware at it
• Optimize the application = sql, indexes, access
• Employ caching layers = MemcacheD, Coherence
• Denormalization = reduce joins
• Sharding/Shared Nothing = split the data up
• Innovation = columnar

What’s driving
change and
innovation?

Big data innovation incubated
Big data innovation incubated
A search engine project at Yahoo
Doug Cutting = Nutch
Google = GFS and GMR

eBay erected a Hadoop cluster
spanning 530 servers –
now five times the size!

“Hadoop is an amazing
technology stack. We now
depend on it to run eBay.”
Bob Page,
Vice President of Analytics, eBay

Source: http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/

It can get complex
and confusing

“It replaced our need
for ETL”

“It is great for batch
processing in parallel”

“A beautiful platform
for all of problems”

What it’s not good for

• High volume transactional data
• Structured data with low latency

“Note that Hadoop is not an Extract-Transform-
Load (ETL) tool. It is a platform that supports
running ETL processes in parallel. The data
integration vendors do not compete with
Hadoop; rather, Hadoop is another channel
for use of their data transformation modules. “
Teradata/Cloudera Presentation

What it’s really good for

• Index building
• Pattern recognitions
• Sentiment analysis
• Machine generated data
• Log processing
• Web scale = Google, Twitter,
YouTube

Use Cases
Fraud Detection
Spot fraud anomolies
Mobile Data
Process mobile data
Online Travel Reservations IT Security
Travel booking Analyze machine generated data

Image Processing E-Commerce
Large marketplaces
Detecting patterns in sat imagery

HealthCare
Energy Discovery Semantic analysis for relevance
Sort and process seismic data

Energy Savings
Infrastructure Management Suggest ways customers save money
Collecting device logs

Many shades of grey and
lots of great innovations

Relational is still in play
Some innovations worth a look
Dynamically Scaling OLTP = “No Need To Shard”

The NoSQL generation

• Document Storage Model • Released by NSA to open source
• Allows MTV to store • Apache Accumulo
hierarchical data • Based on Google Big Table
• Flexible schema to model • Built on top of Hadoop
structure/data by brand • Fine-grained access control
• Needed to have ability • Cell level security
to query nested content • Server side programming
• No need for a shared
disk storage

Why NoSQL?

• Schemaless model = Easy to to add fields
• Document oriented = Json format (think objects)
• Built from the ground up to be distributed
• Auto sharding
• Distributed querying capabilities

NoSQL Use Case

1. Click/Event into Hadoop

2. Data Analyzed via Map Reduce jobs;
generates 100M profiles based on
campaigns running

3. Selected profiles loaded into Couch

4. Ad targeting logic query Couch with
sub-second latency to optimize
decision and real-time ad placement

Source: Couchbase

Hadoop Augmentation
• Side-by-Side will be commonplace
• ETL solutions support Hadoop
• Relational Databases
• Provide ETL interfaces to Hadoop
• Execute map/reduce jobs inside DBMS
• NoSQL supports ETL

Example Hybrid DBMS Systems
Oracle Endeca Server
• Hybrid Search/Analytic Database
• Supports structured, unstructured, semi-structured
• No schema required. Records stacked.
• Columnar

Trends
• SQL On Hadoop – Hadapt, Clodera Impala, EMC
• Unified Support of Structured, Unstructured, Semi
• Embedding Search
• Expanded ETL/ELT Support
• Big Data In Motion Takes Hold
• Added Data Mining and Analytic Functions In NoSQL
• Embedding R Language = gain in popularity
• Data Scientists instrumental in business success

Bob Zurek | bzurek@epsilon.com

Big Data = Big Decisions

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Big Data = Big Decisions

Semelhante a Big Data = Big Decisions (20)

Mais de InnoTech

Mais de InnoTech (20)

Último

Último (20)

Big Data = Big Decisions