Complex Analytics with NoSQL Data Store in Real Time

Complex Analytics with NoSQL Data Store in
Real Time
Nested Queries, Projection,
Transactions and more
Nati Shalom
@natishalom
slideshare.net/giganati

What were here to discuss?
Making Sense of the Exploding Data
World
How that World Could Look Like if
Disk is no Longer the Bottleneck
Live Demo

Making Sense of The Exploding Data World

Capacity and Performance Drives
New Data Management Technologies
PB
TB
GB
Data Volume
Data Mining
Machine
Learning
Data
Business Intelligence
Warehouse High Throughput OLTP
Yr Mo Day Hr Min Sec MS μS
Data Velocity
Operational Intelligence
Exploratory Analytics
OLTP
Streaming

Let’s Look at
Tradeoffs of
Some Selected
Solutions

SQL Queries
• Query: SQL
• Semantics:
• CRUD
• Aggregation
• Projection
• Partial update
• Performance: 100’s/Sec
• Consistency: Transactional
• Scaling: Mostly Scale-UP
• Availability: Disk Based

NoSQL
• Query: Proprietary but rich
• Semantics:
• CRUD
• Limited Aggregation
(Map/Reduce)
• No Projection
• No Partial update
• Performance: 1000s/Sec
• Consistency: Eventual
• Scaling: Mostly Scale-Out
• Availability: Based on
replication

IMDG
• Query: Propriety but rich
• Semantics:
• CRUD
• Aggregation API +
Map/Reduce
• Projection (GigaSpaces)
• Partial Update
(GigaSpaces)
• Performance: 100k/sec
• Consistency: Transactional
• Availability: Replication

Key/Value
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic
• Availability: Limited (varies
quite substantially between
implementations)

Stream Processing (Storm)
• Semantics
– Event driven data processing
• Used for continues
updates
Spouts
– No need for a costly “SELECT
FOR UPDATE”
• Performance: 10’sM/sec
updates
Bolt

Common Assumption
Disk is the bottleneck
100X
10,000X
HDD Latency (Seek & Rotate) = Little Improvement
2010
Performance^10
2000 2020
Source: GigaOM Research

Capacity and Performance Drives
New Data Management Technologies
(Source: IDC, 2013)
Big Data (Hadoop)
NoSQL
In Memory,
Stream
Processing
RDBMS

There’s No One Size Fits All

A Typical App Looks Like This..
Front End Analytics
RT
STORM
Batch
The Data Flow
Complexity

What if Disk Was no Longer the
Bottleneck?
FLASH Closes the
CPU to Storage Gap

Our Application Cloud Look Like This..
Front End
High Speed
Data Store
(Using Flash/NVM)
Key/Value
SQL
Document
Graph
Map/Reduce
Transactional
Disk Becomes
the new Tape
StreamBase
Common Data Store serving
Multiple Semantics/API

We can use High Speed Data Bus for
Integrating All of our Data Sources
Front End Analytics
RT
STORM
Batch
High Speed
Data Bus
(Built-In
Caching)
RT
Transactional
Data Access
Direct Access
RT Streaming
Hadoop Synch
MySQL Synch
Mongo Synch

Designed for Transactional and
Analytics Scenarios..
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services

Many API’s – Same Data
Key/Value SQL Document Graph Map/Reduce Transactional

Fast Update …
Remains with strong consistency!

The Performance of RAM at a Cost/Capacity Closer to Disk
Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity
ZetaScale-GigaSpaces on SSDs
Stock GigaSpaces in DRAM
62
- 1KB object size and uniform distribution
- 2 sockets 2.8GHz CPU with total 24 cores,
CentOS 5.8, 2 FusionIO SLC PCIe cards RAID
- YCSB measurements performed by SanDisk
121
17
56
160
140
120
100
80
60
40
20
0
No Read / 100% Write 100 % Read / No Write
FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM
Assumptions: 1TB Flash = $2K; 1TB RAM = $20K
ZetaScale-GigaSpaces
1200
1000
800
600
400
200
ZetaScale™ – XAP MemoryXtend
1:50
20
1000
0
Capacity
XAP XAP Extend
242k Read/Sec

Data is Moving to Cloud
Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)

Orchestration needs to be integrated
into DataBase solution to make it
Cloud Ready

Click on the relevant box to get the demo
Many API’s Same
Data
Demo References
Data Bus (Integration
with Storm)
Built In Orchestration

Nati Shalom
Check out the slide on http://www.slideshare.net/giganati

Complex Analytics with NoSQL Data Store in Real Time

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (11)

Semelhante a Complex Analytics with NoSQL Data Store in Real Time

Semelhante a Complex Analytics with NoSQL Data Store in Real Time (20)

Mais de Nati Shalom

Mais de Nati Shalom (20)

Último

Último (20)

Complex Analytics with NoSQL Data Store in Real Time

Notas do Editor