NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
CNIC Information System with Pakdata Cf In Pakistan
Complex Analytics with NoSQL Data Store in Real Time
1. Complex Analytics with NoSQL Data Store in
Real Time
Nested Queries, Projection,
Transactions and more
Nati Shalom
@natishalom
slideshare.net/giganati
2. What were here to discuss?
Making Sense of the Exploding Data
World
How that World Could Look Like if
Disk is no Longer the Bottleneck
Live Demo
4. Capacity and Performance Drives
New Data Management Technologies
PB
TB
GB
Data Volume
Data Mining
Machine
Learning
Data
Business Intelligence
Warehouse High Throughput OLTP
Yr Mo Day Hr Min Sec MS μS
Data Velocity
Operational Intelligence
Exploratory Analytics
OLTP
Streaming
9. Key/Value
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic
• Scaling: Mostly Scale-Out
• Availability: Limited (varies
quite substantially between
implementations)
10. Stream Processing (Storm)
• Semantics
– Event driven data processing
• Used for continues
updates
Spouts
– No need for a costly “SELECT
FOR UPDATE”
• Performance: 10’sM/sec
updates
Bolt
11. Common Assumption
Disk is the bottleneck
100X
10,000X
HDD Latency (Seek & Rotate) = Little Improvement
2010
Performance^10
2000 2020
Source: GigaOM Research
12. Capacity and Performance Drives
New Data Management Technologies
(Source: IDC, 2013)
Big Data (Hadoop)
NoSQL
In Memory,
Stream
Processing
RDBMS
14. A Typical App Looks Like This..
Front End Analytics
RT
STORM
Batch
The Data Flow
Complexity
15. What if Disk Was no Longer the
Bottleneck?
FLASH Closes the
CPU to Storage Gap
16. Our Application Cloud Look Like This..
Front End
High Speed
Data Store
(Using Flash/NVM)
Key/Value
SQL
Document
Graph
Map/Reduce
Transactional
Disk Becomes
the new Tape
StreamBase
Common Data Store serving
Multiple Semantics/API
18. We can use High Speed Data Bus for
Integrating All of our Data Sources
Front End Analytics
RT
STORM
Batch
High Speed
Data Bus
(Built-In
Caching)
RT
Transactional
Data Access
Direct Access
RT Streaming
Hadoop Synch
MySQL Synch
Mongo Synch
20. Designed for Transactional and
Analytics Scenarios..
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services
21. Many API’s – Same Data
Key/Value SQL Document Graph Map/Reduce Transactional
33. Nati Shalom
Check out the slide on http://www.slideshare.net/giganati
Notas do Editor
Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but…
But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up.
It constructs a processing graph that feeds data from an input source through processing nodes.
The processing graph is called a "topology".
The input data sources are called "spouts", and the processing nodes are called "bolts".
The data model consists of tuples.
Tuples flow from Spouts to the bolts, which execute user code.