NOSQL Overview

NOSQL Overview
Tobias Lindaaker
Software Developer @ Neo Technology
twitter:! @thobe / @neo4j / #neo4j
email:! tobias@neotechnology.com
web:! http://neo4j.org/
web:! http://thobe.org/
CON6449

Agenda
๏Key/Value Stores
๏Document Databases
๏NewSQL Databases
๏Graph Databases
๏Column Oriented Databases
๏Caches
๏Message Queues
๏Hadoop 2

Two main categories
4
Aggregate oriented Graph
Distinction
deﬁned
by
Martin
Fowler
Source: NoSQL Distilled

6
α β γ δ ε ζ η θ ι κ λ μ
id π τ
1337
2468
3145
3579
4468
7878
entity key value
1337 a lorem ipsum
1337 b lorem ipsum
3145 b lorem ipsum
3578 a lorem ipsum
3579 f lorem ipsum
3579 j lorem ipsum
4468 c lorem ipsum
4468 f lorem ipsum
7878 g lorem ipsum
7878 f lorem ipsum
Sparse data - Relational mismatch

7
id foo
1337 bar
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Sparse data - Relational mismatch
id data
1337 {"foo":"bar", ...}
2468 {"foo":"bar", ...}
3145 {"foo":"bar", ...}
3579 {"foo":"bar", ...}
4468 {"foo":"bar", ...}
7878 {"foo":"bar", ...}
id bar
1337 foo
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Search Tables
Data Table

Trend: Exponential data growth
8
2005 2006
2007
2008
2009
2010
2011
2012

Connectednes
Time
Trend: Data becomes more connected
9

Nothing is new - everything changes
10
Then
๏Navigational databases
IDS (Codasyl), IMS (IBM)
๏Multivalued databases
PICK/BASIC
๏Key/Value databases
MUMPS/M
๏COPYBOOK
COBOL
๏Object databases
Objectivity, db4o
๏XML databases
Now
๏Graph databases
Neo4j,
๏Column databases
Cassandra
๏Key/Value databases
Couchbase
๏Document databases
MongoDB, Redis
Still recent enough
to not have “new”
counterparts...

Key/Value stores
12
๏Amazon SimpleDB
๏memcached
๏Oracle NoSQL Database
๏Redis

Key/Value stores
13
E D
CF
G B
A

14
Sample use case: Content sharing

Document Databases
๏Lotus Notes
๏MongoDB
๏Riak
๏Redis
๏CouchDB
16

Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith

Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ make: Gallifreyan
‣ diameter: 2”

Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ diameter: 2”
‣ id: 1337
‣ fname: Martha
‣ lname: Jones
‣ occupation: MD

Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ diameter: 2”
‣ id: 1337
‣ fname: Martha
‣ lname: Jones
‣ occupation: MD
‣ id: 2468
‣ fname: Rose
‣ lname:Tyler
‣ in_love_with: 99CC

Document Databases
18
post
title: ___
text: ___
tags: [...]
comments
text: ___
text: ___

The rise of REST for databases
19
๏It’s actually all about Hypermedia:
•When one aggregate root references another
•Not necessarily on the same host
•Hyperlinks provide the desired decoupling,
and can reference documents qualiﬁed by host
๏HTTP and the ease to develop client drivers a further driver

NewSQL deﬁned
21
๏Relational Databases with (primarily) a SQL interface, that adopts
the scaling beneﬁts of NoSQL databases.
๏Automatic/Transparent sharding of data
๏Distributed, Fault Tolerant, Highly Available

NewSQL databases
22
๏Google Spanner
๏VoltDB
๏TokuDB (MySQL engine)
๏Clusterix
๏RethinkDB

Example Graph Databases
๏Neo4j
๏Inﬁnite Graph (by Objectivity)
๏AllegroGraph (by Franz inc.)
๏HypergraphDB
๏InfoGrid
๏DEX
๏VertexDB
๏FlockDB
25

27
from
stole
companion
companion
companion

27
from
stole
companion
companion
companion
married

27
from
stole
companion
companion
companion
enemy
enemy
enemy
married

27
from
stole
plays
plays
plays
plays
companion
companion
companion
enemy
enemy
enemy
married

27
A Good Man
Goes to War
Bad Wolf
from
stole
plays
plays
plays
plays
companion
companion
companion
enemy
enemy
enemy
married
in
in
in
inin
in
in

Querying Graph Databases (Neo4j)
31
LOVES
A B
Graph Patterns

31
A -[:LOVES]-> B
LOVES
A B
Graph PatternsASCII art

31
A -[:LOVES]-> B
LOVES
A B
Graph Patterns
START A=node:person(name=“A”)
MATCH
RETURN B as lover
ASCII art

Column Oriented Databases
๏Cassandra
๏BigTable (internal at Google)
๏HBase (part of Hadoop)
๏Hypertable
34

Column DB - Classic example
35
Twitter clone

Column Databases
36
๏Use as underlying storage
for a higher level data
storage model
๏Eg. a graph database model
implemented on top of
Cassandra
•Notable example:
Aurelius Titan

Caches - Improving Reads
38
๏Read from cache ﬁrst, only read from DB on cache miss
๏Preferably cache aggregates, possibly after passing through
App-level processing
๏memcached - mainly a cache, tried re-position as a NOSQL DB
•as has other cache products tried

Message Queues - Improving Writes
40
๏Write to Queue, process work from Queue in batches
•Alleviates transactional overhead by grouping writes
•Still guarantees writes if the Queue has durability guarantees
•Needs tx synchronization with DB (2PC)
๏Writes not immediately visible, delayed through queue
•Write-to-cache can be used to get around this,
if a cache is used
๏Amazon SQS
๏RabbitMQ
๏ZeroMQ

41
Hadoop - Big Data processing

41
Oracle
Neo4j
Cassandra

41
Map
Reduce

Hadoop - Data Analysis/Processing
42
๏Batch process large amounts of data
typically ofﬂine or semi-online, not for interactive querying
๏Ingest data from your DB, process and generate report
•Ex. Read Neo4j graph, generate centrality analysis report
๏Ingest data from event stream, process and generate data for DB
•Ex. Read access logs, create Neo4j data for security analysis
๏Ingest data from one DB, process and generate data for another
•Ex. Read MySQL transaction logs,
create Neo4j data for query acceleration

Building Databases is hard
44
๏The current NOSQL wave took off in 2009
๏... many much older databases still have issues...
๏Most likely there will be issues
๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr)
•... most distributed databases fail in the event of Partitions
๏Test,Test,Test, and Test
•Test the database heavily before you put it in production
•Test for your use cases - generic benchmarks are useless
•Test with real load
•Test continuously

Serious DatabaseVendors
take Data Seriously
๏Make sure to test their product under “real” load
๏Make sure to test their product in the event of failures
๏But you still need to Test!
๏Report issues to the vendor
๏Data loss is too embarrassing - will be ﬁxed!
๏Performance is important - you’ll be heard!
45

Polyglot Persistence:
combining multiple databases
46

Polyglot Persistence - Multiple DBs
47
๏Real world examples:
•RDBMS as system of record,
Neo4j for accelerating (join) queries
•Neo4j for storing metadata and structure,
Cassandra for storing event logs,
S3 for storing BLOB data

It is all about modelling
Simplify the world enough
‣to reason about
‣to store and process

Model mis-match
Real World Model

Complex problem? - right tool for each job!
51Image credits: Unknown :’(

Key/Value stores
๏Examples:
•Amazon SimpleDB, memcached, Oracle NoSQL, Redis
๏Use when Data is opaque
๏Scalability is important
๏Scale simply with the addition of more servers
•rebalance equally simply
52

Document Databases
๏Examples:
•MongoDB, Riak
๏Use when data is collections of similar entities
•But semi structured (sparse) rather than tabular
•When ﬁelds in entries have multiple values
53

Column Family Databases
๏Examples:
•Cassandra
๏Use when scalability is the main issue
•Both scaling size and scaling load
‣In particular scaling write load
๏Linear scalability (as you add servers) both in read and write
๏Low level - will require you to duplicate data to support queries
54

Graph Databases
๏Examples:
•Neo4j, DEX, InﬁniteGraph
๏Use when (deep) traversals are important
๏For complex domains
๏When how entities relate is an important aspect of the domain
55

When not to use a NOSQL Database
๏RDBMSes have been the de-facto standard for years, and still have
better tools for some tasks
•Especially for reporting
๏When maintaining a system that works already
๏Sometimes when data is uniform / structured
๏When aggregations over (subsets) of the entire dataset is key
๏But please don’t use a Relational database for persisting objects
56

http://neotechnology.com
Questions?

NOSQL Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NOSQL Overview

Similar to NOSQL Overview (20)

More from Tobias Lindaaker

More from Tobias Lindaaker (9)

Recently uploaded

Recently uploaded (20)

NOSQL Overview