6. 6
α β γ δ ε ζ η θ ι κ λ μ
id π τ
1337
2468
3145
3579
4468
7878
entity key value
1337 a lorem ipsum
1337 b lorem ipsum
3145 b lorem ipsum
3578 a lorem ipsum
3579 f lorem ipsum
3579 j lorem ipsum
4468 c lorem ipsum
4468 f lorem ipsum
7878 g lorem ipsum
7878 f lorem ipsum
Sparse data - Relational mismatch
7. 7
id foo
1337 bar
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Sparse data - Relational mismatch
id data
1337 {"foo":"bar", ...}
2468 {"foo":"bar", ...}
3145 {"foo":"bar", ...}
3579 {"foo":"bar", ...}
4468 {"foo":"bar", ...}
7878 {"foo":"bar", ...}
id bar
1337 foo
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Search Tables
Data Table
26. The rise of REST for databases
19
๏It’s actually all about Hypermedia:
•When one aggregate root references another
•Not necessarily on the same host
•Hyperlinks provide the desired decoupling,
and can reference documents qualified by host
๏HTTP and the ease to develop client drivers a further driver
28. NewSQL defined
21
๏Relational Databases with (primarily) a SQL interface, that adopts
the scaling benefits of NoSQL databases.
๏Automatic/Transparent sharding of data
๏Distributed, Fault Tolerant, Highly Available
50. Column Databases
36
๏Use as underlying storage
for a higher level data
storage model
๏Eg. a graph database model
implemented on top of
Cassandra
•Notable example:
Aurelius Titan
52. Caches - Improving Reads
38
๏Read from cache first, only read from DB on cache miss
๏Preferably cache aggregates, possibly after passing through
App-level processing
๏memcached - mainly a cache, tried re-position as a NOSQL DB
•as has other cache products tried
54. Message Queues - Improving Writes
40
๏Write to Queue, process work from Queue in batches
•Alleviates transactional overhead by grouping writes
•Still guarantees writes if the Queue has durability guarantees
•Needs tx synchronization with DB (2PC)
๏Writes not immediately visible, delayed through queue
•Write-to-cache can be used to get around this,
if a cache is used
๏Amazon SQS
๏RabbitMQ
๏ZeroMQ
59. Hadoop - Data Analysis/Processing
42
๏Batch process large amounts of data
typically offline or semi-online, not for interactive querying
๏Ingest data from your DB, process and generate report
•Ex. Read Neo4j graph, generate centrality analysis report
๏Ingest data from event stream, process and generate data for DB
•Ex. Read access logs, create Neo4j data for security analysis
๏Ingest data from one DB, process and generate data for another
•Ex. Read MySQL transaction logs,
create Neo4j data for query acceleration
61. Building Databases is hard
44
๏The current NOSQL wave took off in 2009
๏... many much older databases still have issues...
๏Most likely there will be issues
๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr)
•... most distributed databases fail in the event of Partitions
๏Test,Test,Test, and Test
•Test the database heavily before you put it in production
•Test for your use cases - generic benchmarks are useless
•Test with real load
•Test continuously
62. Serious DatabaseVendors
take Data Seriously
๏Make sure to test their product under “real” load
๏Make sure to test their product in the event of failures
๏But you still need to Test!
๏Report issues to the vendor
๏Data loss is too embarrassing - will be fixed!
๏Performance is important - you’ll be heard!
45
64. Polyglot Persistence - Multiple DBs
47
๏Real world examples:
•RDBMS as system of record,
Neo4j for accelerating (join) queries
•Neo4j for storing metadata and structure,
Cassandra for storing event logs,
S3 for storing BLOB data
68. Complex problem? - right tool for each job!
51Image credits: Unknown :’(
69. Key/Value stores
๏Examples:
•Amazon SimpleDB, memcached, Oracle NoSQL, Redis
๏Use when Data is opaque
๏Scalability is important
๏Scale simply with the addition of more servers
•rebalance equally simply
52
71. Column Family Databases
๏Examples:
•Cassandra
๏Use when scalability is the main issue
•Both scaling size and scaling load
‣In particular scaling write load
๏Linear scalability (as you add servers) both in read and write
๏Low level - will require you to duplicate data to support queries
54
72. Graph Databases
๏Examples:
•Neo4j, DEX, InfiniteGraph
๏Use when (deep) traversals are important
๏For complex domains
๏When how entities relate is an important aspect of the domain
55
73. When not to use a NOSQL Database
๏RDBMSes have been the de-facto standard for years, and still have
better tools for some tasks
•Especially for reporting
๏When maintaining a system that works already
๏Sometimes when data is uniform / structured
๏When aggregations over (subsets) of the entire dataset is key
๏But please don’t use a Relational database for persisting objects
56