2. What is Riak?
• Documented orientated database
• Written in Erlang
• Based on Dynamo[1] and CAP Theorem[2]
• Highly fault tolerant
• HTTP and ProtoBuff interface
• Write MapReduce in Erlang or JavaScript
1. http://goo.gl/r8Np
2. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
3. Same, Same but different
• Riak solves similar problems to MongoDB
• Semi-structured data modeled as "documents”
• Storage of non-document data in the database
• High write-availability
• Riak is intrinsically multi-node scalable
• Mongo in comparison is single system (+ sharding)
• Riak achieves availability via quorum writes
• Mongo uses performant in-place writes
• Riak uses “masterless” replication
4. N/R/W – Dynamo
N = Number of replicas to store
R = Number of replicas needed to read
W = Number of replicas needed to read
• These principals first appeared in an Amazon
research paper known as Dynamo
5. • 160bit integer key
space. Each node that
joins is assigned part
of that space for
consistent hashing
• Hashing means any
node can service any
request making the
cluster masterless and
eventually consistant
Number of replicas
6. • Number of replies
before Riak gives
the client a
successful reply.
• Tries to access all
nodes, but as soon
as the N/R is
satisfied a response
is given
Reads
7. • Same as reads; W
implies the number
of successful nodes
that must reply
before the write
is considered
consistent by
the client
Writes
8. Extreme example
• Given N=10, R=W=2 we
could have 8 nodes
down and the cluster
would still be fully
available to all clients
9. What does this all mean?
• N/R/W specified at request time, so each
client can specify its own tolerance for
outages dynamically
• Despite any outages within the cluster, the whole
cluster can still appear available based on N/R/W
• Given N=3 and R=W=2, we can have 3-2=1 node
down/unreachable/laggy in the cluster
• Stupidly high availability complete with eventual
consistency controlled by dynamic clients
10. Brewer’s CAP Theorem
• Consistency
• Availability
• Partition Tolerance
• You cant have all things, all the time…
• …but you can have some of each, all the time!
• Riak is about choosing your own levels of
each according to your use case
11. Consistency
• Start with document
version zero
• Things get redistributed
and n0 and n2 are
sitting in NYC and n1
and n3 are in London
• What if stuff changes??
12. Consistency
• Uh oh: inconsistency
• Both parts of the cluster
are still fully available
• NYC serves v1 whilst
London serves v0
• The network resumes
and Riak determines
the latest version by
using vector clocks
13. Consistency
• What if both sides of
the Atlantic changed?
• Riak is unable to
determine which is the
right document, both
are returned to the
client with an indication
of the inconsistency
14. • Distributed, fault-tolerant full-text searching
• Lucene syntax for queries
• No need for index sharding
• Linier scaling
• Double the number of nodes to get double the
search capacity (awesome!)
• Search via:
• Fields, wildcards, fuzzy text or token proximity
Riak Search