2. /usr/bin/whoami
• Russell Smith
• Work for UKD1, a consultancy for web-related-tech
• Help with application design, infrastructure, capacity planning, etc
• Mainly for the video-games industry & web-startups
• Twitter: @ukd1
3. What is Riak?
• Pronounced ‘ree-ack’
• A scalable, high-availability, distributed, key-value store
• Modelled on Amazon’s description of Dynamo, like Cassandra
• Commercially supported / developed by Basho
• Written in Erlang
• Open source - Apache License (2.0)
4. What isn’t Riak?
• Schema enforced - store what you want
• Relational database - No joins or constraint enforcement as there are no global locks
• Not intended to compete with in-memory column based databases
5. What versions are available?
• Riak
• Riak Search (Riak + distributed full-text indexing / search)
• Riak Enterprise - commercially licensed - supports extra features for
enterprise use (SNMP, data-centre awareness, etc)
• Luwak (Riak + app for storing large files; it’s bundled by default)
6. Riak’s take on CAP
• Exposed to the end user - allowing tuning of N, R & W
• N - # of nodes, set per bucket (default of 3)
• R - # of nodes required for a read (per request)
• W - # of nodes required for a successful write (a number, all, quorum
or default for the bucket)
8. What can you store?
• Values against keys
• Keys are organised in to buckets
• Practical value limit of 64mb
• For large files; Luwak (built in > 0.13) splits them in to smaller blocks
9. Querying
• Two main interfaces; HTTP & Protocol buffers
• HTTP API is mainly REST - GET, PUT, DELETE
• Riak stores the key, value & metadata about the key;
• Content Type, Charset, Encoding & link data
• Also: any custom metadata
10. Links
• Used to store one-way relationships between objects;
• Stored in object meta-data
• Link-walking uses MapReduce
11. MapReduce
• Designed to be used for web-page-speed requests
• Built in
• Map / Reduce functions are written in Javascript or Erlang
• Can do re-reduce
• Streaming MapReduce
12. Vector clocks
• Each value is tagged with a vector clock
• Riak can determine if values;
• Are direct decendants of a single object
• Share a common parent
• Unrelated
• In Riak each object has a vector clock
• Cassandra uses timestamps - problems can occur with out of sync
13. Siblings
• Siblings are different versions of the same document which Riak has
not merged
• Occurs only if allow_mult is enabled on a bucket AND;
• Concurrent write with the same vector clock value
• Stale vector clock
• No vector clock passed
14. Pre & Post Commit Hooks
• Allow the object to be written
• Modify the object
• Fail the update
• They are per-bucket (stored in the properties)
• Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)
15. Admin
• Super simple;
• riak-admin join <node-in-cluster>
• riak-admin leave
• Backup tools are provided....
16. Backup / restore
• riak-admin backup|restore <node> <cookie> <output_file> [[node|
all]]
• Alternative is filesystem backup for bitcask; as it uses append-only files
• riak-admin backup is storage-engine agnostic
• riak-admin only backs up kv data; not search indexes (Riak-Search)
17. Storage engines
• Ships with two default storage engines;
• Bitcask - default, best when keyspace < RAM
• InnoDB - suggested when keyspace > RAM
• Also available - Google’s LevelDB. It’s BSD licensed & recently
integrated, good for large sets.
18. Riak-Search
• Full-text search engine built on top of Riak
• Realtime
• Uses Lucene Analyzers, custom ones may be written in Erlang / Java
• Supports term / field searchs, boolean operators, grouping, lexical
range queries and end of word wildcards
• Will be part of Riak as default from 1.0
19. Riak > Cassandra
• Extremely simple to add or remove nodes from a cluster
• No pre-setup of datamodel
• Rest & Protobuf API access
• Commercial support from the original developers, Basho
20. Riak = Cassandra
• No single point of failure
• Linearly scalable
• High availability
• Eventually consistent
• You can choose your own consistency requirements
21. Riak < Cassandra
• CQL; an SQL-ish language
• Range / cover queries are built in (no need to write MapReduce functions)
• ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build
• Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra
http://wiki.apache.org/cassandra/ThirdPartySupport
• Cassandra is seemly more popular & has a bigger community
• Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak!
http://wiki.basho.com/Cluster-Capacity-Planning.html