How does Riak compare to Cassandra? [Cassandra London User Group July 2011]

Riak
How does Riak compare to Cassandra?

/usr/bin/whoami

• Russell Smith

• Work for UKD1, a consultancy for web-related-tech

• Help with application design, infrastructure, capacity planning, etc

• Mainly for the video-games industry & web-startups

• Twitter: @ukd1

What is Riak?
• Pronounced ‘ree-ack’

• A scalable, high-availability, distributed, key-value store

• Modelled on Amazon’s description of Dynamo, like Cassandra

• Commercially supported / developed by Basho

• Written in Erlang

• Open source - Apache License (2.0)

What isn’t Riak?

• Schema enforced - store what you want

• Relational database - No joins or constraint enforcement as there are no global locks

• Not intended to compete with in-memory column based databases

What versions are available?

• Riak

• Riak Search (Riak + distributed full-text indexing / search)

• Riak Enterprise - commercially licensed - supports extra features for
enterprise use (SNMP, data-centre awareness, etc)

• Luwak (Riak + app for storing large ﬁles; it’s bundled by default)

Riak’s take on CAP

• Exposed to the end user - allowing tuning of N, R & W

• N - # of nodes, set per bucket (default of 3)

• R - # of nodes required for a read (per request)

• W - # of nodes required for a successful write (a number, all, quorum
or default for the bucket)

Client libraries

• PHP, Python, Ruby, Java, Erlang, Javascript, .NET

• Community client libraries;

• C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk

What can you store?

• Values against keys

• Keys are organised in to buckets

• Practical value limit of 64mb

• For large ﬁles; Luwak (built in > 0.13) splits them in to smaller blocks

Querying

• Two main interfaces; HTTP & Protocol buffers

• HTTP API is mainly REST - GET, PUT, DELETE

• Riak stores the key, value & metadata about the key;

• Content Type, Charset, Encoding & link data

• Also: any custom metadata

Links

• Used to store one-way relationships between objects;

• Stored in object meta-data

• Link-walking uses MapReduce

MapReduce

• Designed to be used for web-page-speed requests

• Built in

• Map / Reduce functions are written in Javascript or Erlang

• Can do re-reduce

• Streaming MapReduce

Vector clocks
• Each value is tagged with a vector clock

• Riak can determine if values;
• Are direct decendants of a single object

• Share a common parent

• Unrelated

• In Riak each object has a vector clock

• Cassandra uses timestamps - problems can occur with out of sync

Siblings
• Siblings are different versions of the same document which Riak has
not merged

• Occurs only if allow_mult is enabled on a bucket AND;

• Concurrent write with the same vector clock value

• Stale vector clock

• No vector clock passed

Pre & Post Commit Hooks

• Allow the object to be written

• Modify the object

• Fail the update

• They are per-bucket (stored in the properties)

• Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)

Admin

• Super simple;

• riak-admin join <node-in-cluster>

• riak-admin leave

• Backup tools are provided....

Backup / restore

• riak-admin backup|restore <node> <cookie> <output_file> [[node|
all]]

• Alternative is filesystem backup for bitcask; as it uses append-only files

• riak-admin backup is storage-engine agnostic

• riak-admin only backs up kv data; not search indexes (Riak-Search)

Storage engines

• Ships with two default storage engines;

• Bitcask - default, best when keyspace < RAM

• InnoDB - suggested when keyspace > RAM

• Also available - Google’s LevelDB. It’s BSD licensed & recently
integrated, good for large sets.

Riak-Search
• Full-text search engine built on top of Riak

• Realtime

• Uses Lucene Analyzers, custom ones may be written in Erlang / Java

• Supports term / ﬁeld searchs, boolean operators, grouping, lexical
range queries and end of word wildcards

• Will be part of Riak as default from 1.0

Riak > Cassandra

• Extremely simple to add or remove nodes from a cluster

• No pre-setup of datamodel

• Rest & Protobuf API access

• Commercial support from the original developers, Basho

Riak = Cassandra

• No single point of failure

• Linearly scalable

• High availability

• Eventually consistent

• You can choose your own consistency requirements

Riak < Cassandra
• CQL; an SQL-ish language

• Range / cover queries are built in (no need to write MapReduce functions)

• ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build

• Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra
http://wiki.apache.org/cassandra/ThirdPartySupport

• Cassandra is seemly more popular & has a bigger community

• Partitions vs MD5 of RandomPartitioner; you can’t reconﬁgure if you need - careful you plan with Riak!
http://wiki.basho.com/Cluster-Capacity-Planning.html

Further reading

• Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html

• Commit hooks; http://wiki.basho.com/Pre--and-Post-Commit-
Hooks.html

• Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to-
Cassandra.html

How does Riak compare to Cassandra? [Cassandra London User Group July 2011]

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (12)

Semelhante a How does Riak compare to Cassandra? [Cassandra London User Group July 2011]

Semelhante a How does Riak compare to Cassandra? [Cassandra London User Group July 2011] (20)

Mais de Rainforest QA

Mais de Rainforest QA (11)

Último

Último (20)

How does Riak compare to Cassandra? [Cassandra London User Group July 2011]

Notas do Editor