2. What is Cassandra?
● A distributed, columnar database
● Data model inspired by Google BigTable (2006)
● Distribution model inspired by Amazon Dynamo (2007)
● Open Sourced by Facebook in 2008
● Monolithic Kernel written in Java
● Used by Digg, Facebook, Twitter, Reddit, Rackspace,
CloudKick and others
3. Etymology
● In Greek mythology Cassandra (Also known as Alexandra) was
the daughter of King Priam and Queen Hecuba of Troy
● Her beauty caused Apollo to grant her the gift of prophecy
● When she did not return his love, Apollo placed a curse on her
so that no one would ever believe her predictions
4. Why Cassandra ?
● Minimal Administration
● No Single Point of Failure
● Scale Horizontally
● Writes are durable
● Optimized for writes
● Consistency is flexible, can be updated
online
● Schema is flexible, can be updated online
● Handles failure gracefully
● Replication is easy, Rack and DC aware
9. Cassandra is good at
Reading data from a row in
the order it is stored, i.e. by
Column Name!
Understand the queries you
application requires before
building the data model
10. Consistent Hashing
Load Balancing in a changing world ...
● Evenly map keys to nodes
● Minimize key movement when
nodes join or leave
12. Keys and Tokens?
0 999010
‘fop’ ‘foo’
MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b
MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
15. Token Ranges With Virtual Nodes in 1.2
Node 1
Node 2
Node 3
● Easier to Enlarge or
shrink the cluster
● The cluster can grow in
steps of 1 node
● Node Recovery is much
more faster
16. Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
Selects Replication Factor number of nodes
for a row.
23. The Client and the Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
24. Multi DC Client and Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
Node 10
Node 20
25. Gossip
Nodes share information with
small number of neighbours,
who share information with
other small number of
neighbours …
● Used for intra-cluster
communication
● Routes client requests
● Detects nodes failure
● Peers are called by seeds in
config file.
27. Consistency
● CAP theorem
○ Trade consistency for availability
○ Consistency is a choice
* it doesn't matter if you are good at somethings long as you are consistent.
Partition
Consistency
Availability
OR
28. Level Description
ZERO Cross fingers
ANY 1st to Respond (HH)
ONE, TWO, THREE 1st to Respond
QUORUM N/2+1 replicas
ALL All replicas
WRITE
Level Description
ZERO N/A
ANY N/A
ONE, TWO, THREE nth to Respond
QUORUM* N/2+1
ALL All replicas
READ
Consistency Level
● Specifies for each request
● Number of nodes to wait for
* QUORUM, LOCAL_QUORUM, EACH_QUOROM
29. Write ‘foo’ at Quorum with Hinted
Handoff
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
30. Read ‘foo’ at Quorum
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
31. Are used to resolve differences
● Stored for each Column Value
● 64bit Integers
Column Node 1 Node 2 Node 3
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
<missing>
Fruit ‘Apple’
(timestamp 10)
‘banana’
(timestamp 15)
‘Apple’
(timestamp 10)
Column TimeStamps
32. Strong Consistency
W + R > N
#Write Nodes + #Read Nodes > Replication Factor
● QUORUM Read + QUORUM Write
● ALL Read + ONE Write
● ONE Read + ALL Write
34. Write Path
● Append to Commit Log File
● Merge Columns into Memtable
● Asynchronously flush Memtabe to a
new file (Never update existing files)
● Data is stored in immutable files called
SSTables (Sorted String Tables)
36. Read Path
Bloom Filter (cache)
Index/Key Cache
Memory
SStable-1.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:15)
cucumber
….
….
….
SSTable-1-Index.db
Disk
Bloom Filter (cache)
Index/Key Cache
SStable-2.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:10)
Pepper
….
….
….
SSTable-2-Index.db
Bloom Filter Bloom Filter
37. Compactions
Compactions merges truth from multiple
SSTables into one SSTable with the same
truth
(Manual and continuous background process)
Column SSTable 1 SStable 2 New
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
Fruit ‘Apple’
(timestamp 10)
<tombstone>
(timestamp 15)
<tombstone>
(timestamp: 15)
39. Managing Cassandra
● Single configuration file
/etc/cassandra/cassandra.yaml
file
● Single control command
/usr/bin/nodetool
● Monitoring done by DataStax OpsCenter
42. Client (API) Choices
● Thrift, original and still fully supported API:
○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…
○ Python: Pycassa, Telephus, …
○ Ruby: Fauna
○ PHP: PHP Client Library
○ C#
○ Node.JS
○ GO
○ SImba ODBC
○ C++: LibQtCassandra
○ ORM
○ ….
● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
43. CQL3 Create KeySpace
● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):
● Create a new Keyspace with Replication factor of 3 and NetworkTopology
CREATE KEYSPACE
kenshoo_cass_fans
WITH replication =
{‘class’:’NetworkTopologyStrategy’,
‘us_east_dc’:3};
44. CQL3 Working with Tables
● CQL3 Example
● Table is a sparse collection of well known ordered columns
CREATE TABLE User
(
user_name text,
password text,
real_name text,
PRIMARY KEY (user_name)
);
---------------------------------------------------------
INSERT INTO User
(user_name, password, real_name)
VALUES
(‘nader’,’sekr8t’,’MR NADER’);
---------------------------------------------------------
SELECT * From User where user_name = ‘NADER’;
user_name| password | real_name
---------+----------+-----------
nader| sekr8t | MR NADER