An introduction to Cassandra as well as an example of accessing Cassandra from Clojure.
Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo
2. Who am I?
• OpsCenter Architect
• Monitoring/management tool for Cassandra
• Organizer of Austin Cassandra Users
• http://www.meetup.com/Austin-Cassandra-Users/
• Third Thursday each month. Come join!
• Working with Cassandra for 4 years
7. Cassandra - Cluster Architecture
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
7
8. Cassandra - Data Distribution
8
75
0
25
50
• Each node owns 1 or more “tokens”
• Each piece of data has a “partition key”
• Partition key is hashed to determine token
• Hashes:
• Murmur3 (default)
• Md5
9. Cassandra - Replication
• Client writes to any node
• Node coordinates with replicas
• Data replicated in parallel
• Replication factor (RF): How many
copies of your data?
9
10. Cassandra - Failure Modes
• Consistency level
• How many nodes?
• ONE/QUORUM/ALL
10
11. Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
• Consistency Level
• LOCAL_QUORUM
11
Datacenter East Datacenter West
15. Data Types
cqlsh:clojure_cassandra_demo> help types
CQL types recognized by this version of cqlsh:
ascii
bigint
blob
boolean
counter
decimal
double
float
inet
int
list
map
set
text
timestamp
timeuuid
uuid
varchar
varint
18. Approaching Data Modeling
• Model your queries, not your data
• Generally, optimize for reads
• Denormalize!
• Iterate!
19. Basic Last.fm Clone
• See songs that user X has listened to recently
• See user X’s favorite songs in a specific month
• See who has recently listened to artist Y
• See artist Y’s most popular songs in a specific week
20. Basic Last.fm Clone
• See songs that user X has listened to recently
• One of the most common patterns/data models
• Time series
• Immutable (good fit for Clojure!)
21. Basic Last.fm Clone
• See songs that user X has listened to recently
SELECT song, artist, played_at
FROM user_history
WHERE username = ‘nickmbailey’
ORDER BY played_at DESC;
• Partition key = ‘username’
• Clustering key = ‘played_at’
22. Basic Last.fm Clone
• See songs that user X has listened to recently
CREATE TABLE user_history (
username text,
played_at timestamp,
album text,
artist text,
song text,
PRIMARY KEY (username, played_at)
) WITH CLUSTERING ORDER BY (played_at DESC)
23. Basic Last.fm Clone
• See songs that user X has listened to recently
• This table has a “bad” partition key
CREATE TABLE user_history (
username text,
played_at timestamp,
album text,
artist text,
song text,
PRIMARY KEY (username, played_at)
) WITH CLUSTERING ORDER BY (played_at DESC)
24. Basic Last.fm Clone
• See songs that user X has listened to recently
• Much better partition key
CREATE TABLE user_history (
username text,
year_and_month text,
played_at timestamp,
album text,
artist text,
song text,
PRIMARY KEY ((username, year_and_month), played_at)
) WITH CLUSTERING ORDER BY (played_at DESC)
25. Basic Last.fm Clone
• See songs that user X has listened to recently
cqlsh:clojure_cassandra_demo> select * from user_history limit 5;
username | year_and_month | played_at | album | artist | song
-------------+----------------+--------------------------+--------------------------+--------------------------+-------------------------
nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween
nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City
b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan
zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix)
zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake
Partition Key - unordered Clustering Key - Ordered
26. Basic Last.fm Clone
• See user X’s favorite songs in a specific month
SELECT song, artist, play_count
FROM user_history
WHERE username = ‘nickmbailey’ AND month = ‘July’
ORDER BY play_count DESC;
• Partition key = ‘username’, ‘month’
• Clustering key = ‘play_count’?
• Counters are a special case
27. Counters
• Counter can not be part of the PRIMARY KEY
• No ordering based on counter value
• All non counter columns must be part of the PRIMARY KEY
• Limitations due to the storage format
28. Basic Last.fm Clone
• See user X’s favorite songs in a specific month
CREATE TABLE user_song_counts (
username text,
year_and_month text,
artist text,
song text,
play_count counter,
PRIMARY KEY ((username, year_and_month), artist, song))
29. Basic Last.fm Clone
• See user X’s favorite songs in a specific month
• Results unordered
• Client will have to do the sorting
cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';
username | year_and_month | artist | song | count
-------------+----------------+----------+-----------------------------------+-------
nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10
nickmbailey | 2014-07 | Beck | Blackbird Chain | 1
nickmbailey | 2014-07 | Beck | Blue Moon | 4
nickmbailey | 2014-07 | Cherub | <3 | 12
nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6
30. Basic Last.fm Clone
• See who has recently listened to artist Y
CREATE TABLE artist_history (
artist text,
year_and_week text,
played_at timestamp,
album text,
song text,
username text,
PRIMARY KEY ((artist, year_and_week), played_at)
) WITH CLUSTERING ORDER BY (played_at DESC)
31. Basic Last.fm Clone
• See artist Y’s most popular songs in a specific week
CREATE TABLE artist_song_counts (
artist text,
year_and_week text,
album text,
song text,
play_count counter,
PRIMARY KEY ((artist, year_and_week), album, song))
40. Session Object
• A Session is associated with a keyspace
• Allows interacting with multiple keyspaces
40
(def cluster (alia/cluster {:contact-points [“localhost"]}))
(def session (alia/connect cluster))
(def session (alia/connect cluster) :my_keyspace)
41. Querying
• Multiple ways to query
• alia/execute
• Synchronous, block on result
• alia/execute-async
• Returns a Lamina result-channel (basically, a promise)
• Optional success/error callbacks
• alia/execute-chan
• Returns a core.async channel
• We won’t dive in to core.async now
41
42. Prepared Statements
• Statements can be prepared server side
• Better performance for common queries
42
(def prepared-statement
(alia/prepare
session
"select * from users where user_name=?;"))
43. What else?
• See github and docs
• https://github.com/mpenet/alia
• http://mpenet.github.io/alia/qbits.alia.html
43