2. Shameless self-promotion!
@doanduyhai
2
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• Europe technical point of contact
☞ duy_hai.doan@datastax.com
• production troubleshooting
3. Datastax!
@doanduyhai
3
• Founded in April 2010
• We drive Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Home to Cassandra chair & most committers (≈80%)
• Headquartered in San Francisco Bay area
• EU headquarters in London, offices in France and Germany
4. Agenda!
@doanduyhai
4
Architecture
• Cluster, Replication, Consistency
Data model
• Last Write Win (LWW), CQL basics, From SQL to CQL
Dev Center Demo
DSE overview
CQL In Depth (time permitted)
5. Cassandra history!
@doanduyhai
5
NoSQL database
• created at Facebook
• open-sourced since 2008
• current version = 2.1
• column-oriented ☞ distributed table
12. Cassandra architecture!
@doanduyhai
12
Cluster layer
• Amazon DynamoDB paper
• masterless architecture
Data-store layer
• Google Big Table paper
• Columns/columns family
13. Cassandra architecture!
@doanduyhai
13
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node1
Client request
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node2
14. Data distribution!
@doanduyhai
14
Random: hash of #partition → token = hash(#p)
Hash: ]-X, X]
X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8
15. Token Ranges!
@doanduyhai
15
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
17. Failure tolerance!
@doanduyhai
17
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H
18. Coordinator node!
Incoming requests (read/write)
Coordinator node handles the request
Every node can be coordinator àmasterless
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request
19. Consistency!
@doanduyhai
19
Tunable at runtime
• ONE
• QUORUM (strict majority w.r.t. RF)
• ALL
Apply both to read & write
20. Write consistency!
Write ONE
• write request to all replicas in //
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
21. Write consistency!
Write ONE
• write request to all replicas in //
• wait for ONE ack before returning to
client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
22. Write consistency!
Write ONE
• write request to all replicas in //
• wait for ONE ack before returning to
client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs
23. Write consistency!
Write QUORUM
• write request to all replicas in //
• wait for QUORUM acks before
returning to client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs
24. Read consistency!
Read ONE
• read from one node among all replicas
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
25. Read consistency!
Read ONE
• read from one node among all replicas
• contact the fastest node (stats)
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
26. Read consistency!
Read QUORUM
• read from one fastest node
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
27. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
28. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
• return most up-to-date data to client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
29. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
• return most up-to-date data to client
• repair if digest mismatch n1
@doanduyhai
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
39. Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
@doanduyhai 39
47. Last Write Win (LWW)!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
@doanduyhai
47
jdoe
age
name
33 John DOE
#partition
48. Last Write Win (LWW)!
@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
48
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
auto-generated timestamp (μs)
.
49. Last Write Win (LWW)!
@doanduyhai
49
UPDATE users SET age = 34 WHERE login = jdoe;
jdoe
SSTable1 SSTable2
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
50. Last Write Win (LWW)!
@doanduyhai
50
DELETE age FROM users WHERE login = jdoe;
tombstone
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
51. Last Write Win (LWW)!
@doanduyhai
51
SELECT age FROM users WHERE login = jdoe;
? ? ?
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
52. Last Write Win (LWW)!
@doanduyhai
52
SELECT age FROM users WHERE login = jdoe;
✕ ✕ ✓
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
53. Compaction!
@doanduyhai
53
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE
54. CRUD operations!
@doanduyhai
54
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
UPDATE users SET age = 34 WHERE login = jdoe;
DELETE age FROM users WHERE login = jdoe;
SELECT age FROM users WHERE login = jdoe;
57. Queries!
@doanduyhai
57
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe
and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
58. Queries!
@doanduyhai
58
Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
59. Queries!
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
@doanduyhai
59
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe;
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
60. WHERE clause restrictions!
@doanduyhai
60
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible on columns defined in PRIMARY KEY
61. WHERE clause restrictions!
@doanduyhai
61
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
62. WHERE clause restrictions!
@doanduyhai
62
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
☞ Apache Solr (Lucene) integration (DSE)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND sex:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
63. Collections & maps!
@doanduyhai
63
CREATE TABLE users (
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…
PRIMARY KEY(login));
64. User Defined Type (UDT)!
Instead of
@doanduyhai
64
CREATE TABLE users (
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…
PRIMARY KEY(login));
65. User Defined Type (UDT)!
@doanduyhai
65
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
CREATE TABLE users (
login text,
…
location frozen <address>,
…
PRIMARY KEY(login));
69. From SQL to CQL!
@doanduyhai
69
Remember…
CQL is not SQL
70. From SQL to CQL!
@doanduyhai
70
Remember…
there is no join
(do you want to scale ?)
71. From SQL to CQL!
@doanduyhai
71
Remember…
there is no integrity constraint
(do you want to read-before-write ?)
72. From SQL to CQL!
@doanduyhai
72
Paradigm change
• space is cheap (somehow …), latency is precious
• embrace immutability
• think query first
• denormalize !!!
73. From SQL to CQL!
@doanduyhai
73
Normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author_id text, // typical join id
content text,
PRIMARY KEY((article_id), comment_id));
74. From SQL to CQL!
@doanduyhai
74
De-normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author person, // person is UDT
content text,
PRIMARY KEY((article_id), comment_id));
75. Data modeling best practices!
@doanduyhai
75
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
76. Data modeling best practices!
@doanduyhai
76
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
Denormalize
• wisely, only duplicate necessary & immutable data
• functional/technical trade-off
77. Data modeling best practices!
@doanduyhai
77
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location
78. Data modeling best practices!
@doanduyhai
78
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click
83. Training Day | December 3rd
Beginner Track
• Introduction to Cassandra
• Introduction to Spark, Shark, Scala and
Cassandra
Advanced Track
• Data Modeling
• Performance Tuning
Conference Day | December 4th
Cassandra Summit Europe 2014 will be the single
largest gathering of Cassandra users in Europe.
Learn how the world's most successful companies are
transforming their businesses and growing faster than
ever using Apache Cassandra.
http://bit.ly/cassandrasummit2014
@doanduyhai Company Confidential 83
84. CQL In Depth!
Simple Table!
Clustered Table!
Bucketing!
99. Query With Clustered Table!
Select by operator and city for all dates
Select by operator and city range for all dates
@doanduyhai
99
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
100. Query With Clustered Table!
Select by operator and city and date
Select by operator and city and range of date
@doanduyhai
100
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
AND date >= 20140910 AND date <= 20140913
101. Query With Clustered Table!
@doanduyhai
101
Select by operator and city and date tuples
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
AND date IN (20140910, 20140913)
102. Query With Clustered Table!
@doanduyhai
102
Select by operator and date without city
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND date = 20140910
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!
108. Bucketing!
@doanduyhai
108
But how can I select raw data between 14:45 and 15:10 ?
14:45 à ?
15:00 à 15:10
sensor_id:2014091014
date1 date2 date3 date4 …
blob1 blob2 blob3 blob4 …
sensor_id:2014091015
date11 date12 date13 date14 …
blob11 blob12 blob13 blob14 …
109. Bucketing!
Solution
• use IN clause on partition key component
• with range condition on date column
☞ date column should be monotonic function (increasing/decreasing)
@doanduyhai
109
SELECT * FROM sensor_data WHERE sensor_id = xxx
AND date_bucket IN (2014091014 , 2014091015)
AND date >= ‘2014-09-10 14:45:00.000‘
AND date <= ‘2014-09-10 15:10:00.000‘
110. Bucketing Caveats!
@doanduyhai
110
IN clause for #partition is not silver bullet !
• use scarcely
• keep cardinality low (≤ 5)
n1
n2
n3
n4
n5
n6
n7
coordinator
n8
sensor_id:2014091014
sensor_id:2014091015
111. Bucketing Caveats!
@doanduyhai
111
IN clause for #partition is not silver bullet !
• use scarcely
• keep cardinality low (≤ 5)
• prefer // async queries
• ease of query vs perf
n1
n2
n3
n4
n5
n6
n7
n8
Async client
sensor_id:2014091014
sensor_id:2014091015