Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Developer Data Modeling Mistakes: From Postgres to NoSQL
Cassandra EU - Data model on fire
1. #CASSANDRAEU
Data Model on Fire
Patrick McFadin | Chief Evangelist DataStax
@PatrickMcFadin
Friday, October 18, 13
2. Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!
Friday, October 18, 13
#CASSANDRAEU
4. The race is on
Process 1
#CASSANDRAEU
Process 2
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
T0
T1
(0 rows)
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
(0 rows)
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00');
Got nothing! Good to go!
T2
T3
This one wins
Friday, October 18, 13
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');
5. Solution LWT
#CASSANDRAEU
Process 1
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True
T0
T1
•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
Friday, October 18, 13
6. Solution LWT
Process 2
T2
T3
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00')
IF NOT EXISTS;
[applied] | username | created_date
| firstname | lastname
-----------+----------+--------------------------+-----------+---------False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin
•applied = false: Rejected
•No record stomping!
Friday, October 18, 13
#CASSANDRAEU
7. LWT Fine Print
#CASSANDRAEU
•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model
•Now go shut down that ZooKeeper mess you have!
Friday, October 18, 13
9. Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (
!
username varchar,
!
form_id int,
!
version_number int,
!
locked_by varchar,
!
form_attributes map<varchar,varchar>
!
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);
•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form
Friday, October 18, 13
#CASSANDRAEU
10. Form Versioning Pt 1
1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});
2. Lock for one user
Danger Zone
UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;
3. Insert new version. Release lock
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});
Friday, October 18, 13
#CASSANDRAEU
11. Form Versioning Pt 2
#CASSANDRAEU
1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'pmcfadin',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'})
IF NOT EXISTS;
Exclusive lock
UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'pmcfadin';
Accepted
UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Email Adx: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'dude';
Rejected
(sorry dude)
Friday, October 18, 13
12. Form Versioning Pt 2
•Old way: Edge cases with problems
• Use external locking?
• Take your chances?
•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency
Friday, October 18, 13
#CASSANDRAEU
14. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Friday, October 18, 13
#CASSANDRAEU
15. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Friday, October 18, 13
#CASSANDRAEU
16. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Friday, October 18, 13
#CASSANDRAEU
17. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap
Friday, October 18, 13
#CASSANDRAEU
18. Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*
Rotation Speed
12ms
7200 RPM
7ms
10k RPM
5ms
15k RPM
.04ms
SSD
* Source: www.tomshardware.com
Friday, October 18, 13
#CASSANDRAEU
19. Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*
Rotation Speed
12ms
7200 RPM
7ms
10k RPM
5ms
15k RPM
.04ms
SSD
Shared storage == Great sadness
* Source: www.tomshardware.com
Friday, October 18, 13
#CASSANDRAEU
20. Quick Diversion
#CASSANDRAEU
•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>
Friday, October 18, 13
28. Histograms + Data Model
•Your data model is the key to success
•How do you ensure that?
Test
Measure
Repeat
Friday, October 18, 13
#CASSANDRAEU
29. Real World Example
•Real Customer
•Needed very tight SLA on reads
Problem
•Read response highly variable
•Loading data increases latency
Friday, October 18, 13
#CASSANDRAEU
32. Partition Size
#CASSANDRAEU
•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size
•Partial row reads? Maybe smaller.
Friday, October 18, 13
33. Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!
Friday, October 18, 13
#CASSANDRAEU
35. Disk + Data Model
•Understand the internals
• Size of partition
• Compaction
•Learn how to measure
•Load test
Friday, October 18, 13
#CASSANDRAEU
36. #CASSANDRAEU
Thank you! Time for questions...
*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model
Friday, October 18, 13