Cassandra13: Data Modeling Techniques Using CQL

#CASSANDRA13
Patrick McFadin | Solution Architect, DataStax
The World's Next Top Data Model
Monday, June 24, 13

#CASSANDRA13
The saga continues!
★ Data model is dead, long live the data
model.
★ Bridging from Relational to Cassandra
★ Become a Super Modeler
★ Core data modeling techniques using
CQL
Monday, June 24, 13

#CASSANDRA13
Because I love talking about this
Just to recap...
Monday, June 24, 13

#CASSANDRA13
Why does this matter?
* Cassandra lives closer to your users or applications
* Not a hammer for all use case nails
* Proper use case, proper model...
* Get it wrong and...
Monday, June 24, 13

#CASSANDRA13
When to use Cassandra*
* Need to be in more than one datacenter. active-active
* Scaling from 0 to, uh, well... we’re not really sure.
* Need as close to 100% uptime as possible.
* Getting these from any other solution would just be mega $
and...
*nutshell version. These are all ORs not ANDs
Monday, June 24, 13

#CASSANDRA13
You get the data
model right!
Monday, June 24, 13

#CASSANDRA13
So let’s do that
* Four real world examples
* Use case, what they were avoiding and model to accomplish
* You may think this is you, but it isn’t. I hear these all the time.
* All examples are in CQL3
Monday, June 24, 13

#CASSANDRA13
But wait you say
CQL doesn’t do dynamic wide rows!
Monday, June 24, 13

#CASSANDRA13
Yes it can!
* CQL does wide rows the same way you did them in Thrift
* No really
* Read this blog post
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
...or just trust me and I’ll show you how
Monday, June 24, 13

#CASSANDRA13
Customers giving you money is a good reason for uptime
Shopping Cart Data Model
Monday, June 24, 13

#CASSANDRA13
Shopping cart use case
* Store shopping cart data reliably
* Minimize (or eliminate) downtime. Multi-dc
* Scale for the “Cyber Monday” problem
* Every minute off-line is lost $$
* Online shoppers want speed!
The bad
Monday, June 24, 13

#CASSANDRA13
Shopping cart data model
* Each customer can have
one or more shopping carts
* De-normalize data for fast
access
* Shopping cart == One
partition (Row Level
Isolation)
* Each item a new column
Monday, June 24, 13

#CASSANDRA13
Shopping cart data model
CREATE TABLE user (
! username varchar,
! firstname varchar,
! lastname varchar,
! shopping_carts set<varchar>,
! PRIMARY KEY (username)
);
CREATE TABLE shopping_cart (
! username varchar,
! cart_name text
! item_id int,
! item_name varchar,
description varchar,
! price float,
! item_detail map<varchar,varchar>
! PRIMARY KEY ((username,cart_name),item_id)
);
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin
910XT','Multisport training watch',349.99,
{'Related':'Timex sports watch',
'Volume Discount':'10'});
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot
Pod','Bluetooth Smart foot pod',64.00
{'Related':'Timex foot pod',
'Volume Discount':'25'});
One partition (storage row) of data
Item details. Flexible for whatev
Partition row key for one users cart
Creates partition row key
Monday, June 24, 13

#CASSANDRA13
Watching users, making decisions. Freaky, but cool.
User Activity Tracking
Monday, June 24, 13

#CASSANDRA13
User activity use case
* React to user input in real time
* Support for multiple application pods
* Scale for speed
* Losing interactions is costly
* Waiting for batch(hadoop) is to long
The bad
Monday, June 24, 13

#CASSANDRA13
User activity data model
* Interaction points stored per
user in short table
* Long term interaction stored
in similar table with date
partition
* Process long term later
using batch
* Reverse time series to get
last N items
Monday, June 24, 13

#CASSANDRA13
User activity data model
CREATE TABLE user_activity (
! username varchar,
! interaction_time timeuuid,
! activity_code varchar,
! detail varchar,
! PRIMARY KEY (username, interaction_time)
) WITH CLUSTERING ORDER BY (interaction_time DESC);
CREATE TABLE user_activity_history (
! username varchar,
! interaction_date varchar,
! interaction_time timeuuid,
! activity_code varchar,
! detail varchar,
! PRIMARY KEY ((username,interaction_date),interaction_time)
);
INSERT INTO user_activity
(username,interaction_time,activity_code,detail)
VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal
login')
USING TTL 2592000;
INSERT INTO user_activity_history
(username,interaction_date,interaction_time,activity_code,detail)
VALUES ('pmcfadin','20130605',0D1454E0-
D202-11E2-8B8B-0800200C9A66,'100','Normal login');
Reverse order based on timestamp
Expire after 30 days
Monday, June 24, 13

#CASSANDRA13
Machines generate logs at a furious pace. Be ready.
Log collection/aggregation
Monday, June 24, 13

#CASSANDRA13
Log collection use case
* Collect log data at high speed
* Cassandra near where logs are generated. Multi-datacenter
* Dice data for various uses. Dashboard. Lookup. Etc.
* The scale needed for RDBMS is cost prohibitive
* Batch analysis of logs too late for some use cases
The bad
Monday, June 24, 13

#CASSANDRA13
Log collection data model
* Use Flume to collect and fan out
data to various tables
* Tables for lookup based on
source and time
* Tables for dashboard with
aggregation and summation
Monday, June 24, 13

#CASSANDRA13
Log collection data model
CREATE TABLE log_lookup (
! source varchar,
! date_to_minute varchar,
! timestamp timeuuid,
! raw_log blob,
! PRIMARY KEY ((source,date_to_minute),timestamp)
);
CREATE TABLE login_success (
! source varchar,
! successful_logins counter,
! PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
CREATE TABLE login_failure (
! source varchar,
! failed_logins counter,
! PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
Consider storing raw logs as GZIP
Monday, June 24, 13

#CASSANDRA13
Log dashboard
0
25
50
75
100
10:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19
Sucessful Logins
Failed Logins
SELECT date_to_minute,successful_logins
FROM login_success
LIMIT 20;
SELECT date_to_minute,failed_logins
FROM login_failure
LIMIT 20;
Monday, June 24, 13

#CASSANDRA13
Because mistaks mistakes happen
User Form Versioning
Monday, June 24, 13

#CASSANDRA13
Form versioning use case
* Store every possible version efficiently
* Scale to any number of users
* Commit/Rollback functionality on a form
* In RDBMS, many relations that need complicated join
* Needs to be in cloud and local data center
The bad
Monday, June 24, 13

#CASSANDRA13
Form version data model
* Each user has a form
* Each form needs versioning
* Separate table to store live
version
* Exclusive lock on a form
Monday, June 24, 13

#CASSANDRA13
Form version data model
CREATE TABLE working_version (
! username varchar,
! form_id int,
! version_number int,
! locked_by varchar,
! form_attributes map<varchar,varchar>
! PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});
UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});
1. Insert ﬁrst version
2. Lock for one user
3. Insert new version. Release lock
Monday, June 24, 13

#CASSANDRA13
That’s it!
“Mind what you have learned. Save you it can.”
- Yoda. Master Data Modeler
Monday, June 24, 13

#CASSANDRA13
Your data model is next!
* Try out a few things
* See what works
* All else fails, engage an expert in the community
* Want more? Follow me on twitter: @PatrickMcFadin
Monday, June 24, 13

Cassandra13: Data Modeling Techniques Using CQL

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Cassandra13: Data Modeling Techniques Using CQL

Semelhante a Cassandra13: Data Modeling Techniques Using CQL (20)

Mais de Patrick McFadin

Mais de Patrick McFadin (20)

Último

Último (20)

Cassandra13: Data Modeling Techniques Using CQL