Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Cassandra Community Webinar | Become a Super Modeler
1. Become a super modeler
Patrick McFadin @PatrickMcFadin
Senior Solutions Architect
DataStax
2. ... the saga continues.
This is the second part of a data modeling series
Part 1:The data model is dead, long live the data model!
• Relational -> Cassandra topics
• Basic entity modeling
• one-to-many
• many-to-many
•Transaction like modeling
3. Becoming a super modeler
• Data model is the key to happiness
• Successful deployments depend on it
• Not just a Cassandra problem...
3
4. Time series - Basic
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
• Weather station collects regular temperature
• Each weather station is a row
• Each event is a new column in a wide row
5. Time series - Super!
• Every second? Row would be too big
• Order by access pattern
• Partition the rows by day
- One weather station by day
5
CREATE TABLE temperature_by_day (
weatherstation_id text,
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
Compound row key
Reverse sort: Last event, first on row
6. User model - basic
• Plain ole entity table
• One primary key
• Booooring
6
CREATE TABLE users (
username text PRIMARY KEY,
first_name text,
last_name text,
address1 text,
city text,
postal_code text,
last_login timestamp
);
7. Cassandra feature - Collections
• Collections give you three types:
- Set
- List
- Map
• Each allow for dynamic updates
• Fully supported in CQL 3
• Requires serialization so don’t go crazy
7
CREATE TABLE collections_example (
! id int PRIMARY KEY,
! set_example set<text>,
! list_example list<text>,
! map_example map<int,text>
);
8. Cassandra Collections - Set
• Set is sorted by CQL type comparator
8
INSERT INTO collections_example (id, set_example)
VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection name Collection type CQLType
9. Cassandra Collections - Set Operations
9
UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;
• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
10. Cassandra Collections - List
• Ordered by insertion
10
list_example list<text>
Collection name Collection type CQLType
INSERT INTO collections_example (id, list_example)
VALUES(1, ['1-one', '2-two']);
11. Cassandra Collections - List Operations
• Adding an element to the end of a list
11
UPDATE collections_example
SET list_example = list_example + ['3-three'] WHERE id = 1;
UPDATE collections_example
SET list_example = ['0-zero'] + list_example WHERE id = 1;
• Adding an element to the beginning of a list
UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;
• Deleting an element from a list
12. Cassandra Collections - Map
• Key and value
• Key is sorted by CQL type comparator
12
INSERT INTO collections_example (id, map_example)
VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection name Collection type Value CQLTypeKey CQLType
13. Cassandra Collections - Map Operations
• Add an element to the map
13
UPDATE collections_example
SET map_example[3] = 'three' WHERE id = 1;
UPDATE collections_example
SET map_example[3] = 'tres' WHERE id = 1;
DELETE map_example[3]
FROM collections_example WHERE id = 1;
• Update an existing element in the map
• Delete an element in the map
14. User model - Super!
•Take boring user table and kick it up
• Great for static + some dynamic
•Takes advantage of row level isolation
14
CREATE TABLE user_with_location (
! username text PRIMARY KEY,
! first_name text,
! last_name text,
! address1 text,
! city text,
! postal_code text,
! last_login timestamp,
! location_by_date map<timeuuid,text>
);
15. Super user profile - Operations
• Adding new login locations to the map
15
UPDATE user_with_location
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
UPDATE user_with_location
USING TTL 2592000 // 30 Days
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
• Adding new login locations to the map +TTL!
16. Indexing
• Indexing expresses application intent
• Fast access to specific queries
• Secondary indexes != relational indexes
• Use information you have. No pre-reads.
16
Goals:
1. Create row key for speed
2. Use wide rows for efficiency
17. Keyword index
• Use a word as a key
• Columns are the occurrence
• Ex: Index of tag words about videos
17
CREATE TABLE tag_index (
tag varchar,
videoid uuid,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
);
VideoId1 .. VideoIdNtag
Fast
Efficient
18. Partial word index
• Where row size will be large
•Take one part for key, rest for columns name
18
CREATE TABLE email_index (
domain varchar,
user varchar,
username varchar,
PRIMARY KEY (domain, user)
);
INSERT INTO email_index (domain, user, username)
VALUES ('@relational.com','tcodd', 'tcodd');
User: tcodd Email: tcodd@relational.com
19. Partial word index - Super!
• Create partitions + partial indexes FTW
19
CREATE TABLE product_index (
store int,
part_number0_3 int,
part_number4_9 int,
count int,
PRIMARY KEY ((store,part_number0_3), part_number4_9)
);
INSERT INTO product_index (store,part_number0_3,part_number4_9,count)
VALUES (8675309,7079,48575,3);
SELECT count
FROM product_index
WHERE store = 8675309
AND part_number0_3 = 7079
AND part_number4_9 = 48575;
Compound row key!
Fast and efficient!
• Store #8675309 has 3 of part# 7079748575
20. Bit map index
• Multiple parts to a key
• Create a truth table of the different combinations
• Inserts == the number of combinations
- 3 fields? 7 options (Not going to use null choice)
- 4 fields? 15 options
20
21. Bit map index
• Find a car in a lot by variable combinations
21
Make Model Color Combination
x Color
x Model
x x Model+Color
x Make
x x Make+Color
x x Make+Model
x x x Make+Model+Color
22. Bit map index -Table create
• Make a table with three different key combos
22
CREATE TABLE car_location_index (
make varchar,
model varchar,
color varchar,
vehical_id int,
lot_id int,
PRIMARY KEY ((make,model,color),vehical_id)
);
Compound row key with three different options
23. Bit map index - Adding records
• Pre-optimize for 7 possible questions on insert
23
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','','Blue',1234,8675309);
24. Bit map index - Selecting records
• Different combinations now possible
24
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = 'Ford'
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = ''
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
8765 | 5551212
25. Feeling super yet?
• Use these skills. Save you they will.
• Don’t settle for boring data models
• Stay tuned for more!
25
• Final will be at the Cassandra Summit: June 11th
The worlds next top data model
26. Be there!!!
26
Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.
Here is my discount code! Use it: PMcVIP
27. Bonus!
• DataStax Java Driver Preso - June 12th
• Download today!
27
https://github.com/datastax/java-driver