Talk I did at Cassandra Summit 2012 on building an application based on the Cassandra data source. Meant to be an introduction into how to start the development process.
Cassandra Summit 2012 - Building a Cassandra Based App From Scratch
1. Building a Cassandra based
application from scratch
Patrick McFadin
Cassandra Summit 2012
#cassandra12
2. This is me
• Chief Architect at Hobsons
– Hobsons is an education services company. More
here: www.hobsons.com
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
3. Goals
• Take a new concept
• What’s the data model?!?!
• Some sample code
• You get homework! (If you want)
4. Here’s the plan
• Conceptualize a new application
• Identify the entity tables
• Identify query tables
• Code. Rinse. Repeat.
• Deploy
• …
• Profit!
* I’ll be using the term Tables which is equivalent to Column Families
5. www.killrvideos.com
Video Tit le User name
Recommended
D ipt ion
escr
Start with a Meow
Ads
concept by Google
Video Sharing Website Rat ing: Tags: Foo Bar
Upload New!
Comment s
*Cat drawing by goodrob13 on Flickr
6. Break down the features
• Post a video*
• View a video
• Add a comment
• Rate a video
• Tag a video
* Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
8. Users
password FirstName LastName
Username
• Similar to a RDBMS table. Fairly fixed columns
• Username is unique
• Use secondary indexes on firstname and lastname for lookup
• Adding columns with Cassandra is super easy
CREATE TABLE users (
username varchar PRIMARY KEY,
firstname varchar,
lastname varchar,
password varchar
);
9. Users: The set code
static void setUser(User user, Keyspace keyspace) {
// Create a mutator that allows you to talk to casssandra
Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
try {
// Use the mutator to insert data into our table
mutator.addInsertion(user.getUsername(), "users",
HFactory.createStringColumn("firstname", user.getFirstname()));
mutator.addInsertion(user.getUsername(), "users”,
HFactory.createStringColumn("lastname", user.getLastname()));
mutator.addInsertion(user.getUsername(), "users",
HFactory.createStringColumn("password", user.getPassword()));
// Once the mutator is ready, execute on cassandra
mutator.execute();
} catch (HectorException he) {
he.printStackTrace();
}
}
You can implement the get…
10. Videos
UserName Description Tags
VideoId VideoName
<UUID>
• Use a UUID as a row key for uniqueness
• Allows for same video names
• Tags should be stored in some sort of delimited format
• Index on username may not be the best plan
CREATE TABLE videos (
videoid uuid PRIMARY KEY,
videoname varchar,
username varchar,
description varchar,
tags varchar
);
11. Videos: The get code
static Video getVideoByUUID(UUID videoId, Keyspace keyspace){
Video video = new Video();
//Create a slice query. We'll be getting specific column names
SliceQuery<UUID, String, String> sliceQuery =
HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);
sliceQuery.setColumnFamily("videos");
sliceQuery.setKey(videoId);
sliceQuery.setColumnNames("videoname","username","description","tags");
// Execute the query and get the list of columns
ColumnSlice<String,String> result = sliceQuery.execute().get();
// Get each column by name and add them to our video object
video.setVideoName(result.getColumnByName("videoname").getValue());
video.setUsername(result.getColumnByName("username").getValue());
video.setDescription(result.getColumnByName("description").getValue());
video.setTags(result.getColumnByName("tags").getValue().split(","));
return video;
}
You can implement the set…
12. Comments
VideoId Username:<timestamp> .. Username:<timestamp>
<UUID>
Time Order
• Videos have many comments
• Use Composite Columns to store user and time
• Value of each column is the text of the comment
• Order is as inserted
• Use getSlice() to pull some or all of the comments
CREATE TABLE comments (
videoid uuid PRIMARY KEY
comment varchar
);
13. Rating a video
rating_count rating_total
VideoId
<UUID> <counter> <counter>
• Use counter for single call update
• rating_count is how many ratings were given
• rating_total is the sum of rating
• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6
CREATE TABLE video_rating (
videoid uuid PRIMARY KEY,
rating_counter counter,
rating_total counter);*
* Only valid in CQL 3+
14. Video Event
start_<timestamp> stop_<timestamp> start_<timestamp>
VideoId:Username
video_<timestamp>
Time Order
• Track viewing events
• Combine Video ID and Username for a unique row
• Stop time can be used to pick up where they left off
• Great for usage analytics later
CREATE TABLE video_event (
videoid_username varchar PRIMARY KEY,
event varchar
);
16. Lookup Video By Username
VideoId:<timestamp> .. VideoId:<timestamp>
Username
• Username is unique
• One column for each new video uploaded
• Column slice for time span. From x to y
• VideoId is added the same time a Video record is added
CREATE TABLE username_video_index (
username varchar PRIMARY KEY,
videoid_timestamp varchar
);
17. Videos by Tag
VideoId .. VideoId
tag
• Tag is unique regardless of video
• Great for “List videos with X tag”
• Tags have to be updated in Video and Tag at the same time
• Index integrity is maintained in app logic
CREATE TABLE tag_index (
tag varchar PRIMARY KEY,
videoid varchar
);
18. Deployment strategies
• Measure your risk
– Replication factor?
– Multi-datacenter?
– Cost?
• Performance
– Today != tomorrow. Scale when needed
– Have a expansion plan ready
19. Wrap up
• Similar data model process to RDBMS… to
start
• Query -> Index table
• Don’t be afraid to write in multiple tables at
once
• Bonus points: Hadoop and Solr!
20. Go play!
• Go to: http://github.com/pmcfadin
• Look for projects with cassandra12
• Clone or fork my examples
• Implement stubbed methods
• Send me your solutions: pmcfadin@gmail.com
• Follow me for updates: @PatrickMcFadin