This document describes Finn.no's use of Cassandra for event counting and statistics. It discusses their evolution from a "stone age" RDBMS-based system, to an older Hadoop/Cassandra system, to their current system. The current system stores raw event data in Cassandra with a RandomPartitioner and runs aggregation jobs to populate aggregated statistics also in Cassandra. It provides advantages over the older systems like real-time statistics and cleaner data models.
9. Usecase: Event counting and statistics
“Stone age” system Old system Current system
Counter updates in
Web-app.
Storing in RDBMS.
Raw Event data in C*
(ByteOrderedPartitioner)
Hadoop jobs to rollup
event data
Aggregated data in C*
(RandomPartitioner +
SuperColumns)
CQL3 for raw Event Data
Composite columns for
Aggregated data
10. Usecase: Event counting and statistics. «Stone age» system
«Stone age» architecture:
● Syncronous counter updates
● Updating counters inside a web-app
● Using Finn's main relational database as a
storage for counters
11. Usecase: Event counting and statistics. «Stone age» system
RDBS
Web server Web server Web server. . .
++count
++count
++count
12. Usecase: Event counting and statistics. «Stone age» system
Pros:
● Real time numbers
Cons:
● High DB commit-log write times during peak-
hours. Overall Finn performance degradation.
● No interaval based statistics like daily counters,
just totals
14. Usecase: Event counting and statistics. Old system
Old architecture:
● Asyncronous event logging via Scribe
● Saving event data in a raw unnormalized format
to C*
● Hadoop jobs to sum up event counters over time
periods
● Serving aggregated statistics from C*
15. Usecase: Event counting and statistics. Old architecture
Pros:
● Less load on main RDBMS
● Intervall based statistics
● Ability to re-aggregate data and get new insights
● Better Command-Query separation
Cons:
● Not real-time, although jobs run every minute
22. Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
23. Usecase: Event counting and statistics. Old system. Event logging
Event bean (Thrift IDL):
struct Event {
/** Event domain. Typical AD, CV, Oppdrag */
1: required string type;
/** Event name. Typical PageView, EmailSent */
2: optional string subCategory;
/** Arbitrary key-value map with extra info like finnkode or userid */
3: required map<string, string> values;
}
● Event bean-> binary-> base64 + timestamp-> scribe message
24. Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
25. Usecase: Event counting and statistics. Old system. Raw event data
● Ca 1500k events per sec to log in peak times
● ByteOrderedPartitioner for storing raw data. TTL = 3mnth
(picture from datastax.com)
Timestamp Eventbean
1369164000 0x1f0562bda6...
1369164001 0x364dd9a5a6...
1369164002 0x4d96508da6...
1369164003 0x64dec775a6...
Sequential rowkeys = Hadoop friendly
get_range_slices(1369164000, 1369164003) to
get data for Hadoop splits
26. Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
27. Usecase: Event counting and statistics. Old system. Aggregation
Hadoop jobs:
● Sum up events for each finnkode grouped by
subCategory.
map(events):
for event in events:
finnkode = event.getValues().get(“ad.id”)
subcategoryCount = (event.subcategory, 1)
emit(finnkode, subcategoryCount)
reduce(finnkode, subcategoryCountList):
subcategoryTotals = {}
for subcategory, count in subcategoryCountList:
subcategoryTotals[subcategory] += count
for subcategory, count in subcategoryTotals.iteritems():
incrementCassandraCounter(finnkode, subcategory, count, HOUR_somehour)
incrementCassandraCounter(finnkode, subcategory, count, DAY_someday)
incrementCassandraCounter(finnkode, subcategory, count, TOTAL)
28. Usecase: Event counting and statistics. Old system. Data flow
webserver1.finn.no
Tomcat
Scribe daemon
async event logging
cassandra1.finn.no
Cassandra
(ByteOrderedPartitioner)
Raw event data
webserver2.finn.no
…..
webserver3.finn.no
…..
Scribe daemon
Cassandra
(RandomPartitioner)
Aggregated data
Hadoop aggregation job
29. Usecase: Event counting and statistics. Old system. Aggregated data
SuperColumns for interval-based counters
HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL
PageView Email PageView Email PageView Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
Min. time resolution QUARTER_HOUR
No TTL on Counter columns, got really wide and slow
http://www.makingitscale.com/2012/scaling-cassandra-counter-columns.html
30. Usecase: Event counting and statistics. Current system. Aggregated data
HOUR_2013_05
_30_18:
PageView
HOUR_2013_05
_30_18:
Email
DAY_2013_05_30:
PageView
DAY_2013_05_30:
Email
TOTAL:
PageView
TOTAL:
Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
HOUR_2013_05_30_18 DAY_2013_05_30 TOTAL
PageView Email PageView Email PageView Email
3706119 10 2 20 4 50 5
3706052 23 3 102 4 234 10
SuperColumns
CompositeColumns
+ clean up jobs to remove old QUARTER_HOUR and HOUR columns
Migration from SuperColumns to Composite columns
32. Usecase: Event counting and statistics. Old system. Disadvantages
Old Raw event data cluster disadvantages:
ByteOrderedPartitioner: just one node at work at a time taking all load
Skinny rows: 5 billions rows for 3 month of data on each node
Extreme unstable instances failing with OOM errors
Hadoop jobs fail or hang on init stage despite of QUORUM consistency
Outdated statistics across Finn services
+
=
33. Usecase: Event counting and statistics. Current system. Raw data
Current system for raw data:
● Same cluster as aggregated data, i.e.
RandomPartitioner + CQL3
34. Usecase: Event counting and statistics. Current system. Raw data
● CQL = SQL without JOINs, GROUPBYs and other unimportant stuff
● Abstraction over physical C* storage
● CQL Table transposes rows into Composite columns
Timestamp Eventbean
1369164000 0x1f0562bda6...
1369164001 0x364dd9a5a6...
1369164002 0x4d96508da6...
1369164003 0x64dec775a6...
1369164000:
Eventbean
1369164001:
Eventbean
1369164002:
Eventbean
1369164003:
Eventbean
0x1f0562bda6... 0x364dd9a5a6... 0x4d96508da6... 0x64dec775a6..
.
CQL table
Underlying ColumnFamily
Timebucket
1369164000
Timebucket
1369164000
1369164000
1369164000
1369164000
Partition key
(row key)
Clustering key
Partition + clustering = PRIMARY KEY
Other columns are static for every PK pair
35. Usecase: Event counting and statistics. Current system. Raw data
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
“real” timestamp – Event occurred at the client
“collected” timestamp – Event reached C*
36. Usecase: Event counting and statistics. Current system. Raw data
Hadoop data reading:
1. Get a list of InputSplits
HDFS:
A file block is replicated across several machines
FileInputSplit: (“file, start, length”, IP-addresses)
C*:
Rows with same Partition key are replicated across several machines
EventsInputSplit: (Partition key, IP-addresses)
2. InputSplit → Map-tasks on IP-addresses
3. Map-task reads data based on:
HDFS: “file, start, length”
C*: Partition key
37. Usecase: Event counting and statistics. Current system. Raw data
Process all data collected 18:00 – 19:00 30.05.2013:
1. Get InputSplits:
For minute in (18:00-19:00).getMinutes:
SELECT realtb_sharded FROM events WHERE collected_tb = minute
token(realtb_sharded) → IP-addresses
2. Map-task:
SELECT * FROM events WHERE realtb_sharded='17:58'
SELECT * FROM events WHERE realtb_sharded='17:59'
SELECT * FROM events WHERE realtb_sharded='18:03'
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
38. Usecase: Event counting and statistics. Current system. Raw data
Process data of type “AD” collected 18:00 – 19:00 30.05.2013:
1. Get InputSplits:
For minute in (18:00-19:00).getMinutes:
SELECT realtb_sharded FROM events WHERE collected_tb = minute
and type = “AD”
token(realtb_sharded) → IP-addresses
2. Map-task:
SELECT * FROM events WHERE realtb_sharded='17:58' and type = “AD”
and collected_ts>minTimeuuid(“18:00”)
and collected_ts<maxTimeuuid(“19:00”)
CREATE TABLE events (
realtb_sharded text, ← Partition key
type text, ← Clustering key
collected_ts timeuuid, ← Clustering key
PRIMARY KEY(realtb_sharded, type, collected_ts),
key_values_json text,← Static column
real_ts timestamp, ← Static column
real_tb bigint, ← Secondary Index
collected_tb bigint ← Secondary Index
);
39. Usecase: Event counting and statistics. Current system. Raw data
Getting InputSplits:
Get Partition keys for data of type AD collected during timebucket 29.05.2013 21:18 –
21:19
Hadoop Map-task:
Get rows for a Partition key from split limiting by type and collected timestamp
40. Usecase: Event counting and statistics. NextGen
Event Counting and Statistics NextGen:
Ad-hoc analytics:
– Apache Hive integration (we have Pig)
– Hive ODBC driver for Tableu integration
Aggregation jobs:
– Higher level library like Cascading or Apache Crunch than raw
M/R-code.
Hadoop 2