Thursday, September 24
1:50 PM - 2:30 PM
Ballroom G
Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm
It turns out that much can be learned about Cassandra in a year's time, given a high enough pain tolerance on the part of an organization's founders. Join us as we at Timeli.io step you through exactly what happened when we walked into an in-production time series implementation that somehow could not return data in a time series format to its existing customers. We will then discuss how we then re-worked that same implementation to be fully functional, and how we started on the road to finding the keys to Cassandra's legendary performance capabilities in a Zookeeper/AMQP/SQL/Tomcat stack. The path for this journey left no block unstumbled, so if your organization still has toes that are left unbruised this talk could well save you pain.
2. 2
Company
§ Founded in 2013
§ Based in Boulder, CO & Sunnyvale, CA
Product/Business
§ Predictive asset analytics solutions
§ Operational applications for connected equipment
Technology Platform
§ Time series data and analytics platform
§ Proprietary time series data processing layer
§ Leverages “best of breed” open source software
Industry Verticals
§ Oil & Gas
§ Manufacturing
§ Utilities – Electric, Gas & Water
Company Overview
3. Who are we to talk?
² Time Series data ingestion engine, platform, predictive analytics
² Validation, Estimation, Regularization
² Aggregations (ie. Coarse Graining)
² Based on Utilities software started in Europe in 2009
² Added Cassandra to stack in 2011
Timeli.io
I started in late 2013 and discovered quickly something they had missed:
Cassandra can be hard to do right
5. But first …
Cassandra:
² Sister to Helen of Troy
² More beautiful, more sought after, wiser
² Even the gods themselves
² Promised a wild night to Apollo for power of prophecy
² Reneged
² Apollo left her with prophecy, but made it so nobody
believed her
… a minor cultural digression
Moral: Cassandra accurately predicted the Fall of Troy.
6. Just like Cassandra of legend …
… real-life Cassandra difficult to “believe”
² Selects designed beforehand
² Denormalization
² Many arcane configuration options
² Hard to find expertise
² Based on “tables” but not tabular
² CQL looks like SQL. It’s not SQL.
“No indexed columns present in by-columns clause with Equal operator”
“ORDER BY is only supported when the partition key is restricted by an EQ or an IN”
“PRIMARY KEY column ‘timestamp’ cannot be restricted”
“Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.”
7. What did this mean for Timeli?
Example: Timeli ingests data, writes to raw, writes to processed, then coarse grains 1 or more series into “aggregations.”
8. Multiple very competent RDBMS/Java/JPA architects built a time series app where the following
could not be done:
SELECT * FROM aggregations where meter_id=4bbedd76-4e9e-11e5-885d-feff819cdc9f
AND timestamp > 2013-01-01 AND timestamp < 2013-03-01;
Early Warning
Aggregations, the primary product:
“It’s a security feature! You have to know when your data exists to get your data!”
Cassandra isn’t crazy
9. New Beginnings
Out of all of this, Timeli was born
What did we change?
1. Partitioner
2. Primary Keys and Row Keys
3. Performance/Missing data in Collection types
4. Batching for “Performance”
5. Double Precision vs. BigDecimal
6. QueryBuilder vs Prepared Statements
7. Row Limits
10. 1. The Partitioner
What is a partitioner in Cassandra?
Data
Cassandra Ring
² Byte Ordered Partitioner
² Random Partitioner
² Mumur3 Partitioner
Three Types:
B …
S …
S …
S …
Z …
T …
11. 1. The Partitioner
What is a partitioner in Cassandra?
S …
S …
S …
T …
B …
Data
Z …
Cassandra Ring
² Byte Ordered Partitioner
² Random Partitioner
² Mumur3 Partitioner
Three Types:
12. 1. The Partitioner
What is a partitioner in Cassandra?
S …
T …
B …
S …
Data
Z …
S …
Cassandra Ring
² Byte Ordered Partitioner
² Random Partitioner
² Mumur3 Partitioner
Three Types:
Murmur3 is a random partitioner as well but faster
13. 1. The Partitioner
What is a partitioner in Cassandra?
S …
T …
B …
S …
Data
Z …
S …
Cassandra Ring
² Byte Ordered Partitioner
² Random Partitioner
² Mumur3 Partitioner
Three Types:
² Our partition keys were of form {UUID}|{string key}
² UUID 1s are uniformly distributed but keys are not
² ByteOrderedPartitioner left big gaps:
> nodetool status ts
Datacenter: us-central1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.82.79.110 4.27 GB 256 44.9% 29d0a723-fc1f-4f73-a864-97dc6df045f5 b
UN 10.105.185.1 2.51 GB 256 26.4% 1d236bd9-5fb1-4423-83bc-168bac924db4 b
UN 10.234.92.2 2.73 GB 256 28.7% 29e1358a-bef2-495e-80bc-3de4c4499790 b
15. 2. Primary Keys and Row Keys
Aggregation Table
A coarse graining of a time series into measures on buckets of larger size than original time resolution
0
5
10 T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T13
T14
T15
T16
T17
T18
T19
T20
T21
T22
T23
T24
Original
Original
0
5
10
T1 T8 T9 T16 T24 T24
8-Hour Mean
8-Hour Max
8-Hour Min
16. 2. Primary Keys and Row Keys
Original Persistence Model
Aggrega&on_ID
Index
Period
Count
Sum
Average
Max
Min
Measurements
UUID
Long
DateTime
Long
Double
Double
Double
Double
Map<DateTime,
Double>
² Aggregation_ID: UUID/ identifier associated with aggregation metadata
² Period: DateTime of start of aggregation
² Index: Offset from DateTime of fixed aggregation bucket
² Count, Sum, Average, Max, Min: values of aggregation on the bucket
² Measurements: map of all measurements included in the system
PRIMARY KEY (Aggregation_ID, Index)
17. 2. Primary Keys and Row Keys
Original Persistence Model, Storage Representation
Aggregation_ID
Index 1 Index 2 Index 3 …
Period,
Count, etc.
Period,
Count, etc.
Period,
Count, etc.
Index N
Period,
Count, etc.
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
✔
18. 2. Primary Keys and Row Keys
Original Persistence Model, Storage Representation
Aggregation_ID
Index 1 Index 2 Index 3 …
Period,
Count, etc.
Period,
Count, etc.
Period,
Count, etc.
Index N
Period,
Count, etc.
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index = 1
✔
✔
19. 2. Primary Keys and Row Keys
Original Persistence Model, Storage Representation
Aggregation_ID
Index 1 Index 2 Index 3 …
Period,
Count, etc.
Period,
Count, etc.
Period,
Count, etc.
Index N
Period,
Count, etc.
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index = 1
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index >= 1 AND Index < 3
✔
✔
✔
20. 2. Primary Keys and Row Keys
Original Persistence Model, Storage Representation
Aggregation_ID
Index 1 Index 2 Index 3 …
Period,
Count, etc.
Period,
Count, etc.
Period,
Count, etc.
Index N
Period,
Count, etc.
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index = 1
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index >= 1 AND Index < 3
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287
AND Index = 1 AND Period > 2015-01-01 AND Period < 2015-02-01
✔
✔
✔
✖
21. 2. Primary Keys and Row Keys
Fixed Persistence Model
Aggrega&on_ID
StartDate
Count
Sum
Average
Max
Min
Measurements
UUID
Timestamp
Long
Double
Double
Double
Double
Map<Timestamp,
Double>
PRIMARY KEY (Aggregation_ID, StartDate)
² Index column not required
² Primary key allows row key and clustering
Aggregation_ID
2015-01-01 2015-01-02 2015-01-03 …
Count, etc. Count, etc. Count, etc.
2015-12-31
Count, etc.
² SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 AND Index = 1
AND Period > 2015-01-01 AND Period < 2015-02-01
✔
2015-01-31
Count, etc.
…
22. 2. Primary Keys and Row Keys
Moral: Consider which queries you need to make and design around them
23. 3. Performance/Missing data in Collection types
Collections in C*
² C*: supposed to denormalize data
² Measurements arriving to be included in aggregation
² How to be sure they’re included?
² Keep copy
Rationale
Aggrega&on_ID
StartDate
Count
Sum
Average
Max
Min
Measurements
UUID
Timestamp
Long
Double
Double
Double
Double
Map<Timestamp,
Double>
24. 3. Performance/Missing data in Collection types
Collections in C*
² C*: supposed to denormalize data
² Measurements arriving to be included in aggregation
² How to be sure they’re included?
² Keep copy
Rationale
Downsides
² Lots of storage space – do we really need value?
² In < 2.1, performance implications (serialization)
² All values returned
² 64K limit! modulus => missing data
Aggrega&on_ID
StartDate
Count
Sum
Average
Max
Min
Measurements
UUID
Timestamp
Long
Double
Double
Double
Double
Map<Timestamp,
Double>
25. 3. Performance/Missing data in Collection types
Collections in C*
Aggrega&on_ID
StartDate
Count
Sum
Average
Max
Min
Measurements
UUID
Timestamp
Long
Double
Double
Double
Double
Blob
² Know start date
² Know all measurement timestamps in processed data
² Keep a bit for each
Solution
2015-‐01-‐01T00:00
2015-‐01-‐01T00:01
2015-‐01-‐01T00:02
2015-‐01-‐01T00:03
2015-‐01-‐01T00:04
2015-‐01-‐01T00:05
2015-‐01-‐01T00:06
1
0
1
1
0
1
1
Bitwise Verifier
One minute expected timestamps, 6 minute aggregations. 2 still missing below:
26. 3. Performance/Missing data in Collection types
Moral: limits in Cassandra are important, not always enforced, and have consequences
27. 4. Batching for “Performance”
Slave
Master
Slave
Application
Server
Traditional Master/Slave model
Write data
² App server writes to remote DB
² Across network
² Latency! Many writes => N x 200ms
² Solution: batch multiple commands to save
~200ms
~1-10ms
~1-10ms
Single data center
28. 4. Batching for “Performance”
Peer
B
Peer
A
Peer
C
Application
Server
Peers model with atomicity
Write data
² Batches are atomic
² CAP: can either lock DB across all nodes or perform on just one and publish
² Cassandra chooses latter (fast writes)
² => Batches with large numbers of writes all execute on A
² => 1/3 the processing power
~200ms
~1-10ms
~1-10ms
Single data center
30. 5. Double Precision vs. BigDecimal
² double a = Math.round(1.14 * 75); // round 85.5 represented as 85.4999,
gets 85
² float 10.0/3; // = 3.3333333333333335;
² for (float f = 10f; f!=0; f-=0.1) {
System.out.println(f);
}
² double x = .37; //.370000004 or .36999999998 or …
Java has some quirks with floating point representations
What do the following have in common?
31. 5. Double Precision vs. BigDecimal
The model so far
Aggrega&on_ID
StartDate
Coun
t
Sum
Average
Max
Min
Measurements
UUID
Timestamp
Long
Double
Double
Double
Double
Blob
² Cassandra written in Java
² Java has floating point errors
² Our aggregated values are leaking!
Aggrega&on_ID
StartDate
Count
Measures
Measurements
UUID
Timestamp
Long
Map<String,
BigDecimal>
Blob
For good measure …
² Wrapped our measures in a Map for flexibility (add new measures on fly)
32. 5. Double Precision vs. BigDecimal
Moral: Law of Leaky Abstractions (a Java app is a Java app)
Bonus moral: use C* collections for good, not evil
33. 6. QueryBuilder vs Prepared Statements
CQL Driver in Java allows various types of statements
1. Regular Statement
2. Prepared Statement
Regular Statement:
² Convenient
² Readable
² QueryBuilder to help build
² Tempting!
34. 6. QueryBuilder vs Prepared Statements
QueryBuilder.select().all()
.from("table")
.where(QueryBuilder.eq(“partition_key”,
5))
App
Server
Cassandra Cluster
Query Schematic (Regular Statement)
ResultSet
35. 6. QueryBuilder vs Prepared Statements
Problem: Regular Statements are a lot of bytes!
Bound Statements
² Register with C* cluster
² Text of statement sent once with placeholders
² Subsequent requests are a key and params
² Avoids transfer costs
36. 6. QueryBuilder vs Prepared Statements
App
Server
Cassandra Cluster
Query Schematic (Bound Statement)
ResultSet
“select * from table
where
partition_key = ?”
5
37. 6. QueryBuilder vs Prepared Statements
Moral: Caching is your friend. Cache queries on C*, particularly ones being done many times.
38. 7. Row Limits
The model so far: “Wide Rows”
² Unique ID for partition
² StartDate clustering key allows ranged
² Count of measurements included
² Map of measures with precise storage
² Binary representation of measurements included
Aggrega&on_ID
StartDate
Count
Measures
Measurements
UUID
Timestamp
Long
Map<String,
BigDecimal>
Blob
39. 7. Row Limits
² Cassandra row limit => 2 billion items per row
² Best results (Ebay) “a few hundred million per row” (~500 mil)
Practical storage limits
How much time does this represent?
Time
Resolu&on
500
million
&mestamps
1
day
~1.37
E
6
years
1
hour
57,077
years
1
minute
951
years
1
second
15.85
years
1
millisecond
5.78
days
40. 7. Row Limits
No business case has yet used aggregations on less than 1 min
For aggregations we’re probably fine
But we collect raw/processed measurements as well
At millisecond resolution, <6 days not ok
Can constrain row size using compound PK
² Have resolution on channel, Rc (milliseconds)
² Have number of items in row K (eg. 500m)
² Get a baseline on epoch (Jan 1, 1970 12:00AM)
² => The Batch index can be calculated
double batchInd = Math.floor(date.getMillis()/ K * Rc)
43. Wrapup
Aggrega&on_ID
Index
Period
Count
Sum
Average
Max
Min
Measurements
UUID
Long
Timestamp
Long
Double
Double
Double
Double
Map<Timestamp,
Double>
Final model
Initial model
Aggrega&on_ID
BatchIndex
StartDate
Count
Measures
Measurements
UUID
Int
Timestamp
Long
Map<String,
BigDecimal>
Blob
1. Couldn’t do ranged queries in time
2. Ran out of space in measurement map
3. Columnar approach to measures => less flexibility
4. Rows not very wide
Evolution
44. Wrapup
Lessons Learned
1. Read the manual. Partitioners are important. Other configuration options as well.
2. Consider which queries you need to make and design around them.
3. Limits in Cassandra are important, not always enforced, and have consequences.
Exceeding collection limits will lose you data.
4. Don’t batch for speed, only for atomicity.
5. C* is a Java app and subject to floating point errors
6. C* collections are useful for avoiding multitable queries without joins.
7. Cache queries on C* using Prepared/Bound statments, particularly ones being done
many times.
8. Pay attention to row limits