4. WHAT I’M TALKING ABOUT
How we started using Cassandra
How we use it to power the X Factor and Britain’s Got Talent apps
Counting - harder than you might think
What we learnt along the way
4
5. THE CHALLENGE
10-12 Million people watching these shows
TV tells them to buzz/clap/score....
....Servers melt
Design goals to handle 10K interactions/s
5
6. ROLL BACK 1 YEAR
We’d won BGT 2011 - our first big talent show
Existing MySQL/Django/Python stack
Back of envelope calculations....oh dear
Needed something quickly that could cope with anticipated load
6
7. OUR FIRST CASSANDRA SCHEMA
create column family vote_log with comment='Log of votes'
and comparator='UTF8Type' and key_validation_class='UUIDType'
and default_validation_class='UTF8Type'
and column_metadata = [
{column_name:'ipaddr', validation_class:'AsciiType'},
{column_name:'poll', validation_class:'LongType'},
{column_name:'choice', validation_class:'LongType'},
{column_name:'idtoken', validation_class:'UTF8Type'},
{column_name:'count', validation_class:'LongType'}];
7
8. WHAT WE LEARNT
Cassandra scales beautifully for writes
Cassandra has no single point of failure
....but it’s not hard to make it fail
Ad-hoc questions and reporting were going to be much slower
8
9. OPERATIONS
BGT 2011 was a write only DB
Ignored failures
One cluster, one AZ
Backup to MySQL
9
10. Over 1 Million app
downloads
Over 260 Million boos/claps
X FACTOR 2011
10
11. IMPLEMENTING X FACTOR WITH CASSANDRA
Counting
Social network
No longer write-only
11
12. WHAT ARE MY FRIENDS DOING?
Scale makes this hard
10K changes/s
Which ones are relevant to which users?
When new users (and their social graph) can arrive at any time
12
13. SOLUTION
New Column Family - user activity
Maps user to their interactions
Write problem nicely randomised and thus ideal for Cassandra
Read problem!
13
14. COUNTING - HARDER THAN IT LOOKS
Everyone can count
But we need to count really fast
And distribute the results to all the clients
14
15. DISTRIBUTED COUNTING
“Memcache does counters”
“OK, how about sharding?”
“Well, I hear Cassandra 0.8 has counters”
15
19. SINGLE BOX LIMITS
We have a single value
Everything needs to read and write that value -
from multiple servers
17
20. SINGLE BOX LIMITS
We have a single value
Everything needs to read and write that value -
from multiple servers
EC2 limits
Single Memcache server runs out of
network I/O
What then?
17
21. CASSANDRA HAS COUNTERS
New (at the time) feature in Cassandra 0.8
Special column type - CounterColumnType as the validator
Distributed 64 bit counter, with eventual consistency
CL.ONE writes recommended to avoid implicit reads impacting performance
Reads tot up values from replicas to give value
Simple functionality
incr()/decr(), get()
18
24. CAN CASSANDRA COUNT?
Yes, But....
Performance can be an issue
Switch off replicate_on_write, tune RF & cluster size
19
25. CAN CASSANDRA COUNT?
Yes, But....
Performance can be an issue
Switch off replicate_on_write, tune RF & cluster size
Not scalable for single counter
Scales as function of RF up to 4 nodes
Above that ... you’re out of luck
Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances
19
26. CAN CASSANDRA COUNT?
Yes, But....
Performance can be an issue
Switch off replicate_on_write, tune RF & cluster size
Not scalable for single counter
Scales as function of RF up to 4 nodes
Above that ... you’re out of luck
Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances
What do you do if an operation fails?
19
27. COUNTING AT SCALE WITH CASSANDRA
Write throughput to a single counter is limited
We were inside the performance limit, so writes could go to Cassandra
No way to scale within Cassandra (yet)
Reads have a serious performance overhead
We used sharded counters in memcached with source of truth in Cassandra
Few reads from Cass = much more predictable performance
20
28. OPERATIONS
Cassandra GUIs & mgmt consoles still in infancy
Hard to figure out what is going wrong when performance suffered
Analytics (and backup) still via dump to MySQL
Flexible, well understood
Single cluster, single AZ
21
29. WHERE WE WERE AFTER X FACTOR
Cassandra as a source of truth in production
Mainly write load
Memcached layer on top
Simple operations
No backups :(
22
30. BEYOND X FACTOR
Dancing on Ice - harder counting
Britain’s Got Talent 2012 - more social
Backups
Data integrity
23
31. DATA CONSISTENCY
There’s no referential integrity
So is the data in the database self-consistent?
Or do you have a bug somewhere?
How do you validate the data?
Truth + 1
24
32. BACKUPS
Backing up a cluster isn’t easy
Restoring can be harder...
25
33. CONCLUSION
Cassandra saved our bacon :)
Scales to insane write loads
Reads are easier to scale in memcached
Beware of limitations on “hot” values
Migrating functionality gradually let us learn the operational aspects
There are lots of interesting failure scenarios at scale
26
35. ANY QUESTIONS?
We’re hiring - if you want to work on wicked scaling problems and reach millions of
users, get in touch!
malcolm@tellybug.com
@malcolmbox
28
Notas do Editor
\n
Who I am.\nBackground in mobile\nNot a Big Data Expert\n\n
Apps that make TV more entertaining\nBig shows, big audiences\nSimple interaction - so we get lots of it\nSmall number of “results”\n
\n
XFactor - over 1M installs, 260 Million boos/claps\n
No way to scale MySQL for single counter write\nHybrid memcache/mysql for values\nWhere to write the audit trail/log of what had happened?\nStep forward Acunu/Cassandra\n
Random partitioner. UUID type\nAnalytics by MySQL\nA write only database\n
E.g. too many connections from the web tier\n\n
\n
\n
Counters - moving production counts from MySQL to Cassandra\nSocial network - challenge if you don’t own the graph\n\n
Splaying writes is normal solution - push everyone’s updates to all their friends\nBut what about friends who aren’t there yet?\n
Cassandra as source of truth and destination for writes.\nMemcache as place to read from - holds social graphs, activity etc. Updated in parallel with Cassandra writes\nA lot of logic to deal with cache misses, and horizontal scaling of the cache\n
BGT used a memcache based counter with write-behind to MySQL\n
\n
Bug in older versions of memcached and pylibmc - now fixed\n
Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
Looked ideal for our needs - move counts out of memcache & MySQL\n
\n
\n
\n
\n
Now multiple levels of inconsistency:\n- Cassandra\n- Central memcache value\n- Sharded counter values on each webserver box\n\nWhat is “the truth”?\n
We saw crashes on too many connections, truncate behaviour etc\n
\n
\n
We have millions of records in the DB - and then counts etc. Are the two consistent?\nIf not, why not?\nWe’ve seen various issues including missing reads, counter values not consistent etc etc\n
Netflix, Rackspace... everyone writes a tool\nTook us a couple of weeks to be able to backup and restore our cluster successfully\n - and another week to figure out whether the data was the same\n
\n
Bursty loads - we need to scale both ways\nMonitoring - we struggle generally with monitoring/alerting/graphing\nBackup & restore to smaller clusters - see Priam from Netflix\nAnalytics - we’ve hit the wall on the get_range() approach\n