Mais conteúdo relacionado
Semelhante a M6d cassandra summit (20)
Mais de Edward Capriolo (16)
M6d cassandra summit
- 1. Increasing Your Prospects: Cassandra in
Online Advertising
Let 'em know: #cassandra12
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 2. A little about what we do
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 4. A High Level look at RTB
1. Browsers visit Publishers and create
2. impressions. sell impressions via Exchanges.
Publishers
3. Exchanges serve as auction houses for the
impressions.
4. M6d bids on impression. If we in we display an
ad.
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 5. Key Cassandra features
• Horizontal scalability
●
More nodes more storage
●
More nodes more throughput
• Cassandra is a high availability solution
●
Almost all changes can be made at run time
●
Rolling updates
●
Survives node failures
• One configuration file
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 6. Key storage model features
• Type Validation give us creature comforts
Help prevent insertion of bad data
– Columns named 'age' should be a number
Make data easier to read and write for end users
Encourage/Enforce storage in terse format
– Store 478 as 478 not “478”
• Rows do not need to have fixed columns
• Writes do not read
• Optimal for set/get/slice operations
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 7. Things I have learned on the presentation
circuit
• Gratuitous use of Meme Generator (tx Nathan)
• Gratuitous buzzwords for maximum tweet-ability
●
Big Data
●
Real Time analytics
●
Cloud
●
Web scale
• Make prolific statements that contradict current software
trends (tx Dean)
• Attempted Prolific Statement: Transactions and locking are
highly overrated
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 8. Signal De-duplication and
frequency capping
• Solution must be “web-scale”
●
billions of users
●
one->thousands of events per user
• Solution must record events
• Do not store the same event N times a minute
●
Control data growth
– Spiders, nagios, pathological cases
– Small statistical difference in signal
●
An action 10 times a day vs 1 time a minute
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 9. What this would look like
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 10. '?' Solution with transactions
and locking
●
Likely need scalable
redundant lock layer
●
Built in locks are not free
●
Lots of code
●
Lots of sockets
●
Likely need to read to write
●
Results in more nodes or
caching layer for disk io
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 11. Remember with Cassandra...
• Rows have one to many columns
• Column is composed of { name, value, timstamp }
●
If two columns have the same name > timestamp wins
• Memtables absorb overwrites
• Writes are fast
●
Sorted structure in memory
●
Commit log to disk
• Log-structured storage prunes old values and deletes
• No reads on write path
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 12. 12
Cassandr'ified solution
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 13. Consistent Hashing distributes data
●
Random Partitioner rows keys are MD5 to locate node
– Results in even distribution of rows across nodes
– Limits/Removes hot spots
●
Big Data is not so big when you have N nodes attack it
* Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 14. Memtables absorb overwrites
●
Memtables give de-duplication for free
– Large memtable has larger chance of absorbing a write
●
This solves our original requirement:
– Do not store the same event N-times per interval
●
Worst-case data written to disk N-times and compacted away
●
Automatically de-duplicate on read with last-update-wins rule
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 15. Casandra & stream processing as an
alternative to ETL
●
ETL (Extract,Transform,Load) is a useful paradigm
●
Batch process can be obtuse
– Processes with long startup
– Little support for Appends, inserts, updates
– Throughput issues for small files
●
Difficult for small windows of time
●
Overhead from MapReduce
●
Sample scenario breakdown of state, city, and count
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 16. City, State, count(1) in ETL system
●
Several phases / copies
●
Storing the entire log to build/rebuild aggregation
●
Difficult to do on small intervals
●
Needs scheduling, needs log push system
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 17. City, State, count(1) stream system
●
Could use Cassandra's counter feature directly
●
Added Apache Kafka layer
●
Decouples producers and consumers
●
Allows message replay
●
Allows backlog and recover from failures (never happens btw)
●
Near real time
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 18. An application to search logs
●
In 2008 this article sold
me on map reduce
●
Take logs from all servers
●
Put them into hadoop
●
Generate lucene indexes
●
Load into sharded SOLR
cluster on interval
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 19. Pseudo diagram of solution
●
Process to get files from
servers into hadoop
●
MapReduce process to build
indexes
●
Embedded SOLR on Hadoop
Datanodes
* Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 20. But now its the future!
●
Every component or layer of an architecture is another
thing document and manage
●
DataStax has built SOLR into Cassandra
●
Applications can write to solr/cassandra directly
●
Applications can read solr/cassandra directly
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 21. Ah ha! moment
●
Determined the rackspace log application could be done
with simple pieces
●
Someone called it Taco Bell Programming
'The more I write code and design systems, the more I
understand that many times, you can achieve the desired
functionality simply with clever reconfigurations of the basic
Unix tool set. After all, functionality is an asset, but code is a
liability.
●
Cassandra is my main taco ingredient
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 22. Prolific statement: Design stuff
with less arrows
●
More layers/components
●
Batch driven
●
Less layers/components
●
Low latency
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 23. Solr has wide adoption
●
Clients for many programming languages
●
Many hip JQuery Ajax widgets and stuff
●
Open source Reuters Ajax Solr demo worked seamlessly with
cassandra/solr
●
Implemented Rackspace like solution with small code
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 24. Game Changer: Compression
●
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
●
Compress 1K bytes with Zippy 3,000 ns
●
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
●
Read 4K randomly from SSD* 150,000 ns 0.15 ms
●
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
●
Round trip within same datacenter 500,000 ns 0.5 ms
●
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory
●
Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
●
Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD
Source: https://gist.github.com/2841832
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 25. Why compression helps
●
Compressed data is smaller on disk
●
If we compress data more fits in RAM and is cached
●
Rotational disks:
●
Rotational disks have very slow seeks
●
RAM not used by process with cache disk
●
Solid State Disks do seek faster then rotational
●
But they are more expensive then rotationa l
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 26. Enabling Compression
●
Rolling update to Cassandra
●
update column family my_stuff with
compression_options={sstable_compression:SnappyCompresso
r, chunk_length_kb:64};
●
bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff
●
68 GB of data shrinks to 36
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 27. Compression in action
●
Disk activity reduced drastically as more/all data fit in cache
●
Better performance
●
Disks that spin less should last longer
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 28. Compression lessons
●
Creates extra CPU usage (but not really much)
●
Creates more young gen garbage (some)
●
Anecdotal experimentation with chunk_length_kb
●
64KB is good for sparse less frequent tables
●
16KB had same compression ratio and made less garbage
●
Found 4KB to be less effective then 16KB
●
This is easy to experiment with
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 29. We have reached the point of the
presentation where we...
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 30. Hate on everything not Cassandra
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 31. Cassandra's uptime story
●
Main cluster in continuous operation since 8/6/11
●
Doubled physical nodes in the cluster
●
Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7
●
Rolling reboot kernel update, 1 for leap second
●
No maintenance windows
●
Let's compare Cassandra with other things I use/used
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 32. Cassandra vs MySQL master/slave...
MySQL Cassandra
Replication Single thread, binlogs, Per operation
manual recovery
Scaling Add more nodes, initial Bootstrap new
sync, setup replication, Cassandra node, re-
configure applications balance off-peak
Consistency Applications that care Per operation
read master, or
application check
status of replication
Backup Mysqldump/LVM Sstabletojson |
snapshot snapshot
Restore Re-insert Copy files into place
everything/Restore
snapshot
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 33. So with mysql...
●
Replication breaking often
●
requiring manual intervention for many fixes
●
Blocking writes for 30 minutes to add a column to a table
●
Scale up to big iron then...
●
Restart takes 30 minutes to fsck all disks
●
Applications needing to be coded with state aware logic
●
Which node should I query?
●
Is replication behind?
●
Is there some merge table trickery going on?
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 34. Cassandra vs Memcache
Memcache Cassandra
Replication None (client managed) Per operation
Scaling None (client managed) Grow or shrink without
bad reads
Consistency Yes (and really no) Per operation
Backup No persistence sstabletojson|snapshot
Restore No persistence Cache warming
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 35. So memcache is...
●
Not persistent
●
Not clear on sharding
●
Not clear on failure modes
●
Actual experiences with memcache
●
Memcache client was not sharding requests evenly. 60 % were going to
node 1..
●
We lost rack with 40% of the memcache nodes
– Site went to crawl as DB's were overloaded
– took 1 hour to warm up again
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 36. Cassandra vs DRBD
DRBD Cassandra
Replication 1 or 2 nodes per block Per operation
Scaling No scaling. Just more Grow or shrink
availability. dynamically
Consistency Sync modes change Per operation
failure consistency,
deadtime between flip-
flops
Backup Like a disk sstabletojson|snapshot
Restore Like a disk Like a disk
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 37. So DRBD is...
●
A 30 second to 1 minute fail over/outage
●
An alert that might wake you up
●
but hopefully allows you to sleep again
●
Handcuffed to linux-ha/keepalived etc
●
Making it an involved setup
●
Making it involved to troubleshoot
●
Might need a crossover cable or dedicated network
●
cpu/network intensive with very active disks
●
Can successfully fail over a data file in an inconsistent state
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 38. Cassandra vs HDFS
Hadoop Cassandra
Replication Per file Per operation
Scaling Add nodes Add nodes
Consistency Very, to the point Per operation
getting data in
becomes difficult
Backup Distcp sstabletojson|snapshot
Restore Distcp Like a disk
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
- 39. So HDFS...
●
Comes up with about 4 or 5 reasons a year for master node/
full cluster restart
●
Grow NameNode heap
●
Enable jobtracker setting to stop 100,000 task jobs
●
Enabled/updated trash feature (off by default)
●
Forced to do a fail over by hardware fault
●
Random DRBD/Kernel brain fart
●
Need to update a JVM/kernel eventually
●
Now finally new versions have HA NameNode
●
Running jobs lose progress will not automatically restart
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential