M6d cassandra summit

Increasing Your Prospects: Cassandra in
Online Advertising
Let 'em know: #cassandra12

© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential

A little about what we do


Impressions look like…


A High Level look at RTB

1. Browsers visit Publishers and create
2. impressions. sell impressions via Exchanges.
Publishers
3. Exchanges serve as auction houses for the
impressions.
4. M6d bids on impression. If we in we display an
ad.


Key Cassandra features
• Horizontal scalability
●
More nodes more storage
●
More nodes more throughput
• Cassandra is a high availability solution
●
Almost all changes can be made at run time
●
Rolling updates
●
Survives node failures
• One configuration file


Key storage model features
• Type Validation give us creature comforts
 Help prevent insertion of bad data
– Columns named 'age' should be a number

 Make data easier to read and write for end users
 Encourage/Enforce storage in terse format
– Store 478 as 478 not “478”
• Rows do not need to have fixed columns
• Writes do not read
• Optimal for set/get/slice operations


Things I have learned on the presentation
circuit
• Gratuitous use of Meme Generator (tx Nathan)
• Gratuitous buzzwords for maximum tweet-ability
●
Big Data
●
Real Time analytics
●
Cloud
●
Web scale
• Make prolific statements that contradict current software
trends (tx Dean)

• Attempted Prolific Statement: Transactions and locking are
highly overrated


Signal De-duplication and
frequency capping
• Solution must be “web-scale”
●
billions of users
●
one->thousands of events per user
• Solution must record events
• Do not store the same event N times a minute
●
Control data growth
– Spiders, nagios, pathological cases
– Small statistical difference in signal
●
An action 10 times a day vs 1 time a minute


What this would look like


'?' Solution with transactions
and locking

●
Likely need scalable
redundant lock layer
●
Built in locks are not free
●
Lots of code
●
Lots of sockets
●
Likely need to read to write
●
Results in more nodes or
caching layer for disk io


Remember with Cassandra...
• Rows have one to many columns
• Column is composed of { name, value, timstamp }
●
If two columns have the same name > timestamp wins
• Memtables absorb overwrites
• Writes are fast
●
Sorted structure in memory
●
Commit log to disk
• Log-structured storage prunes old values and deletes
• No reads on write path


12

Cassandr'ified solution


Consistent Hashing distributes data

●
Random Partitioner rows keys are MD5 to locate node
– Results in even distribution of rows across nodes
– Limits/Removes hot spots
●
Big Data is not so big when you have N nodes attack it
* Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data


Memtables absorb overwrites

●
Memtables give de-duplication for free
– Large memtable has larger chance of absorbing a write
●
This solves our original requirement:
– Do not store the same event N-times per interval
●
Worst-case data written to disk N-times and compacted away
●
Automatically de-duplicate on read with last-update-wins rule

Casandra & stream processing as an
alternative to ETL
●
ETL (Extract,Transform,Load) is a useful paradigm
●
Batch process can be obtuse
– Processes with long startup
– Little support for Appends, inserts, updates
– Throughput issues for small files
●
Difficult for small windows of time
●
Overhead from MapReduce
●
Sample scenario breakdown of state, city, and count


City, State, count(1) in ETL system

●
Several phases / copies
●
Storing the entire log to build/rebuild aggregation
●
Difficult to do on small intervals
●
Needs scheduling, needs log push system


City, State, count(1) stream system

●
Could use Cassandra's counter feature directly
●
Added Apache Kafka layer
●
Decouples producers and consumers
●
Allows message replay
●
Allows backlog and recover from failures (never happens btw)
●
Near real time


An application to search logs
●
In 2008 this article sold
me on map reduce
●
Take logs from all servers
●
Put them into hadoop
●
Generate lucene indexes
●
Load into sharded SOLR
cluster on interval


Pseudo diagram of solution

●
Process to get files from
servers into hadoop
●
MapReduce process to build
indexes
●
Embedded SOLR on Hadoop
Datanodes

* Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928


But now its the future!
●
Every component or layer of an architecture is another
thing document and manage
●
DataStax has built SOLR into Cassandra
●
Applications can write to solr/cassandra directly
●
Applications can read solr/cassandra directly


Ah ha! moment

●
Determined the rackspace log application could be done
with simple pieces
●
Someone called it Taco Bell Programming
'The more I write code and design systems, the more I
understand that many times, you can achieve the desired
functionality simply with clever reconfigurations of the basic
Unix tool set. After all, functionality is an asset, but code is a
liability.
●
Cassandra is my main taco ingredient


Prolific statement: Design stuff
with less arrows
●
More layers/components
●
Batch driven

●
Less layers/components
●
Low latency


Solr has wide adoption
●
Clients for many programming languages
●
Many hip JQuery Ajax widgets and stuff
●
Open source Reuters Ajax Solr demo worked seamlessly with
cassandra/solr
●
Implemented Rackspace like solution with small code


Game Changer: Compression
●
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
●
Compress 1K bytes with Zippy 3,000 ns
●
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
●
Read 4K randomly from SSD* 150,000 ns 0.15 ms
●
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
●
Round trip within same datacenter 500,000 ns 0.5 ms
●
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory
●
Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
●
Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD

Source: https://gist.github.com/2841832

Why compression helps
●
Compressed data is smaller on disk
●
If we compress data more fits in RAM and is cached

●
Rotational disks:
●
Rotational disks have very slow seeks
●
RAM not used by process with cache disk

●
Solid State Disks do seek faster then rotational
●
But they are more expensive then rotationa l


Enabling Compression
●
Rolling update to Cassandra
●
update column family my_stuff with
compression_options={sstable_compression:SnappyCompresso
r, chunk_length_kb:64};
●
bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff

●
68 GB of data shrinks to 36


Compression in action
●
Disk activity reduced drastically as more/all data fit in cache

●
Better performance
●
Disks that spin less should last longer


Compression lessons
●
Creates extra CPU usage (but not really much)
●
Creates more young gen garbage (some)
●
Anecdotal experimentation with chunk_length_kb
●
64KB is good for sparse less frequent tables
●
16KB had same compression ratio and made less garbage
●
Found 4KB to be less effective then 16KB
●
This is easy to experiment with


We have reached the point of the
presentation where we...


Hate on everything not Cassandra


Cassandra's uptime story
●
Main cluster in continuous operation since 8/6/11
●
Doubled physical nodes in the cluster
●
Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7
●
Rolling reboot kernel update, 1 for leap second
●
No maintenance windows
●
Let's compare Cassandra with other things I use/used


Cassandra vs MySQL master/slave...

MySQL Cassandra
Replication Single thread, binlogs, Per operation
manual recovery
Scaling Add more nodes, initial Bootstrap new
sync, setup replication, Cassandra node, re-
configure applications balance off-peak
Consistency Applications that care Per operation
read master, or
application check
status of replication
Backup Mysqldump/LVM Sstabletojson |
snapshot snapshot
Restore Re-insert Copy files into place
everything/Restore
snapshot


So with mysql...
●
Replication breaking often
●
requiring manual intervention for many fixes
●
Blocking writes for 30 minutes to add a column to a table
●
Scale up to big iron then...
●
Restart takes 30 minutes to fsck all disks
●
Applications needing to be coded with state aware logic
●
Which node should I query?
●
Is replication behind?
●
Is there some merge table trickery going on?


Cassandra vs Memcache

Memcache Cassandra
Replication None (client managed) Per operation
Scaling None (client managed) Grow or shrink without
bad reads
Consistency Yes (and really no) Per operation
Backup No persistence sstabletojson|snapshot
Restore No persistence Cache warming


So memcache is...
●
Not persistent
●
Not clear on sharding
●
Not clear on failure modes
●
Actual experiences with memcache
●
Memcache client was not sharding requests evenly. 60 % were going to
node 1..
●
We lost rack with 40% of the memcache nodes
– Site went to crawl as DB's were overloaded
– took 1 hour to warm up again


Cassandra vs DRBD

DRBD Cassandra
Replication 1 or 2 nodes per block Per operation
Scaling No scaling. Just more Grow or shrink
availability. dynamically
Consistency Sync modes change Per operation
failure consistency,
deadtime between flip-
flops
Backup Like a disk sstabletojson|snapshot
Restore Like a disk Like a disk


So DRBD is...
●
A 30 second to 1 minute fail over/outage
●
An alert that might wake you up
●
but hopefully allows you to sleep again
●
Handcuffed to linux-ha/keepalived etc
●
Making it an involved setup
●
Making it involved to troubleshoot
●
Might need a crossover cable or dedicated network
●
cpu/network intensive with very active disks
●
Can successfully fail over a data file in an inconsistent state


Cassandra vs HDFS

Hadoop Cassandra
Replication Per file Per operation
Scaling Add nodes Add nodes

Consistency Very, to the point Per operation
getting data in
becomes difficult
Backup Distcp sstabletojson|snapshot
Restore Distcp Like a disk


So HDFS...
●
Comes up with about 4 or 5 reasons a year for master node/
full cluster restart
●
Grow NameNode heap
●
Enable jobtracker setting to stop 100,000 task jobs
●
Enabled/updated trash feature (off by default)
●
Forced to do a fail over by hardware fault
●
Random DRBD/Kernel brain fart
●
Need to update a JVM/kernel eventually
●
Now finally new versions have HA NameNode
●
Running jobs lose progress will not automatically restart


Questions?


M6d cassandra summit

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (18)

Semelhante a M6d cassandra summit

Semelhante a M6d cassandra summit (20)

Mais de Edward Capriolo

Mais de Edward Capriolo (16)

Último

Último (20)

M6d cassandra summit