SlideShare uma empresa Scribd logo
1 de 40
Increasing Your Prospects: Cassandra in
                      Online Advertising
                                                                          Let 'em know: #cassandra12




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
A little about what we do




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Impressions look like…




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
A High Level look at RTB




           1.    Browsers visit Publishers and create
           2. impressions. sell impressions via Exchanges.
                 Publishers
           3.    Exchanges serve as auction houses for the
              impressions.
           4.                           M6d bids on impression. If we in we display an
                          ad.

© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Key Cassandra features
         •          Horizontal scalability
                     ●
                        More nodes more storage
                          ●
                                    More nodes more throughput
         •          Cassandra is a high availability solution
                     ●
                       Almost all changes can be made at run time
                          ●
                                    Rolling updates
                          ●
                                    Survives node failures
         •          One configuration file




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Key storage model features
         •          Type Validation give us creature comforts
                      Help prevent insertion of bad data
                         – Columns named 'age' should be a number

                        Make data easier to read and write for end users
                        Encourage/Enforce storage in terse format
                                         –        Store 478 as 478 not “478”
         •          Rows do not need to have fixed columns
         •          Writes do not read
         •          Optimal for set/get/slice operations




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Things I have learned on the presentation
                                          circuit
         •          Gratuitous use of Meme Generator (tx Nathan)
         •          Gratuitous buzzwords for maximum tweet-ability
                     ●
                        Big Data
                          ●
                                    Real Time analytics
                          ●
                                    Cloud
                          ●
                                    Web scale
         •          Make prolific statements that contradict current software
                    trends (tx Dean)


         •          Attempted Prolific Statement: Transactions and locking are
                    highly overrated



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Signal De-duplication and
                                                                   frequency capping
         •          Solution must be “web-scale”
                     ●
                        billions of users
                          ●
                                    one->thousands of events per user
         •          Solution must record events
         •          Do not store the same event N times a minute
                          ●
                                    Control data growth
                                         –        Spiders, nagios, pathological cases
                                         –        Small statistical difference in signal
                                                        ●
                                                               An action 10 times a day vs 1 time a minute




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
What this would look like




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
'?' Solution with transactions
                                                                     and locking

                                                                           ●
                                                                               Likely need scalable
                                                                               redundant lock layer
                                                                               ●
                                                                                   Built in locks are not free
                                                                           ●
                                                                               Lots of code
                                                                           ●
                                                                               Lots of sockets
                                                                           ●
                                                                               Likely need to read to write
                                                                               ●
                                                                                   Results in more nodes or
                                                                                   caching layer for disk io




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Remember with Cassandra...
         •          Rows have one to many columns
         •          Column is composed of { name, value, timstamp }
                     ●
                        If two columns have the same name > timestamp wins
         •          Memtables absorb overwrites
         •          Writes are fast
                          ●
                                    Sorted structure in memory
                          ●
                                    Commit log to disk
         •          Log-structured storage prunes old values and deletes
         •          No reads on write path




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
12


                                                                               Cassandr'ified solution




     © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Consistent Hashing distributes data




                       ●
                                 Random Partitioner rows keys are MD5 to locate node
                                      –        Results in even distribution of rows across nodes
                                      –        Limits/Removes hot spots
                       ●
                                 Big Data is not so big when you have N nodes attack it
                  * Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Memtables absorb overwrites




                       ●
                                 Memtables give de-duplication for free
                                      –        Large memtable has larger chance of absorbing a write
                       ●
                                 This solves our original requirement:
                                      –        Do not store the same event N-times per interval
                       ●
                                 Worst-case data written to disk N-times and compacted away
                       ●
                                 Automatically de-duplicate on read with last-update-wins rule
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Casandra & stream processing as an
                                            alternative to ETL
                       ●
                                 ETL (Extract,Transform,Load) is a useful paradigm
                       ●
                                 Batch process can be obtuse
                                      –        Processes with long startup
                                      –        Little support for Appends, inserts, updates
                                      –        Throughput issues for small files
                       ●
                                 Difficult for small windows of time
                       ●
                                 Overhead from MapReduce
                       ●
                                 Sample scenario breakdown of state, city, and count




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
City, State, count(1) in ETL system




         ●
                    Several phases / copies
         ●
                    Storing the entire log to build/rebuild aggregation
         ●
                    Difficult to do on small intervals
         ●
                    Needs scheduling, needs log push system




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
City, State, count(1) stream system




           ●
                     Could use Cassandra's counter feature directly
           ●
                     Added Apache Kafka layer
                        ●
                                  Decouples producers and consumers
                        ●
                                  Allows message replay
                        ●
                                  Allows backlog and recover from failures (never happens btw)
                        ●
                                  Near real time



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
An application to search logs
                                                                          ●
                                                                              In 2008 this article sold
                                                                              me on map reduce
                                                                          ●
                                                                              Take logs from all servers
                                                                          ●
                                                                              Put them into hadoop
                                                                          ●
                                                                              Generate lucene indexes
                                                                          ●
                                                                              Load into sharded SOLR
                                                                              cluster on interval




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Pseudo diagram of solution



                                                                            ●
                                                                                Process to get files from
                                                                                servers into hadoop
                                                                            ●
                                                                                MapReduce process to build
                                                                                indexes
                                                                            ●
                                                                                Embedded SOLR on Hadoop
                                                                                Datanodes




* Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
But now its the future!
                       ●
                                 Every component or layer of an architecture is another
                                 thing document and manage
                       ●
                                 DataStax has built SOLR into Cassandra
                       ●
                                 Applications can write to solr/cassandra directly
                       ●
                                 Applications can read solr/cassandra directly




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Ah ha! moment

          ●
                    Determined the rackspace log application could be done
                    with simple pieces
          ●
                    Someone called it Taco Bell Programming
                    'The more I write code and design systems, the more I
                    understand that many times, you can achieve the desired
                    functionality simply with clever reconfigurations of the basic
                    Unix tool set. After all, functionality is an asset, but code is a
                    liability.
          ●
                    Cassandra is my main taco ingredient




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Prolific statement: Design stuff
                                                              with less arrows
          ●
                    More layers/components
          ●
                    Batch driven




         ●
                    Less layers/components
         ●
                    Low latency




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Solr has wide adoption
         ●
                    Clients for many programming languages
         ●
                    Many hip JQuery Ajax widgets and stuff
         ●
                    Open source Reuters Ajax Solr demo worked seamlessly with
                    cassandra/solr
         ●
                    Implemented Rackspace like solution with small code




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Game Changer: Compression
         ●
                    Main memory reference 100 ns 20x L2 cache, 200x L1 cache
         ●
                    Compress 1K bytes with Zippy 3,000 ns
         ●
                    Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
         ●
                    Read 4K randomly from SSD* 150,000 ns 0.15 ms
         ●
                    Read 1 MB sequentially from memory 250,000 ns 0.25 ms
         ●
                    Round trip within same datacenter 500,000 ns 0.5 ms
         ●
                    Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory
         ●
                    Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
         ●
                    Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD




                                                              Source: https://gist.github.com/2841832
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Why compression helps
         ●
                    Compressed data is smaller on disk
         ●
                    If we compress data more fits in RAM and is cached


         ●
                    Rotational disks:
                       ●
                                 Rotational disks have very slow seeks
                       ●
                                 RAM not used by process with cache disk


         ●
                    Solid State Disks do seek faster then rotational
                       ●
                                 But they are more expensive then rotationa l




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Enabling Compression
         ●
                    Rolling update to Cassandra
         ●
                    update column family my_stuff with
                    compression_options={sstable_compression:SnappyCompresso
                    r, chunk_length_kb:64};
         ●
                    bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff




         ●
                    68 GB of data shrinks to 36


© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Compression in action
         ●
                    Disk activity reduced drastically as more/all data fit in cache




         ●
                    Better performance
         ●
                    Disks that spin less should last longer




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Compression lessons
         ●
                    Creates extra CPU usage (but not really much)
         ●
                    Creates more young gen garbage (some)
         ●
                    Anecdotal experimentation with chunk_length_kb
                       ●
                                 64KB is good for sparse less frequent tables
                       ●
                                 16KB had same compression ratio and made less garbage
                       ●
                                 Found 4KB to be less effective then 16KB
         ●
                    This is easy to experiment with




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
We have reached the point of the
                                            presentation where we...




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Hate on everything not Cassandra




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra's uptime story
                       ●
                                 Main cluster in continuous operation since 8/6/11
                       ●
                                 Doubled physical nodes in the cluster
                       ●
                                 Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7
                       ●
                                 Rolling reboot kernel update, 1 for leap second
                       ●
                                 No maintenance windows
                       ●
                                 Let's compare Cassandra with other things I use/used




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs MySQL master/slave...

                                                                          MySQL                      Cassandra
               Replication                                                Single thread, binlogs,    Per operation
                                                                          manual recovery
               Scaling                                                    Add more nodes, initial    Bootstrap new
                                                                          sync, setup replication,   Cassandra node, re-
                                                                          configure applications     balance off-peak
               Consistency                                                Applications that care     Per operation
                                                                          read master, or
                                                                          application check
                                                                          status of replication
               Backup                                                     Mysqldump/LVM              Sstabletojson |
                                                                          snapshot                   snapshot
               Restore                                                    Re-insert                  Copy files into place
                                                                          everything/Restore
                                                                          snapshot




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So with mysql...
         ●
                    Replication breaking often
                       ●
                                 requiring manual intervention for many fixes
         ●
                    Blocking writes for 30 minutes to add a column to a table
         ●
                    Scale up to big iron then...
                       ●
                                 Restart takes 30 minutes to fsck all disks
         ●
                    Applications needing to be coded with state aware logic
                       ●
                                 Which node should I query?
                       ●
                                 Is replication behind?
                       ●
                                 Is there some merge table trickery going on?




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs Memcache

                                                                          Memcache                Cassandra
               Replication                                                None (client managed)   Per operation
               Scaling                                                    None (client managed)   Grow or shrink without
                                                                                                  bad reads
               Consistency                                                Yes (and really no)     Per operation
               Backup                                                     No persistence          sstabletojson|snapshot
               Restore                                                    No persistence          Cache warming




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So memcache is...
         ●
                    Not persistent
         ●
                    Not clear on sharding
         ●
                    Not clear on failure modes
         ●
                    Actual experiences with memcache
                       ●
                                 Memcache client was not sharding requests evenly. 60 % were going to
                                 node 1..
                       ●
                                 We lost rack with 40% of the memcache nodes
                                      –        Site went to crawl as DB's were overloaded
                                      –        took 1 hour to warm up again




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs DRBD

                                                                            DRBD                     Cassandra
               Replication                                                  1 or 2 nodes per block   Per operation
               Scaling                                                      No scaling. Just more    Grow or shrink
                                                                            availability.            dynamically
               Consistency                                                  Sync modes change        Per operation
                                                                            failure consistency,
                                                                            deadtime between flip-
                                                                            flops
               Backup                                                       Like a disk              sstabletojson|snapshot
               Restore                                                      Like a disk              Like a disk




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So DRBD is...
         ●
                    A 30 second to 1 minute fail over/outage
         ●
                    An alert that might wake you up
                       ●
                                 but hopefully allows you to sleep again
         ●
                    Handcuffed to linux-ha/keepalived etc
                       ●
                                 Making it an involved setup
                       ●
                                 Making it involved to troubleshoot
         ●
                    Might need a crossover cable or dedicated network
         ●
                    cpu/network intensive with very active disks
         ●
                    Can successfully fail over a data file in an inconsistent state




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs HDFS

                                                                           Hadoop               Cassandra
               Replication                                                 Per file             Per operation
               Scaling                                                     Add nodes            Add nodes

               Consistency                                                 Very, to the point   Per operation
                                                                           getting data in
                                                                           becomes difficult
               Backup                                                      Distcp               sstabletojson|snapshot
               Restore                                                     Distcp               Like a disk




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So HDFS...
         ●
                    Comes up with about 4 or 5 reasons a year for master node/
                    full cluster restart
                       ●
                                 Grow NameNode heap
                       ●
                                 Enable jobtracker setting to stop 100,000 task jobs
                       ●
                                 Enabled/updated trash feature (off by default)
                       ●
                                 Forced to do a fail over by hardware fault
                       ●
                                 Random DRBD/Kernel brain fart
                       ●
                                 Need to update a JVM/kernel eventually
         ●
                    Now finally new versions have HA NameNode
         ●
                    Running jobs lose progress will not automatically restart




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Questions?




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential

Mais conteúdo relacionado

Destaque

Working progress preliminary task
Working progress preliminary taskWorking progress preliminary task
Working progress preliminary task
aq101824
 
The civil war, lincoln, lee
The civil war, lincoln, leeThe civil war, lincoln, lee
The civil war, lincoln, lee
ms_faris
 
Slides open stack emily_updated_2
Slides open stack emily_updated_2Slides open stack emily_updated_2
Slides open stack emily_updated_2
OpenCity Community
 
Exposicion redes sociales, buscadores, correos y paginas
Exposicion redes sociales, buscadores, correos y paginasExposicion redes sociales, buscadores, correos y paginas
Exposicion redes sociales, buscadores, correos y paginas
Saida Lopez
 
Alive Day - series of features
Alive Day - series of featuresAlive Day - series of features
Alive Day - series of features
Ann Knabe
 

Destaque (18)

Parking
ParkingParking
Parking
 
Working progress preliminary task
Working progress preliminary taskWorking progress preliminary task
Working progress preliminary task
 
The civil war, lincoln, lee
The civil war, lincoln, leeThe civil war, lincoln, lee
The civil war, lincoln, lee
 
Slides open stack emily_updated_2
Slides open stack emily_updated_2Slides open stack emily_updated_2
Slides open stack emily_updated_2
 
Kanji from the Start - Unit 1 p12 spelling test
Kanji from the Start - Unit 1 p12 spelling testKanji from the Start - Unit 1 p12 spelling test
Kanji from the Start - Unit 1 p12 spelling test
 
General Quiz (Finals) | Elixir '12
General Quiz (Finals) | Elixir '12General Quiz (Finals) | Elixir '12
General Quiz (Finals) | Elixir '12
 
Pt 4
Pt 4Pt 4
Pt 4
 
長野市放課後子ども総合プラン有料化の方針
長野市放課後子ども総合プラン有料化の方針長野市放課後子ども総合プラン有料化の方針
長野市放課後子ども総合プラン有料化の方針
 
C 4
C 4C 4
C 4
 
Exposicion redes sociales, buscadores, correos y paginas
Exposicion redes sociales, buscadores, correos y paginasExposicion redes sociales, buscadores, correos y paginas
Exposicion redes sociales, buscadores, correos y paginas
 
Jewelk
JewelkJewelk
Jewelk
 
Offshore Operations Maintenance[2]
Offshore Operations Maintenance[2]Offshore Operations Maintenance[2]
Offshore Operations Maintenance[2]
 
A guide to selling and buying a business 1.0
A guide to selling and buying a business 1.0A guide to selling and buying a business 1.0
A guide to selling and buying a business 1.0
 
Panorama economy 12 aprile 2012
Panorama economy 12 aprile 2012 Panorama economy 12 aprile 2012
Panorama economy 12 aprile 2012
 
Новогодний счастливый купон
Новогодний счастливый купонНовогодний счастливый купон
Новогодний счастливый купон
 
Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...
Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...
Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...
 
Alive Day - series of features
Alive Day - series of featuresAlive Day - series of features
Alive Day - series of features
 
Diff systemverilogc
Diff systemverilogcDiff systemverilogc
Diff systemverilogc
 

Semelhante a M6d cassandra summit

Master agile development and testing
Master agile development and testingMaster agile development and testing
Master agile development and testing
vmglover
 
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
AdaLabs
 
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql ClusterSanto Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Santo Leto
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
DATAVERSITY
 
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
MongoDB
 
Enabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series DataEnabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series Data
InfluxData
 

Semelhante a M6d cassandra summit (20)

Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
 
Cloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for CreatorsCloud as a Flexible & Collaborative Tool for Creators
Cloud as a Flexible & Collaborative Tool for Creators
 
Data distribution in the cloud with Node.js
Data distribution in the cloud with Node.jsData distribution in the cloud with Node.js
Data distribution in the cloud with Node.js
 
Master agile development and testing
Master agile development and testingMaster agile development and testing
Master agile development and testing
 
Mobile Development Meets Semantic Technology
Mobile Development Meets Semantic TechnologyMobile Development Meets Semantic Technology
Mobile Development Meets Semantic Technology
 
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
 
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql ClusterSanto Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
“Startup - it’s not just an IT project” - a random sampling of problems we’ve...
“Startup - it’s not just an IT project” - a random sampling of problems we’ve...“Startup - it’s not just an IT project” - a random sampling of problems we’ve...
“Startup - it’s not just an IT project” - a random sampling of problems we’ve...
 
Australian CIO Summit 2012: Architecting a Secure Castle in the Clouds by Dr ...
Australian CIO Summit 2012: Architecting a Secure Castle in the Clouds by Dr ...Australian CIO Summit 2012: Architecting a Secure Castle in the Clouds by Dr ...
Australian CIO Summit 2012: Architecting a Secure Castle in the Clouds by Dr ...
 
Dynomite - PerconaLive 2017
Dynomite  - PerconaLive 2017Dynomite  - PerconaLive 2017
Dynomite - PerconaLive 2017
 
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
 
2011-05-22 Domain Driven Design
2011-05-22 Domain Driven Design2011-05-22 Domain Driven Design
2011-05-22 Domain Driven Design
 
2011-05-22 Domain Driven Design
2011-05-22 Domain Driven Design2011-05-22 Domain Driven Design
2011-05-22 Domain Driven Design
 
Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...
Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...
Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...
 
Feedback on DDD Europe - short -event storming.pptx
Feedback on DDD Europe - short -event storming.pptxFeedback on DDD Europe - short -event storming.pptx
Feedback on DDD Europe - short -event storming.pptx
 
Writing GREAT Agile User Stories
Writing GREAT Agile User StoriesWriting GREAT Agile User Stories
Writing GREAT Agile User Stories
 
Dutch entrepreneurs visiting twago in Berlin
Dutch entrepreneurs visiting twago in BerlinDutch entrepreneurs visiting twago in Berlin
Dutch entrepreneurs visiting twago in Berlin
 
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
 
Enabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series DataEnabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series Data
 

Mais de Edward Capriolo

Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Mais de Edward Capriolo (16)

Nibiru: Building your own NoSQL store
Nibiru: Building your own NoSQL storeNibiru: Building your own NoSQL store
Nibiru: Building your own NoSQL store
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Cassandra4hadoop
Cassandra4hadoopCassandra4hadoop
Cassandra4hadoop
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Cassandra NoSQL Lan party
Cassandra NoSQL Lan partyCassandra NoSQL Lan party
Cassandra NoSQL Lan party
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Breaking first-normal form with Hive
Breaking first-normal form with HiveBreaking first-normal form with Hive
Breaking first-normal form with Hive
 
Casbase presentation
Casbase presentationCasbase presentation
Casbase presentation
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best Practices
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIve
 
Cli deep dive
Cli deep diveCli deep dive
Cli deep dive
 
Cassandra as Memcache
Cassandra as MemcacheCassandra as Memcache
Cassandra as Memcache
 
Counters for real-time statistics
Counters for real-time statisticsCounters for real-time statistics
Counters for real-time statistics
 
Real world capacity
Real world capacityReal world capacity
Real world capacity
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

M6d cassandra summit

  • 1. Increasing Your Prospects: Cassandra in Online Advertising Let 'em know: #cassandra12 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 2. A little about what we do © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 3. Impressions look like… © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 4. A High Level look at RTB 1. Browsers visit Publishers and create 2. impressions. sell impressions via Exchanges. Publishers 3. Exchanges serve as auction houses for the impressions. 4. M6d bids on impression. If we in we display an ad. © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 5. Key Cassandra features • Horizontal scalability ● More nodes more storage ● More nodes more throughput • Cassandra is a high availability solution ● Almost all changes can be made at run time ● Rolling updates ● Survives node failures • One configuration file © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 6. Key storage model features • Type Validation give us creature comforts  Help prevent insertion of bad data – Columns named 'age' should be a number  Make data easier to read and write for end users  Encourage/Enforce storage in terse format – Store 478 as 478 not “478” • Rows do not need to have fixed columns • Writes do not read • Optimal for set/get/slice operations © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 7. Things I have learned on the presentation circuit • Gratuitous use of Meme Generator (tx Nathan) • Gratuitous buzzwords for maximum tweet-ability ● Big Data ● Real Time analytics ● Cloud ● Web scale • Make prolific statements that contradict current software trends (tx Dean) • Attempted Prolific Statement: Transactions and locking are highly overrated © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 8. Signal De-duplication and frequency capping • Solution must be “web-scale” ● billions of users ● one->thousands of events per user • Solution must record events • Do not store the same event N times a minute ● Control data growth – Spiders, nagios, pathological cases – Small statistical difference in signal ● An action 10 times a day vs 1 time a minute © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 9. What this would look like © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 10. '?' Solution with transactions and locking ● Likely need scalable redundant lock layer ● Built in locks are not free ● Lots of code ● Lots of sockets ● Likely need to read to write ● Results in more nodes or caching layer for disk io © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 11. Remember with Cassandra... • Rows have one to many columns • Column is composed of { name, value, timstamp } ● If two columns have the same name > timestamp wins • Memtables absorb overwrites • Writes are fast ● Sorted structure in memory ● Commit log to disk • Log-structured storage prunes old values and deletes • No reads on write path © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 12. 12 Cassandr'ified solution © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 13. Consistent Hashing distributes data ● Random Partitioner rows keys are MD5 to locate node – Results in even distribution of rows across nodes – Limits/Removes hot spots ● Big Data is not so big when you have N nodes attack it * Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 14. Memtables absorb overwrites ● Memtables give de-duplication for free – Large memtable has larger chance of absorbing a write ● This solves our original requirement: – Do not store the same event N-times per interval ● Worst-case data written to disk N-times and compacted away ● Automatically de-duplicate on read with last-update-wins rule © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 15. Casandra & stream processing as an alternative to ETL ● ETL (Extract,Transform,Load) is a useful paradigm ● Batch process can be obtuse – Processes with long startup – Little support for Appends, inserts, updates – Throughput issues for small files ● Difficult for small windows of time ● Overhead from MapReduce ● Sample scenario breakdown of state, city, and count © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 16. City, State, count(1) in ETL system ● Several phases / copies ● Storing the entire log to build/rebuild aggregation ● Difficult to do on small intervals ● Needs scheduling, needs log push system © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 17. City, State, count(1) stream system ● Could use Cassandra's counter feature directly ● Added Apache Kafka layer ● Decouples producers and consumers ● Allows message replay ● Allows backlog and recover from failures (never happens btw) ● Near real time © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 18. An application to search logs ● In 2008 this article sold me on map reduce ● Take logs from all servers ● Put them into hadoop ● Generate lucene indexes ● Load into sharded SOLR cluster on interval © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 19. Pseudo diagram of solution ● Process to get files from servers into hadoop ● MapReduce process to build indexes ● Embedded SOLR on Hadoop Datanodes * Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 20. But now its the future! ● Every component or layer of an architecture is another thing document and manage ● DataStax has built SOLR into Cassandra ● Applications can write to solr/cassandra directly ● Applications can read solr/cassandra directly © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 21. Ah ha! moment ● Determined the rackspace log application could be done with simple pieces ● Someone called it Taco Bell Programming 'The more I write code and design systems, the more I understand that many times, you can achieve the desired functionality simply with clever reconfigurations of the basic Unix tool set. After all, functionality is an asset, but code is a liability. ● Cassandra is my main taco ingredient © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 22. Prolific statement: Design stuff with less arrows ● More layers/components ● Batch driven ● Less layers/components ● Low latency © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 23. Solr has wide adoption ● Clients for many programming languages ● Many hip JQuery Ajax widgets and stuff ● Open source Reuters Ajax Solr demo worked seamlessly with cassandra/solr ● Implemented Rackspace like solution with small code © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 24. Game Changer: Compression ● Main memory reference 100 ns 20x L2 cache, 200x L1 cache ● Compress 1K bytes with Zippy 3,000 ns ● Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms ● Read 4K randomly from SSD* 150,000 ns 0.15 ms ● Read 1 MB sequentially from memory 250,000 ns 0.25 ms ● Round trip within same datacenter 500,000 ns 0.5 ms ● Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory ● Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip ● Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Source: https://gist.github.com/2841832 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 25. Why compression helps ● Compressed data is smaller on disk ● If we compress data more fits in RAM and is cached ● Rotational disks: ● Rotational disks have very slow seeks ● RAM not used by process with cache disk ● Solid State Disks do seek faster then rotational ● But they are more expensive then rotationa l © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 26. Enabling Compression ● Rolling update to Cassandra ● update column family my_stuff with compression_options={sstable_compression:SnappyCompresso r, chunk_length_kb:64}; ● bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff ● 68 GB of data shrinks to 36 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 27. Compression in action ● Disk activity reduced drastically as more/all data fit in cache ● Better performance ● Disks that spin less should last longer © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 28. Compression lessons ● Creates extra CPU usage (but not really much) ● Creates more young gen garbage (some) ● Anecdotal experimentation with chunk_length_kb ● 64KB is good for sparse less frequent tables ● 16KB had same compression ratio and made less garbage ● Found 4KB to be less effective then 16KB ● This is easy to experiment with © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 29. We have reached the point of the presentation where we... © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 30. Hate on everything not Cassandra © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 31. Cassandra's uptime story ● Main cluster in continuous operation since 8/6/11 ● Doubled physical nodes in the cluster ● Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7 ● Rolling reboot kernel update, 1 for leap second ● No maintenance windows ● Let's compare Cassandra with other things I use/used © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 32. Cassandra vs MySQL master/slave... MySQL Cassandra Replication Single thread, binlogs, Per operation manual recovery Scaling Add more nodes, initial Bootstrap new sync, setup replication, Cassandra node, re- configure applications balance off-peak Consistency Applications that care Per operation read master, or application check status of replication Backup Mysqldump/LVM Sstabletojson | snapshot snapshot Restore Re-insert Copy files into place everything/Restore snapshot © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 33. So with mysql... ● Replication breaking often ● requiring manual intervention for many fixes ● Blocking writes for 30 minutes to add a column to a table ● Scale up to big iron then... ● Restart takes 30 minutes to fsck all disks ● Applications needing to be coded with state aware logic ● Which node should I query? ● Is replication behind? ● Is there some merge table trickery going on? © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 34. Cassandra vs Memcache Memcache Cassandra Replication None (client managed) Per operation Scaling None (client managed) Grow or shrink without bad reads Consistency Yes (and really no) Per operation Backup No persistence sstabletojson|snapshot Restore No persistence Cache warming © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 35. So memcache is... ● Not persistent ● Not clear on sharding ● Not clear on failure modes ● Actual experiences with memcache ● Memcache client was not sharding requests evenly. 60 % were going to node 1.. ● We lost rack with 40% of the memcache nodes – Site went to crawl as DB's were overloaded – took 1 hour to warm up again © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 36. Cassandra vs DRBD DRBD Cassandra Replication 1 or 2 nodes per block Per operation Scaling No scaling. Just more Grow or shrink availability. dynamically Consistency Sync modes change Per operation failure consistency, deadtime between flip- flops Backup Like a disk sstabletojson|snapshot Restore Like a disk Like a disk © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 37. So DRBD is... ● A 30 second to 1 minute fail over/outage ● An alert that might wake you up ● but hopefully allows you to sleep again ● Handcuffed to linux-ha/keepalived etc ● Making it an involved setup ● Making it involved to troubleshoot ● Might need a crossover cable or dedicated network ● cpu/network intensive with very active disks ● Can successfully fail over a data file in an inconsistent state © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 38. Cassandra vs HDFS Hadoop Cassandra Replication Per file Per operation Scaling Add nodes Add nodes Consistency Very, to the point Per operation getting data in becomes difficult Backup Distcp sstabletojson|snapshot Restore Distcp Like a disk © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 39. So HDFS... ● Comes up with about 4 or 5 reasons a year for master node/ full cluster restart ● Grow NameNode heap ● Enable jobtracker setting to stop 100,000 task jobs ● Enabled/updated trash feature (off by default) ● Forced to do a fail over by hardware fault ● Random DRBD/Kernel brain fart ● Need to update a JVM/kernel eventually ● Now finally new versions have HA NameNode ● Running jobs lose progress will not automatically restart © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 40. Questions? © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential