SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Cassandra Summit 1.0
    Performance Tuning


      Brandon Williams

           Riptano, Inc.
    brandon@riptano.com
 brandonwilliams@apache.org
             @faltering
        driftx on freenode

       August 10, 2010




  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8
          Realservers: one RAID array, bad RAID options




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8
          Realservers: one RAID array, bad RAID options
          Will anyone ever offer SSDs?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What else?




     concurrent writers (concurrent readers for
     reads)
        increase if you have lots of cores




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What else?




     concurrent writers (concurrent readers for
     reads)
        increase if you have lots of cores
     memtable flush writers
        increase if you have lots of IO




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?
          no, but they can improve reads




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?
          no, but they can improve reads
          what?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                     Tuning Reads


Compaction: the slayer of reads




                  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Compaction: the slayer of reads



      a necessary evil




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts
          dynamic snitch




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts
          dynamic snitch
               accrual failure detector




                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off
          merge-on-read and bloomfilters save you




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off
          merge-on-read and bloomfilters save you
          someday, you’ll want to repair




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Know your read pattern




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Know your read pattern




      how much data is in the working set?




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost
      how many reads are repeats?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost
      how many reads are repeats?
      doing lots of random IO within a row?
          column index size in kb




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                            Tuning Reads


Caches




         Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Caches


     on a cold hit, each row requires two seeks




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution
     the OS file cache




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution
     the OS file cache
         use a good OS




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Caching Strategies




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes



                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes
          if you enable on it very fat rows, it can be bad

                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes
          if you enable on it very fat rows, it can be bad
               keep your indexes in a different column family
                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?
      Absolute numbers vs percentages
          percentages can be an OOM time bomb
          harder to calculate how much memory the cache will use




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?
      Absolute numbers vs percentages
          percentages can be an OOM time bomb
          harder to calculate how much memory the cache will use




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS
      don’t make your heap larger than needed




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS
      don’t make your heap larger than needed
      monitor hit rates via JMX
          actually, monitor everything you can



                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Test, Measure, Tweak, Repeat




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Test, Measure, Tweak, Repeat




      use stress.py as a baseline
          make sure you have multiprocessing




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Test, Measure, Tweak, Repeat




      use stress.py as a baseline
          make sure you have multiprocessing
      move to real world data




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                     Tuning Reads


Settings you don’t need to touch




      commitlog rotation threshold in mb
      SlicedBufferSizeInKB
      FlushIndexBufferSizeInMB




                  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                  Tuning Reads


The End




  Questions?




               Brandon Williams   Cassandra Summit 1.0

Mais conteúdo relacionado

Destaque

Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)kakugawa
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Ambiente Livre
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...DataStax
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...DataStax
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & JupyterRaj Singh
 

Destaque (20)

Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Devstack
DevstackDevstack
Devstack
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Django Heresies
Django HeresiesDjango Heresies
Django Heresies
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
 

Semelhante a Cassandra Summit 2010 Performance Tuning

Le Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerLe Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerMicrosoft Technet France
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
Demystifying Storage - Building large SANs
Demystifying  Storage - Building large SANsDemystifying  Storage - Building large SANs
Demystifying Storage - Building large SANsDirecti Group
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanJoseph D'Antoni
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageElizabeth Ciabattari
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderJustin Smestad
 

Semelhante a Cassandra Summit 2010 Performance Tuning (13)

Le Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerLe Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL Server
 
Amazon rds
Amazon rdsAmazon rds
Amazon rds
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Firebird and RAID
Firebird and RAIDFirebird and RAID
Firebird and RAID
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Demystifying Storage - Building large SANs
Demystifying  Storage - Building large SANsDemystifying  Storage - Building large SANs
Demystifying Storage - Building large SANs
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable Storage
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo Boulder
 
Raid
Raid Raid
Raid
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Cassandra Summit 2010 Performance Tuning

  • 1. Cassandra Summit 1.0 Performance Tuning Brandon Williams Riptano, Inc. brandon@riptano.com brandonwilliams@apache.org @faltering driftx on freenode August 10, 2010 Brandon Williams Cassandra Summit 1.0
  • 2. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Brandon Williams Cassandra Summit 1.0
  • 3. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Brandon Williams Cassandra Summit 1.0
  • 4. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) Brandon Williams Cassandra Summit 1.0
  • 5. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent Brandon Williams Cassandra Summit 1.0
  • 6. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Brandon Williams Cassandra Summit 1.0
  • 7. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Brandon Williams Cassandra Summit 1.0
  • 8. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Realservers: one RAID array, bad RAID options Brandon Williams Cassandra Summit 1.0
  • 9. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Realservers: one RAID array, bad RAID options Will anyone ever offer SSDs? Brandon Williams Cassandra Summit 1.0
  • 10. Tuning Writes Tuning Reads What else? concurrent writers (concurrent readers for reads) increase if you have lots of cores Brandon Williams Cassandra Summit 1.0
  • 11. Tuning Writes Tuning Reads What else? concurrent writers (concurrent readers for reads) increase if you have lots of cores memtable flush writers increase if you have lots of IO Brandon Williams Cassandra Summit 1.0
  • 12. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? Brandon Williams Cassandra Summit 1.0
  • 13. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? no, but they can improve reads Brandon Williams Cassandra Summit 1.0
  • 14. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? no, but they can improve reads what? Brandon Williams Cassandra Summit 1.0
  • 15. Tuning Writes Tuning Reads Compaction: the slayer of reads Brandon Williams Cassandra Summit 1.0
  • 16. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil Brandon Williams Cassandra Summit 1.0
  • 17. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell Brandon Williams Cassandra Summit 1.0
  • 18. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 Brandon Williams Cassandra Summit 1.0
  • 19. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes Brandon Williams Cassandra Summit 1.0
  • 20. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO Brandon Williams Cassandra Summit 1.0
  • 21. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts Brandon Williams Cassandra Summit 1.0
  • 22. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts dynamic snitch Brandon Williams Cassandra Summit 1.0
  • 23. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts dynamic snitch accrual failure detector Brandon Williams Cassandra Summit 1.0
  • 24. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites Brandon Williams Cassandra Summit 1.0
  • 25. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction Brandon Williams Cassandra Summit 1.0
  • 26. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off Brandon Williams Cassandra Summit 1.0
  • 27. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off merge-on-read and bloomfilters save you Brandon Williams Cassandra Summit 1.0
  • 28. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off merge-on-read and bloomfilters save you someday, you’ll want to repair Brandon Williams Cassandra Summit 1.0
  • 29. Tuning Writes Tuning Reads Know your read pattern Brandon Williams Cassandra Summit 1.0
  • 30. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? Brandon Williams Cassandra Summit 1.0
  • 31. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory Brandon Williams Cassandra Summit 1.0
  • 32. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost Brandon Williams Cassandra Summit 1.0
  • 33. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost how many reads are repeats? Brandon Williams Cassandra Summit 1.0
  • 34. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost how many reads are repeats? doing lots of random IO within a row? column index size in kb Brandon Williams Cassandra Summit 1.0
  • 35. Tuning Writes Tuning Reads Caches Brandon Williams Cassandra Summit 1.0
  • 36. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks Brandon Williams Cassandra Summit 1.0
  • 37. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index Brandon Williams Cassandra Summit 1.0
  • 38. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this Brandon Williams Cassandra Summit 1.0
  • 39. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too Brandon Williams Cassandra Summit 1.0
  • 40. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards Brandon Williams Cassandra Summit 1.0
  • 41. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows Brandon Williams Cassandra Summit 1.0
  • 42. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution Brandon Williams Cassandra Summit 1.0
  • 43. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution the OS file cache Brandon Williams Cassandra Summit 1.0
  • 44. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution the OS file cache use a good OS Brandon Williams Cassandra Summit 1.0
  • 45. Tuning Writes Tuning Reads Caching Strategies Brandon Williams Cassandra Summit 1.0
  • 46. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory Brandon Williams Cassandra Summit 1.0
  • 47. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure Brandon Williams Cassandra Summit 1.0
  • 48. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you Brandon Williams Cassandra Summit 1.0
  • 49. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes Brandon Williams Cassandra Summit 1.0
  • 50. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes if you enable on it very fat rows, it can be bad Brandon Williams Cassandra Summit 1.0
  • 51. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes if you enable on it very fat rows, it can be bad keep your indexes in a different column family Brandon Williams Cassandra Summit 1.0
  • 52. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed Brandon Williams Cassandra Summit 1.0
  • 53. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap Brandon Williams Cassandra Summit 1.0
  • 54. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only Brandon Williams Cassandra Summit 1.0
  • 55. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Brandon Williams Cassandra Summit 1.0
  • 56. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Brandon Williams Cassandra Summit 1.0
  • 57. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Absolute numbers vs percentages percentages can be an OOM time bomb harder to calculate how much memory the cache will use Brandon Williams Cassandra Summit 1.0
  • 58. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Absolute numbers vs percentages percentages can be an OOM time bomb harder to calculate how much memory the cache will use Brandon Williams Cassandra Summit 1.0
  • 59. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) Brandon Williams Cassandra Summit 1.0
  • 60. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS Brandon Williams Cassandra Summit 1.0
  • 61. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS don’t make your heap larger than needed Brandon Williams Cassandra Summit 1.0
  • 62. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS don’t make your heap larger than needed monitor hit rates via JMX actually, monitor everything you can Brandon Williams Cassandra Summit 1.0
  • 63. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat Brandon Williams Cassandra Summit 1.0
  • 64. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat use stress.py as a baseline make sure you have multiprocessing Brandon Williams Cassandra Summit 1.0
  • 65. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat use stress.py as a baseline make sure you have multiprocessing move to real world data Brandon Williams Cassandra Summit 1.0
  • 66. Tuning Writes Tuning Reads Settings you don’t need to touch commitlog rotation threshold in mb SlicedBufferSizeInKB FlushIndexBufferSizeInMB Brandon Williams Cassandra Summit 1.0
  • 67. Tuning Writes Tuning Reads The End Questions? Brandon Williams Cassandra Summit 1.0