DevEX - reference for building teams, processes, and platforms
Cassandra Summit 2010 Performance Tuning
1. Cassandra Summit 1.0
Performance Tuning
Brandon Williams
Riptano, Inc.
brandon@riptano.com
brandonwilliams@apache.org
@faltering
driftx on freenode
August 10, 2010
Brandon Williams Cassandra Summit 1.0
2. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Brandon Williams Cassandra Summit 1.0
3. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Brandon Williams Cassandra Summit 1.0
4. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
Brandon Williams Cassandra Summit 1.0
5. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
EC2: EBS is slow, local disk is impersistent
Brandon Williams Cassandra Summit 1.0
6. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
EC2: EBS is slow, local disk is impersistent
You could put the commitlog on the ephemeral drive anyway,
at the price of durability
But then, why have a commitlog at all?
Brandon Williams Cassandra Summit 1.0
7. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
EC2: EBS is slow, local disk is impersistent
You could put the commitlog on the ephemeral drive anyway,
at the price of durability
But then, why have a commitlog at all?
Maybe you can disable it in 0.7/0.8
Brandon Williams Cassandra Summit 1.0
8. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
EC2: EBS is slow, local disk is impersistent
You could put the commitlog on the ephemeral drive anyway,
at the price of durability
But then, why have a commitlog at all?
Maybe you can disable it in 0.7/0.8
Realservers: one RAID array, bad RAID options
Brandon Williams Cassandra Summit 1.0
9. Tuning Writes
Tuning Reads
Making writes faster
Use a separate IO device for the commit log.
Hard to accomplish in the cloud
Rackspace: one IO device, but it’s persistent (RAID array
underneath)
EC2: EBS is slow, local disk is impersistent
You could put the commitlog on the ephemeral drive anyway,
at the price of durability
But then, why have a commitlog at all?
Maybe you can disable it in 0.7/0.8
Realservers: one RAID array, bad RAID options
Will anyone ever offer SSDs?
Brandon Williams Cassandra Summit 1.0
10. Tuning Writes
Tuning Reads
What else?
concurrent writers (concurrent readers for
reads)
increase if you have lots of cores
Brandon Williams Cassandra Summit 1.0
11. Tuning Writes
Tuning Reads
What else?
concurrent writers (concurrent readers for
reads)
increase if you have lots of cores
memtable flush writers
increase if you have lots of IO
Brandon Williams Cassandra Summit 1.0
12. Tuning Writes
Tuning Reads
What are all these options?
memtable throughput in mb
memtable operations in millions
memtable flush after mins
bigger memtables improve writes?
Brandon Williams Cassandra Summit 1.0
13. Tuning Writes
Tuning Reads
What are all these options?
memtable throughput in mb
memtable operations in millions
memtable flush after mins
bigger memtables improve writes?
no, but they can improve reads
Brandon Williams Cassandra Summit 1.0
14. Tuning Writes
Tuning Reads
What are all these options?
memtable throughput in mb
memtable operations in millions
memtable flush after mins
bigger memtables improve writes?
no, but they can improve reads
what?
Brandon Williams Cassandra Summit 1.0
15. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
Brandon Williams Cassandra Summit 1.0
16. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
Brandon Williams Cassandra Summit 1.0
17. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
Brandon Williams Cassandra Summit 1.0
18. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
Brandon Williams Cassandra Summit 1.0
19. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
constantly outstripping it means you need more nodes
Brandon Williams Cassandra Summit 1.0
20. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
constantly outstripping it means you need more nodes
reducing the priority affects CPU usage, not IO
Brandon Williams Cassandra Summit 1.0
21. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
constantly outstripping it means you need more nodes
reducing the priority affects CPU usage, not IO
avoid reading from slow hosts
Brandon Williams Cassandra Summit 1.0
22. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
constantly outstripping it means you need more nodes
reducing the priority affects CPU usage, not IO
avoid reading from slow hosts
dynamic snitch
Brandon Williams Cassandra Summit 1.0
23. Tuning Writes
Tuning Reads
Compaction: the slayer of reads
a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1
constantly outstripping it means you need more nodes
reducing the priority affects CPU usage, not IO
avoid reading from slow hosts
dynamic snitch
accrual failure detector
Brandon Williams Cassandra Summit 1.0
24. Tuning Writes
Tuning Reads
Compaction (con’t)
bigger memtables absorb more overwrites
Brandon Williams Cassandra Summit 1.0
25. Tuning Writes
Tuning Reads
Compaction (con’t)
bigger memtables absorb more overwrites
less sstables makes for more efficient compaction
Brandon Williams Cassandra Summit 1.0
26. Tuning Writes
Tuning Reads
Compaction (con’t)
bigger memtables absorb more overwrites
less sstables makes for more efficient compaction
if you are write once then read-only, you *could* turn it off
Brandon Williams Cassandra Summit 1.0
27. Tuning Writes
Tuning Reads
Compaction (con’t)
bigger memtables absorb more overwrites
less sstables makes for more efficient compaction
if you are write once then read-only, you *could* turn it off
merge-on-read and bloomfilters save you
Brandon Williams Cassandra Summit 1.0
28. Tuning Writes
Tuning Reads
Compaction (con’t)
bigger memtables absorb more overwrites
less sstables makes for more efficient compaction
if you are write once then read-only, you *could* turn it off
merge-on-read and bloomfilters save you
someday, you’ll want to repair
Brandon Williams Cassandra Summit 1.0
29. Tuning Writes
Tuning Reads
Know your read pattern
Brandon Williams Cassandra Summit 1.0
30. Tuning Writes
Tuning Reads
Know your read pattern
how much data is in the working set?
Brandon Williams Cassandra Summit 1.0
31. Tuning Writes
Tuning Reads
Know your read pattern
how much data is in the working set?
disk is slow: you want that in memory
Brandon Williams Cassandra Summit 1.0
32. Tuning Writes
Tuning Reads
Know your read pattern
how much data is in the working set?
disk is slow: you want that in memory
sometimes you can’t afford the cost
Brandon Williams Cassandra Summit 1.0
33. Tuning Writes
Tuning Reads
Know your read pattern
how much data is in the working set?
disk is slow: you want that in memory
sometimes you can’t afford the cost
how many reads are repeats?
Brandon Williams Cassandra Summit 1.0
34. Tuning Writes
Tuning Reads
Know your read pattern
how much data is in the working set?
disk is slow: you want that in memory
sometimes you can’t afford the cost
how many reads are repeats?
doing lots of random IO within a row?
column index size in kb
Brandon Williams Cassandra Summit 1.0
36. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
Brandon Williams Cassandra Summit 1.0
37. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
Brandon Williams Cassandra Summit 1.0
38. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
Brandon Williams Cassandra Summit 1.0
39. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
Brandon Williams Cassandra Summit 1.0
40. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
columns in the row are contiguous afterwards
Brandon Williams Cassandra Summit 1.0
41. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
columns in the row are contiguous afterwards
make fat rows
Brandon Williams Cassandra Summit 1.0
42. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
columns in the row are contiguous afterwards
make fat rows
but not too fat, since the row is the unit of distribution
Brandon Williams Cassandra Summit 1.0
43. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
columns in the row are contiguous afterwards
make fat rows
but not too fat, since the row is the unit of distribution
the OS file cache
Brandon Williams Cassandra Summit 1.0
44. Tuning Writes
Tuning Reads
Caches
on a cold hit, each row requires two seeks
one to find the row’s position in the index
key cache eliminates this
another to read the row
row cache eliminates this, too
columns in the row are contiguous afterwards
make fat rows
but not too fat, since the row is the unit of distribution
the OS file cache
use a good OS
Brandon Williams Cassandra Summit 1.0
46. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
Brandon Williams Cassandra Summit 1.0
47. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
row cache
all seeks are gone
but more heap usage = more GC pressure
Brandon Williams Cassandra Summit 1.0
48. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
row cache
all seeks are gone
but more heap usage = more GC pressure
trying to use 32GB of row cache will wreck you
Brandon Williams Cassandra Summit 1.0
49. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
row cache
all seeks are gone
but more heap usage = more GC pressure
trying to use 32GB of row cache will wreck you
estimating the correct size can be difficult
use the average row size in cfstats as a starting point
in 0.7, each SSTable has a persistent row size histogram
the penalty for being wrong can be catastrophic: OOM
can’t be done programmatically in Java, or Cassandra would
do it for you
this is why you can’t set an absolute amount in bytes
Brandon Williams Cassandra Summit 1.0
50. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
row cache
all seeks are gone
but more heap usage = more GC pressure
trying to use 32GB of row cache will wreck you
estimating the correct size can be difficult
use the average row size in cfstats as a starting point
in 0.7, each SSTable has a persistent row size histogram
the penalty for being wrong can be catastrophic: OOM
can’t be done programmatically in Java, or Cassandra would
do it for you
this is why you can’t set an absolute amount in bytes
if you enable on it very fat rows, it can be bad
Brandon Williams Cassandra Summit 1.0
51. Tuning Writes
Tuning Reads
Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys fit in a relatively small amount of memory
row cache
all seeks are gone
but more heap usage = more GC pressure
trying to use 32GB of row cache will wreck you
estimating the correct size can be difficult
use the average row size in cfstats as a starting point
in 0.7, each SSTable has a persistent row size histogram
the penalty for being wrong can be catastrophic: OOM
can’t be done programmatically in Java, or Cassandra would
do it for you
this is why you can’t set an absolute amount in bytes
if you enable on it very fat rows, it can be bad
keep your indexes in a different column family
Brandon Williams Cassandra Summit 1.0
52. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
Brandon Williams Cassandra Summit 1.0
53. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
Brandon Williams Cassandra Summit 1.0
54. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
switch to mmap index only
Brandon Williams Cassandra Summit 1.0
55. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
switch to mmap index only
why do you have swap enabled, anyway?
Brandon Williams Cassandra Summit 1.0
56. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
switch to mmap index only
why do you have swap enabled, anyway?
Brandon Williams Cassandra Summit 1.0
57. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
switch to mmap index only
why do you have swap enabled, anyway?
Absolute numbers vs percentages
percentages can be an OOM time bomb
harder to calculate how much memory the cache will use
Brandon Williams Cassandra Summit 1.0
58. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
OS file cache: it’s free
no size estimation needed
mmap is great
unless it makes you swap
switch to mmap index only
why do you have swap enabled, anyway?
Absolute numbers vs percentages
percentages can be an OOM time bomb
harder to calculate how much memory the cache will use
Brandon Williams Cassandra Summit 1.0
60. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
lookup order:
row cache
key cache
disk (file cache?)
sizing your caches:
large key cache
smaller row cache for very hot rows
leave the rest to the OS
Brandon Williams Cassandra Summit 1.0
61. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
lookup order:
row cache
key cache
disk (file cache?)
sizing your caches:
large key cache
smaller row cache for very hot rows
leave the rest to the OS
don’t make your heap larger than needed
Brandon Williams Cassandra Summit 1.0
62. Tuning Writes
Tuning Reads
Caching Strategies (con’t)
lookup order:
row cache
key cache
disk (file cache?)
sizing your caches:
large key cache
smaller row cache for very hot rows
leave the rest to the OS
don’t make your heap larger than needed
monitor hit rates via JMX
actually, monitor everything you can
Brandon Williams Cassandra Summit 1.0
64. Tuning Writes
Tuning Reads
Test, Measure, Tweak, Repeat
use stress.py as a baseline
make sure you have multiprocessing
Brandon Williams Cassandra Summit 1.0
65. Tuning Writes
Tuning Reads
Test, Measure, Tweak, Repeat
use stress.py as a baseline
make sure you have multiprocessing
move to real world data
Brandon Williams Cassandra Summit 1.0
66. Tuning Writes
Tuning Reads
Settings you don’t need to touch
commitlog rotation threshold in mb
SlicedBufferSizeInKB
FlushIndexBufferSizeInMB
Brandon Williams Cassandra Summit 1.0
67. Tuning Writes
Tuning Reads
The End
Questions?
Brandon Williams Cassandra Summit 1.0