Cassandra Summit 2010 Performance Tuning

Cassandra Summit 1.0
Performance Tuning

Brandon Williams

Riptano, Inc.
brandon@riptano.com
brandonwilliams@apache.org
@faltering
driftx on freenode

August 10, 2010

Brandon Williams Cassandra Summit 1.0

Tuning Writes
Tuning Reads

Making writes faster

Use a separate IO device for the commit log.


Tuning Writes
Tuning Reads


Hard to accomplish in the cloud


Tuning Writes
Tuning Reads


Rackspace: one IO device, but it’s persistent (RAID array
underneath)


Tuning Writes
Tuning Reads


underneath)
EC2: EBS is slow, local disk is impersistent


Tuning Writes
Tuning Reads


underneath)
You could put the commitlog on the ephemeral drive anyway,
at the price of durability
But then, why have a commitlog at all?


Tuning Writes
Tuning Reads


underneath)
Maybe you can disable it in 0.7/0.8


Tuning Writes
Tuning Reads


underneath)
Realservers: one RAID array, bad RAID options


Tuning Writes
Tuning Reads


underneath)
Realservers: one RAID array, bad RAID options
Will anyone ever offer SSDs?


Tuning Writes
Tuning Reads

What else?

concurrent writers (concurrent readers for
reads)
increase if you have lots of cores


Tuning Writes
Tuning Reads

What else?

concurrent writers (concurrent readers for
reads)
increase if you have lots of cores
memtable flush writers
increase if you have lots of IO


Tuning Writes
Tuning Reads

What are all these options?

memtable throughput in mb
memtable operations in millions
memtable flush after mins
bigger memtables improve writes?


Tuning Writes
Tuning Reads


no, but they can improve reads


Tuning Writes
Tuning Reads


no, but they can improve reads
what?


Tuning Writes
Tuning Reads

Compaction: the slayer of reads


Tuning Writes
Tuning Reads


a necessary evil


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
you can reduce compaction priority in 0.6.4 or later
-Dcassandra.compaction.priority=1


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
constantly outstripping it means you need more nodes


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
reducing the priority affects CPU usage, not IO


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
avoid reading from slow hosts


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
dynamic snitch


Tuning Writes
Tuning Reads


a necessary evil
IO contention hell
dynamic snitch
accrual failure detector


Tuning Writes
Tuning Reads

Compaction (con’t)

bigger memtables absorb more overwrites


Tuning Writes
Tuning Reads


less sstables makes for more efﬁcient compaction


Tuning Writes
Tuning Reads


if you are write once then read-only, you *could* turn it off


Tuning Writes
Tuning Reads


merge-on-read and bloomﬁlters save you


Tuning Writes
Tuning Reads


merge-on-read and bloomﬁlters save you
someday, you’ll want to repair


Tuning Writes
Tuning Reads

Know your read pattern


Tuning Writes
Tuning Reads


how much data is in the working set?


Tuning Writes
Tuning Reads


disk is slow: you want that in memory


Tuning Writes
Tuning Reads


sometimes you can’t afford the cost


Tuning Writes
Tuning Reads


how many reads are repeats?


Tuning Writes
Tuning Reads


how many reads are repeats?
doing lots of random IO within a row?
column index size in kb


Tuning Writes
Tuning Reads

Caches


Tuning Writes
Tuning Reads

Caches

on a cold hit, each row requires two seeks


Tuning Writes
Tuning Reads

Caches

one to ﬁnd the row’s position in the index


Tuning Writes
Tuning Reads

Caches

key cache eliminates this


Tuning Writes
Tuning Reads

Caches

another to read the row
row cache eliminates this, too


Tuning Writes
Tuning Reads

Caches

columns in the row are contiguous afterwards


Tuning Writes
Tuning Reads

Caches

make fat rows


Tuning Writes
Tuning Reads

Caches

make fat rows
but not too fat, since the row is the unit of distribution


Tuning Writes
Tuning Reads

Caches

make fat rows
the OS ﬁle cache


Tuning Writes
Tuning Reads

Caches

make fat rows
the OS ﬁle cache
use a good OS


Tuning Writes
Tuning Reads

Caching Strategies


Tuning Writes
Tuning Reads

Caching Strategies
key cache
excellent bang for your buck
half your seeks are gone
a lot of keys ﬁt in a relatively small amount of memory


Tuning Writes
Tuning Reads

Caching Strategies
key cache
row cache
all seeks are gone
but more heap usage = more GC pressure


Tuning Writes
Tuning Reads

Caching Strategies
key cache
row cache
all seeks are gone
trying to use 32GB of row cache will wreck you


Tuning Writes
Tuning Reads

Caching Strategies
key cache
row cache
all seeks are gone
estimating the correct size can be difﬁcult
use the average row size in cfstats as a starting point
in 0.7, each SSTable has a persistent row size histogram
the penalty for being wrong can be catastrophic: OOM
can’t be done programmatically in Java, or Cassandra would
do it for you
this is why you can’t set an absolute amount in bytes


Tuning Writes
Tuning Reads

Caching Strategies
key cache
row cache
all seeks are gone
do it for you
if you enable on it very fat rows, it can be bad


Tuning Writes
Tuning Reads

Caching Strategies
key cache
row cache
all seeks are gone
do it for you
if you enable on it very fat rows, it can be bad
keep your indexes in a different column family

Tuning Writes
Tuning Reads

Caching Strategies (con’t)

OS ﬁle cache: it’s free
no size estimation needed


Tuning Writes
Tuning Reads


mmap is great
unless it makes you swap


Tuning Writes
Tuning Reads


mmap is great
switch to mmap index only


Tuning Writes
Tuning Reads


mmap is great
why do you have swap enabled, anyway?


Tuning Writes
Tuning Reads


mmap is great
why do you have swap enabled, anyway?
Absolute numbers vs percentages
percentages can be an OOM time bomb
harder to calculate how much memory the cache will use


Tuning Writes
Tuning Reads


lookup order:
row cache
key cache
disk (ﬁle cache?)


Tuning Writes
Tuning Reads


lookup order:
row cache
key cache
disk (ﬁle cache?)
sizing your caches:
large key cache
smaller row cache for very hot rows
leave the rest to the OS


Tuning Writes
Tuning Reads


lookup order:
row cache
key cache
disk (ﬁle cache?)
sizing your caches:
large key cache
don’t make your heap larger than needed


Tuning Writes
Tuning Reads


lookup order:
row cache
key cache
disk (ﬁle cache?)
sizing your caches:
large key cache
don’t make your heap larger than needed
monitor hit rates via JMX
actually, monitor everything you can


Tuning Writes
Tuning Reads

Test, Measure, Tweak, Repeat


Tuning Writes
Tuning Reads


use stress.py as a baseline
make sure you have multiprocessing


Tuning Writes
Tuning Reads


use stress.py as a baseline
make sure you have multiprocessing
move to real world data


Tuning Writes
Tuning Reads

Settings you don’t need to touch

commitlog rotation threshold in mb
SlicedBufferSizeInKB
FlushIndexBufferSizeInMB


Tuning Writes
Tuning Reads

The End

Questions?


Cassandra Summit 2010 Performance Tuning

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Cassandra Summit 2010 Performance Tuning

Semelhante a Cassandra Summit 2010 Performance Tuning (13)

Último

Último (20)

Cassandra Summit 2010 Performance Tuning