Cassandra compaction

What is Compaction?
Kazutaka Tomita (INTHEFOREST Co., Ltd.)

Who is this guy?
Kazutaka Tomita (@railute)
• ＩＮＴＨＥＦＯＲＥＳＴ Co., Ltd. CEO/CTO
• Consulting for Apache Cassandra and Apache Spark Systems
• Supporting for Cassandra in Japan
• an organizer of Cassandra Summit JPN
Specialty
• RDBMS (Oracle,SQLServer,MySQL,PostgreSQL)
• Apache Cassandra
• Apache Spark
• Apache Hadoop with YARN
• And other NoSQL
• NLP and Text mining for Japanese

Agenda
 Overview of Compaction.
 Compaction Do.

Overview of Compaction.
• Why is the compaction done ?
• When is the compaction done?
• What type is the compaction?
Three points of Cassandra’s Compaction.

Why is the compaction done ?
So, We must purge duplicate or overwritten or deleted data and tombstones.
The most important thing :
The SSTable is immutable.

Writing System for Apache Cassandra
for your reference
memtable
Memory
Disk
Commit Log
Coordinator
node
Flush
SSTable
For
local
1st
NoWriting
node is
alive.
YES
Write Hinted
Sent messages to other node
Writing operation
Receive messages from coordinator node
2nd
memtable memtable
SSTable SSTable
Compacion
Close
YES
No
Sort by token

When is the compaction done?
1.Manually
2.Running in the background

1.Manually
1. nodetool compact
Forces a major compaction on one or more tables.
By size tiered compaction, a major compaction combines each of the
pools of repaired and unrepaired SSTables into one repaired and one
unreparied SSTable.
2. nodetool scrub
Rebuild SSTables for one or more Cassandra tables.
3. nodetool cleanup
Cleans up keyspaces and partition keys no longer belonging to a node.
Use this command to remove unwanted data after adding a new node
to the cluster. Cassandra does not automatically remove data from
nodes that lose part of their partition range to a newly added node.
4. nodetool upgradesstables
Rewrites SSTables for tables that are not running the current version
of Cassandra.

2. Running in the background
1.daemon started
2.after flashing memtables
3.after streaming
4.enable auto compaction by nodetool
5.set compaction threshold by nodetool

What type is the compaction?
1. Minor
2. Major
3. Single-sstable compactions
4. Anti compaction

1. Minor
This compaction runs automatically in the background.
• daemon started
• after flashing memtables
• after streaming

2. Major
This compaction is only called by size tiered compaction.
cf.)
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy#getMaximalTask
The Other compaction is called by “nodetool compact”, but major compaction is not executed.
*n. minor compaction is executed.
cf.)
org.apache.cassandra.db.compaction.DateTieredCompactionStrategy#getMaximalTask
org.apache.cassandra.db.compaction.LeveledCompactionStrategy#getMaximalTask

3. Single-sstable compactions
This Compaction is executed one by one every SSTable.
nodetool upgradesstables
nodetool scrub
nodetool cleanup

4. Anti compaction
This Compaction is for incremental repairs.
After executing incremantal repairs, An anticompaction is called.
*After 2.1

Compaction Strategy
1. SizeTieredCompactionStrategy
For write-intensive workloads
2. LeveledCompactionStrategy
For read-intensive workloads
3. DateTieredCompactionStrategy
For time series data and expiring (TTL) data

Size Tiered Compaction Strategy
When Some SSTables became the similar size, they are merged.
(default is 4.)
SSTable SSTable SSTable SSTable
SSTable SSTable
SSTable SSTable
SSTable

Leveled Compaction Strategy
SSTable SSTable SSTable SSTable SSTable SSTableLebel0
SSTableLebel1 SSTable SSTable
SSTableLebel2 SSTable
The data which
isn't read so much.

DateTieredCompactionStrategy
Default:1hour
The basic idea of DTCS is to group SSTables in windows based on how old the data is in the SSTable.
sstable sstable sstable sstable
sstable
windows
windows
now
sstable
4 sstables 4 sstables

Merge SSTable by Compaction
When Some SSTables became the similar size, they are merged.
(default is 4.)
Name: John
Address: Osaka Address: Tokyo
Tel: xxx-xxx
ages: 20
Name: John
Address: Tokyo
ages: 20

Cassandra compaction

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (15)

Semelhante a Cassandra compaction

Semelhante a Cassandra compaction (20)

Mais de Kazutaka Tomita

Mais de Kazutaka Tomita (14)

Último

Último (20)

Cassandra compaction