Compaction in Apache Cassandra is the process of merging SSTables to reclaim disk space used by deleted or overwritten data. It occurs automatically in the background after memtables are flushed to disk or manually via nodetool. There are minor, major, and single-SSTable compactions. The compaction strategy, such as size-tiered, leveled, or date-tiered, determines how SSTables are merged.
2. Who is this guy?
Kazutaka Tomita (@railute)
• INTHEFOREST Co., Ltd. CEO/CTO
• Consulting for Apache Cassandra and Apache Spark Systems
• Supporting for Cassandra in Japan
• an organizer of Cassandra Summit JPN
Specialty
• RDBMS (Oracle,SQLServer,MySQL,PostgreSQL)
• Apache Cassandra
• Apache Spark
• Apache Hadoop with YARN
• And other NoSQL
• NLP and Text mining for Japanese
4. Overview of Compaction.
• Why is the compaction done ?
• When is the compaction done?
• What type is the compaction?
Three points of Cassandra’s Compaction.
5. Why is the compaction done ?
So, We must purge duplicate or overwritten or deleted data and tombstones.
The most important thing :
The SSTable is immutable.
6. Writing System for Apache Cassandra
for your reference
memtable
Memory
Disk
Commit Log
Coordinator
node
Flush
SSTable
For
local
1st
NoWriting
node is
alive.
YES
Write Hinted
Sent messages to other node
Writing operation
Receive messages from coordinator node
2nd
memtable memtable
SSTable SSTable
Compacion
Close
YES
No
Sort by token
7. When is the compaction done?
1.Manually
2.Running in the background
8. When is the compaction done?
1.Manually
1. nodetool compact
Forces a major compaction on one or more tables.
By size tiered compaction, a major compaction combines each of the
pools of repaired and unrepaired SSTables into one repaired and one
unreparied SSTable.
2. nodetool scrub
Rebuild SSTables for one or more Cassandra tables.
3. nodetool cleanup
Cleans up keyspaces and partition keys no longer belonging to a node.
Use this command to remove unwanted data after adding a new node
to the cluster. Cassandra does not automatically remove data from
nodes that lose part of their partition range to a newly added node.
4. nodetool upgradesstables
Rewrites SSTables for tables that are not running the current version
of Cassandra.
9. When is the compaction done?
2. Running in the background
1.daemon started
2.after flashing memtables
3.after streaming
4.enable auto compaction by nodetool
5.set compaction threshold by nodetool
10. What type is the compaction?
1. Minor
2. Major
3. Single-sstable compactions
4. Anti compaction
11. What type is the compaction?
1. Minor
This compaction runs automatically in the background.
• daemon started
• after flashing memtables
• after streaming
12. What type is the compaction?
2. Major
This compaction is only called by size tiered compaction.
cf.)
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy#getMaximalTask
The Other compaction is called by “nodetool compact”, but major compaction is not executed.
*n. minor compaction is executed.
cf.)
org.apache.cassandra.db.compaction.DateTieredCompactionStrategy#getMaximalTask
org.apache.cassandra.db.compaction.LeveledCompactionStrategy#getMaximalTask
13. What type is the compaction?
3. Single-sstable compactions
This Compaction is executed one by one every SSTable.
nodetool upgradesstables
nodetool scrub
nodetool cleanup
14. What type is the compaction?
4. Anti compaction
This Compaction is for incremental repairs.
After executing incremantal repairs, An anticompaction is called.
*After 2.1
16. Size Tiered Compaction Strategy
When Some SSTables became the similar size, they are merged.
(default is 4.)
SSTable SSTable SSTable SSTable
SSTable SSTable
SSTable SSTable
SSTable
17. Leveled Compaction Strategy
SSTable SSTable SSTable SSTable SSTable SSTableLebel0
SSTableLebel1 SSTable SSTable
SSTableLebel2 SSTable
The data which
isn't read so much.
18. DateTieredCompactionStrategy
Default:1hour
The basic idea of DTCS is to group SSTables in windows based on how old the data is in the SSTable.
sstable sstable sstable sstable
sstable
windows
windows
now
sstable
4 sstables 4 sstables
19. Merge SSTable by Compaction
When Some SSTables became the similar size, they are merged.
(default is 4.)
Name: John
Address: Osaka Address: Tokyo
Tel: xxx-xxx
ages: 20
Name: John
Address: Tokyo
ages: 20