SlideShare a Scribd company logo
1 of 26
안미진
RocksDB Compaction
Embedded Key-Value Store for Flash and RAM
Contents
1. RocksDB Architecture
2. Level Style Compaction
3. Universal Style Compaction
4. RocksDB Compaction
Overview
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
(4MB)
Immutable
Memtable
Memory
Disk
Write
Level 0
(4 SSTfile)
Level 1
(10MB)
Level 2
(100MB)
. . .
. . . . . .
Info Log
MANIFEST
CURRENT
Compaction
Log
SSTfile
(2MB)
RocksDB Compaction
Multi-threaded compactions
• Background Multi-thread
→ periodically do the “compaction”
→ parallel compactions on different parts of the database
can occur simultaneously
• Merge SSTfiles to a bigger SSTfile
• Remove multiple copies of the same key
– Duplicate or overwritten keys
• Process deletions of keys
• Supports two different styles of compaction
– Tunable compaction to trade-off
Level Style Compaction
• level0_file_num_compaction_trigger
- Number of files to trigger level0 compaction
- Default : 1
Ex) candidate files size < the next file’s size (1% smaller)
→ include next file into this candidate set
• Level0_file_
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
1. Level Style Compaction
• RocksDB default compaction style
• Stores data in multiple levels in the database
• More recent data → L0
The oldest data → Lmax
• Files in L0
- overlapping keys, sorted by flush time
Files in L1 and higher
- non-overlapping keys, sorted by key
• Each level is 10 times larger than the previous one
Inherited from LevelDB
Level Style Compaction
Compaction process
cache
log
level1
level2
level3
level0
① Pick one file from level N
② Compact it with all its overlapping
files from level N+1
③ Replace them with new files in
level N+1
Level 0 → Level 1 Compaction
• Level 0 → overlapping keys
• Compaction includes all files from L1
• All files from L1 are compacted with L0
• L0 → L1 compaction completion
L1 → L2 compaction start
• Single thread compaction → not good throughput
• Solution : Making the size of L0 similar to size of L1
Tricky Compaction
Level Style Compaction
Level Style Compaction
· Level score =
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑠𝑖𝑧𝑒
max level size
· max file size
= target_file_size_base * target_file_size_multiplier
(Default=2MB) (Default=1)
· Overlapping range search
: Binary Search
Level Style
Flowchart
• Read : 128KB / Write : 512KB
Level Style Compaction
2. Universal Style Compaction
• For write-heavy workloads
→ Level Style Compaction may be bottlenecked on
disk throughput
• Stores all files in L0
• All files are arranged in time order
• Temporarily increase size amplification by a factor of
two
• Intended to decrease write amplification
• But, increase space amplification
Universal Style Compaction
① Pick up a few files that are chronologically adjacent to one
another
② Merge them
③ Replace them with a new file in level 0
Compaction process
Universal Style Compaction
Universal Style Compaction
Universal Style Compaction
Flowchart
Universal Style Compaction
• Read : 128KB / Write : 512KB
Universal Style Compaction
• size_ratio
- Percentage flexibility while comparing file size
- Default : 1
Ex) candidate set size < size of next file (1% smaller)
→ include next file in candidate set
• min_merge_width
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
Universal Style Compaction
• max_size_amplification_percent
- The amount of additional storage needed to store a
single byte of data in the database
- Controls the amount of space amplification in the
database
- Does not determine when calls to Put & Delete are
stalled
- Determines when compaction is done
- Default : 200
Compaction options
Universal Style Compaction
• stop_style
- The algorithm used to stop picking files into a single
compaction run
- kCompactionStopStyleSimilarSize
→ Pick files of similar size
- kCompactionStopStyleTotalSize
→ total size of picked files > next files
- Default : kCompactionStopStyleTotalSize
Compaction options

More Related Content

What's hot

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

What's hot (20)

Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 

Similar to RocksDB compaction

Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
Storage talk
Storage talkStorage talk
Storage talk
christkv
 

Similar to RocksDB compaction (20)

Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring Tempdb
 
Shignled disk
Shignled diskShignled disk
Shignled disk
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Extlect03
Extlect03Extlect03
Extlect03
 
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Deep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS PerformanceDeep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS Performance
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Storage talk
Storage talkStorage talk
Storage talk
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Caches microP
Caches microPCaches microP
Caches microP
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 

More from MIJIN AN (7)

InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and Checkpoints
 
Secondary Index Search in InnoDB
Secondary Index Search in InnoDBSecondary Index Search in InnoDB
Secondary Index Search in InnoDB
 
MySQL Space Management
MySQL Space ManagementMySQL Space Management
MySQL Space Management
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
Group play service for Tizen
Group play service for TizenGroup play service for Tizen
Group play service for Tizen
 
MySQL with FaCE
MySQL with FaCEMySQL with FaCE
MySQL with FaCE
 
MySQL Hash Table
MySQL Hash TableMySQL Hash Table
MySQL Hash Table
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

RocksDB compaction

Editor's Notes

  1. MANIFEST files will be formatted as a log all changes cause a state change (add or delete) will be appended to the log. A MANIFEST file lists the set of sorted tables that make up each level Informational messages are printed to files named LOG and LOG.old. CURRENT is a latest manifest file name of the text file