O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Scaling ScyllaDB Storage Engine with State-of-Art Compaction

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 59 Anúncio

Scaling ScyllaDB Storage Engine with State-of-Art Compaction

Baixar para ler offline

Log Structured Merge (LSM) tree storage engines are known for very fast writes. This LSM tree structure is used by ScyllaDB to immutable Sorted Strings Tables (SSTables) on disk. These fast writes come with a tradeoff in terms of read and space amplification. While compaction processes can help mitigate this, the RUM conjecture states that only two amplification factors can be optimized at the extent of a third. Learn how ScyllaDB leverages RUM conjecture and controller theory, to deliver a state-of-art LSM-tree compaction for its users.

Log Structured Merge (LSM) tree storage engines are known for very fast writes. This LSM tree structure is used by ScyllaDB to immutable Sorted Strings Tables (SSTables) on disk. These fast writes come with a tradeoff in terms of read and space amplification. While compaction processes can help mitigate this, the RUM conjecture states that only two amplification factors can be optimized at the extent of a third. Learn how ScyllaDB leverages RUM conjecture and controller theory, to deliver a state-of-art LSM-tree compaction for its users.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Scaling ScyllaDB Storage Engine with State-of-Art Compaction (20)

Mais de ScyllaDB (20)

Anúncio

Mais recentes (20)

Scaling ScyllaDB Storage Engine with State-of-Art Compaction

  1. 1. Squeezing the Most Out of the Storage Engine with State of the Art Compaction Raphael S. Carvalho, Software Engineer
  2. 2. Raphael Carvalho ■ Syslinux, suite of bootloaders ■ OSv, an operating system for the cloud ■ Seastar, the framework powering ScyllaDB ■ ScyllaDB, the best database in the world
  3. 3. “In order to make good use of the computer resources, one must organize files intelligently, making the retrieval process efficient.” The Ubiquitous B-Tree paper, 1979
  4. 4. ■ Short & precise definition from aforementioned paper: ■ “allow users to store, update, and recall” Storage Engines
  5. 5. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines
  6. 6. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2)
  7. 7. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2) (k1, v3)
  8. 8. ■ Two approaches for handling updates ■ In-place structure (ex: B+-tree) Storage Engines (k1,v3)(k2,v2)
  9. 9. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines
  10. 10. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2)
  11. 11. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  12. 12. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  13. 13. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  14. 14. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  15. 15. ■ Out-of-place update isn’t new. ■ 1976 paper “Differential files” shows its applicability in the real world ■ “shown to be an efficient method for storing a large and changing database” Storage Engines
  16. 16. ■ A good analogy is presented in the paper Storage Engines
  17. 17. ■ The Log-Structured Merge-Tree (LSM-Tree) paper is then published in 1996 Storage Engines
  18. 18. Storage Engines THE LSM-TREE writes C0 C1 C2 Ck MEMORY DISK merge sort
  19. 19. Storage Engines THE LSM-TREE C1 is T times bigger than C0. C(K) is T times bigger than C(K-1). C0 C1 C2 Ck MEMORY DISK merge sort
  20. 20. ■ Immutability of LSM tree components (ex: SSTables) simplifies ■ Concurrency control ■ Recovery Storage Engines
  21. 21. Query on LSM Tree (k1, v2) (k1, v1) MEMORY DISK Query k1
  22. 22. ■ A compaction policy (or strategy) defines the shape of LSM tree ■ Any policy is composed of 4 primitives ■ Trigger (when to compact) ■ File picking policy (which data to compact) ■ Granularity (how much data at once) ■ Layout (how data is laid out) LSM-tree compaction policy
  23. 23. Pure Leveled in Original LSM Design ONLY 1 COMPONENT PER LEVEL! C0 C1 C2 Ck MEMORY DISK merge sort
  24. 24. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  25. 25. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  26. 26. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  27. 27. ■ Partitions the LSM-tree components into (usually fixed-size) fragments ■ Subset of a level can be merged into the next one (partial merge) ■ Bounds: ■ compaction operation time ■ temporary disk space during compaction lifetime Partitioning Optimization for Leveled
  28. 28. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  29. 29. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  30. 30. Leveled Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(T * L) O(T + 1) ------ = ~1.1 T
  31. 31. Stepped-Merge Algorithm ■ 1997 paper Incremental organization for data recording and warehousing -> a new approach to LSM tree layout ■ “Our goal is to design a technique that supports both insertion and queries with reasonable efficiency, and without the delays of periodic batch processing.” ■ Gives birth to the tiered compaction policy
  32. 32. Tiered Compaction Policy MEMORY DISK L0 L1 SST FILE SIZE
  33. 33. Tiered Compaction Policy MEMORY DISK L0 L1 SST FILE SIZE SST
  34. 34. Tiered Compaction Policy MEMORY DISK L0 L1 FILE SIZE SST
  35. 35. Tiered Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(L) O(T * L)
  36. 36. Now ScyllaDB journey begins The database inherited all the LSM-tree improvements described so far… But they weren’t enough
  37. 37. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST
  38. 38. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST SST 100% TEMP SPACE OVERHEAD
  39. 39. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T
  40. 40. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T S
  41. 41. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S T S T S
  42. 42. Tiered Policy - Partitioning Optimization ■ Bounds temporary space overhead significantly ■ Allows disk space usage from 50% to 80% and beyond. ■ Available in ScyllaDB as Incremental Compaction Strategy (ICS)
  43. 43. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED
  44. 44. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED
  45. 45. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED PURE TIERED
  46. 46. But the world is not only black and white There are shades of gray in between…
  47. 47. Hybrid LSM-tree data layout ■ Largest level is space optimized ■ Other levels are write optimized ■ Addresses O(K) space amplification in tiered in overwrite workloads ■ Where K = number of components per level
  48. 48. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL
  49. 49. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL SST
  50. 50. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  51. 51. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  52. 52. Hybrid LSM-tree data layout ■ Reduces space amplification in overwrite-intensive workloads ■ = less space amplification ■ = increased storage density per node ■ = more money in your pocket. ■ Available as space amplification goal (SAG) option of Incremental Compaction Strategy.
  53. 53. LSM-tree & tombstones MEMORY DISK L0 L1 FILE SIZE KEY A
  54. 54. LSM-tree & tombstones MEMORY DISK L0 L1 FILE SIZE KEY A KEY A TOMBSTONE
  55. 55. LSM-tree & tombstones MEMORY DISK L0 L1 FILE SIZE KEY A KEY A
  56. 56. Suboptimal LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  57. 57. Efficient LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  58. 58. Efficient LSM-tree tombstone handling ■ Piggyback on incremental compaction, to bound temporary disk space. ■ Triggers (avoids write amplification issues): ■ File staleness ■ Tombstone density threshold ■ Available in Incremental Compaction Strategy (ICS) by default.
  59. 59. Thank You Stay in Touch Raphael Carvalho raphaelsc@scylladb.com @raphael_scarv raphaelsc

×