In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, discusses the top data modeling mistakes. He covers:
- Working with large collections
- Issues with low cardinality indexes/views
- Hot partitions
- Large partitions
- Dealing with Tombstones
5. Data Modeling Guidelines
NoSQL data modeling is very simple – but SURPRISINGLY EASY to
get wrong. Typical hiccups and roadblocks include:
■ Hotspots
■ High Imbalances
■ Large partitions, rows, cells, collections…
■ Tombstones
5
6. Data Modeling Guidelines
General guidelines typically involve:
■ Follow a query-driven design approach
■ Proper PRIMARY KEY selection
■ Seemingly* even data distribution
■ Avoiding bad access patterns
■ And…
6
12. Why are imbalances bad?
Imbalances introduce:
■ Over / under-utilized resources
■ Slower replicas
■ Higher latencies
■ Inability to read data any longer
■ Worse problems under the extreme… 🥲
12
15. Large Partitions
How large is… Acceptable?
15
■ How latency sensitive are you? 😅
■ What's the average payload size?
■ Are large partitions a natural aspect?
■ How do you read from these partitions?
16. Large Partitions
Payload Size Row Count Partition Size
1 KB 10K ~10 MB
2 KB 10K ~20 MB
4 KB 10K ~40 MB
16
How does payload affects the partition size? Page cut off at every 1 MB =
More client-server round-trips!
19. Large Collections
Collections are meant for storing/denormalizing a relatively small
amount of data
19
■ Can be frozen or non-frozen
○ Frozen collections can only be updated as a whole
○ Non-frozen can be appended to (danger!)
○ (Re) Initializing a collection requires a tombstone 💀
■ Can be nested 🥲
■ Collections are stored as a single cell!
■ EXTREMELY EASY to nuke performance
20. Large Collections
Unlike partitions, collections are much strict in size AND nº of items
20
CREATE TABLE IF NOT EXISTS {table} (
sensor_id uuid PRIMARY KEY,
events map<timestamp, FROZEN<map<text, int>>>,
)
■ Step 1: Create a nested map
■ Step 2: (Try to) append 1 million entries
■ Step 3: Enjoy! 💣 (and see scylladb/13686)
22. Large Collections
p99 > 1s? Why? ScyllaDB Engineer Michał Chojnowski explains:
22
■ Collection cells are stored in memory as sorted vectors
■ Adding elements require a merge of two collections (old & new)
■ Adding an element has a cost proportional to the size of the
entire collection
■ Trees (instead of vectors) would improve the performance,
BUT…
■ Trees would make small collections less efficient!
23. Large Collections
Solution to previous data modeling:
23
CREATE TABLE IF NOT EXISTS {table} (
sensor_id uuid,
record_time timestamp,
events FROZEN<map<text, int>>,
PRIMARY KEY(sensor_id, record_time)
)
■ Move the timestamp to a clustering key
27. Hot Partitions
Identifying hot partitions:
27
■ Find the affected nodes / shards on monitoring
■ Use nodetool toppartitions to sample and print hit-rate
■ When unsure (or too many tables to sample): Trace
■ Isolate and remediate the problem
28. Low Cardinality Indexes & Views
Everyone eventually tries this 😢
28
■ Indexes or Views are essentially like regular tables
■ Same data modeling practices apply
■ DON'TS:
○ Filter all inactive users
○ Retrieve tenants by state or country
○ Implement a type-ahead use case with views or indexes
○ Try to implement an ad-hoc querying use case on top of MVs
■ But if you really… really… really… REALLY need this, then…
29. Low Cardinality Indexes & Views
Still avoid the index/view … 🥲 Do efficient full table scans!
29
■ Partitions are hashed to a token
■ Tokens are placed in the token ring
■ Break-down and scan the token ring!
SELECT * FROM {table} WHERE
TOKEN(pk) > ? AND
TOKEN(pk) <= ? AND
is_active = TRUE
ALLOW FILTERING
30. Low Cardinality Indexes & Views
30
Data Nature
Average Nº of
possible items
Large partition nº
Boolean 2 100%
Country / State 195 ~10
Status 10 2~3
Winning the lottery: Hotspots, large partitions, imbalances all in one!
31. Tombstones ☠️
31
Users are often surprised that deletes are actually WRITES!
■ Deleting a record is actually "writing a tombstone"
■ Tombstones (deletes) can be classified as:
○ Cell-level tombstone (DELETE val FROM table WHERE key=?)
○ Range tombstone (DELETE FROM table WHERE key=? AND ts <= ?)
○ Row tombstone (DELETE FROM table WHERE key=? AND ts=?)
○ Partition tombstone (DELETE FROM table WHERE key=?)
■ Large tombstone runs slow down the read path
■ DELETE-heavy use cases require attention
36. Tombstones ☠️
36
Why tombstones slow down the read path? Re-revisit ScyllaDB write
path:
Reads need to scan through all SSTables!
Effectively increasing read latencies…
40. Large Partitions / Cells / Collections
Easily found via system.large_* tables:
■ SELECT * FROM system.large_partitions;
■ SELECT * FROM system.large_rows;
■ SELECT * FROM system.large_cells;
■ Large Partition Hunting guide
40
41. Hot Partitions
Multiple ways to address:
■ When in doubt: nodetool toppartitions <keyspace> <table> <sample in ms>
■ Per Partition Rate Limit:
ALTER TABLE t WITH per_partition_rate_limit = {
'max_reads_per_second': 100,
'max_writes_per_second': 200
};
■ Client-side settings review
41
42. Hot Shards
■ Use the Monitoring:
○ Affected shards are on the coordinator side?
○ Affected shards are on the replica side?
■ Request shedding: --max-concurrent-requests-per-shard <n>
■ Tracing:
○ User-defined
○ Probabilistic
○ Slow Query Logging
○ Lightweight Slow Query Logging Mode
○ Woot!
42
43. Tombstones ☠️ Eviction
■ Be sure to select the right Compaction Strategy
43
■ Review DELETE patterns
■ Repair-based Tombstone Garbage Collection
■ New in 5.2: Empty Replica Pages
44. Keep in touch!
Felipe Mendes
Solution Architect
ScyllaDB
felipemendes@scylladb.com
LinkedIn