Top NoSQL Data Modeling Mistakes

Top Data Modeling Mistakes
Felipe Mendes
Real-world examples of wrong data modeling

2
Felipe Mendes
● Solution Architect at ScyllaDB
● Published Author
● Linux and Open Source enthusiast

3
Agenda
● Data Modeling Guidelines
● Shooting yourself in the foot
○ Large partitions & collections 🤌
○ Hot partitions 🤌
○ Low cardinality indexes & views 🤌
○ TOMBSTONES! ☠️
● Mistakes Diagnosis & Prevention

Data Modeling Guidelines
NoSQL data modeling is very simple – but SURPRISINGLY EASY to
get wrong. Typical hiccups and roadblocks include:
■ Hotspots
■ High Imbalances
■ Large partitions, rows, cells, collections…
■ Tombstones
5

Data Modeling Guidelines
General guidelines typically involve:
■ Follow a query-driven design approach
■ Proper PRIMARY KEY selection
■ Seemingly* even data distribution
■ Avoiding bad access patterns
■ And…
6

PLEASE …
7
INSTALL THE
MONITORING !

Imbalance Types
■ Driver load balancing settings
8
Queries sent from the client side perspective
Node receives much less traffic

Imbalance Types
■ Uneven Access Patterns
9
Node receives much less traffic
How queries get balanced among replicas

Imbalance Types
■ The ideal world
10
Perfect distribution!

Imbalance Types
■ Data Distribution
11

Why are imbalances bad?
Imbalances introduce:
■ Over / under-utilized resources
■ Slower replicas
■ Higher latencies
■ Inability to read data any longer
■ Worse problems under the extreme… 🥲
12

Shooting yourself in the foot
A primer on how to break things & ruin your day

Large Partitions
How large is… LARGE?
14

Large Partitions
How large is… Acceptable?
15
■ How latency sensitive are you? 😅
■ What's the average payload size?
■ Are large partitions a natural aspect?
■ How do you read from these partitions?

Large Partitions
Payload Size Row Count Partition Size
1 KB 10K ~10 MB
2 KB 10K ~20 MB
4 KB 10K ~40 MB
16
How does payload affects the partition size? Page cut off at every 1 MB =
More client-server round-trips!

17
Large Partitions
Write path re-visit:

Large Partitions
SSTable components require memory allocation:
18

Large Collections
Collections are meant for storing/denormalizing a relatively small
amount of data
19
■ Can be frozen or non-frozen
○ Frozen collections can only be updated as a whole
○ Non-frozen can be appended to (danger!)
○ (Re) Initializing a collection requires a tombstone 💀
■ Can be nested 🥲
■ Collections are stored as a single cell!
■ EXTREMELY EASY to nuke performance

Large Collections
Unlike partitions, collections are much strict in size AND nº of items
20
CREATE TABLE IF NOT EXISTS {table} (
sensor_id uuid PRIMARY KEY,
events map<timestamp, FROZEN<map<text, int>>>,
)
■ Step 1: Create a nested map
■ Step 2: (Try to) append 1 million entries
■ Step 3: Enjoy! 💣 (and see scylladb/13686)

Large Collections
As more and more items get appended, latency climbs to heaven 🌥🥲
21

Large Collections
p99 > 1s? Why? ScyllaDB Engineer Michał Chojnowski explains:
22
■ Collection cells are stored in memory as sorted vectors
■ Adding elements require a merge of two collections (old & new)
■ Adding an element has a cost proportional to the size of the
entire collection
■ Trees (instead of vectors) would improve the performance,
BUT…
■ Trees would make small collections less efficient!

Large Collections
Solution to previous data modeling:
23
CREATE TABLE IF NOT EXISTS {table} (
sensor_id uuid,
record_time timestamp,
events FROZEN<map<text, int>>,
PRIMARY KEY(sensor_id, record_time)
)
■ Move the timestamp to a clustering key

Hot Partitions
Typical hotspot sources:
24
■ Data modeling inefficiency (low cardinality)
■ Uneven application access patterns
■ Spam / Bots
■ Retry storms

Hot Partitions
ScyllaDB shard-per-core architecture makes identifying hot shards
fairly easy:
25

Hot Partitions
Tip: The number of affected shards equals the replication factor!
26

Hot Partitions
Identifying hot partitions:
27
■ Find the affected nodes / shards on monitoring
■ Use nodetool toppartitions to sample and print hit-rate
■ When unsure (or too many tables to sample): Trace
■ Isolate and remediate the problem

Low Cardinality Indexes & Views
Everyone eventually tries this 😢
28
■ Indexes or Views are essentially like regular tables
■ Same data modeling practices apply
■ DON'TS:
○ Filter all inactive users
○ Retrieve tenants by state or country
○ Implement a type-ahead use case with views or indexes
○ Try to implement an ad-hoc querying use case on top of MVs
■ But if you really… really… really… REALLY need this, then…

Still avoid the index/view … 🥲 Do efficient full table scans!
29
■ Partitions are hashed to a token
■ Tokens are placed in the token ring
■ Break-down and scan the token ring!
SELECT * FROM {table} WHERE
TOKEN(pk) > ? AND
TOKEN(pk) <= ? AND
is_active = TRUE
ALLOW FILTERING

30
Data Nature
Average Nº of
possible items
Large partition nº
Boolean 2 100%
Country / State 195 ~10
Status 10 2~3
Winning the lottery: Hotspots, large partitions, imbalances all in one!

Tombstones ☠️
31
Users are often surprised that deletes are actually WRITES!
■ Deleting a record is actually "writing a tombstone"
■ Tombstones (deletes) can be classified as:
○ Cell-level tombstone (DELETE val FROM table WHERE key=?)
○ Range tombstone (DELETE FROM table WHERE key=? AND ts <= ?)
○ Row tombstone (DELETE FROM table WHERE key=? AND ts=?)
○ Partition tombstone (DELETE FROM table WHERE key=?)
■ Large tombstone runs slow down the read path
■ DELETE-heavy use cases require attention

Tombstones ☠️
32
Why tombstones slow down the read path? Re-revisit ScyllaDB write
path:

Tombstones ☠️
33
path:

Tombstones ☠️
34
path:

Tombstones ☠️
35
path:

Tombstones ☠️
36
path:
Reads need to scan through all SSTables!
Effectively increasing read latencies…

Tombstones ☠️
path:
compaction

Tombstones ☠️
Latency may become unacceptable:
Or data may become entirely unreadable:
6 seconds!

Mistakes Diagnosis &
Prevention
39

Large Partitions / Cells / Collections
Easily found via system.large_* tables:
■ SELECT * FROM system.large_partitions;
■ SELECT * FROM system.large_rows;
■ SELECT * FROM system.large_cells;
■ Large Partition Hunting guide
40

Hot Partitions
Multiple ways to address:
■ When in doubt: nodetool toppartitions <keyspace> <table> <sample in ms>
■ Per Partition Rate Limit:
ALTER TABLE t WITH per_partition_rate_limit = {
'max_reads_per_second': 100,
'max_writes_per_second': 200
};
■ Client-side settings review
41

Hot Shards
■ Use the Monitoring:
○ Affected shards are on the coordinator side?
○ Affected shards are on the replica side?
■ Request shedding: --max-concurrent-requests-per-shard <n>
■ Tracing:
○ User-defined
○ Probabilistic
○ Slow Query Logging
○ Lightweight Slow Query Logging Mode
○ Woot!
42

Tombstones ☠️ Eviction
■ Be sure to select the right Compaction Strategy
43
■ Review DELETE patterns
■ Repair-based Tombstone Garbage Collection
■ New in 5.2: Empty Replica Pages

Keep in touch!
Felipe Mendes
Solution Architect
ScyllaDB
felipemendes@scylladb.com
LinkedIn

Top NoSQL Data Modeling Mistakes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Top NoSQL Data Modeling Mistakes

Similar to Top NoSQL Data Modeling Mistakes (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Top NoSQL Data Modeling Mistakes

Editor's Notes