Deep Dive Into Kafka Tiered Storage With Satish Duggana | Current 2022
KIP-405 introduced tiered storage in Apache Kafka. The proposed design introduces the separation of compute and storage which benefits the brokers to largely focus on serving producer or consume requests and not manage the storage beyond local disks. But the important caveat here is that it should still maintain the same consistency semantics and lineage of data as in the local storage.
This talk dives into the internals of tiered storage in how we achieve those semantics covering scenarios like new brokers bootstrapped, or brokers having hard failures, or other out-of-sync brokers becoming leaders etc.
We will also talk about how topic deletion lifecycle management is done without leaking any segments in tiered storage based on the retention policies or while deleting a topic or a partition.
2. Introduction
Goals
● Scalability
● Efficiency
○ Operational
○ Cost
● Elasticity
Non Goals
● It does not support compact topics
● It does not support JBOD feature
● Tiered storage does not replace ETL pipelines
3. Features
● Provides tiering in storage layer beyond local drives
○ Memory/PageCache
○ Local drive
○ Remote storage support (including cloud stores like S3/GCS/Azure)
■ consistency and ordering semantics as local storage
● Improves efficiency
○ operational
○ cost
● Isolation of reading latest and old data
● Easy tuning and provisioning of clusters
● No changes required from clients
4. Local and Remote Log Segments
6 7 8 9 Active
1 2 3 4 6 7
Local Log Segments
Remote Log Segments
5
6. Follower Fetch
● There are two main states
○ Fetch
■ Fetch the messages from the leader and append it to its log segment
○ Truncating
■ Truncate the existing data to make sure its log segments follow the
same log lineage of the leader
7. Follower Fetch
1. Leader copies log segments with the auxiliary
state(includes leader epoch cache and producer-id
snapshots) to remote storage.
8. Follower Fetch
1. Leader copies log segments with the auxiliary
state(includes leader epoch cache and producer-id
snapshots) to remote storage.
2. Leader publishes remote log segment metadata about the
copied remote log segment.
9. Follower Fetch
1. Leader copies log segments with the auxiliary
state(includes leader epoch cache and producer-id
snapshots) to remote storage.
2. Leader publishes remote log segment metadata about the
copied remote log segment.
3. Follower tries to fetch the messages from the leader.
10. Follower Fetch
1. Leader copies log segments with the auxiliary
state(includes leader epoch cache and producer-id
snapshots) to remote storage.
2. Leader publishes remote log segment metadata about the
copied remote log segment.
3. Follower tries to fetch the messages from the leader.
4. Follower waits till it catches up consuming the required
remote log segment metadata.
11. Follower Fetch
1. Leader copies log segments with the auxiliary
state(includes leader epoch cache and producer-id
snapshots) to remote storage.
2. Leader publishes remote log segment metadata about the
copied remote log segment.
3. Follower tries to fetch the messages from the leader.
4. Follower waits till it catches up consuming the required
remote log segment metadata.
5. Follower fetches the respective remote log segment
metadata to build auxiliary state.
12. Follower Fetch
● Which offset should follower fetch from
○ Last tiered offset
○ Earliest local offset
2001 9000 Active
0 1000
Local Log Segments
Remote Log Segments
…….. 2001 9000
…….. 12000
……..
Earliest Local Offset
Last Tiered Offset
13. Follower Fetch
● Maintain the same log lineage
● Leader epochs
○ It is a representation of leader transitions, which is a monotonically increasing
number
○ Added with each message batch by the leader into the log segment
○ Maintains the leader epoch sequence file with epoch vs start-offset
■ Maintained by each replica
■ Enables maintaining the log lineage across the replicas
20. Follower Fetch - Empty follower
1. Fetch from 0
a. Receives OMTS(OffsetMovedToTieredStorage)
2. Fetch ELO (Earliest Local Offset)
a. Receives ELO (leader epoch, offset)
3. Fetch remote segment info and build local leader
epoch sequence until ELO
a. Receives leader epoch sequence, producer-id snapshot
4. Fetch from ELO to HW
Leader - A
Follower - B
33. Follower Fetch - Empty follower - Summary
1. Fetch offset 0
a. Receives OMTS
2. Fetch EarliestLocalOffset (ELO)
a. Receives ELO (leader epoch, offset)
3. Fetch remote segment info and build local leader
epoch sequence until ELO
a. Receives leader epoch sequence
4. Fetch from ELO to HW
Leader - A
Follower - B
35. Follower Fetch - Out of sync follower
● Follower catching up with the leader
● Segments are copied to remote storage
○ Locally available
○ Locally not available
36. Follower Fetch - Out of sync follower
● Follower trying to catch up with the leader
● Segments are copied to remote storage
○ Locally available
■ Fetch from the leader like it does without tiered storage
37. Follower Fetch - Out of sync follower
● Follower trying to catch up with the leader
● Segments are copied to remote storage
○ Locally available
■ Fetch from the leader like it does without remote storage
○ Locally not available
■ Truncate the data on follower