2. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 0
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 0, 0
This toy example is reading from a Kafka topic with two partitions, each containing “a”, “b”, “c”, … as messages.
The offset is set to 0 for both partitions, a counter is initialized to 0.
3. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
a
counter = 0
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 1, 0
The Kafka consumer starts reading messages from partition 0. Message “a” is in-flight, the offset for the first
consumer has been set to 1.
4. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
a
counter = 1
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 2, 1
a
b
Trigger
Checkpoint at
source
Message “a” arrives at the counter, it is set to 1. The consumers both read the next records (“b” and “a”). The
offsets are set accordingly. In parallel, the checkpoint coordinator decides to trigger a checkpoint at the source …
5. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator a
counter = 2
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 1
a
b
offsets = 2, 1
c
The source has created a snapshot of its state (“offset=2,1”), which is now stored in the checkpoint coordinator.
The sources emitted a checkpoint barrier after messages “a” and “b”.
6. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
b
offsets = 2, 1 counter = 3
c
b
The map operator has received checkpoint barriers from both sources. It checkpoints its state (counter=3) in the
coordinator. At the same time, the consumers are further reading more data from the Kafka partitions.
7. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 4
Zookeeper
offset partition 0: 0
offset partition 1: 0
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
offsets = 2, 1 counter = 3
c
b
Notify
checkpoint
complete
The checkpoint coordinator informs the Kafka consumer that the checkpoint has been completed. It commits the
checkpoints offsets into Zookeeper. Note that Flink is not relying on the Kafka offsets in ZK for restoring from failures
8. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 4
Zookeeper
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 2
a
offsets = 2, 1 counter = 3
c
b
Checkpoint in
Zookeeper
The checkpoint is now persisted in Zookeeper. External tools such as the Kafka Offset Checker can see the lag of the
consumer group.
9. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 5
Zookeeper
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 4, 2
offsets = 2, 1 counter = 3
c
b
d
The processing further advances
10. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 5
Zookeeper
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 4, 2
offsets = 2, 1 counter = 3
c
b
d
Failure
Some failure has happened (such as worker failure)
11. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 2, 1
offsets = 2, 1 counter = 3 Reset all
operators to
last completed
checkpoint
The checkpoint coordinator restores the state at all the operators participating at the checkpointing. The Kafka
sources start from offset 2 and 1, the counter’s value is 3.
12. a b c d e
a b c d e
Flink Kafka Consumer
Flink Kafka Consumer
Flink Map Operator
counter = 3
Zookeeper
offset partition 0: 2
offset partition 1: 1
Flink Checkpoint Coordinator
Pending:
Completed:
offsets = 3, 1
offsets = 2, 1 counter = 3
Continue
processing …
c
The system continues with the processing, the counter’s value is consistent across a worker failure.