Gwen Shapira, Confluent, Engineering Leader
It is easy to find information on how Kafka's exactly once semantics work. It isn't as easy to understand what it all means for you - what is and what is not guaranteed? Which kinds of use-cases are a good fit, and which are unlikely to work as expected? In this talk, we will separate hype from reality and explore what Kafka's Exactly-Once semantics means to developers using Kafka.
https://www.meetup.com/KafkaBayArea/events/276013048/
2. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
Exactly-once
Semantics is two
features:
Idempotent Producer
Transactions:
● Atomic multi-partition write
● Read committed
3. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Setting the stage
3
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
4. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Duplicate messages
caused by producer retry
Writing Exactly Once to Kafka - What can go wrong?
4
Re-processing due to
application crash
Re-processing due to
zombie application
instance
Produce Output
A
Produce
A
Output
Produce
A A
Output
A
Consumer 1
A B C D
Input
offset
Consumer 2
A B C D
Input
Consumer 1
X
6. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates caused by retries
6
Produce
Output Topic
Produce
A’
Output Topic
Produce
A’ A’
Output Topic
A’
No Response
A’
7. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it?
7
Produce (ID = 1)
Output Topic
Produce (ID = 1)
A’
Output Topic
Produce (ID = 1)
A’
Output Topic
A’
p.id = 1
seq = 2
No Response
P.id=1, seq=2
P.id=1, seq=2
A’
p.id = 1
seq = 2
Warning:
Duplicate
8. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it?
enable.idempotence=true
8
9. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 9
What doesn’t it solve? • Duplicates caused by calling
producer.send()twice with same
record
• Duplicates from external sources
• Duplicates caused by two different
producers
• Duplicates between application
restarts
10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to avoid it?
• If you don’t care about reliability.
• If you use very short lived producers (new
producer per record or close).
10
11. 99.95% of the time, idempotent
producer is safe and recommended
13. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
13
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
3
2
14. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
14
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
15. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
15
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
X
16. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
16
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
X
Consume
17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
17
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’ A’
Input Topic
Output Topic
offset Topic
1
2
X
Consume
18. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
18
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
2
2
We want two writes
“at the same time”
Or at least:
Either we wrote both output and offset
or pretend neither happened
19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
19
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
2
2
20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
20
Produce
A’
Output Topic
1
offset Topic
I’m going to write
to output and offsets
topics
Transaction Log
1
2
2
21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
21
Produce
A’ C
Output Topic
1 C
offset Topic
I’m going to write
to output and offsets
topics
Transaction Log
3
4
4
Writing was successful. I’m
about to commit!
Committed and we are
done. TTYL
5
22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’
Output Topic
<NOTHING>
Console
23. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C
Output Topic
A’
Console
24. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D E
Output Topic
A’
Console
25. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D E A
Output Topic
A’
Console
26. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D’ E’ A V
Output Topic
A’, V
Console
27. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
27
Stream Processing Application
Consume
Process
Produce
A B C D
Input Topic
Output Topic
offset Topic
1
2
A B C
28. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
28
Stream Processing Application
Consume
Process
Produce
A B C D
Input Topic
Output Topic
offset Topic
1
2
A B C
29. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
29
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
A B C
30. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
30
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
A B C
31. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
31
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’ A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
32. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
32
Application 1
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Consume
Process
Produce
A B C
How do we know
that we have a
zombie?
We give unique
transactional.id
to every application
instance.
33. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
33
Application 1
Epoch 0
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Epoch 1
Consume
Process
Produce
A B C
How do we know
who is the zombie?
Apps register
transactional.id when
they start and get
an epoch.
Newest epoch wins
34. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
34
Application 1
Epoch 0
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Epoch 1
Consume
Process
Produce
A’, epoch 0
Dude, you are
dead.
35. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (The sane way)
Use Kafka Streams with
processing.guarantee = exactly_once
(If you have broker 2.5+: exactly_once_beta)
35
37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 37
Picking a good
transactional.id
is non-trivial
What is “same instance of app”?
In Kafka streams it is
“consuming from same partition”
Task = consumer + processor + producer
exactly_once uses task_id as
transactional.id
But…
- Producer per task is heavy
- Need to initialize new producer on every
rebalance
38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 38
exactly_once_beta • Does not use transactional.id for
fencing
• Use consumer group information
instead:
Group ID
Consumer group generation
(epoch)
• Fencing happens during the offset
commit, which includes the consumer
group information
• Made possible by KIP-477
39. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (Hard and likely wrong way)
39
40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (Hard and likely wrong way)
40
41. What does it solve?
The main use-case is accurate aggregation in
streams processing applications.
Easy to use in any Kafka Streams application.
42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What doesn’t it solve?
● Side-effects during processing
● Reading from Kafka and writing to a database
● Reading from a database and writing to Kafka
● Replicating from one Kafka to another (unless replicating all topics)
● Publish-subscribe pattern (or rather - this depends a lot on the consumer)
42
43. When to avoid it?
If it doesn’t fit into a Kafka Streams app, it is
probably not a good idea.
Don’t keep creating new transactional.id
44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Performance Notes
● Overhead on producer - fixed per transaction:
○ Register transactional.id once in its lifetime
○ Register partition in transaction once per partition per transaction
○ Extra commit marker per partition
○ Synchronous commit
● Consumer:
○ Reads extra commit markers
○ read_committed will wait for transaction commits.
Large transactions will increase end to end latency
Larger transactions == higher throughput (due to lower overhead), but higher e2e latency
44
47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 47
Two things to
remember
Use idempotent producer,
but not with FaaS.
Use Kafka Streams with
processing.guarantee = exactly_once
Or exactly_once_beta
if you have 2.5+ brokers
48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 48
Small Plug
The talk is based on a new chapter.
Early release is available via O’Reilly Safari.
Thanks to Ron Dagostino, Justine Olshan,
Lucas Bradstreet, Mike Bin, Bob Barrett,
Boyang Chen, Guozhang Wang and
Jason Gustafson for all the help.
49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Good resources
https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/Trans
actionalMessageCopier.java
https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/
https://www.confluent.io/blog/transactions-apache-kafka/
https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/
https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Sem
antics
49