Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
2. 2
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka: Is it Practical?
• Next Steps
4. 4
An overview of messaging semantics
• At-most once
• At-least once
• Exactly-once
5. 5
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
6. 6
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
In other words: make stream processing easy,
simple, and reliable enough for everyone.
18. 18
Why are duplicates introduced?
Various failures must be handled correctly:
• Broker can fail
• Producer-to-Broker RPC can fail
• Producer or Consumer client can fail
19. 19
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates and headaches.
20. 20
The age old engineering question
Before we make this work, are we sure we should?
28. 28
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer: Exactly-once, in-order, delivery
per partition.
• Transactions: Atomic writes across partitions.
• Exactly-once stream processing across read-process-
write tasks.
29. 29
Part 1/3 : Idempotent Producer
Exactly-once, in-order, delivery per partition
30. 30
Idempotent Producer Semantics
A single --successful!-- producer.send will result in
exactly one copy of the message in the log in all
circumstances.
40. 40
TL;DR: idempotent producer
• Works transparently -- only one config change.
• Sequence numbers and producer ids are in the log.
• Resilient to broker failures, producer retries, etc.
41. 41
Part 2/3 : Transactions
Atomic writes across multiple partitions.
42. 42
Transactions semantics
• Atomic writes across multiple partitions.
• All messages in a transaction are made visible together,
or none are.
• Consumers must be configured to skip uncommitted
messages.
43. 43
Producer config for transactions
• transactional.id = ‘some string’
• Typically based on the partition identifier in a partitioned,
stateful, app.
• Enables transaction recovery across producer sessions.
44. 44
The transaction API
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
57. 57
What do you get with isolation levels?
• read_committed: consumers read to the point where there
are no open transactions.
• read_uncommitted: will read everything.
• Messages read in offset order.
58. 58
TL;DR: Transactions
• Atomic, multi-partition, writes.
• Use the new producer APIs for transactions.
• Consumers can filter out uncommitted or aborted
transactional messages.
59. 59
Part 3/3 : Stream Processing
Stream Processing with
Exactly Once Semantics
61. 61
End-to-end exactly-once semantics
• The read-process-write operation is atomic.
• Thus streams tasks produce valid answers even when
failures happen.
64. 64
Performance boost for Apache Kafka 0.11!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Details: https://bit.ly/kafka-eos-perf
66. 66
What about the idempotent producer and transactions?
• Transactions: 3-5% overhead for 100ms transactions, 1KB
messages.
• Longer transactions and better batching result in better
performance.
• 20% overhead relative to at-most once delivery without
ordering guarantees.
• Idempotent producer alone has negligible overhead.
67. 67
Putting it together
• We talked through an idempotent producer
• How we added transactions with atomic writes
• The impact it has on stream processing
68. 68
When is it available?
Available to use in Kafka 0.11, June 2017.
69. 69
Where we’ve come
2007
High throughput
messaging broker
2008
Highly available
replicated log 2012
Top Level
Apache Project
2016
Streams API
Connect API
2017
Exactly Once
Semantics
71. 71
What’s next for you
slackpass.io/
confluentcommunity
v
Try it
v v
Join the Community Let us know what
you think
@ConfluentDownload Confluent
Open Source