Exactly-once Stream Processing with Kafka Streams

Exactly-once Stream Processing
with Kafka Streams
Guozhang Wang
Kafka Summit SF, Aug. 28, 2017

Outline
• Stream processing with Kafka
• Exactly-once for stream processing
• How Kafka Streams enabled exactly-once
2

3
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
Your App

4
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
ack
ack
commit
Your App

5
Stream Processing: Do it Yourself
while (isRunning) {
// read some messages from Kafka
inputMessages = consumer.poll();
// do some processing…
// send output messages back to Kafka, wait for ack
producer.send(outputMessages).get();
// commit offsets for processed messages
consumer.commit(..);
}

6
• Ordering
• Partitioning &
Scalability
• Fault tolerance
• State Management
• Time, Window &
Out-of-order Data
• Re-processing
DIY Stream Processing is Hard

7
API,
coding
“Full stack”
evaluation
Operations, debugging,
…

8
Exactly-Once
An application property for stream processing,
.. that for each received record,
.. its process results will be reflected exactly once,
.. even under failures

9
Error Scenario #1: Duplicate
Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
Streams App

10
Error Scenario #1: Duplicate
Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
producer config: retries = N (default = 0)
Streams App

11
Error Scenario #2: Re-process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
State
Process
Streams App

12
Error Scenario #2: Re-process
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App

13
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Life before 0.11: At-least-once +
Dedup

14
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
Dedup

15
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
Dedup

16
2
2
3
3
4
4
Dedup

17
Exactly-once, the Kafka Way!(0.11+)

18
• Building blocks to achieve exactly-once
• Idempotence: de-duped sends in order per partition
• Transactions: atomic multiple-sends across topic partitions
• Kafka Streams: enable exactly-once in a single
knob
Exactly-once, the Kafka Way!(0.11+)

Kafka Streams (0.10+)
• New client library beyond producer and
consumer
• Powerful yet easy-to-use
• Event time, stateful processing
• Out-of-order handling
• Highly scalable, distributed, fault tolerant
19

20
Anywhere, anytime
Ok. Ok. Ok. Ok.

21
Anywhere, anytime
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.11.0.0</version>
</dependency>

Kafka Streams DSL
23
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}

Kafka Streams DSL
24
streams.start();
}

Kafka Streams DSL
25
streams.start();
}

Kafka Streams DSL
26
streams.start();
}

Processor Topology
27
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);

Processor
28
State

Stream
29
State

State Store
30
State

Kafka Streams DSL
31
streams.start();
}

Processor
Topology
32Kafka Streams Kafka
State

33
Kafka Topic B Kafka Topic A
P1
P2
P1
P2
Processing in Kafka
Streams

34
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
Processing in Kafka
Streams

35
Kafka Topic AKafka Topic B
Processing in Kafka
Streams

MyApp.2MyApp.1
Kafka Topic B
Task2Task1
36
Kafka Topic A
State State
Processing in Kafka
Streams

MyApp.2MyApp.1
Kafka Topic B
Task2Task1
37
Kafka Topic A
State State
Processing in Kafka
Streams
Kafka Changelog Topic

38
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
Exactly-once with Kafka
• Acked produce to sink topics
• Offset commit for source
topics
• State update on processor

39
topics
Exactly-Once with Kafka

40
topics
All or Nothing
Exactly-Once with Kafka

41
Exactly-Once with Kafka Streams
(0.11+)
topics

42
(0.11+)
• A batch of records sent to the offset topic

43
(0.11+)
• A batch of records sent to changelog
topics

44
(0.11+)
• A batch of records sent to sink topics
topics

45
(0.11+)
All or Nothing
• A batch of records sent to sink topics
topics

46Kafka Streams
Kafka Input Topics
Kafka Output Topic
try {
producer.beginTxn();
} catch (KafkaException e) {
}
Kafka Streams Exactly-Once
State

47Kafka Streams
Kafka Input Topics
Kafka Output Topic
Kafka Streams Exactly-Once
State
try {
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
}

Kafka Streams Exactly-
Once
48Kafka Streams
Kafka Input Topics
Kafka Output Topic
try {
// process ..
}
}
State

Once
49Kafka Streams
Kafka Input Topics
State
Kafka Output Topic
try {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}

Once
50Kafka Streams
Kafka Input Topics
State
Kafka Output Topic
try {
// process ..
producer.sendOffsets(“input”, ..);
}
}

Once
51Kafka Streams
Kafka Input Topics
State
Kafka Output Topic
try {
// process ..
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
producer.abortTxn();
}

52
StateProces
s
StateProces
s
StateProces
s
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka

53
StateProces
s
StateProces
s
StateProces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka

54
StateProces
s
StateProces
s
Proces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka
State

55
StateProces
s
StateProces
s
StateProces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka

56
StateProces
s
StateProces
s
StateProces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka

57
config: processing.mode = exactly-once
(default = at-least-once)
[KIP-98, KIP-
129]

58
API,
coding
“Full stack”
evaluation
Operations, debugging,
…
Simple is Beautiful

59
Life is Good with Exactly-Once, but..

60
What if not all my data is in Kafka?

62
Connectors
• 60+ since first
release (0.9+)
• 20+ from
& partners
(exactly-once coming)

63
Connect
End-to-End Exactly-Once
Connect
Connect
Connect
Connect
Connect
Connect
Streams
Streams
Streams
Streams
Streams
Exactly-Once Zone
Wild Wild West
Connect

Take-aways
• Exactly-once: important property for stream
processing
64

Take-aways
processing
• Kafka Streams: exactly-once processing made
easy
65

Take-aways
66
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
Additional Resources: http://www.confluent.io/resources
processing
• Kafka Streams: exactly-once processing made
easy

Exactly-Once Performance
• 3 Brokers, 3 Streams on AWS m3.xlarge
• max.inflight.request = 1, commit.interval = 100ms
70
record size (bytes) at-least-once exactly-once
100 100% 68%
500 100% 75%
1024 100% 82%
config: processing.mode = exactly-once (default = at-least-once)

71
Exactly-Once does NOT mean..
• Two Generals problem can now be solved
• .. or FLP result is proved wrong
• .. or TCP at transport level is “perfect”
• .. or you can get distributed consensus in any
settings

72
Producer
Kafka Topic Cack
Idempotent Producer
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28

73
Producer
Kafka Topic Cack (dup)
Idempotent Producer
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28
config: enable.idempotence = true

Exactly-once Stream Processing with Kafka Streams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exactly-once Stream Processing with Kafka Streams

Similar to Exactly-once Stream Processing with Kafka Streams (20)

More from Guozhang Wang

More from Guozhang Wang (11)

Recently uploaded

Recently uploaded (20)

Exactly-once Stream Processing with Kafka Streams

Editor's Notes