SlideShare a Scribd company logo
1 of 74
Exactly-once Stream Processing
with Kafka Streams
Guozhang Wang
Kafka Summit SF, Aug. 28, 2017
Outline
• Stream processing with Kafka
• Exactly-once for stream processing
• How Kafka Streams enabled exactly-once
2
3
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
Your App
4
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
ack
ack
commit
Your App
5
Stream Processing: Do it Yourself
while (isRunning) {
// read some messages from Kafka
inputMessages = consumer.poll();
// do some processing…
// send output messages back to Kafka, wait for ack
producer.send(outputMessages).get();
// commit offsets for processed messages
consumer.commit(..);
}
6
• Ordering
• Partitioning &
Scalability
• Fault tolerance
• State Management
• Time, Window &
Out-of-order Data
• Re-processing
DIY Stream Processing is Hard
7
API,
coding
“Full stack”
evaluation
Operations, debugging,
…
8
Exactly-Once
An application property for stream processing,
.. that for each received record,
.. its process results will be reflected exactly once,
.. even under failures
9
Error Scenario #1: Duplicate
Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
Streams App
10
Error Scenario #1: Duplicate
Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
producer config: retries = N (default = 0)
Streams App
11
Error Scenario #2: Re-process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
State
Process
Streams App
12
Error Scenario #2: Re-process
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
13
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Life before 0.11: At-least-once +
Dedup
14
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
Life before 0.11: At-least-once +
Dedup
15
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
Life before 0.11: At-least-once +
Dedup
16
2
2
3
3
4
4
Life before 0.11: At-least-once +
Dedup
17
Exactly-once, the Kafka Way!(0.11+)
18
• Building blocks to achieve exactly-once
• Idempotence: de-duped sends in order per partition
• Transactions: atomic multiple-sends across topic partitions
• Kafka Streams: enable exactly-once in a single
knob
Exactly-once, the Kafka Way!(0.11+)
Kafka Streams (0.10+)
• New client library beyond producer and
consumer
• Powerful yet easy-to-use
• Event time, stateful processing
• Out-of-order handling
• Highly scalable, distributed, fault tolerant
19
20
Anywhere, anytime
Ok. Ok. Ok. Ok.
21
Anywhere, anytime
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.11.0.0</version>
</dependency>
22
Simple is Beautiful
Kafka Streams DSL
23
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
24
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
25
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
26
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Processor Topology
27
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);
Processor
28
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);
Stream
29
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);
State Store
30
State
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.groupBy(…).count(”store”);
aggregated.to(”topic3”);
Kafka Streams DSL
31
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.groupBy(..).count(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a streams client and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Processor
Topology
32Kafka Streams Kafka
State
33
Kafka Topic B Kafka Topic A
P1
P2
P1
P2
Processing in Kafka
Streams
34
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
Processing in Kafka
Streams
35
Kafka Topic AKafka Topic B
Processing in Kafka
Streams
MyApp.2MyApp.1
Kafka Topic B
Task2Task1
36
Kafka Topic A
State State
Processing in Kafka
Streams
MyApp.2MyApp.1
Kafka Topic B
Task2Task1
37
Kafka Topic A
State State
Processing in Kafka
Streams
Kafka Changelog Topic
38
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
Exactly-once with Kafka
• Acked produce to sink topics
• Offset commit for source
topics
• State update on processor
39
• Acked produce to sink topics
• Offset commit for source
topics
• State update on processor
Exactly-Once with Kafka
40
• Acked produce to sink topics
• Offset commit for source
topics
• State update on processor
All or Nothing
Exactly-Once with Kafka
41
Exactly-Once with Kafka Streams
(0.11+)
• Acked produce to sink topics
• Offset commit for source
topics
• State update on processor
42
Exactly-Once with Kafka Streams
(0.11+)
• Acked produce to sink topics
• A batch of records sent to the offset topic
• State update on processor
43
• Acked produce to sink topics
• A batch of records sent to the offset topic
Exactly-Once with Kafka Streams
(0.11+)
• A batch of records sent to changelog
topics
44
Exactly-Once with Kafka Streams
(0.11+)
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog
topics
45
Exactly-Once with Kafka Streams
(0.11+)
All or Nothing
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog
topics
46Kafka Streams
Kafka Input Topics
Kafka Changelog Topic
Kafka Output Topic
try {
producer.beginTxn();
} catch (KafkaException e) {
}
Kafka Streams Exactly-Once
State
47Kafka Streams
Kafka Input Topics
Kafka Changelog Topic
Kafka Output Topic
Kafka Streams Exactly-Once
State
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
} catch (KafkaException e) {
}
Kafka Streams Exactly-
Once
48Kafka Streams
Kafka Input Topics
Kafka Changelog Topic
Kafka Output Topic
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
} catch (KafkaException e) {
}
State
Kafka Streams Exactly-
Once
49Kafka Streams
Kafka Input Topics
State
Kafka Changelog Topic
Kafka Output Topic
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
} catch (KafkaException e) {
}
Kafka Streams Exactly-
Once
50Kafka Streams
Kafka Input Topics
State
Kafka Changelog Topic
Kafka Output Topic
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
} catch (KafkaException e) {
}
Kafka Streams Exactly-
Once
51Kafka Streams
Kafka Input Topics
State
Kafka Changelog Topic
Kafka Output Topic
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
52
StateProces
s
StateProces
s
StateProces
s
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka
53
StateProces
s
StateProces
s
StateProces
s
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka
54
StateProces
s
StateProces
s
Proces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
State
55
StateProces
s
StateProces
s
StateProces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
56
StateProces
s
StateProces
s
StateProces
s
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
57
Exactly-Once with Failures
config: processing.mode = exactly-once
(default = at-least-once)
[KIP-98, KIP-
129]
58
API,
coding
“Full stack”
evaluation
Operations, debugging,
…
Simple is Beautiful
59
Life is Good with Exactly-Once, but..
60
What if not all my data is in Kafka?
61
62
Connectors
• 60+ since first
release (0.9+)
• 20+ from
& partners
(exactly-once coming)
63
Connect
End-to-End Exactly-Once
Connect
Connect
Connect
Connect
Connect
Connect
Streams
Streams
Streams
Streams
Streams
Exactly-Once Zone
Wild Wild West
Connect
Take-aways
• Exactly-once: important property for stream
processing
64
Take-aways
• Exactly-once: important property for stream
processing
• Kafka Streams: exactly-once processing made
easy
65
Take-aways
66
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
Additional Resources: http://www.confluent.io/resources
• Exactly-once: important property for stream
processing
• Kafka Streams: exactly-once processing made
easy
67
68
69
BACKUP SLIDES
Exactly-Once Performance
• 3 Brokers, 3 Streams on AWS m3.xlarge
• max.inflight.request = 1, commit.interval = 100ms
70
record size (bytes) at-least-once exactly-once
100 100% 68%
500 100% 75%
1024 100% 82%
config: processing.mode = exactly-once (default = at-least-once)
71
Exactly-Once does NOT mean..
• Two Generals problem can now be solved
• .. or FLP result is proved wrong
• .. or TCP at transport level is “perfect”
• .. or you can get distributed consensus in any
settings
72
Producer
Kafka Topic Cack
Idempotent Producer
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28
73
Producer
Kafka Topic Cack (dup)
Idempotent Producer
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28
config: enable.idempotence = true
74
We are Hiring!

More Related Content

What's hot

A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...confluent
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQAraf Karsh Hamid
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...HostedbyConfluent
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 

What's hot (20)

A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 

Similar to Exactly-once Stream Processing with Kafka Streams

Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringJoe Kutner
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsHostedbyConfluent
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitMarkus Günther
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...HostedbyConfluent
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipesinside-BigData.com
 

Similar to Exactly-once Stream Processing with Kafka Streams (20)

Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnit
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 

More from Guozhang Wang

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfGuozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaGuozhang Wang
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsGuozhang Wang
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaGuozhang Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduceGuozhang Wang
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsGuozhang Wang
 

More from Guozhang Wang (11)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 

Recently uploaded

Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 

Recently uploaded (20)

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 

Exactly-once Stream Processing with Kafka Streams

Editor's Notes

  1. Thank you, and hello everyone, my name’s Guozhang. I’m very excited to be here and talk about …
  2. Here is a quick spoiler alert of my talk. I’ll start by giving some context about stream processing, and does it look like with Kafka. And then I will explain what actually does exactly-once mean for stream processing. And finally, I will talk about what we have added in the latest release of 0.11, and how these building blocks are leveraged by Kafka’s stream processing API to help developers achieve exactly-once with Kafka.
  3. So how does stream processing with Kafka look like? Here’s a concrete example: suppose you are building a real-time ads billing application with two input streams of data: one stream is a Kafka topic representing the events when an ad is displayed to some users, and the other streams is another Kafka topic representing the events when a user clicks on certain displayed ads. These two streams of data are unbounded, since the front-end web servers will keep appending more and more events to these two Kafka topics. The billing application’s goal, is to calculate the cost of each clicked ads for the corresponding client, plus alert on potential fraud user, who may click on lots of ads in a very short period of time. The output of this application, is also two data streams.
  4. May then be consumed by an offline data analytics system generate the dashboards and reports and so on. So it forms the big picture of a pipeline. And then it moves on to the next available event, and this loop repeat again. In practice, though, developers would not commit on each single record for performance, instead they will only do that for a bunch of records.
  5. So how to write this fetch-process-produce loop in code?
  6. It seems OK at the first look, but until you deploy your code to production Data may come out of ordering, computation and state may need to be partitioned for distributed partitioning And more importantly, your app could fail at any time: bad config, human error deploying, a bug in your code. You then need to worry about all these lower-level details about consistency and high-availability.
  7. And all these issues would soon add up and cost you much more time than coding a first draft of your streaming applications. Today, I would like to focus on a single slice of this iceberg, which is, how would you achieve correctness of your application along with fault tolerance.
  8. It is not a network transport level guarantee, but really an app-level property, that given a stream processing application..
  9. You lose that committed offset forever, hence even this record has been completed processing, Kafka would not know any more.
  10. Bottom line: because of Kafka’s at-least-once semantics, we can have duplicated writes and processing.
  11. This way looks nice at the first glance, but once you start doing it, you will realize the fact that since each application’s output could be other applications input, and since each application can potentially generate duplicated writes to its output topics.
  12. You would ended up doing reduplication at each stage of your streaming computation pipeline. And it soon becomes a maintenance headache. So, is there a better solution than this?
  13. Strengthen Kafka itself and provide the building blocks to help developers to write their streaming applications that achieves exactly once.
  14. Supports event-at-a-time stateful processing, handles out-of-order data arriving
  15. Instead of external store, Kafka Streams would add a local state store associated with the stateful processors to maintain the running state.
  16. In terms of distributed processing, Kafka Streams API have a tight integration with the Kafka’s topic partitions.
  17. For a typical streaming application of Kafka
  18. Idempotence is also enabled so that within a single life time of the producer, its resent duplicated data could be detected and rejected by brokers.
  19. In practice, your streaming pipeline may not be completely within the closed world of Kafka.
  20. Send a request to some REST proxy during the processing, or simply you need to pipe your end result into another data system
  21. HDFS, S3, Elastic Search and JDBC Sink
  22. distributed systems are all about trade-offs
  23. Effective availability: the empirically measured percentage of successful requests over some period, often measured in “9s”. Algorithmic availability: a liveness property of an algorithm where every request to a non-failing node must eventually return a valid response. The CAP theorem is only concerned with algorithmic availability. An algorithmic availability of 100% does not guarantee an effective availability of 100%. The algorithmic availability from the CAP theorem only applies if both the implementation and the execution of the algorithm is without error. In practice, most outages to an AP system are not due to network issues, which the algorithm can handle, but rather to implementation defects, user errors, misconfiguration, resource limits, and misbehaving clients.