SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Principles in
Data Stream Processing
Nick Dearden | Matthias J. Sax
Outline
2
• What do we mean by Stream Processing anyway ?
• Concepts
• Events & State
• Time
• (In)Completeness
• Future directions
Events vs State (Example)
3
Events vs State – Streams vs Tables
4
𝑠𝑡𝑎𝑡𝑒 𝑛𝑜𝑤 = )
!"#
$%&
𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 d𝑡
𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 =
d𝑠𝑡𝑎𝑡𝑒(𝑡)
d𝑡
* M. Kleppman, Designing Data-Intensive Applications
Batch vs Stream
Event vs State
Over
Batch vs Stream
Unifying Batch and Stream Processing?
7
No! Approach this from a different direction.
• Batch vs Stream is not just about “processing latency”, but fundamentally about semantics!
• What they have in common is that they both transform data
Instead of focusing on “batch” vs “incremental” data transformation, stream processing is about “data in motion”
in contrast to “data at rest”
• And the goal is to UNIFY data in motion and data at rest with a unique abstraction
We think of this as integrating state and events
• Streams and Tables in a unified processing model and programming interface / query language
Events and ”the Log”
8
Immutability as core concept
Use a log to store events
• Logs store immutable data
• Logs preserve order
Think stream-table join:, an input event is enriched to produce an output event
• After the output event is produced, it’s immutable;
• Thus, if the table is updated, the enriched event is not updated
Events and ”the Log”
9
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
10
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
11
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
12
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
CFLT 10 Laptops PA
Events and ”the Log”
13
CFLT 3 Router
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops PA PA
Events and ”the Log”
14
CFLT 3 Router
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops PA PA
Events and ”the Log”
15
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
PA MV
PA
Everything is Time-based
Stream Processing and Time Semantics
17
Time is a first-class citizen
Stream processing engine needs to reason about time (not just a data dimension).
Batch processing ignores the time-dimension
User’s responsibility to reason about time – the processing engine does not understand time semantics.
Event Time
When did something happen (e.g. “click on ad”).
Events and State
Tables need a time dimension, too!
Event Time vs Processing Time
18
Event Time
Every event carries a timestamp embedded in the record. Captures when the event happened.
• Stream processing semantics are defined on event time.
Processing Time
When we process the data (last week, today, tomorrow).
• Must not impact the result.
Tables and Time Semantics
19
Classic DB systems don’t track time nor “versions”
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
Tables and Time Semantics (cont.)
20
Manually track time and versions
Or use newer SQL features like “system versioned temporal tables”.
KEY VALUE ValidFrom ValidTo
A 10 1 17
B 42 3 23
C 23 10 MAX_VALUE
A 20 17 40
B 73 23 MAX_VALUE
A <null> 40 MAX_VALUE
Versioned Tables
21
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
t=1 t=3 t=10 t=17 t=23 t=40
A sequence of table versions over time:
t=5 t=20
Events and State in Time
22
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
PA MV
PA
t=10 t=15 t=30
Incomplete Information
Deterministic Processing
24
Log Order (aka offset order)
• messages are stored in append order (per partition)
• all consumers read message in the same order
• log can also be re-read multiple times, and the order of messages is always the same
Kafka’s strict ordering guarantees
allow to build consistent and deterministic application!
Log Order and Event Time
25
However
• stream processing semantics are defined on (logical) event time order
• infinite input and out-of-order data (event time) make the concept of completeness fuzzy
Consistency != Completeness
When is the Puzzle completed?
26
Limit “scope” via (time) windows
• Define puzzle “boundaries”
• How many puzzle pieces do we have?
When is which Puzzle (window) completed?
27
Event Time
Grace Period
28
Consistency != Completeness
Only you know when you are complete (or rather, how much out-of-order-ness you are prepared to tolerate).
Configurable grace period.
Eventual Completeness ?
Future Developments
Future of Stream Processing
30
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
Future of Stream Processing
31
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing
32
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
retention = 25
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing
33
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
retention = 25
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing (cont.)
34
Easier Reasoning about Completeness
• ksqlDB and Kafka Streams are distributed systems
• Vector clocks
DM32 Ad-hoc group on SQL extensions for streaming data
• Confluent, Oracle, IBM, Microsoft, Google, Alibaba
Learn More, Join In!
35
SIGMOD paper
https://www.confluent.io/resources/white-paper/distributed-
stream-processing-in-kafka
Apache Kafka Improvement Proposals (KIPs)
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improveme
nt+Proposals
Confluent blog
https://confluent.io/blog
Streaming Audio with Tim Berglund
https://developer.confluent.io/podcast/
Thanks! We are hiring!

Mais conteúdo relacionado

Mais procurados

High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...HostedbyConfluent
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaAbrar Sheikh
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkKnoldus Inc.
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureJoey Bolduc-Gilbert
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkVasia Kalavri
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor APIconfluent
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 

Mais procurados (20)

High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into Kafka
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 

Semelhante a Principles in Data Stream Processing | Matthias J Sax, Confluent

Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)KafkaZone
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryDataWorks Summit
 
Scheduling ppt @ doms
Scheduling ppt @ doms Scheduling ppt @ doms
Scheduling ppt @ doms Babasab Patil
 
Scheduling ppt @ DOMS
Scheduling ppt @ DOMSScheduling ppt @ DOMS
Scheduling ppt @ DOMSBabasab Patil
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraParis Carbone
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust安齊 劉
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsJ On The Beach
 

Semelhante a Principles in Data Stream Processing | Matthias J Sax, Confluent (20)

Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Scheduling
SchedulingScheduling
Scheduling
 
Scheduling
SchedulingScheduling
Scheduling
 
Scheduling
SchedulingScheduling
Scheduling
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Scheduling ppt @ doms
Scheduling ppt @ doms Scheduling ppt @ doms
Scheduling ppt @ doms
 
Scheduling ppt @ DOMS
Scheduling ppt @ DOMSScheduling ppt @ DOMS
Scheduling ppt @ DOMS
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
scheduling_1.ppt
scheduling_1.pptscheduling_1.ppt
scheduling_1.ppt
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and Snapshots
 

Mais de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Mais de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Principles in Data Stream Processing | Matthias J Sax, Confluent

  • 1. Principles in Data Stream Processing Nick Dearden | Matthias J. Sax
  • 2. Outline 2 • What do we mean by Stream Processing anyway ? • Concepts • Events & State • Time • (In)Completeness • Future directions
  • 3. Events vs State (Example) 3
  • 4. Events vs State – Streams vs Tables 4 𝑠𝑡𝑎𝑡𝑒 𝑛𝑜𝑤 = ) !"# $%& 𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 d𝑡 𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 = d𝑠𝑡𝑎𝑡𝑒(𝑡) d𝑡 * M. Kleppman, Designing Data-Intensive Applications
  • 7. Unifying Batch and Stream Processing? 7 No! Approach this from a different direction. • Batch vs Stream is not just about “processing latency”, but fundamentally about semantics! • What they have in common is that they both transform data Instead of focusing on “batch” vs “incremental” data transformation, stream processing is about “data in motion” in contrast to “data at rest” • And the goal is to UNIFY data in motion and data at rest with a unique abstraction We think of this as integrating state and events • Streams and Tables in a unified processing model and programming interface / query language
  • 8. Events and ”the Log” 8 Immutability as core concept Use a log to store events • Logs store immutable data • Logs preserve order Think stream-table join:, an input event is enriched to produce an output event • After the output event is produced, it’s immutable; • Thus, if the table is updated, the enriched event is not updated
  • 9. Events and ”the Log” 9 Company City Confluent PA Oracle RWC
  • 10. Events and ”the Log” 10 CFLT 10 Laptops Company City Confluent PA Oracle RWC
  • 11. Events and ”the Log” 11 CFLT 10 Laptops Company City Confluent PA Oracle RWC
  • 12. Events and ”the Log” 12 CFLT 10 Laptops Company City Confluent PA Oracle RWC CFLT 10 Laptops PA
  • 13. Events and ”the Log” 13 CFLT 3 Router CFLT 10 Laptops Company City Confluent PA Oracle RWC CFLT 3 Router CFLT 10 Laptops PA PA
  • 14. Events and ”the Log” 14 CFLT 3 Router CFLT 10 Laptops Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops PA PA
  • 15. Events and ”the Log” 15 CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice PA MV PA
  • 17. Stream Processing and Time Semantics 17 Time is a first-class citizen Stream processing engine needs to reason about time (not just a data dimension). Batch processing ignores the time-dimension User’s responsibility to reason about time – the processing engine does not understand time semantics. Event Time When did something happen (e.g. “click on ad”). Events and State Tables need a time dimension, too!
  • 18. Event Time vs Processing Time 18 Event Time Every event carries a timestamp embedded in the record. Captures when the event happened. • Stream processing semantics are defined on event time. Processing Time When we process the data (last week, today, tomorrow). • Must not impact the result.
  • 19. Tables and Time Semantics 19 Classic DB systems don’t track time nor “versions” KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23
  • 20. Tables and Time Semantics (cont.) 20 Manually track time and versions Or use newer SQL features like “system versioned temporal tables”. KEY VALUE ValidFrom ValidTo A 10 1 17 B 42 3 23 C 23 10 MAX_VALUE A 20 17 40 B 73 23 MAX_VALUE A <null> 40 MAX_VALUE
  • 21. Versioned Tables 21 KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time t=1 t=3 t=10 t=17 t=23 t=40 A sequence of table versions over time:
  • 22. t=5 t=20 Events and State in Time 22 CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice PA MV PA t=10 t=15 t=30
  • 24. Deterministic Processing 24 Log Order (aka offset order) • messages are stored in append order (per partition) • all consumers read message in the same order • log can also be re-read multiple times, and the order of messages is always the same Kafka’s strict ordering guarantees allow to build consistent and deterministic application!
  • 25. Log Order and Event Time 25 However • stream processing semantics are defined on (logical) event time order • infinite input and out-of-order data (event time) make the concept of completeness fuzzy Consistency != Completeness
  • 26. When is the Puzzle completed? 26 Limit “scope” via (time) windows • Define puzzle “boundaries” • How many puzzle pieces do we have?
  • 27. When is which Puzzle (window) completed? 27 Event Time
  • 28. Grace Period 28 Consistency != Completeness Only you know when you are complete (or rather, how much out-of-order-ness you are prepared to tolerate). Configurable grace period. Eventual Completeness ?
  • 30. Future of Stream Processing 30 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS;
  • 31. Future of Stream Processing 31 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time t=1 t=3 t=10 t=17 t=23 t=40
  • 32. Future of Stream Processing 32 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time retention = 25 t=1 t=3 t=10 t=17 t=23 t=40
  • 33. Future of Stream Processing 33 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time retention = 25 t=1 t=3 t=10 t=17 t=23 t=40
  • 34. Future of Stream Processing (cont.) 34 Easier Reasoning about Completeness • ksqlDB and Kafka Streams are distributed systems • Vector clocks DM32 Ad-hoc group on SQL extensions for streaming data • Confluent, Oracle, IBM, Microsoft, Google, Alibaba
  • 35. Learn More, Join In! 35 SIGMOD paper https://www.confluent.io/resources/white-paper/distributed- stream-processing-in-kafka Apache Kafka Improvement Proposals (KIPs) https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improveme nt+Proposals Confluent blog https://confluent.io/blog Streaming Audio with Tim Berglund https://developer.confluent.io/podcast/
  • 36. Thanks! We are hiring!