SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
© 2019 Ververica
Konstantin Knauf, @snntrable, Solutions Architect
99 WAYS TO ENRICH
STREAMING DATA WITH
APACHE FLINK
© 2019 Ververica2
Agenda
• Introduction
• Per-Record Reference Data Lookup
• Reference Data Pre-Loading
• Reference Data Change Stream
• Summary
© 2019 Ververica
Introduction
© 2019 Ververica4
Running Example
Sensor
Reference
Data
Apache Flink
• low update frequency
• one record per key
• high frequency
• many events per key over time
look up by sensorID
Sensor
Measurements
Enriched
Measurements
© 2019 Ververica
Per-Record Reference Data Lookup
© 2019 Ververica6
Per-Record Synchronous Lookup
Sensor
Reference
Data
Implementation in Flink:
● RichFlatMapFunction
● Database Client instantiated in open()
© 2019 Ververica7
Per-Record Synchronous Lookup
• very simple
• always uses up-to-date reference
data
• high latency
• high load on database
• low CPU utilization on
Taskmanagers
• low throughput
• overall not suitable for high
frequency streams
+ -
© 2019 Ververica8
Per-Record Asynchronous Lookup
Sensor
Reference
Data
Implementation in Flink:
● AsyncDataStream#unorderedWait
● https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/op
erators/asyncio.html
© 2019 Ververica9
Per-Record Asynchronous Lookup
Code
© 2019 Ververica10
Per-Record Asynchronous Lookup
• still pretty simple
• always uses up-to-date reference
data
• higher throughput than “Per-Record
Synchronous Lookups”
• high latency
• high load on database
• overall not suitable for high
frequency streams
+ -
© 2019 Ververica11
Per-Record (A)Synchronous Lookup with In-Memory Cache
Sensor
Reference
Data
C
a
c
h
e
C
a
c
h
e
Implementation in Flink:
● RichFlatMapFunction
● Simple HashMap or cache implementation of your favorite library
© 2019 Ververica12
Per-Record (A)Synchronous Lookup with In-Memory Cache
• lower load on database
• higher throughput if cache hit rate
is high
• low latency if cache is hit
• high tail latencies
• high load on database during warm
up phase, possibly hard to predict
• events might be enriched by stale
data
• cache size limited by available
memory
+ -
© 2019 Ververica
Reference Data Pre-Loading
© 2019 Ververica14
Pre-Loading of Reference Data
Sensor
Reference
Data
Implementation in Flink:
● RichFlatMapFunction
● HashMap populated in open()
© 2019 Ververica15
Pre-Loading of Reference Data
• usually very simple (depending on
the database client)
• high throughput
• low latency
• reference data size limited by
memory of single Taskmanager
• events are enriched with stale data,
no updates to reference data
+ -
© 2019 Ververica16
Partitioned Pre-Loading of Reference Data
Sensor
Reference
Data
Implementation in Flink:
● DataStream#partitionCustom
● RichFlatMapFunction
© 2019 Ververica17
Partitioned Pre-Loading of Reference Data
Code
© 2019 Ververica18
Partitioned Pre-Loading of Reference Data
• high throughput
• low latency
• reference data size limited by
memory of all Taskmanagers
• events are enriched with stale data,
no updates to reference data
• requires custom partitioning of
DataStream
+ -
© 2019 Ververica19
Periodic (Partitioned) Pre-Loading of Reference Data
Sensor
Reference
Data
Implementation in Flink:
● CoProcessFunction
● Processing or Event Time Timers for Reloading
© 2019 Ververica20
Periodic (Partitioned) Pre-Loading of Reference Data
• high throughput
• low latency
• staleness of reference data limited
by refresh interval
• reference data size limited by
memory of all Taskmanagers
• events are enriched with stale
reference data
• load spikes/high tail latencies during
refresh of reference data
+ -
© 2019 Ververica21
Per-Record Lookup with Initial Cache Pre-Loading
Sensor
Reference
Data
C
a
c
h
e
C
a
c
h
e
Implementation in Flink:
● RichFlatMapFunction
● Cache Preloading in open()
© 2019 Ververica22
Per-Record Lookup with Initial Cache Pre-Loading
+ -
• high throughput
• low latency
• staleness of reference data limited
by cache timeout
• reference data size limited by
memory of all Taskmanagers
• events are enriched with stale
reference data
• load spikes/high tail latencies
depending on cache miss rate
© 2019 Ververica23
Half-Time Assessment
• multiple solutions for enrichment via external database
• So far, always a trade-off between
– database load
– staleness of reference data
– reference data size
– latency & throughput of event stream
© 2019 Ververica
Reference Data Change Stream
© 2019 Ververica25
High-Level Architecture
Sensor
Reference
Data
Apache Flink
• low update frequency
• one record per key
• high frequency
• many events per key over time
look up by sensorID
Sensor
Measurements
Enriched
Measurements
© 2019 Ververica26
More Streamy High-Level Architecture
Sensor
Reference
Data
Apache Flink
• database updates are captured and written
into message queue
• local stream join instead of
external lookup
Sensor
Measurements
Enriched
Measurements
Sensor
Reference Data
Updates
© 2019 Ververica27
High-Level Streaming/Event-Driven Architecture
Sensor
Reference
Data
Apache Flink
• ground truth is moved to message broker
• Former sensor reference database and
Apache Flink consume the same stream of
reference data updates
Sensor
Measurements
Enriched
Measurements
Sensor
Reference Data
Updates
Sensor Management
System
• local stream join instead of
external lookup
© 2019 Ververica28
Simple Streaming Enrichment
KeyedCoProcessFunction
Implementation in
Flink:
● Key by sensorId
● ValueState<
SensorReference
Data>
© 2019 Ververica29
Simple Streaming Enrichment
Code
© 2019 Ververica30
Simple Streaming Enrichment
+ -
• high throughput
• low latency
• always uses up-to-date reference
data
• reference data size not limited
(RocksDB) by memory
• might necessitate change of
high-level architecture or a
conversation with DBAs
• events might be enriched by
reference data from the future
© 2019 Ververica31
Simple Event Time Join
CoProcessFunction
Look up
based on
event time
Implementation in
Flink:
● Key by sensorId
● MapState<Long,
SensorReference
Data>
© 2019 Ververica32
Simple Event Time Join
+ -
• high throughput
• low latency
• always uses latest available
reference data for each record
• reference data size not limited
(RocksDB) by memory
• might necessitate change of
high-level architecture or a
conversation with DBAs
© 2019 Ververica33
Temporal Table Join
KeyedCoProcessFunction
On each watermark:
joins all events up till
watermark correct
reference
Implementation in Flink:
● https://ci.apache.org/pro
jects/flink/flink-docs-rele
ase-1.8/dev/table/strea
ming/temporal_tables.ht
ml
● org.apache.flink.table.r
untime.join.TemporalR
owtimeJoin
© 2019 Ververica34
Temporal Table Join
+ -
• high throughput
• always uses latest available
reference data for each record &
reference data is complete
• reference data size not limited
(RocksDB) by memory
• might necessitate change of
high-level architecture or a
conversation with DBAs
• higher latency
© 2019 Ververica35
TL;DL
• Flink provides numerous ways to enrich streaming data with slow changing reference
data.
• The highest performance and cleanest semantics result from a stream processing
architecture and streaming enrichment methods like Temporal Table Joins.
• More Resources
– https://www.ververica.com/resources/flink-forward-san-francisco-2019/how-to-join-two-data-streams
– https://github.com/knaufk/enrichments-with-flink
• Join the community!
– Subscribe to mailing lists
– Join Flink Forward Europe 2019 in October
© 2019 Ververica
Questions?
© 2019 Ververica
www.ververica.com @VervericaDatakonstantin@ververica.com

Mais conteúdo relacionado

Mais procurados

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 

Mais procurados (20)

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 

Semelhante a Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf

The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkAljoscha Krettek
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward
 
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Anirudha Jadhav
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ LyftJamie Grier
 
Machine Data 101
Machine Data 101Machine Data 101
Machine Data 101Splunk
 
What's New in NGINX Plus R7?
What's New in NGINX Plus R7?What's New in NGINX Plus R7?
What's New in NGINX Plus R7?NGINX, Inc.
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaBangladesh Network Operators Group
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...Flink Forward
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward
 
NGINX Kubernetes Ingress Controller: Getting Started – EMEA
NGINX Kubernetes Ingress Controller: Getting Started – EMEANGINX Kubernetes Ingress Controller: Getting Started – EMEA
NGINX Kubernetes Ingress Controller: Getting Started – EMEAAine Long
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Timothy Spann
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGMatt Stubbs
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo
 

Semelhante a Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf (20)

The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
 
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Machine Data 101
Machine Data 101Machine Data 101
Machine Data 101
 
What's New in NGINX Plus R7?
What's New in NGINX Plus R7?What's New in NGINX Plus R7?
What's New in NGINX Plus R7?
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with Grafana
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
Flink Forward San Francisco 2018: Robert Metzger & Patrick Lucas - "dA Platfo...
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
NGINX Kubernetes Ingress Controller: Getting Started – EMEA
NGINX Kubernetes Ingress Controller: Getting Started – EMEANGINX Kubernetes Ingress Controller: Getting Started – EMEA
NGINX Kubernetes Ingress Controller: Getting Started – EMEA
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
Project Controls Expo, 13th Nov 2013 - "Loading Cost and Activity data into P...
 
implementing the right website monitoring strategy
 implementing the right website monitoring strategy implementing the right website monitoring strategy
implementing the right website monitoring strategy
 

Mais de Ververica

2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...Ververica
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar:  How to contribute to Apache Flink - Robert MetzgerWebinar:  How to contribute to Apache Flink - Robert Metzger
Webinar: How to contribute to Apache Flink - Robert MetzgerVerverica
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid WysakowiczVerverica
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian HueskeVerverica
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2Ververica
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP Ververica
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Ververica
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingVerverica
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Ververica
 
Kostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsKostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsVerverica
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
 
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Ververica
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Ververica
 

Mais de Ververica (20)

2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar:  How to contribute to Apache Flink - Robert MetzgerWebinar:  How to contribute to Apache Flink - Robert Metzger
Webinar: How to contribute to Apache Flink - Robert Metzger
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Kostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIsKostas Kloudas - Extending Flink's Streaming APIs
Kostas Kloudas - Extending Flink's Streaming APIs
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
 
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
 

Último

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 

Último (20)

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 

Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf

  • 1. © 2019 Ververica Konstantin Knauf, @snntrable, Solutions Architect 99 WAYS TO ENRICH STREAMING DATA WITH APACHE FLINK
  • 2. © 2019 Ververica2 Agenda • Introduction • Per-Record Reference Data Lookup • Reference Data Pre-Loading • Reference Data Change Stream • Summary
  • 4. © 2019 Ververica4 Running Example Sensor Reference Data Apache Flink • low update frequency • one record per key • high frequency • many events per key over time look up by sensorID Sensor Measurements Enriched Measurements
  • 5. © 2019 Ververica Per-Record Reference Data Lookup
  • 6. © 2019 Ververica6 Per-Record Synchronous Lookup Sensor Reference Data Implementation in Flink: ● RichFlatMapFunction ● Database Client instantiated in open()
  • 7. © 2019 Ververica7 Per-Record Synchronous Lookup • very simple • always uses up-to-date reference data • high latency • high load on database • low CPU utilization on Taskmanagers • low throughput • overall not suitable for high frequency streams + -
  • 8. © 2019 Ververica8 Per-Record Asynchronous Lookup Sensor Reference Data Implementation in Flink: ● AsyncDataStream#unorderedWait ● https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/op erators/asyncio.html
  • 9. © 2019 Ververica9 Per-Record Asynchronous Lookup Code
  • 10. © 2019 Ververica10 Per-Record Asynchronous Lookup • still pretty simple • always uses up-to-date reference data • higher throughput than “Per-Record Synchronous Lookups” • high latency • high load on database • overall not suitable for high frequency streams + -
  • 11. © 2019 Ververica11 Per-Record (A)Synchronous Lookup with In-Memory Cache Sensor Reference Data C a c h e C a c h e Implementation in Flink: ● RichFlatMapFunction ● Simple HashMap or cache implementation of your favorite library
  • 12. © 2019 Ververica12 Per-Record (A)Synchronous Lookup with In-Memory Cache • lower load on database • higher throughput if cache hit rate is high • low latency if cache is hit • high tail latencies • high load on database during warm up phase, possibly hard to predict • events might be enriched by stale data • cache size limited by available memory + -
  • 13. © 2019 Ververica Reference Data Pre-Loading
  • 14. © 2019 Ververica14 Pre-Loading of Reference Data Sensor Reference Data Implementation in Flink: ● RichFlatMapFunction ● HashMap populated in open()
  • 15. © 2019 Ververica15 Pre-Loading of Reference Data • usually very simple (depending on the database client) • high throughput • low latency • reference data size limited by memory of single Taskmanager • events are enriched with stale data, no updates to reference data + -
  • 16. © 2019 Ververica16 Partitioned Pre-Loading of Reference Data Sensor Reference Data Implementation in Flink: ● DataStream#partitionCustom ● RichFlatMapFunction
  • 17. © 2019 Ververica17 Partitioned Pre-Loading of Reference Data Code
  • 18. © 2019 Ververica18 Partitioned Pre-Loading of Reference Data • high throughput • low latency • reference data size limited by memory of all Taskmanagers • events are enriched with stale data, no updates to reference data • requires custom partitioning of DataStream + -
  • 19. © 2019 Ververica19 Periodic (Partitioned) Pre-Loading of Reference Data Sensor Reference Data Implementation in Flink: ● CoProcessFunction ● Processing or Event Time Timers for Reloading
  • 20. © 2019 Ververica20 Periodic (Partitioned) Pre-Loading of Reference Data • high throughput • low latency • staleness of reference data limited by refresh interval • reference data size limited by memory of all Taskmanagers • events are enriched with stale reference data • load spikes/high tail latencies during refresh of reference data + -
  • 21. © 2019 Ververica21 Per-Record Lookup with Initial Cache Pre-Loading Sensor Reference Data C a c h e C a c h e Implementation in Flink: ● RichFlatMapFunction ● Cache Preloading in open()
  • 22. © 2019 Ververica22 Per-Record Lookup with Initial Cache Pre-Loading + - • high throughput • low latency • staleness of reference data limited by cache timeout • reference data size limited by memory of all Taskmanagers • events are enriched with stale reference data • load spikes/high tail latencies depending on cache miss rate
  • 23. © 2019 Ververica23 Half-Time Assessment • multiple solutions for enrichment via external database • So far, always a trade-off between – database load – staleness of reference data – reference data size – latency & throughput of event stream
  • 24. © 2019 Ververica Reference Data Change Stream
  • 25. © 2019 Ververica25 High-Level Architecture Sensor Reference Data Apache Flink • low update frequency • one record per key • high frequency • many events per key over time look up by sensorID Sensor Measurements Enriched Measurements
  • 26. © 2019 Ververica26 More Streamy High-Level Architecture Sensor Reference Data Apache Flink • database updates are captured and written into message queue • local stream join instead of external lookup Sensor Measurements Enriched Measurements Sensor Reference Data Updates
  • 27. © 2019 Ververica27 High-Level Streaming/Event-Driven Architecture Sensor Reference Data Apache Flink • ground truth is moved to message broker • Former sensor reference database and Apache Flink consume the same stream of reference data updates Sensor Measurements Enriched Measurements Sensor Reference Data Updates Sensor Management System • local stream join instead of external lookup
  • 28. © 2019 Ververica28 Simple Streaming Enrichment KeyedCoProcessFunction Implementation in Flink: ● Key by sensorId ● ValueState< SensorReference Data>
  • 29. © 2019 Ververica29 Simple Streaming Enrichment Code
  • 30. © 2019 Ververica30 Simple Streaming Enrichment + - • high throughput • low latency • always uses up-to-date reference data • reference data size not limited (RocksDB) by memory • might necessitate change of high-level architecture or a conversation with DBAs • events might be enriched by reference data from the future
  • 31. © 2019 Ververica31 Simple Event Time Join CoProcessFunction Look up based on event time Implementation in Flink: ● Key by sensorId ● MapState<Long, SensorReference Data>
  • 32. © 2019 Ververica32 Simple Event Time Join + - • high throughput • low latency • always uses latest available reference data for each record • reference data size not limited (RocksDB) by memory • might necessitate change of high-level architecture or a conversation with DBAs
  • 33. © 2019 Ververica33 Temporal Table Join KeyedCoProcessFunction On each watermark: joins all events up till watermark correct reference Implementation in Flink: ● https://ci.apache.org/pro jects/flink/flink-docs-rele ase-1.8/dev/table/strea ming/temporal_tables.ht ml ● org.apache.flink.table.r untime.join.TemporalR owtimeJoin
  • 34. © 2019 Ververica34 Temporal Table Join + - • high throughput • always uses latest available reference data for each record & reference data is complete • reference data size not limited (RocksDB) by memory • might necessitate change of high-level architecture or a conversation with DBAs • higher latency
  • 35. © 2019 Ververica35 TL;DL • Flink provides numerous ways to enrich streaming data with slow changing reference data. • The highest performance and cleanest semantics result from a stream processing architecture and streaming enrichment methods like Temporal Table Joins. • More Resources – https://www.ververica.com/resources/flink-forward-san-francisco-2019/how-to-join-two-data-streams – https://github.com/knaufk/enrichments-with-flink • Join the community! – Subscribe to mailing lists – Join Flink Forward Europe 2019 in October
  • 37. © 2019 Ververica www.ververica.com @VervericaDatakonstantin@ververica.com