SlideShare uma empresa Scribd logo
1 de 31
Stream Processing Advanced
with Flink and Pulsar
Till Rohrmann, Engineering Lead at Ververica, @stsffap
Addison Higham, Chief Architect at StreamNative, @addisonjh
Why is Stream Processing Important?
Spectrum of data streaming applications
offline real time
Data warehousing
OLAP / BI / reporting
Machine learning & model
training
Continuous
ETL
Unified offline /
real-time analytics
Continuous
monitoring
(position, risk, …)
Real-time behaviour
modeling (pricing,
recommenders, …)
Real-time alerts
(fraud, security, …)
Real-time ML
model
training/evaluation
Distributed OLTP
applications
Choice of data architecture defines which uses cases you can cover!
Modern Streaming Data Architecture
Stream
Processing
Stream
Processing
Stream
Processing
Stream Storage
Long Term Storage
Results / Views
Triggered
Applications
Event Producers
Reference Streaming Data Architecture: Flink + Pulsar
Stream Storage
Long Term Storage
Results / Views
Stateful
Functions
Event Producers
Apache Flink: Analytics and Applications on Streaming Data
Flink Runtime
Stateful Computations over Data Streams
Stateful Stream
Processing
Streams, State, Time
Streaming Analytics
SQL and Tables
Event-driven
Applications
Stateful Functions
Apache Flink in Numbers
● Contributors: 887
● Most active Apache ML
● Most active sources by visits and commits (#2)
● Github stars: 16.4k
● Commits: >27k
● Releases: 1.13 latest major release
● Maven downloads per month: ~ 170k
● LOC: > 1.8 million
● Latency: < 1s
● Throughput: 4 billion events/s, 7 TB/s
● Input size: 100 TB for batch job
Some Apache Flink Users
One of The Most Advanced Stream Processors
● First class support for state
○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints
● Event-time support
○ Correctness under delayed events
● End-to-end exactly once processing guarantees
○ Correctness under faults
● Resource elastic
○ Flink applications’ resources can be adjusted to the actual need
● Unified batch-streaming APIs
○ A single query to process historical as well as live data
● Stream: Sequence of data which is made available over time
● All computation processes chunks of data over time producing results over
time → Stream processing
○ E.g. reading data from disks is done in streaming fashion
● Events can be of various forms
● Decisive difference: Is my stream bounded or not?
Everything Is a Stream
SQL / Table API: Unified Batch & Stream Processing
SQL Query
Batch query
execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Stream-Table Duality
User lastLogin
Alice 2021-01-01
Bob 2021-01-02
User lastLogin
Alice 2021-01-14
Bob 2021-02-01
Eve 2021-01-21
Alice, 2021-01-01
Bob, 2021-01-02
Alice, 2021-01-14
Eve, 2021-01-21
Bob, 2021-02-01
SQL / Table API: Running The Same Query On Streams
SQL Query
Incremental
query execution
SELECT
room,
TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR),
AVG(temperature)
FROM
sensors
GROUP BY
TUMBLE(rowtime, INTERVAL ‘1’ HOUR),
room
Interpret stream as
table
Data Has to Come From Somewhere
● Flink is a powerful compute engine for unified batch and streaming
processing that is pushing boundaries of data processing
● For Flink really to shine, a storage system that supports efficient stream and
batch ingestion is required
Pulsar offers exactly that!
● Apache Pulsar is evolving to meet the needs of a system like Flink, offering
both a low-latency, high bandwidth streaming API as well as a flexible
architecture to support batch access
Why is Pulsar a Great Fit For Streaming And Batch?
Streaming
● Like other streaming systems,
Pulsar can be used like a
distributed scale-out log, providing
consumer controlled offsets and
high throughput through parallel
partitions of topics
Batch
● In addition, Pulsar’s segment-
based, multi-layer architecture
allows for applications to access
historical data more directly via
the underlying storage
Producer Consumer
Broker 2
Broker 1 Broker 3
Topic1-Part1 Topic1-Part2 Topic1-Part3
Segment 2
Segment 1 Segment 3 Segment X
Bookie
1
Bookie
2
Bookie
3
Segment 1
Segment X
Segment 1
Segment X
Segment 1
Segment X
Offload
Segments
+
Flink + Pulsar
Adoption at Scale
While Flink and Pulsar are both widely used independently, together they offer a
strong technology for unified batch and stream storage and compute
Many large organizations are adopting Flink + Pulsar for complex workloads that
can take advantage of the strengths of both systems
State of The Pulsar-Flink Connector Today
● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13
● Exactly once source and sink via producer deduplication
● Full support for Flink Schema with integration to Pulsar Schema Registry
● Full Flink SQL support for batch and streaming modes
○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes
● Support for KeyShared subscriptions for higher source parallelism than
number of topic-partitions
● https://github.com/streamnative/pulsar-flink
+
Demo
The Road Ahead
Many in-progress features are being developed
● Pulsar-Flink connector being upstreamed into Flink
○ Targeted for Flink 1.14
● Pulsar Transaction Support
○ Pulsar transaction’s allow for stronger exactly-once processing guarantees
● Native Pulsar Watermarking
○ Pulsar is able to broker watermarks between producers and consumers
● Parallel Batch Source
○ Query multiple segments in parallel for higher throughput
● Unified Batch + Streaming Source
○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
Job 1
Flink Support For Pulsar Transactions
Pulsar transactions are GA in Pulsar 2.8.0
Support in Flink is nearing completion, which can allows for end-to-end exactly-
once processing guarantees!
Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in
Apache Pulsar at 11:30 AM
Job 2
Job 3
tx
The Importance of Watermarks
High-quality watermarks are crucial for correct and
stable stream processing jobs.
In order for results to be correct, we need to take into
account “Event Time” rather than solely relying on
“Processing Time”
This is especially important when replaying or
processing older streaming data
“Watermarks” advance the event-time clock, if the clock
does not “tick” accurately, the results will not be correct
Ideal
Reality
Skew
Event Time
Native Pulsar Watermarking
Pulsar is adding support for watermarks
● Producers have a new API for injecting
watermarks into the stream, with
multiple producers potentially producing
● Watermarks are broadcast across all
partitions, which allows the broker to
dispatch more accurate watermarks to
consumers, in both realtime and
historical scenarios
● The API is designed to be simple to
integrate with Flink, removing the
complexity of accurate watermark
generation from the developer
// Pulsar
for (int = 0; i < NUM_MESSAGES; i++) {
producer.newMessage()
.value("hello with event time! " + i)
.eventTime(
System.currentTimeMillis()).sendAsync();
}
Producer
.newWatermark()
.eventTime(
System.currentTimeMillis()).sendAsync();
// Flink
PulsarSource<String> pulsarSource = new
FlinkPulsarSource(topic, ...);
pulsarSource.assignTimestampsAndWatermarks(
PulsarWatermarkStrategy.forPulsarWatermarks());
Shared Event-Time Domain
End-to-End Watermarks
Job 1
Job 2
Job 3
Producer
W|24
W|17 W|12
W|12
W|8
W|4
Pulsar + Flink and StreamNative + Ververica
Pulsar + Flink community collaboration
● Contributing Pulsar-Flink connector to Flink repository
● Evolve connector to support more advanced Pulsar and Flink features
● Build best in the class open source stream processing platform
StreamNative + Ververica Cloud partnership
● Help customers to unlock the full potential of stream processing
Thanks a lot for your attention!
Questions?

Mais conteúdo relacionado

Mais procurados

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 

Mais procurados (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
kafka
kafkakafka
kafka
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 

Semelhante a Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 

Semelhante a Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote (20)

Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 

Mais de StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

Mais de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Último

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

  • 1. Stream Processing Advanced with Flink and Pulsar Till Rohrmann, Engineering Lead at Ververica, @stsffap Addison Higham, Chief Architect at StreamNative, @addisonjh
  • 2. Why is Stream Processing Important?
  • 3.
  • 4.
  • 5. Spectrum of data streaming applications offline real time Data warehousing OLAP / BI / reporting Machine learning & model training Continuous ETL Unified offline / real-time analytics Continuous monitoring (position, risk, …) Real-time behaviour modeling (pricing, recommenders, …) Real-time alerts (fraud, security, …) Real-time ML model training/evaluation Distributed OLTP applications Choice of data architecture defines which uses cases you can cover!
  • 6. Modern Streaming Data Architecture Stream Processing Stream Processing Stream Processing Stream Storage Long Term Storage Results / Views Triggered Applications Event Producers
  • 7. Reference Streaming Data Architecture: Flink + Pulsar Stream Storage Long Term Storage Results / Views Stateful Functions Event Producers
  • 8.
  • 9. Apache Flink: Analytics and Applications on Streaming Data Flink Runtime Stateful Computations over Data Streams Stateful Stream Processing Streams, State, Time Streaming Analytics SQL and Tables Event-driven Applications Stateful Functions
  • 10. Apache Flink in Numbers ● Contributors: 887 ● Most active Apache ML ● Most active sources by visits and commits (#2) ● Github stars: 16.4k ● Commits: >27k ● Releases: 1.13 latest major release ● Maven downloads per month: ~ 170k ● LOC: > 1.8 million ● Latency: < 1s ● Throughput: 4 billion events/s, 7 TB/s ● Input size: 100 TB for batch job
  • 12. One of The Most Advanced Stream Processors ● First class support for state ○ Asynchronous barrier checkpointing algorithm to create globally consistent checkpoints ● Event-time support ○ Correctness under delayed events ● End-to-end exactly once processing guarantees ○ Correctness under faults ● Resource elastic ○ Flink applications’ resources can be adjusted to the actual need ● Unified batch-streaming APIs ○ A single query to process historical as well as live data
  • 13. ● Stream: Sequence of data which is made available over time ● All computation processes chunks of data over time producing results over time → Stream processing ○ E.g. reading data from disks is done in streaming fashion ● Events can be of various forms ● Decisive difference: Is my stream bounded or not? Everything Is a Stream
  • 14. SQL / Table API: Unified Batch & Stream Processing SQL Query Batch query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room
  • 15. Stream-Table Duality User lastLogin Alice 2021-01-01 Bob 2021-01-02 User lastLogin Alice 2021-01-14 Bob 2021-02-01 Eve 2021-01-21 Alice, 2021-01-01 Bob, 2021-01-02 Alice, 2021-01-14 Eve, 2021-01-21 Bob, 2021-02-01
  • 16. SQL / Table API: Running The Same Query On Streams SQL Query Incremental query execution SELECT room, TUMBLE_END(rowtime, INTERVAL ‘1’ HOUR), AVG(temperature) FROM sensors GROUP BY TUMBLE(rowtime, INTERVAL ‘1’ HOUR), room Interpret stream as table
  • 17.
  • 18. Data Has to Come From Somewhere ● Flink is a powerful compute engine for unified batch and streaming processing that is pushing boundaries of data processing ● For Flink really to shine, a storage system that supports efficient stream and batch ingestion is required Pulsar offers exactly that! ● Apache Pulsar is evolving to meet the needs of a system like Flink, offering both a low-latency, high bandwidth streaming API as well as a flexible architecture to support batch access
  • 19. Why is Pulsar a Great Fit For Streaming And Batch? Streaming ● Like other streaming systems, Pulsar can be used like a distributed scale-out log, providing consumer controlled offsets and high throughput through parallel partitions of topics Batch ● In addition, Pulsar’s segment- based, multi-layer architecture allows for applications to access historical data more directly via the underlying storage Producer Consumer Broker 2 Broker 1 Broker 3 Topic1-Part1 Topic1-Part2 Topic1-Part3 Segment 2 Segment 1 Segment 3 Segment X Bookie 1 Bookie 2 Bookie 3 Segment 1 Segment X Segment 1 Segment X Segment 1 Segment X Offload Segments
  • 20. +
  • 22. Adoption at Scale While Flink and Pulsar are both widely used independently, together they offer a strong technology for unified batch and stream storage and compute Many large organizations are adopting Flink + Pulsar for complex workloads that can take advantage of the strengths of both systems
  • 23. State of The Pulsar-Flink Connector Today ● Pulsar-Flink connector supports Flink 1.11, 1.12 and soon 1.13 ● Exactly once source and sink via producer deduplication ● Full support for Flink Schema with integration to Pulsar Schema Registry ● Full Flink SQL support for batch and streaming modes ○ Also support for an “upsert” mode which can reason about inserts, updates, and deletes ● Support for KeyShared subscriptions for higher source parallelism than number of topic-partitions ● https://github.com/streamnative/pulsar-flink
  • 25. The Road Ahead Many in-progress features are being developed ● Pulsar-Flink connector being upstreamed into Flink ○ Targeted for Flink 1.14 ● Pulsar Transaction Support ○ Pulsar transaction’s allow for stronger exactly-once processing guarantees ● Native Pulsar Watermarking ○ Pulsar is able to broker watermarks between producers and consumers ● Parallel Batch Source ○ Query multiple segments in parallel for higher throughput ● Unified Batch + Streaming Source ○ Using Pulsar’s batch mode for catch up and then switching over to streaming ingestion
  • 26. Job 1 Flink Support For Pulsar Transactions Pulsar transactions are GA in Pulsar 2.8.0 Support in Flink is nearing completion, which can allows for end-to-end exactly- once processing guarantees! Learn more about transactions at the talk Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar at 11:30 AM Job 2 Job 3 tx
  • 27. The Importance of Watermarks High-quality watermarks are crucial for correct and stable stream processing jobs. In order for results to be correct, we need to take into account “Event Time” rather than solely relying on “Processing Time” This is especially important when replaying or processing older streaming data “Watermarks” advance the event-time clock, if the clock does not “tick” accurately, the results will not be correct Ideal Reality Skew Event Time
  • 28. Native Pulsar Watermarking Pulsar is adding support for watermarks ● Producers have a new API for injecting watermarks into the stream, with multiple producers potentially producing ● Watermarks are broadcast across all partitions, which allows the broker to dispatch more accurate watermarks to consumers, in both realtime and historical scenarios ● The API is designed to be simple to integrate with Flink, removing the complexity of accurate watermark generation from the developer // Pulsar for (int = 0; i < NUM_MESSAGES; i++) { producer.newMessage() .value("hello with event time! " + i) .eventTime( System.currentTimeMillis()).sendAsync(); } Producer .newWatermark() .eventTime( System.currentTimeMillis()).sendAsync(); // Flink PulsarSource<String> pulsarSource = new FlinkPulsarSource(topic, ...); pulsarSource.assignTimestampsAndWatermarks( PulsarWatermarkStrategy.forPulsarWatermarks());
  • 29. Shared Event-Time Domain End-to-End Watermarks Job 1 Job 2 Job 3 Producer W|24 W|17 W|12 W|12 W|8 W|4
  • 30. Pulsar + Flink and StreamNative + Ververica Pulsar + Flink community collaboration ● Contributing Pulsar-Flink connector to Flink repository ● Evolve connector to support more advanced Pulsar and Flink features ● Build best in the class open source stream processing platform StreamNative + Ververica Cloud partnership ● Help customers to unlock the full potential of stream processing
  • 31. Thanks a lot for your attention! Questions?

Notas do Editor

  1. Faster results → faster business decisions → competitive advantage → more money
  2. Generalization of batch processing Processing of historical data Processing of live data Processing both historical data + live data
  3. Brings together the worlds of data analytics and event-driven applications Building powerful applications which benefit from real time analytics (e.g. fraud detection, match-making, etc.)
  4. Real time search and recommendation models (e.g., Alibaba) Build a real-time session behavior profile of users (e.g., Netflix) Real time trade settlement dashboard (e.g., UBS) Real time revenue accounting (various AdTechs) Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft) Real-time data refinement and data pipelines (many) Realtime Analytics Platforms (e.g., Alibaba, Uber, Lyft, Yelp!, Tencent) Materializing Views (dashboards, data marts) ETL - batch and continuous Machine Learning Training (Alibaba, new ML library) Sub-second latency Peak throughput: 4B Event/s Throughput size: 7 TB/s Largest Batch Job Input size: 100TB Flink cluster size: 30K
  5. If everything is a stream, can I treat it with a single API? TableAPI, SQL and DataStream can handle bounded (batch) and unbounded streams Write the program once → Run it on live and historic data Lower development and maintenance costs Having a single engine for batch and stream processing Do batch processing when catching up with the stream One engine to run batch & streaming workloads Scheduler which can schedule bounded and unbounded tasks Exploiting bounded streams property for runtime optimizations More batching of results Out of order processing Stage wise execution possible
  6. In this section, I will give a brief overview of watermarks and discuss correctness and skew a bit
  7. Watermarks are assertions about the progression of event time, according to the data producer. Pulsar is brokering the assertions from producer to consumer. The quality of the watermarks (and hence the job output) is determined by the producer; system internals (partitioning, I/O performance, etc) are effectively factored out.