SlideShare a Scribd company logo
1 of 48
- STEPHAN EWEN, CO-FOUNDER & CTO, APACHE FLINK PMC
APACHE FLINK AND APACHE KAFKA FOR
STATEFUL STREAMING APPLICATIONS
2
Original creators of
Apache Flink®
dA Platform
Stream Processing for the
Enterprise
What is Apache Flink?
3
Batch Processing
process static and
historic data
Data Stream
Processing
realtime results
from data streams
Event-driven
Applications
data-driven actions
and services
Stateful Computations Over Data Streams
Apache Flink in a Nutshell
4
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application
Everything Streams
5
Apache Flink handles everything as streams internally.
Continuous streaming and applications use "unbounded streams".
Batch processing and finite applications use "bounded streams".
Layered abstractions
6
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Navigate simple to complex use cases
DataStream API
7
Source
Transformation
Windowed Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink
Low Level: Process Function
8
High Level: SQL (ANSI)
9
SELECT
campaign,
TUMBLE_START(clickTime, INTERVAL ’1’ HOUR),
COUNT(ip) AS clickCnt
FROM adClicks
WHERE clickTime > ‘2017-01-01’
GROUP BY campaign, TUMBLE(clickTime, INTERVAL ‘1’ HOUR)
Query
past futurenowstart of
the stream
10
How Large (or Small)
can Flink get?
11
Blink is Alibaba's
Flink-based System
12
Keystone Routing Pipeline at Netflix
(as presented at Flink Forward San Francisco, 2018)
Small Flink
 Can run in single process
 Some users run it on IoT Gateways
 Also runs with zero dependencies in IDE
13
14
Checkpoints instead
of Transactions
Event Sourcing + Memory Image
15
event log
persists events
(temporarily)
event /
command
Process
main memory
update local
variables/structures
periodically snapshot
the memory
Event Sourcing + Memory Image
16
Recovery: Restore snapshot and replay events
since snapshot
event log
persists events
(temporarily)
Process
Consistent Distributed Snapshots
17
Checkpoints for Recovery
18
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing
Why Checkpoints?
 No barriers / boundaries  low latency
 No intermediate stream/state replication necessary
• High throughput
• Shuffles are very cheap! No load on brokers.
 Handles very large state well (TBs)
 Supports fast batch processing
 Supports flexibly types of states and timers
19
Incremental Snapshots
20
Localized State Recovery (Flink 1.5)
21
Piggybags on internal multi-version
data structures:
• LSM Tree (RocksDB)
• MV Hashtable (Fs / Mem State Backend)
Setup:
• 500 MB state per node
• Checkpoints to S3
• Soft failure (Flink fails, machine survives)
Checkpoints for Program Evolution
22
Restore to different
programs
Bugfixes, Upgrades, A/B testing, etc
State Archiving Through Savepoints
23
time
Replay from Savepoints to Drill Down
24
time
Incident of Interest
"Debug Job"
(modified version of original Job)
Filter
(events of interest only)
Extra sink for
trace output
Pause / Resume style execution
25
time
Bursty Event Stream (events only at only end-of-day )
Pause / Resume style execution
26
time
Bursty Event Stream (events only at only end-of-day )
Checkpoint / Savepoint
Store
27
Flink and Kafka
Integration
Flink Kafka Reader
 Supports version 0.8 – 0.11/1.0
 Exactly-once semantics
• Flink checkpoints manage offsets
• Can optionally participate in reader groups offset committing
 Topic and partition discovery
 Multiple topics at the same time
 Per-partition watermarking
28
Flink Kafka Writer
 Supports version 0.8 – 0.11/1.0
 Exactly-once via Kafka Transactions (0.11+)
• Details in a later sections
 Supports partitioners, timestamps, and the usual
29
Transaction Coordination
 Similar to a distributed 2-phase/3-phase commit
 Coordinated by asynchronous checkpoints
==> non-blocking, no voting delays
 Basic algorithm:
• Between checkpoints: Produce into transaction or Write Ahead Log
• On operator snapshot: Flush local transaction (vote-to-commit)
• On checkpoint complete: Commit transactions (commit)
• On recovery: check and commit any pending transactions
30
Exactly-once via Transactions
31
chk-1 chk-2
TXN-1
✔chk-1 ✔chk-
2
TXN-2
✘
TXN-3
✔ global ✔ global
Transaction fails after local snapshot
32
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global
Transaction fails before commit…
33
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global ✔ global
… commit on recovery
34
chk-2
TXN-2 TXN-3
✔ global
recover
TXN handle
chk-3
35
Some interesting
Flink and Kafka
Use Cases
Flink and Kafka
36
Application
Sensor
APIs
Application
Application
Application
Data Stream Parsing & Routing
37
Steven Wu / Netflix - "Scaling Flink in Cloud"
Machine Learning Pipelines
38
Dave Torok & Sameer Wadkar (Comcast)
"Embedding Flink Throughout an Operationalized Streaming ML Lifecycle"
Machine Learning Pipelines
39
Xingzhong Xu / Uber
"Scaling Uber’s Realtime Optimization with Apache Flink"
Real-time and Historic Data
40
Kafka
HDFS, S3, GCS,
SAN, NAS, NFS, ECS,
Swift, Ceph, …
The anatomy of a data stream
Combining S3 and Kafka/Kinesis data
41
Gregory Fee / Lyft - "Bootstrapping State In Apache Flink"
Event Sourcing CQRS Applictions
42
Aris Koliopoulos
"Drivetribe's Kappa Architecture
with Apache Flink"
Sophisticated Time Semantics
43
Erik de Nooij / ING
"StreamING models, how ING adds models at runtime to catch fraudsters"
Low-latency event-time joins/aggregations
Large and complex state
44
45
Outlook
Going for more languages
46
Flink 1.5
 Big change to process model
• Better support for Framework and Library modes
• Dynamic resource acquisition
• All communication with clients is REST
 Special network protocol to speed up checkpoint alignments
 Lower shuffle latency with same throughput
 Faster recovery of large state
 Managed broadcast state
 Interactive SQL Client (beta)
 … much more …
47
48
Thank you!
Questions?

More Related Content

What's hot

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 

What's hot (20)

Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 

Similar to 2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 

Similar to 2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications" (20)

Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
The Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkThe Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache Flink
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 

More from Ververica

Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 

More from Ververica (13)

2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar:  How to contribute to Apache Flink - Robert MetzgerWebinar:  How to contribute to Apache Flink - Robert Metzger
Webinar: How to contribute to Apache Flink - Robert Metzger
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

  • 1. - STEPHAN EWEN, CO-FOUNDER & CTO, APACHE FLINK PMC APACHE FLINK AND APACHE KAFKA FOR STATEFUL STREAMING APPLICATIONS
  • 2. 2 Original creators of Apache Flink® dA Platform Stream Processing for the Enterprise
  • 3. What is Apache Flink? 3 Batch Processing process static and historic data Data Stream Processing realtime results from data streams Event-driven Applications data-driven actions and services Stateful Computations Over Data Streams
  • 4. Apache Flink in a Nutshell 4 Queries Applications Devices etc. Database Stream File / Object Storage Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory, event time, large state, exactly-once Historic Data Streams Application
  • 5. Everything Streams 5 Apache Flink handles everything as streams internally. Continuous streaming and applications use "unbounded streams". Batch processing and finite applications use "bounded streams".
  • 6. Layered abstractions 6 Process Function (events, state, time) DataStream API (streams, windows) Stream SQL / Tables (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Navigate simple to complex use cases
  • 7. DataStream API 7 Source Transformation Windowed Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  • 8. Low Level: Process Function 8
  • 9. High Level: SQL (ANSI) 9 SELECT campaign, TUMBLE_START(clickTime, INTERVAL ’1’ HOUR), COUNT(ip) AS clickCnt FROM adClicks WHERE clickTime > ‘2017-01-01’ GROUP BY campaign, TUMBLE(clickTime, INTERVAL ‘1’ HOUR) Query past futurenowstart of the stream
  • 10. 10 How Large (or Small) can Flink get?
  • 12. 12 Keystone Routing Pipeline at Netflix (as presented at Flink Forward San Francisco, 2018)
  • 13. Small Flink  Can run in single process  Some users run it on IoT Gateways  Also runs with zero dependencies in IDE 13
  • 15. Event Sourcing + Memory Image 15 event log persists events (temporarily) event / command Process main memory update local variables/structures periodically snapshot the memory
  • 16. Event Sourcing + Memory Image 16 Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process
  • 18. Checkpoints for Recovery 18 Re-load state Reset positions in input streams Rolling back computation Re-processing
  • 19. Why Checkpoints?  No barriers / boundaries  low latency  No intermediate stream/state replication necessary • High throughput • Shuffles are very cheap! No load on brokers.  Handles very large state well (TBs)  Supports fast batch processing  Supports flexibly types of states and timers 19
  • 21. Localized State Recovery (Flink 1.5) 21 Piggybags on internal multi-version data structures: • LSM Tree (RocksDB) • MV Hashtable (Fs / Mem State Backend) Setup: • 500 MB state per node • Checkpoints to S3 • Soft failure (Flink fails, machine survives)
  • 22. Checkpoints for Program Evolution 22 Restore to different programs Bugfixes, Upgrades, A/B testing, etc
  • 23. State Archiving Through Savepoints 23 time
  • 24. Replay from Savepoints to Drill Down 24 time Incident of Interest "Debug Job" (modified version of original Job) Filter (events of interest only) Extra sink for trace output
  • 25. Pause / Resume style execution 25 time Bursty Event Stream (events only at only end-of-day )
  • 26. Pause / Resume style execution 26 time Bursty Event Stream (events only at only end-of-day ) Checkpoint / Savepoint Store
  • 28. Flink Kafka Reader  Supports version 0.8 – 0.11/1.0  Exactly-once semantics • Flink checkpoints manage offsets • Can optionally participate in reader groups offset committing  Topic and partition discovery  Multiple topics at the same time  Per-partition watermarking 28
  • 29. Flink Kafka Writer  Supports version 0.8 – 0.11/1.0  Exactly-once via Kafka Transactions (0.11+) • Details in a later sections  Supports partitioners, timestamps, and the usual 29
  • 30. Transaction Coordination  Similar to a distributed 2-phase/3-phase commit  Coordinated by asynchronous checkpoints ==> non-blocking, no voting delays  Basic algorithm: • Between checkpoints: Produce into transaction or Write Ahead Log • On operator snapshot: Flush local transaction (vote-to-commit) • On checkpoint complete: Commit transactions (commit) • On recovery: check and commit any pending transactions 30
  • 31. Exactly-once via Transactions 31 chk-1 chk-2 TXN-1 ✔chk-1 ✔chk- 2 TXN-2 ✘ TXN-3 ✔ global ✔ global
  • 32. Transaction fails after local snapshot 32 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global
  • 33. Transaction fails before commit… 33 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global ✔ global
  • 34. … commit on recovery 34 chk-2 TXN-2 TXN-3 ✔ global recover TXN handle chk-3
  • 35. 35 Some interesting Flink and Kafka Use Cases
  • 37. Data Stream Parsing & Routing 37 Steven Wu / Netflix - "Scaling Flink in Cloud"
  • 38. Machine Learning Pipelines 38 Dave Torok & Sameer Wadkar (Comcast) "Embedding Flink Throughout an Operationalized Streaming ML Lifecycle"
  • 39. Machine Learning Pipelines 39 Xingzhong Xu / Uber "Scaling Uber’s Realtime Optimization with Apache Flink"
  • 40. Real-time and Historic Data 40 Kafka HDFS, S3, GCS, SAN, NAS, NFS, ECS, Swift, Ceph, … The anatomy of a data stream
  • 41. Combining S3 and Kafka/Kinesis data 41 Gregory Fee / Lyft - "Bootstrapping State In Apache Flink"
  • 42. Event Sourcing CQRS Applictions 42 Aris Koliopoulos "Drivetribe's Kappa Architecture with Apache Flink"
  • 43. Sophisticated Time Semantics 43 Erik de Nooij / ING "StreamING models, how ING adds models at runtime to catch fraudsters" Low-latency event-time joins/aggregations
  • 44. Large and complex state 44
  • 46. Going for more languages 46
  • 47. Flink 1.5  Big change to process model • Better support for Framework and Library modes • Dynamic resource acquisition • All communication with clients is REST  Special network protocol to speed up checkpoint alignments  Lower shuffle latency with same throughput  Faster recovery of large state  Managed broadcast state  Interactive SQL Client (beta)  … much more … 47

Editor's Notes

  1. Apache Flink is the only system that handles the full breadth of stream processing: from exploration of bounded data over streaming analytics to streaming data applications