SlideShare uma empresa Scribd logo
1 de 30
Piotr Nowojski
piotr@data-artisans.com
@PiotrNowojski
“Hit me, baby, just one
time”
Building End-to-End Exactly-Once
Applications with Flink
Overview of delivering
guarantees
2
No guarantees and failures
3
At-Least-Once and failures
4
Exactly-Once and failures
5
How to achieve Exactly-Once
6
Exactly-Once on a single node
▪ Simple transactions
▪ Write data in transaction
▪ Include read offsets in transaction
▪ On success - commit transaction
▪ On failure - abort transaction and restart from
last committed read offset
7
Exactly-Once on a single node
▪ What about application state?
▪ For example running average
▪ Include in the transaction, by adding as a
separate file/table just before commit
8
Exactly-Once in a distributed system
▪ Persistent communication channels
9
Exactly-Once in a distributed system
▪ Persistent communication channels
▪ Allow to re-process messages from last
committed read offsets
▪ High costs of communication
▪ Processes operate independent of each
other
10
Exactly-Once in Flink
1
1
Exactly-Once two phase commit
▪ Can we do better? Yes! Two phase commit
protocol
12
Input data Process 1 Process 2 Output data
Exactly-Once two phase commit
13
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Exactly-Once two phase commit
14
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Start external
transaction
Exactly-Once two phase commit
15
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Pre-commit
(checkpoint starts)
Exactly-Once two phase commit
16
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Pre-commit
Exactly-Once two phase commit
17
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Snapshot state (3)
Pre-commit external
transaction (3)
Pre-commit
Exactly-Once two phase commit
▪ Without side effects (only internal state)
•Example: calculating running average
•State managed by Flink
▪ With side effects
•Operator must manage external state on it’s own
18
Exactly-Once two phase commit
19
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Inject checkpoint
barrier (1)
Snapshot offsets &
state (2)
Pass checkpoint
barrier (2)
Snapshot state (3)
Pre-commit external
transaction (3)
Pre-commit
Exactly-Once two phase commit
20
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Notify checkpoint
completed (1)
Commit
Exactly-Once two phase commit
21
Input data Process 1 Process 2 Output data
State
backend
Job
manager
Notify checkpoint
completed (1) Commit external
transaction (2)
Commit
Exactly-Once two phase commit
▪ Once all processes successfully complete pre-
commit, commit is issued
▪ If at least one pre-commit fails others are aborted
▪ After a successful pre-commit, commit MUST be
guaranteed to eventually succeed
▪ This guarantees that all processes agree on the final
outcome
22
Two phase commit example
implementation
▪ Begin transaction - create a temporary file
▪ Write element - write data to that file
▪ Pre-commit - flush the file
▪ And begin new transaction for subsequent
writes
▪ Commit - move the file to a target directory
▪ Can increase latency
23
Exactly-Once two phase commit
▪ In case of system crash, we can restore
state of the application to the latest
snapshot
▪ We could/should abort previous “pending”
transactions
•Delete temporary files
24
TwoPhaseCommitSinkFunction
▪ Flink’s snapshots/checkpoints are very
similar to two phase commit
▪ There is a TwoPhaseCommitSinkFunction
which maps checkpoint calls to
beginTransaction, preCommit, commit and
abort calls
25
Exactly-Once producers in Flink
▪ BucketingSink
▪ Pravega connector
▪ Kafka 0.11 connector
▪ Kafka 0.11 introduced transactions
▪ Implemented on top of the
TwoPhaseCommitSinkFunction
▪ Low overhead
26
Summarizon
▪ Many ways to achieve Exactly-Once
▪ Flink checkpointing is similar to the two
phase commit protocol
▪ No materialization of the data in transit
▪ Lower latency
▪ Higher throughput
27
Questions?
2
8
29
Thank you!
@PiotrNowojski
@ApacheFlink
@dataArtisans
We are hiring!
data-artisans.com/careers

Mais conteúdo relacionado

Mais procurados

Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeFlink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaAltinity Ltd
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 

Mais procurados (20)

Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 

Semelhante a Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - Building End-to-End Exactly-Once Applications with Flink

Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordinationsiva krishna
 
Transaction Processing Monitors (TPM)
Transaction Processing Monitors (TPM)Transaction Processing Monitors (TPM)
Transaction Processing Monitors (TPM)Peter R. Egli
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OSC.U
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-queryM Rezaur Rahman
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlbalamurugan.k Kalibalamurugan
 
Transactions (Distributed computing)
Transactions (Distributed computing)Transactions (Distributed computing)
Transactions (Distributed computing)Sri Prasanna
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationWayne Jones Jnr
 
Tmedia 1+1 Solution Technical Training from TelcoBridges
Tmedia 1+1 Solution Technical Training from TelcoBridgesTmedia 1+1 Solution Technical Training from TelcoBridges
Tmedia 1+1 Solution Technical Training from TelcoBridgesAdminatTelcoBridges
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfbadboy624277
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
 
Dartabase Transaction.pptx
Dartabase Transaction.pptxDartabase Transaction.pptx
Dartabase Transaction.pptxBibus Poudel
 

Semelhante a Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - Building End-to-End Exactly-Once Applications with Flink (20)

Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordination
 
Transaction Processing Monitors (TPM)
Transaction Processing Monitors (TPM)Transaction Processing Monitors (TPM)
Transaction Processing Monitors (TPM)
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
 
OSCh17
OSCh17OSCh17
OSCh17
 
OS_Ch17
OS_Ch17OS_Ch17
OS_Ch17
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
 
운영체제론 Ch16
운영체제론 Ch16운영체제론 Ch16
운영체제론 Ch16
 
Bab8 transaction
Bab8 transactionBab8 transaction
Bab8 transaction
 
Lec13s transaction
Lec13s transactionLec13s transaction
Lec13s transaction
 
Transactions (Distributed computing)
Transactions (Distributed computing)Transactions (Distributed computing)
Transactions (Distributed computing)
 
Lec08
Lec08Lec08
Lec08
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed Coordination
 
Tmedia 1+1 Solution Technical Training from TelcoBridges
Tmedia 1+1 Solution Technical Training from TelcoBridgesTmedia 1+1 Solution Technical Training from TelcoBridges
Tmedia 1+1 Solution Technical Training from TelcoBridges
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Distributed Transaction
Distributed TransactionDistributed Transaction
Distributed Transaction
 
Dartabase Transaction.pptx
Dartabase Transaction.pptxDartabase Transaction.pptx
Dartabase Transaction.pptx
 

Mais de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionFlink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 

Mais de Flink Forward (17)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 

Último

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 

Último (17)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 

Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - Building End-to-End Exactly-Once Applications with Flink

  • 1. Piotr Nowojski piotr@data-artisans.com @PiotrNowojski “Hit me, baby, just one time” Building End-to-End Exactly-Once Applications with Flink
  • 3. No guarantees and failures 3
  • 6. How to achieve Exactly-Once 6
  • 7. Exactly-Once on a single node ▪ Simple transactions ▪ Write data in transaction ▪ Include read offsets in transaction ▪ On success - commit transaction ▪ On failure - abort transaction and restart from last committed read offset 7
  • 8. Exactly-Once on a single node ▪ What about application state? ▪ For example running average ▪ Include in the transaction, by adding as a separate file/table just before commit 8
  • 9. Exactly-Once in a distributed system ▪ Persistent communication channels 9
  • 10. Exactly-Once in a distributed system ▪ Persistent communication channels ▪ Allow to re-process messages from last committed read offsets ▪ High costs of communication ▪ Processes operate independent of each other 10
  • 12. Exactly-Once two phase commit ▪ Can we do better? Yes! Two phase commit protocol 12 Input data Process 1 Process 2 Output data
  • 13. Exactly-Once two phase commit 13 Input data Process 1 Process 2 Output data State backend Job manager
  • 14. Exactly-Once two phase commit 14 Input data Process 1 Process 2 Output data State backend Job manager Start external transaction
  • 15. Exactly-Once two phase commit 15 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Pre-commit (checkpoint starts)
  • 16. Exactly-Once two phase commit 16 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Pre-commit
  • 17. Exactly-Once two phase commit 17 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Snapshot state (3) Pre-commit external transaction (3) Pre-commit
  • 18. Exactly-Once two phase commit ▪ Without side effects (only internal state) •Example: calculating running average •State managed by Flink ▪ With side effects •Operator must manage external state on it’s own 18
  • 19. Exactly-Once two phase commit 19 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Snapshot state (3) Pre-commit external transaction (3) Pre-commit
  • 20. Exactly-Once two phase commit 20 Input data Process 1 Process 2 Output data State backend Job manager Notify checkpoint completed (1) Commit
  • 21. Exactly-Once two phase commit 21 Input data Process 1 Process 2 Output data State backend Job manager Notify checkpoint completed (1) Commit external transaction (2) Commit
  • 22. Exactly-Once two phase commit ▪ Once all processes successfully complete pre- commit, commit is issued ▪ If at least one pre-commit fails others are aborted ▪ After a successful pre-commit, commit MUST be guaranteed to eventually succeed ▪ This guarantees that all processes agree on the final outcome 22
  • 23. Two phase commit example implementation ▪ Begin transaction - create a temporary file ▪ Write element - write data to that file ▪ Pre-commit - flush the file ▪ And begin new transaction for subsequent writes ▪ Commit - move the file to a target directory ▪ Can increase latency 23
  • 24. Exactly-Once two phase commit ▪ In case of system crash, we can restore state of the application to the latest snapshot ▪ We could/should abort previous “pending” transactions •Delete temporary files 24
  • 25. TwoPhaseCommitSinkFunction ▪ Flink’s snapshots/checkpoints are very similar to two phase commit ▪ There is a TwoPhaseCommitSinkFunction which maps checkpoint calls to beginTransaction, preCommit, commit and abort calls 25
  • 26. Exactly-Once producers in Flink ▪ BucketingSink ▪ Pravega connector ▪ Kafka 0.11 connector ▪ Kafka 0.11 introduced transactions ▪ Implemented on top of the TwoPhaseCommitSinkFunction ▪ Low overhead 26
  • 27. Summarizon ▪ Many ways to achieve Exactly-Once ▪ Flink checkpointing is similar to the two phase commit protocol ▪ No materialization of the data in transit ▪ Lower latency ▪ Higher throughput 27