SlideShare a Scribd company logo
1 of 34
Download to read offline
Image © Nicolas Buffler https://flic.kr/p/jpWcWD (CC BY 2.0)
Debezium Snapshots Revisited!
Gunnar Morling
Senior Staff Software Engineer, Decodable
@gunnarmorling
#DebeziumSnapshotting @gunnarmorling
Agenda
#DebeziumSnapshotting @gunnarmorling
● Software engineer at Decodable
● Former project lead of Debezium
● kcctl 🧸, JfrUnit, ModiTect,
MapStruct
● Spec Lead for Bean Validation 2.0
● Java Champion
Gunnar Morling
#DebeziumSnapshotting @gunnarmorling
Recap – Debezium
Log-Based Change Data Capture
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Why Is It Needed?
● Need to backfill data, but don’t
have all TX logs
● Solution: scan data once before
streaming
● Emit READ event for each record
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Classic Approach – General Idea
● Capture current
position in transaction
log
● Scan all relevant tables
● Start streaming
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Key Configuration Options
● snapshot.mode (initial, never, schema_only_recovery)
● snapshot.select.statement.overrides
● snapshot.max.threads
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
● Can’t update filter list
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
● Can’t update filter list
● Can’t pause & resume long-running snapshots
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
● Can’t update filter list
● Can’t pause & resume long-running snapshots
● Can’t stream changes until snapshot completed
#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
● Can’t update filter list
● Can’t pause & resume long-running snapshots
● Can’t stream changes until snapshot completed
● Can’t re-snapshot selected tables
Incremental
Snapshots
© Karen Blaha https://flic.kr/p/aeuPys (CC BY-SA 2.0)
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
The Paper
● “DBLog: A Watermark Based
Change-Data-Capture
Framework”, by Andreas Andreakis
and Ioannis Papapanagiotou
● Key idea: interleave snapshot events
and events from TX log
https://arxiv.org/pdf/2010.12597v1.pdf
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
General Idea
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Windowing via Watermarks
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Buffer Processing
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Buffer Processing
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Semantics
● No guarantee for snapshot (read) events for all records
● May receive update or delete without prior insert/read
● May receive read and update/delete
● What is guaranteed: complete data set after snapshot
Demo
© Wall Boat https://flic.kr/p/Y6zkmX (Public Domain)
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Connector Offsets
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
MySQL Read-Only Snapshots
● Write access to DB may be not desirable
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Signalling Channels
● Database table
● Kafka topic
● JMX
● Custom
id 924e3ff8-2245-43ca-ba77-2af9af02fa07
type log, {execute|pause|resume|stop}-snapshot
value { "data-collections": ["schema1.table1", "schema2.table2"],
"type":"incremental",
"additional-condition":"color=blue" }
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Notifications
#Debezium + #ApacheFlink | @gunnarmorling
Comparison
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
● Can update filter list ✅
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
● Can update filter list ✅
● Long-running snapshots can be paused/resumed ✅
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
● Can update filter list ✅
● Long-running snapshots can be paused/resumed ✅
● Can stream changes before snapshot completed ✅
#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
● Can update filter list ✅
● Long-running snapshots can be paused/resumed ✅
● Can stream changes before snapshot completed ✅
● Can re-snapshot selected tables ✅
#DebeziumSnapshotting @gunnarmorling
● Incremental Snapshots in Debezium
https://debezium.io/blog/2021/10/07/incremental-snapshots/
● Read-only Incremental Snapshots for MySQL
https://debezium.io/blog/2022/04/07/read-only-incremental-snapshots/
● Flink CDC
https://ververica.github.io/flink-cdc-connectors/
Resources
#DebeziumSnapshotting @gunnarmorling
● Debezium & Kafka Connect – Ask the Experts
With Chris Cranford (Red Hat) and Chris Egerton (Aiven)
Sep 27, 2:30 PM
● Change Stream Processing with Debezium and Apache Flink
With Robert Metzger (Decodable)
Sep 27, 5:30 PM, Dremio Office
https://www.meetup.com/sf-big-analytics/events/294068331/
Upcoming
#DebeziumSnapshotting @gunnarmorling
Q & A
gunnar@decodable.co
@gunnarmorling
📧
Thank You!
Debezium Snapshots Revisited!

More Related Content

Similar to Debezium Snapshots Revisited!

pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 

Similar to Debezium Snapshots Revisited! (20)

Help , My Datacenter is on fire
Help , My Datacenter is on fireHelp , My Datacenter is on fire
Help , My Datacenter is on fire
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
The pathway to Chromium on Wayland (Web Engines Hackfest 2018)
 
How to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipelineHow to plan and define your CI-CD pipeline
How to plan and define your CI-CD pipeline
 
Glusterfs geo-replication
Glusterfs geo-replicationGlusterfs geo-replication
Glusterfs geo-replication
 
Backday Xebia : Découvrez Spring Boot sur un cas pratique
Backday Xebia : Découvrez Spring Boot sur un cas pratiqueBackday Xebia : Découvrez Spring Boot sur un cas pratique
Backday Xebia : Découvrez Spring Boot sur un cas pratique
 
Swapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgrSwapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgr
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoPGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
 
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...
 
Geo-Replication and Disaster Recovery : Glusterfs
Geo-Replication and Disaster Recovery : GlusterfsGeo-Replication and Disaster Recovery : Glusterfs
Geo-Replication and Disaster Recovery : Glusterfs
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
 
Design choices of golang for high scalability
Design choices of golang for high scalabilityDesign choices of golang for high scalability
Design choices of golang for high scalability
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer Perspective
 
Unleash sp3d model to enterprise level
Unleash sp3d model to enterprise levelUnleash sp3d model to enterprise level
Unleash sp3d model to enterprise level
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 

Debezium Snapshots Revisited!