SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Delta Change Data
Feed
Joseph Torres
Rahul Mahadev
Itai Weiss
Agenda
▪ Data capture challenges
▪ Change Data Feed on the
Lakehouse
▪ Capture Changes
▪ Process Changes
▪ Demo
Who are we? - Jose Torres
● Software Engineer - Databricks
● Committer - Delta Lake, Apache
Spark
● Database and cooking enthusiast
Who are we? - Rahul Mahadev
● Software Engineer– Databricks
● Delta Lake Committer
● MS Computer Science, University of
Illinois Urbana-Champaign
Who are we? - Itai Weiss
• Lead Solution Architect– Databricks
• Working with Apache Spark! since
v1.6
• Consulted for numerous firms across
Financial, Insurance, Tech, Pharma,
Manufacturing, Retail and
Transportation
Delta Lake
An open, reliable, performant,
and secure data storage and
management layer for your
data
■ Fresh & reliable data with a single source of truth
■ Data warehouse performance with data lake
economics
■ Advanced security and standards to meet
compliance needs
■ Open format
Current Challenges
Delta table v-1
Spark Insert
Spark Updates
Delta table v
Spark Delete
Spark Union
■ Big data increases the complexity of
• Lots of data
• Changes infrequently
• Just want to process the newest changes
Full Outer Join
Inserts
Updates
Deletes
■ Read only the changed data
■ Avoid full table scans
■ Reduce compute and memory
Solve data challenges with CDF
Improve ETL
pipelines
Unify batch and
streaming
BI on your
data lake
Meet regulatory
needs
For All your use cases
Delta table with CDF
Spark Union
Delta Change Data Feed
Where Delta Change Data Feed Applies
Raw Ingestion
and History
BRONZE
Filtered, Cleaned,
Augmented
SILVER
Business-level
Aggregates
GOLD
CD
F
External feeds,
Other CDC output,
Extracts
How Does Delta Change Data Feed Work?
PK B
A1 B1
A2 B2
A3 B3
Original Table (v1)
PK B
A2 Z2
A3 B3
A4 B4
Change data
(Merged as v2)
PK B
Change
Type
Time
Versio
n
A2 B2 Preimage
12:00:0
0
2
A2 Z2 Postimage
12:00:0
0
2
A3 B3 Delete
12:00:0
0
2
A4 B4 Insert
12:00:0
0
2
Change Data Feed Output
A1 record did not receive an update or delete.
So it will not be output by CDF.
Consuming the Delta Change Data Feed
Stream-based Consumption
● Delta Change Feed is processed as
each source commit completes
● Rapid source commits can result
in multiple Delta versions being
included in a single micro-batch
Batch Consumption
● Batches are constructed based in
time-bound windows which may
contain multiple Delta versions
A2 B2 Preimage 12:00:00 2
A2 Z2 Postimage 12:00:00 2
A3 B3 Delete 12:00:00 2
A4 B4 Insert 12:00:00 2
A5 B5 Insert
12:08:0
0
3 A6 B6 Insert 12:09:00 4
A6 B6 Preimage 12:10:05 5
A6 Z6 Postimage 12:10:05 5
A5 B5 Insert 12:08:00 3
A6 B6 Insert 12:09:00 4
12:00
A6 B6 Preimage 12:10:05 5
A6 Z6
Postimag
e
12:10:05 5
12:10:00 12:20
12:08 12:10:10
Stream - micro batches
Batch - every 10 mins
12:00
Storing the Delta Change Data Feed
Changes to the table
are stored as ordered,
atomic units called
commits
Add 1.parquet
Add 2.parquet
Remove 1.parquet
Remove 2.parquet
Add 3.parquet
000000.json
000001.json
…
Storing the Delta Change Data Feed
● When Change Data
Feed is enabled,
commits will reference
an additional set of
files containing the
change data events
● These files contains
the updated and
deleted records
Add 1.parquet
Add 2.parquet
Remove 1.parquet
Remove 2.parquet
Add 3.parquet
Cdf 1.parquet
000000.json
000001.json
…
Storing the Delta Change Data Feed
● Silver & Gold Tables
○ Improve Delta performance by processing only data changes, and
simplify ETL/ELT operations
● Materialized Views
○ Create up-to-date, aggregated views of information based on the data
changes
● Transmit Changes
○ Send only the data changes to downstream systems
● Audit Trail Table
○ Capture and store all the data changes over time, including inserts,
updates, and deletes
When to Use Delta Change Data Feed
● Delta changes include updates and/or deletes ● Delta changes are append only
● Small fraction of records updated in each
batch
● Most records in the table updated in each
batch
● Data received from external sources is in CDC
format
● Data received comprises destructive loads
● Send data changes to downstream application
● Find and ingest data outside of the Lakehouse
Getting started with Delta Change Data Feed
spark.conf.set('spark.databri
cks.delta.properties.defaults
.enableChangeDataFeed', True)
ALTER TABLE ...
SET TBLPROPERTIES
(delta.enableChangeDataFeed =
true);
Enable CDF on TABLE… … or Establish CDF on CLUSTER
in SQL, Python, or Scala in Python, or Scala
Using Delta Change Data Feed
INSERT INTO TABLE ...
USING delta ...
as
SELECT … FROM
table_changes(...)
SELECT … FROM
table_changes('tableName',
startingVersion
[,endingVersion])
or
SELECT … FROM
table_changes('tableName',
'startingTimestamp'
[,'endingTimestamp'])
Query changes… … and store them
in SQL, Python, or Scala
Getting started with Delta Change Data Feed
Let’s look at some notebooks
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Change Data Feed in Delta

Mais conteúdo relacionado

Mais procurados

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 

Mais procurados (20)

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 

Semelhante a Change Data Feed in Delta

Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...StreamNative
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
SQLServerDays2012_SSIS_CDC
SQLServerDays2012_SSIS_CDCSQLServerDays2012_SSIS_CDC
SQLServerDays2012_SSIS_CDCKoenVerbeeck
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01jade_22
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022HostedbyConfluent
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Flink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL EngineFlink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL EngineHostedbyConfluent
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartoriwalk2talk srl
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Sap bw lo extraction
Sap bw lo extractionSap bw lo extraction
Sap bw lo extractionObaid shaikh
 
Checklist for Upgrades and Migrations
Checklist for Upgrades and MigrationsChecklist for Upgrades and Migrations
Checklist for Upgrades and MigrationsMarkus Flechtner
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeDatabricks
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 

Semelhante a Change Data Feed in Delta (20)

Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
SQLServerDays2012_SSIS_CDC
SQLServerDays2012_SSIS_CDCSQLServerDays2012_SSIS_CDC
SQLServerDays2012_SSIS_CDC
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Flink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL EngineFlink SQL: The Challenges to Build a Streaming SQL Engine
Flink SQL: The Challenges to Build a Streaming SQL Engine
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Sap bw lo extraction
Sap bw lo extractionSap bw lo extraction
Sap bw lo extraction
 
Checklist for Upgrades and Migrations
Checklist for Upgrades and MigrationsChecklist for Upgrades and Migrations
Checklist for Upgrades and Migrations
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Último (20)

Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Change Data Feed in Delta

  • 1. Delta Change Data Feed Joseph Torres Rahul Mahadev Itai Weiss
  • 2. Agenda ▪ Data capture challenges ▪ Change Data Feed on the Lakehouse ▪ Capture Changes ▪ Process Changes ▪ Demo
  • 3. Who are we? - Jose Torres ● Software Engineer - Databricks ● Committer - Delta Lake, Apache Spark ● Database and cooking enthusiast
  • 4. Who are we? - Rahul Mahadev ● Software Engineer– Databricks ● Delta Lake Committer ● MS Computer Science, University of Illinois Urbana-Champaign
  • 5. Who are we? - Itai Weiss • Lead Solution Architect– Databricks • Working with Apache Spark! since v1.6 • Consulted for numerous firms across Financial, Insurance, Tech, Pharma, Manufacturing, Retail and Transportation
  • 6. Delta Lake An open, reliable, performant, and secure data storage and management layer for your data ■ Fresh & reliable data with a single source of truth ■ Data warehouse performance with data lake economics ■ Advanced security and standards to meet compliance needs ■ Open format
  • 7. Current Challenges Delta table v-1 Spark Insert Spark Updates Delta table v Spark Delete Spark Union ■ Big data increases the complexity of • Lots of data • Changes infrequently • Just want to process the newest changes Full Outer Join Inserts Updates Deletes
  • 8. ■ Read only the changed data ■ Avoid full table scans ■ Reduce compute and memory Solve data challenges with CDF Improve ETL pipelines Unify batch and streaming BI on your data lake Meet regulatory needs For All your use cases Delta table with CDF Spark Union Delta Change Data Feed
  • 9. Where Delta Change Data Feed Applies Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business-level Aggregates GOLD CD F External feeds, Other CDC output, Extracts
  • 10. How Does Delta Change Data Feed Work? PK B A1 B1 A2 B2 A3 B3 Original Table (v1) PK B A2 Z2 A3 B3 A4 B4 Change data (Merged as v2) PK B Change Type Time Versio n A2 B2 Preimage 12:00:0 0 2 A2 Z2 Postimage 12:00:0 0 2 A3 B3 Delete 12:00:0 0 2 A4 B4 Insert 12:00:0 0 2 Change Data Feed Output A1 record did not receive an update or delete. So it will not be output by CDF.
  • 11. Consuming the Delta Change Data Feed Stream-based Consumption ● Delta Change Feed is processed as each source commit completes ● Rapid source commits can result in multiple Delta versions being included in a single micro-batch Batch Consumption ● Batches are constructed based in time-bound windows which may contain multiple Delta versions A2 B2 Preimage 12:00:00 2 A2 Z2 Postimage 12:00:00 2 A3 B3 Delete 12:00:00 2 A4 B4 Insert 12:00:00 2 A5 B5 Insert 12:08:0 0 3 A6 B6 Insert 12:09:00 4 A6 B6 Preimage 12:10:05 5 A6 Z6 Postimage 12:10:05 5 A5 B5 Insert 12:08:00 3 A6 B6 Insert 12:09:00 4 12:00 A6 B6 Preimage 12:10:05 5 A6 Z6 Postimag e 12:10:05 5 12:10:00 12:20 12:08 12:10:10 Stream - micro batches Batch - every 10 mins 12:00
  • 12. Storing the Delta Change Data Feed Changes to the table are stored as ordered, atomic units called commits Add 1.parquet Add 2.parquet Remove 1.parquet Remove 2.parquet Add 3.parquet 000000.json 000001.json …
  • 13. Storing the Delta Change Data Feed ● When Change Data Feed is enabled, commits will reference an additional set of files containing the change data events ● These files contains the updated and deleted records Add 1.parquet Add 2.parquet Remove 1.parquet Remove 2.parquet Add 3.parquet Cdf 1.parquet 000000.json 000001.json …
  • 14. Storing the Delta Change Data Feed ● Silver & Gold Tables ○ Improve Delta performance by processing only data changes, and simplify ETL/ELT operations ● Materialized Views ○ Create up-to-date, aggregated views of information based on the data changes ● Transmit Changes ○ Send only the data changes to downstream systems ● Audit Trail Table ○ Capture and store all the data changes over time, including inserts, updates, and deletes
  • 15. When to Use Delta Change Data Feed ● Delta changes include updates and/or deletes ● Delta changes are append only ● Small fraction of records updated in each batch ● Most records in the table updated in each batch ● Data received from external sources is in CDC format ● Data received comprises destructive loads ● Send data changes to downstream application ● Find and ingest data outside of the Lakehouse
  • 16. Getting started with Delta Change Data Feed spark.conf.set('spark.databri cks.delta.properties.defaults .enableChangeDataFeed', True) ALTER TABLE ... SET TBLPROPERTIES (delta.enableChangeDataFeed = true); Enable CDF on TABLE… … or Establish CDF on CLUSTER in SQL, Python, or Scala in Python, or Scala
  • 17. Using Delta Change Data Feed INSERT INTO TABLE ... USING delta ... as SELECT … FROM table_changes(...) SELECT … FROM table_changes('tableName', startingVersion [,endingVersion]) or SELECT … FROM table_changes('tableName', 'startingTimestamp' [,'endingTimestamp']) Query changes… … and store them in SQL, Python, or Scala
  • 18. Getting started with Delta Change Data Feed
  • 19. Let’s look at some notebooks
  • 20. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.