SlideShare uma empresa Scribd logo
1 de 116
Baixar para ler offline
ETL and Event Sourcing
Integration Architecture: Best Practice and Case Study
Marc Siegel - Panorama Education - Wed Feb 6 2019
ETL pipelines from external systems
ETL and Event Sourcing
Prerequisite knowledge
Familiarity with traditional ETL architectures:
Software systems that Extract data from external systems,
Transform them, and Load the resulting data sets into internal
systems, most often relational databases
Dissatisfaction with traditional ETL architectures / curiosity to
learn about and consider an alternative architecture
ETL and Event Sourcing
What you’ll learn
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
What is ETL?
ETL
In a nutshell
ETL
In a nutshell
External
System
ETL
Traditional ETL Process
Extract
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
Q: What is the System of Record?
What is the Source of Truth?
ETL
In a nutshell
External
System
System of Record
The authoritative data source for a given
data element or piece of information (1)
ETL
Internal
Database
In a nutshell
Source of Truth
A trusted data source that gives a complete
picture of the data object as a whole (2)
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL Challenges
Operational
Domain Modelling
Selective Attention
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Missing Interests:
● Decoupling
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Running ETL job can overwrite history
Missing Interests:
● Decoupling
● Determinism
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Missing Interests:
● Decoupling (of each interpretation)
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Tend toward lowest common denominator
OR superset of all external model features
Missing Interests:
● Decoupling (of each interpretation)
● Modeling State Explicitly
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
From Psychology: the act of focusing on a
particular object while ignoring irrelevant
information
→ Can’t re-interpret past extracts
Missing Interests:
● Past as First Class
ETL Problems
Awareness Tests
YouTube:
● Basketball
● Monkey
Business
How many passes did the team in white make?
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Advantage
Not just problems. Positive trade-offs of ETL?
● Low Costs: Training, framing, explaining
○ Training: Low cost to train new engineers in ETL concepts
○ Framing: No requirement for explicit domain modeling
○ Explaining: Intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
What is ELT?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
Traditional ETL Process
Extract Transform Load
Internal
Database
External
System
ETL and ELT
EL Process
Extract
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
What is Event Sourcing?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Event Sourcing Challenge
Not just advantages. Negative trade-offs of ES?
● High Costs: Training, framing, explaining
○ Training: Higher cost to train new engineers in ES concepts
○ Framing: Requirement for (lots of) explicit domain modeling
○ Explaining: Not necessarily intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
How does Event Sourcing work?
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event: something that happened in the past; a fact; a state
transition.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc B+
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc C
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc A-
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Event Sourcing Basics
Review
Event: something that happened in the past; a fact; a state
transition.
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Applying Event Sourcing to ETL
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
A: Build an Observational Event Sourced system
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Observational: an Event Sourced system where the event history is
of captured observations, and all state is derived from them.
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
TeTL Process(es)
Domain
Events
Tr
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Case study: Event Sourcing ETL
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
observation events domain events
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
InProgressGrades
domain events
read models
Case study: Event Sourcing ETL
queried
InProgressGrades
read models
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Determinism
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
● On-demand Read Model Comparison tool
○ Ensure no Read Model changes across larger code refactors
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Clone Read Model Clone Read Model Again
batch_BEFORE batch_AFTER
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Trade-off: Investment in Training
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
● Set expectation that architecture will feel different
Lessons Learned
At the two year mark
● Lessons learned: Thinnest extractions possible
● Lessons learned: Extracted files as Source of Truth
● Lessons learned: Many iterations on transformations
● Lessons learned: Why TL must be fast and run often
Lessons Learned
At the two year mark
Lessons learned: Thinnest extractions possible
My first version of converting [one type of] XML to CSV was
silently dropping rows, and would have lost all that data if not
for the ability to replace from original extract.
Lessons Learned
At the two year mark
Lessons learned: Extracted files as Source of Truth
Real world example of changing incorrect foreign key reference
(which had been nearly all overlapping previously).
Lessons Learned
At the two year mark
Lessons learned: Many iterations on interpretations
Very natural to handle the changes, big and small, that appear in
the format and content of the data we have extracted. Also, new
features sometimes mean new or changed interpretations.
Lessons Learned
At the two year mark
Lessons learned: Why TL must be fast and run often
Consider the “nightly restores from backups” to prove that you
can actually restore from backups. This practice exists in our
application rather than our tools. If regeneration ever gets too
slow to complete overnight, we could lose this.
Summary and Review
What we covered
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
Learn More
Resources
● DDD, CQRS, and Event Sourcing videos by Greg Young
● CQRS documentation site by Edument AB
● Domain Driven Design book by Eric Evans
Keep in touch!
● twitter: @ms_ati
● email: msiegel@panoramaed.com

Mais conteúdo relacionado

Mais procurados

Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesCitiusTech
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Big Data Security in Apache Projects by Gidon Gershinsky
Big Data Security in Apache Projects by Gidon GershinskyBig Data Security in Apache Projects by Gidon Gershinsky
Big Data Security in Apache Projects by Gidon GershinskyGidonGershinsky
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven ArchitectureStefan Norberg
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais procurados (20)

Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Big Data Security in Apache Projects by Gidon Gershinsky
Big Data Security in Apache Projects by Gidon GershinskyBig Data Security in Apache Projects by Gidon Gershinsky
Big Data Security in Apache Projects by Gidon Gershinsky
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Semelhante a ETL and Event Sourcing

Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersProven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersInterview Mocha
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contentsManoj Jagtap
 
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)Johannes Hoppe
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxWhat is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxtodd471
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentationaskankit
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongMassimo Cenci
 
etl testing training in hyderabad.......
etl testing training in hyderabad.......etl testing training in hyderabad.......
etl testing training in hyderabad.......sowmyavibhin
 
Etl testing training institute in hyderabad
Etl testing training institute  in hyderabadEtl testing training institute  in hyderabad
Etl testing training institute in hyderabadswathi3zen
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration processRashidRiaz18
 
Cs511 data-extraction
Cs511 data-extractionCs511 data-extraction
Cs511 data-extractionBorseshweta
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdfabhaybansal43
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxcamyla81
 

Semelhante a ETL and Event Sourcing (20)

Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersProven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
 
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Entity Framework 4
Entity Framework 4Entity Framework 4
Entity Framework 4
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxWhat is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
etl testing training in hyderabad.......
etl testing training in hyderabad.......etl testing training in hyderabad.......
etl testing training in hyderabad.......
 
Etl testing training institute in hyderabad
Etl testing training institute  in hyderabadEtl testing training institute  in hyderabad
Etl testing training institute in hyderabad
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process
 
LPR - Week 1
LPR - Week 1LPR - Week 1
LPR - Week 1
 
Cs511 data-extraction
Cs511 data-extractionCs511 data-extraction
Cs511 data-extraction
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 

Último

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Último (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

ETL and Event Sourcing

  • 1. ETL and Event Sourcing Integration Architecture: Best Practice and Case Study Marc Siegel - Panorama Education - Wed Feb 6 2019
  • 2. ETL pipelines from external systems
  • 3. ETL and Event Sourcing Prerequisite knowledge Familiarity with traditional ETL architectures: Software systems that Extract data from external systems, Transform them, and Load the resulting data sets into internal systems, most often relational databases Dissatisfaction with traditional ETL architectures / curiosity to learn about and consider an alternative architecture
  • 4. ETL and Event Sourcing What you’ll learn How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 8. ETL Traditional ETL Process Extract In a nutshell External System
  • 9. ETL Traditional ETL Process Extract Transform In a nutshell External System
  • 10. ETL Traditional ETL Process Extract Transform Load In a nutshell External System
  • 11. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 12. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System Q: What is the System of Record? What is the Source of Truth?
  • 13. ETL In a nutshell External System System of Record The authoritative data source for a given data element or piece of information (1)
  • 14. ETL Internal Database In a nutshell Source of Truth A trusted data source that gives a complete picture of the data object as a whole (2)
  • 15. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 16.
  • 18. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Missing Interests: ● Decoupling
  • 19. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Running ETL job can overwrite history Missing Interests: ● Decoupling ● Determinism
  • 20. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 21. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Missing Interests: ● Decoupling (of each interpretation)
  • 22. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Tend toward lowest common denominator OR superset of all external model features Missing Interests: ● Decoupling (of each interpretation) ● Modeling State Explicitly
  • 23. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 24. ETL Challenges Operational Domain Modelling Selective Attention From Psychology: the act of focusing on a particular object while ignoring irrelevant information → Can’t re-interpret past extracts Missing Interests: ● Past as First Class
  • 25. ETL Problems Awareness Tests YouTube: ● Basketball ● Monkey Business How many passes did the team in white make?
  • 26. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 27. ETL Advantage Not just problems. Positive trade-offs of ETL? ● Low Costs: Training, framing, explaining ○ Training: Low cost to train new engineers in ETL concepts ○ Framing: No requirement for explicit domain modeling ○ Explaining: Intuitive to explain to non-engineers
  • 28. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 29.
  • 31. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 32. ETL and ELT Traditional ETL Process Extract Transform Load Internal Database External System
  • 33. ETL and ELT EL Process Extract Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 34. ETL and ELT EL Process Extract Data Lake or Blob or File Store Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 35. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 36. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 37. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 38. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 39. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 40. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 41. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 42. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 43. What is Event Sourcing?
  • 44. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 45. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 46. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System
  • 47. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store
  • 48. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process
  • 49. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr
  • 50. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr Tr Lo
  • 51. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model TeTL Process Domain Events Tr Tr Lo
  • 52. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 53. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 54. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 55. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 56. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 57. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 58. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 59. Event Sourcing Challenge Not just advantages. Negative trade-offs of ES? ● High Costs: Training, framing, explaining ○ Training: Higher cost to train new engineers in ES concepts ○ Framing: Requirement for (lots of) explicit domain modeling ○ Explaining: Not necessarily intuitive to explain to non-engineers
  • 60. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 61.
  • 62. How does Event Sourcing work?
  • 63. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 64. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain.
  • 65. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts.
  • 66. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts. Event: something that happened in the past; a fact; a state transition.
  • 67. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 68. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc B+
  • 69. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc C
  • 70. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc A-
  • 71. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS.
  • 72. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads).
  • 73. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads). Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 74. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A-
  • 75. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors.
  • 76. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1
  • 77. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1 Projection: a function through which we apply events in sequence to deterministically derive the state of our application
  • 78. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 79. Event Sourcing Basics Review Event: something that happened in the past; a fact; a state transition. Projection: a function through which we apply events in sequence to deterministically derive the state of our application Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 80. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 82. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events?
  • 83. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 84. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? A: Build an Observational Event Sourced system Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 85. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 86. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture.
  • 87. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection.
  • 88. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection. Observational: an Event Sourced system where the event history is of captured observations, and all state is derived from them.
  • 89. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 90. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store
  • 91. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store TeTL Process(es) Domain Events Tr
  • 92. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 93. Case study: Event Sourcing ETL
  • 94. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection observation events domain events
  • 95. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection InProgressGrades domain events read models
  • 96. Case study: Event Sourcing ETL queried InProgressGrades read models
  • 97. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 98. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 99. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 100. Case study: Event Sourcing ETL Determinism
  • 101. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models
  • 102. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models ● On-demand Read Model Comparison tool ○ Ensure no Read Model changes across larger code refactors
  • 103. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run Clone Read Model Clone Read Model Again batch_BEFORE batch_AFTER
  • 104. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 105. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 106. Case study: Event Sourcing ETL Trade-off: Investment in Training
  • 107. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
  • 108. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks)
  • 109. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks) ● Set expectation that architecture will feel different
  • 110. Lessons Learned At the two year mark ● Lessons learned: Thinnest extractions possible ● Lessons learned: Extracted files as Source of Truth ● Lessons learned: Many iterations on transformations ● Lessons learned: Why TL must be fast and run often
  • 111. Lessons Learned At the two year mark Lessons learned: Thinnest extractions possible My first version of converting [one type of] XML to CSV was silently dropping rows, and would have lost all that data if not for the ability to replace from original extract.
  • 112. Lessons Learned At the two year mark Lessons learned: Extracted files as Source of Truth Real world example of changing incorrect foreign key reference (which had been nearly all overlapping previously).
  • 113. Lessons Learned At the two year mark Lessons learned: Many iterations on interpretations Very natural to handle the changes, big and small, that appear in the format and content of the data we have extracted. Also, new features sometimes mean new or changed interpretations.
  • 114. Lessons Learned At the two year mark Lessons learned: Why TL must be fast and run often Consider the “nightly restores from backups” to prove that you can actually restore from backups. This practice exists in our application rather than our tools. If regeneration ever gets too slow to complete overnight, we could lose this.
  • 115. Summary and Review What we covered How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 116. Learn More Resources ● DDD, CQRS, and Event Sourcing videos by Greg Young ● CQRS documentation site by Edument AB ● Domain Driven Design book by Eric Evans Keep in touch! ● twitter: @ms_ati ● email: msiegel@panoramaed.com