SlideShare uma empresa Scribd logo
1 de 25
CI/CD for a Data Platform
How to enable consistent data pipelines
2
Your Host
| Koen Rottiers
| Senior Consultant @ Codit
| 9 years in IT, track record in networking
and infrastructure
| Combining people, business and
technology
CI/CD for a Data Platform: How to enable consistent data pipelines
@KoenRottiers
Agenda
| A Data Platform?
| What is Azure Data Factory?
| The Data Lake architecture
| Why do CI/CD for a Data Platform?
| Azure Data Factory Git integration
3
A Data Platform?
4
Data Platform overview
5
| Ingestion different sources
| Centralized data store
| Data flows through
| Output curated data
| Multiple inputs and
outputs
What is Azure Data Factory?
6
Azure Data Factory
7
| Orchestrator
| Connectors to different data sources
| Cloud and on-premises
| Data Mapping flows
| Data Wrangling flows
| External compute integration
| DataBricks
| AzureML
| Azure Functions
| ....
Place in the data platform
8
The Data Lake architecture
9
High-Level Architecture
10
On-Premises
Other Azure Resources
Azure DevOps Project
for DataLake infra and
code
DB
DB
File Server
ExpressRoute
vNet Integrated
External Connections/
Sources/Destinations
Transformation
Rg-bru-{env}-datalake-001
App-bru-{env}-
{action}-datalake-001
la-bru-{env}-{action}-
datalake-001
Kb-bru-{env}-datalake-001
Stabru{env}landingdatalake001
mi-bru-{env}-datalake-001
Stabru{env}rawdatalake001
Stabru{env}curateddatalake001
Stabru{env}outputdatalake001
df-bru-{env}-datalake-001
Self-Hosted integration runtimes
11
On-Premises
Azure Networks
DB
File Server
ExpressRoute
vNet Peering
df-bru-{env}-datalake-001
Hub Network
Self-Hosted Runtime
Azure Integration Runtime
DB
Why do CI/CD for a Data Platform?
12
Data Platform Roles and Responsibilities
13
- Data platform owner: This person is the owner and responsible of the overall data platform.
- Data platform operator: This role is responsible for the day to day operational tasks of the platform
- Data pipeline owner: Different pipelines will be running on the platform. Each pipeline will have its own
purpose and so it’s specific owner. This is someone from the BI Team or business.
- Data pipeline developer: This person will be developing new pipelines or making adjustment to
existing ones.
- Data source owner: Different data sources will be integrated with the data platform. Every data source
will need to have an owner to determine access rights, access manner,... This person will be responsible
for the data residing in the source system. Most of the time this will be the application owner of the
application that uses the data source.
Key Advantages
14
| Consistent deployment of data pipelines
| Full testing of data flows in the Data Lake
| Better collaboration
| Feature development tracking
| Pipeline quality reviews
| More fine-grained data security
| Tracking data movements
Azure Data Factory Git integration
15
Data Factory Git Integration
16
Repo’s and branches
17
What does it look like?
18
Azure DevOps – Infra Git Repository
19
Azure DevOps – Pipelines Git Repository
20
Azure DevOps – Pipelines
21
Azure Data Factory – Git Integration
22
So why?
23
| Let data engineer/data scientists focus on delivering value and insights to the
business
| Enable an agile process in data engineering
| Consistency across environments
| Track feature development / Bug fixing
| Be able to audit your data streams
Do you want a demo?
Feel free to reach out to us.
24
Q&A
25

Mais conteúdo relacionado

Mais procurados

Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
IBM Analytics
 

Mais procurados (20)

Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the Clouds
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Living on the (IoT) edge (Sam Vanhoutte @TechdaysNL 2017)
Living on the (IoT) edge (Sam Vanhoutte @TechdaysNL 2017)Living on the (IoT) edge (Sam Vanhoutte @TechdaysNL 2017)
Living on the (IoT) edge (Sam Vanhoutte @TechdaysNL 2017)
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache Spark
 
Lessons learned when integrating with Dynamics 365
Lessons learned when integrating with Dynamics 365Lessons learned when integrating with Dynamics 365
Lessons learned when integrating with Dynamics 365
 
Building a reliable and cost effect logging system at Box
Building a reliable and cost effect logging system at Box Building a reliable and cost effect logging system at Box
Building a reliable and cost effect logging system at Box
 
Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...
Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...
Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
IoTforReal Seminar slidedeck
IoTforReal Seminar slidedeckIoTforReal Seminar slidedeck
IoTforReal Seminar slidedeck
 
Accelerating Digital Transformation with App Modernization
Accelerating Digital Transformation with App ModernizationAccelerating Digital Transformation with App Modernization
Accelerating Digital Transformation with App Modernization
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...Delivering digital transformation and business impact with io t, machine lear...
Delivering digital transformation and business impact with io t, machine lear...
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
 
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale
Elastic @ Adobe: Making Search Smarter with Machine Learning at ScaleElastic @ Adobe: Making Search Smarter with Machine Learning at Scale
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale
 
Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
Take Your Business to the Next Level with Blockchain - Codit Webinar
Take Your Business to the Next Level with Blockchain - Codit WebinarTake Your Business to the Next Level with Blockchain - Codit Webinar
Take Your Business to the Next Level with Blockchain - Codit Webinar
 

Semelhante a CI/CD for a Data Platform

Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 

Semelhante a CI/CD for a Data Platform (20)

About CDAP
About CDAPAbout CDAP
About CDAP
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
5. iED Cloud Services.pdf
5. iED Cloud Services.pdf5. iED Cloud Services.pdf
5. iED Cloud Services.pdf
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Using standards, open-source and advances in technology to bring down soft co...
Using standards, open-source and advances in technology to bring down soft co...Using standards, open-source and advances in technology to bring down soft co...
Using standards, open-source and advances in technology to bring down soft co...
 
Anzo Smart Data Integration
Anzo Smart Data IntegrationAnzo Smart Data Integration
Anzo Smart Data Integration
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 

Mais de Codit

Mais de Codit (20)

Cloud Native Demystified: Build Once, Run Anywhere!
Cloud Native Demystified: Build Once, Run Anywhere!Cloud Native Demystified: Build Once, Run Anywhere!
Cloud Native Demystified: Build Once, Run Anywhere!
 
What's Next for Microsoft's BizTalk Server
What's Next for Microsoft's BizTalk ServerWhat's Next for Microsoft's BizTalk Server
What's Next for Microsoft's BizTalk Server
 
AI-Driven Fraud Detection
AI-Driven Fraud DetectionAI-Driven Fraud Detection
AI-Driven Fraud Detection
 
Exploring IoT Edge
Exploring IoT EdgeExploring IoT Edge
Exploring IoT Edge
 
The Future of Integration | Webinar of the 24th of April 2020
The Future of Integration | Webinar of the 24th of April 2020The Future of Integration | Webinar of the 24th of April 2020
The Future of Integration | Webinar of the 24th of April 2020
 
Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...
Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...
Application Autoscaling Made Easy with Kubernetes Event-Driven Autoscaling (K...
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
 
Five Reasons IoT Projects Fail - CTO Sam Vanhoutte @ IoT Convention 2019
Five Reasons IoT Projects Fail - CTO Sam Vanhoutte @ IoT Convention 2019Five Reasons IoT Projects Fail - CTO Sam Vanhoutte @ IoT Convention 2019
Five Reasons IoT Projects Fail - CTO Sam Vanhoutte @ IoT Convention 2019
 
Unlock a Smarter Business with Digital Identity - Sylvia Vandevelde @CONNECT19
Unlock a Smarter Business with Digital Identity - Sylvia Vandevelde @CONNECT19Unlock a Smarter Business with Digital Identity - Sylvia Vandevelde @CONNECT19
Unlock a Smarter Business with Digital Identity - Sylvia Vandevelde @CONNECT19
 
AI as Driver of Transformation - Didier Ongena @CONNECT19
AI as Driver of Transformation - Didier Ongena @CONNECT19AI as Driver of Transformation - Didier Ongena @CONNECT19
AI as Driver of Transformation - Didier Ongena @CONNECT19
 
Why your business needs an API driven strategy - Massimo Crippa @CONNECT19
Why your business needs an API driven strategy -  Massimo Crippa @CONNECT19Why your business needs an API driven strategy -  Massimo Crippa @CONNECT19
Why your business needs an API driven strategy - Massimo Crippa @CONNECT19
 
Pushing the boundaries with IoT - Glenn Colpaert @CONNECT19
Pushing the boundaries with IoT - Glenn Colpaert @CONNECT19Pushing the boundaries with IoT - Glenn Colpaert @CONNECT19
Pushing the boundaries with IoT - Glenn Colpaert @CONNECT19
 
The Future of Integration - Toon Vanhoutte @CONNECT19
The Future of Integration - Toon Vanhoutte @CONNECT19The Future of Integration - Toon Vanhoutte @CONNECT19
The Future of Integration - Toon Vanhoutte @CONNECT19
 
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
Securing APIs for ultimate security and privacy with Azure | Codit WebinarSecuring APIs for ultimate security and privacy with Azure | Codit Webinar
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
 
How to connect a 30-year-old car to the cloud (Sam Vanhoutte @Techorama 2018)
How to connect a 30-year-old car to the cloud (Sam Vanhoutte @Techorama 2018)How to connect a 30-year-old car to the cloud (Sam Vanhoutte @Techorama 2018)
How to connect a 30-year-old car to the cloud (Sam Vanhoutte @Techorama 2018)
 
Building Modern Platforms on Microsoft Azure by Steef-Jan Wiggers
Building Modern Platforms on Microsoft Azure by Steef-Jan WiggersBuilding Modern Platforms on Microsoft Azure by Steef-Jan Wiggers
Building Modern Platforms on Microsoft Azure by Steef-Jan Wiggers
 
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
 
Introduction to Microsoft IoT Central
Introduction to Microsoft IoT Central Introduction to Microsoft IoT Central
Introduction to Microsoft IoT Central
 
Getting started with Azure Event Grid - Webinar with Steef-Jan Wiggers
Getting started with Azure Event Grid - Webinar with Steef-Jan WiggersGetting started with Azure Event Grid - Webinar with Steef-Jan Wiggers
Getting started with Azure Event Grid - Webinar with Steef-Jan Wiggers
 
Azure IPaaS: Integration Evolved! (Glenn Colpaert @TechdaysNL 2017)
Azure IPaaS: Integration Evolved! (Glenn Colpaert @TechdaysNL 2017)Azure IPaaS: Integration Evolved! (Glenn Colpaert @TechdaysNL 2017)
Azure IPaaS: Integration Evolved! (Glenn Colpaert @TechdaysNL 2017)
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

CI/CD for a Data Platform

  • 1. CI/CD for a Data Platform How to enable consistent data pipelines
  • 2. 2 Your Host | Koen Rottiers | Senior Consultant @ Codit | 9 years in IT, track record in networking and infrastructure | Combining people, business and technology CI/CD for a Data Platform: How to enable consistent data pipelines @KoenRottiers
  • 3. Agenda | A Data Platform? | What is Azure Data Factory? | The Data Lake architecture | Why do CI/CD for a Data Platform? | Azure Data Factory Git integration 3
  • 5. Data Platform overview 5 | Ingestion different sources | Centralized data store | Data flows through | Output curated data | Multiple inputs and outputs
  • 6. What is Azure Data Factory? 6
  • 7. Azure Data Factory 7 | Orchestrator | Connectors to different data sources | Cloud and on-premises | Data Mapping flows | Data Wrangling flows | External compute integration | DataBricks | AzureML | Azure Functions | ....
  • 8. Place in the data platform 8
  • 9. The Data Lake architecture 9
  • 10. High-Level Architecture 10 On-Premises Other Azure Resources Azure DevOps Project for DataLake infra and code DB DB File Server ExpressRoute vNet Integrated External Connections/ Sources/Destinations Transformation Rg-bru-{env}-datalake-001 App-bru-{env}- {action}-datalake-001 la-bru-{env}-{action}- datalake-001 Kb-bru-{env}-datalake-001 Stabru{env}landingdatalake001 mi-bru-{env}-datalake-001 Stabru{env}rawdatalake001 Stabru{env}curateddatalake001 Stabru{env}outputdatalake001 df-bru-{env}-datalake-001
  • 11. Self-Hosted integration runtimes 11 On-Premises Azure Networks DB File Server ExpressRoute vNet Peering df-bru-{env}-datalake-001 Hub Network Self-Hosted Runtime Azure Integration Runtime DB
  • 12. Why do CI/CD for a Data Platform? 12
  • 13. Data Platform Roles and Responsibilities 13 - Data platform owner: This person is the owner and responsible of the overall data platform. - Data platform operator: This role is responsible for the day to day operational tasks of the platform - Data pipeline owner: Different pipelines will be running on the platform. Each pipeline will have its own purpose and so it’s specific owner. This is someone from the BI Team or business. - Data pipeline developer: This person will be developing new pipelines or making adjustment to existing ones. - Data source owner: Different data sources will be integrated with the data platform. Every data source will need to have an owner to determine access rights, access manner,... This person will be responsible for the data residing in the source system. Most of the time this will be the application owner of the application that uses the data source.
  • 14. Key Advantages 14 | Consistent deployment of data pipelines | Full testing of data flows in the Data Lake | Better collaboration | Feature development tracking | Pipeline quality reviews | More fine-grained data security | Tracking data movements
  • 15. Azure Data Factory Git integration 15
  • 16. Data Factory Git Integration 16
  • 18. What does it look like? 18
  • 19. Azure DevOps – Infra Git Repository 19
  • 20. Azure DevOps – Pipelines Git Repository 20
  • 21. Azure DevOps – Pipelines 21
  • 22. Azure Data Factory – Git Integration 22
  • 23. So why? 23 | Let data engineer/data scientists focus on delivering value and insights to the business | Enable an agile process in data engineering | Consistency across environments | Track feature development / Bug fixing | Be able to audit your data streams
  • 24. Do you want a demo? Feel free to reach out to us. 24

Notas do Editor

  1. Recently there is a third option where you can fully integrate your data factory into you vNet