SlideShare uma empresa Scribd logo
1 de 12
Apache Airflow
Liangjun Jiang
06/07/2019
Outlines
What’s Apache airflow &
what they can do
Install Apache airflow
Working Process Sample DAG with Azure
What’s Apache Airflow
A platform to programmatically author, scheduler and
monitor workflows, has gained its popularity in the
world of data engineering and machine learning. Its
use cases can be
• Define and monitor Cron jobs
• Automate certain DevOps operation
• Move data periodically
• Machine learning pipeline
Who is using
• Airbnb
• Lyft
• Bloomberg
• Jets.com
• Sam’s Club
• a long list ...
Why I think it is useful for us?
• Need a way to manage data movement in and out
• Reliable and consistent way to schedule and monitor moving data
tasks
• Data Ingestion automation for hubble
• Machine learning pipelines automation
• For the benefit of other business units, or small groups by developing
and deploying as an part of infrastructure
Install Airflow
• 1. `pip install airflow`
• 2. User *unofficial* airflow Docker Image
docker pull puckel/docker-airflow
• 3. Use Helm/stable/airflow
helm install --namespace "airflow" --name "airflow" stable/airflow
Install Airflow (cont’d)
I create an fork of puckel/docker-airflow and helm/stable/airlfow with the following
features:
1. Role based authentication
2. Third party python packages to support Azure, GCP and Kubernetes Operators
3. Mount DAGs with Azure File for App Service and AKS deployment
4. Ingress Controller for TLS termination for AKS deployment
5. Sample Airflow Connection with Azure Blob Storage, Cosmos Db, Azure
Databricks
6. Sample DAGs with Azure Cosmos db , blob storage and Cosmos, etc
See this link for the details
Working Process
• Use your local Docker image for DAG
development
docker-compose -f docker-compose-
CeleryExecutor.yml up –d
• Pull request for production DAG
• This dags folder is git managed
Sample DAG
with Azure
Sample DAG
with Azure
Sample DAG with Azure – web UI
• http://localhost:8080
Final Thoughts
• Need to work with cloud team to deploy this as secure and trusted
app service

Mais conteúdo relacionado

Mais procurados

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in ProductionRobert Sanders
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_onpko89403
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composerBruce Kuo
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 

Mais procurados (20)

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Grafana
GrafanaGrafana
Grafana
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Semelhante a Apache Airflow Introduction

What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
WebSphere and Docker
WebSphere and DockerWebSphere and Docker
WebSphere and DockerDavid Currie
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueShapeBlue
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Clocker - How to Train your Docker Cloud
Clocker - How to Train your Docker CloudClocker - How to Train your Docker Cloud
Clocker - How to Train your Docker CloudAndrew Kennedy
 
Implementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessImplementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessAhmed Misbah
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventJohn Schneider
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayOkko Oulasvirta
 
Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Seven Peaks Speaks
 
Alfresco Development Framework Basic
Alfresco Development Framework BasicAlfresco Development Framework Basic
Alfresco Development Framework BasicMario Romano
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxJasonOstrom1
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the TillermanCumulus Networks
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Azure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiAzure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiGirish Kalamati
 
Sebastien goasguen cloud stack and docker
Sebastien goasguen   cloud stack and dockerSebastien goasguen   cloud stack and docker
Sebastien goasguen cloud stack and dockerShapeBlue
 

Semelhante a Apache Airflow Introduction (20)

Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
WebSphere and Docker
WebSphere and DockerWebSphere and Docker
WebSphere and Docker
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Clocker - How to Train your Docker Cloud
Clocker - How to Train your Docker CloudClocker - How to Train your Docker Cloud
Clocker - How to Train your Docker Cloud
 
Implementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessImplementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using Kubeless
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:Invent
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training day
 
Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”
 
Alfresco Development Framework Basic
Alfresco Development Framework BasicAlfresco Development Framework Basic
Alfresco Development Framework Basic
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the Tillerman
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
KONG-APIGateway.pptx
KONG-APIGateway.pptxKONG-APIGateway.pptx
KONG-APIGateway.pptx
 
Azure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish KalamatiAzure from scratch part 3 By Girish Kalamati
Azure from scratch part 3 By Girish Kalamati
 
Sebastien goasguen cloud stack and docker
Sebastien goasguen   cloud stack and dockerSebastien goasguen   cloud stack and docker
Sebastien goasguen cloud stack and docker
 

Mais de Liangjun Jiang

MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle StartLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
Alibaba Technology in 2018
Alibaba Technology in 2018Alibaba Technology in 2018
Alibaba Technology in 2018Liangjun Jiang
 
Use Git-flow Manage Your Git Workflow
Use Git-flow Manage Your Git WorkflowUse Git-flow Manage Your Git Workflow
Use Git-flow Manage Your Git WorkflowLiangjun Jiang
 
What new-in-android-development-tools-googleio2016
What new-in-android-development-tools-googleio2016What new-in-android-development-tools-googleio2016
What new-in-android-development-tools-googleio2016Liangjun Jiang
 

Mais de Liangjun Jiang (6)

MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Apache Flink - a Gentle Start
Apache Flink - a Gentle StartApache Flink - a Gentle Start
Apache Flink - a Gentle Start
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
Alibaba Technology in 2018
Alibaba Technology in 2018Alibaba Technology in 2018
Alibaba Technology in 2018
 
Use Git-flow Manage Your Git Workflow
Use Git-flow Manage Your Git WorkflowUse Git-flow Manage Your Git Workflow
Use Git-flow Manage Your Git Workflow
 
What new-in-android-development-tools-googleio2016
What new-in-android-development-tools-googleio2016What new-in-android-development-tools-googleio2016
What new-in-android-development-tools-googleio2016
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Apache Airflow Introduction

  • 2. Outlines What’s Apache airflow & what they can do Install Apache airflow Working Process Sample DAG with Azure
  • 3. What’s Apache Airflow A platform to programmatically author, scheduler and monitor workflows, has gained its popularity in the world of data engineering and machine learning. Its use cases can be • Define and monitor Cron jobs • Automate certain DevOps operation • Move data periodically • Machine learning pipeline
  • 4. Who is using • Airbnb • Lyft • Bloomberg • Jets.com • Sam’s Club • a long list ...
  • 5. Why I think it is useful for us? • Need a way to manage data movement in and out • Reliable and consistent way to schedule and monitor moving data tasks • Data Ingestion automation for hubble • Machine learning pipelines automation • For the benefit of other business units, or small groups by developing and deploying as an part of infrastructure
  • 6. Install Airflow • 1. `pip install airflow` • 2. User *unofficial* airflow Docker Image docker pull puckel/docker-airflow • 3. Use Helm/stable/airflow helm install --namespace "airflow" --name "airflow" stable/airflow
  • 7. Install Airflow (cont’d) I create an fork of puckel/docker-airflow and helm/stable/airlfow with the following features: 1. Role based authentication 2. Third party python packages to support Azure, GCP and Kubernetes Operators 3. Mount DAGs with Azure File for App Service and AKS deployment 4. Ingress Controller for TLS termination for AKS deployment 5. Sample Airflow Connection with Azure Blob Storage, Cosmos Db, Azure Databricks 6. Sample DAGs with Azure Cosmos db , blob storage and Cosmos, etc See this link for the details
  • 8. Working Process • Use your local Docker image for DAG development docker-compose -f docker-compose- CeleryExecutor.yml up –d • Pull request for production DAG • This dags folder is git managed
  • 11. Sample DAG with Azure – web UI • http://localhost:8080
  • 12. Final Thoughts • Need to work with cloud team to deploy this as secure and trusted app service

Notas do Editor

  1. Image source: https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff