SlideShare uma empresa Scribd logo
1 de 13
Testing in Airflow
Chandu Kavar
Data Engineer
Grab, Ex-ThoughtWorker
@chandukavar
Traditional Web App Testing
We mostly write -
● Unit Tests
● Integration Tests
● Functional Tests
Client
Database
Server
Stress testing, snapshot testing etc. for
complex and large scale application.
● Typos, cyclicity, mandatory parameters etc. in the DAGs
● No of tasks, nature of tasks in the DAG
● Upstream and downstream dependencies of each task
● Custom Operator, sensor etc.
● Communication between tasks such as xComs
● End-to-End pipeline
What can we test in Airflow?
Five Categories of tests we can write -
1. DAG validation tests
2. DAG/Pipeline definition tests
3. Unit tests for custom operators, sensor etc.
4. Integration tests
5. End-to-End pipeline tests
Sample DAG
dag = DAG('hello_world',
description='Hello world example',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 3, 20))
dummy_operator = DummyOperator(task_id='dummy_task', retries = 3, dag=dag)
hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)
multiplyby5_operator = MultiplyBy5Operator(my_operator_param= 20, task_id='multiplyby5_task',
dag=dag)
dummy_operator >> hello_operator
dummy_operator >> multiplyby5_operator
def print_hello():
return 'Hello'
DAG Validation Tests
class TestDagIntegrity(unittest.TestCase):
def setUp(self):
self.dagbag = DagBag()
def test_import_dags(self):
self.assertFalse(
len(self.dagbag.import_errors),
'DAG import failures. Errors: {}'.format(
self.dagbag.import_errors
)
)
def test_queue_present(self):
for dag_id, dag in self.dagbag.dags.iteritems():
queue = dag.default_args.get('queue', None)
msg = Queue not set for DAG {id}'.format(id=dag_id)
self.assertNotEqual(None, queue, msg)
DAG Definition Tests
class TestHelloWorldDAG(unittest.TestCase):
"""Check HelloWorldDAG expectation"""
def setUp(self):
self.dagbag = DagBag()
def test_task_count(self):
"""Check task count of hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
self.assertEqual(len(dag.tasks), 3)
def test_contain_tasks(self):
"""Check task contains in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
tasks = dag.tasks
task_ids = list(map(lambda task: task.task_id, tasks))
self.assertListEqual(task_ids, ['dummy_task', 'multiplyby5_task','hello_task'])
DAG Definition Tests
def test_dependencies_of_dummy_task(self):
"""Check the task dependencies of dummy_task in hello_world dag"""
dag_id='hello_world'
dag = self.dagbag.get_dag(dag_id)
dummy_task = dag.get_task('dummy_task')
upstream_task_ids = list(map(lambda task: task.task_id, dummy_task.upstream_list))
self.assertListEqual(upstream_task_ids, [])
downstream_task_ids = list(map(lambda task: task.task_id, dummy_task.downstream_list))
self.assertListEqual(downstream_task_ids, ['hello_task', 'multiplyby5_task'])
Custom Operator
class MultiplyBy5Operator(BaseOperator):
@apply_defaults
def __init__(self, my_operator_param, *args, **kwargs):
self.operator_param = my_operator_param
super(MultiplyBy5Operator, self).__init__(*args, **kwargs)
def execute(self, context):
log.info('operator_param: %s', self.operator_param)
return (self.operator_param * 5)
class MultiplyBy5Plugin(AirflowPlugin):
name = "multiplyby5_plugin"
Unit tests - Custom Operator
class TestMultiplyBy5Operator(unittest.TestCase):
def test_execute(self):
dag = DAG(dag_id='anydag', start_date=datetime.now())
task = MultiplyBy5Operator(my_operator_param=10, dag=dag, task_id='anytask')
ti = TaskInstance(task=task, execution_date=datetime.now())
result = task.execute(ti.get_template_context())
self.assertEqual(result, 50)
https://github.com/chandulal/airflow-testing
Don’t forget to give a star!
Blog post on medium -
https://medium.com/@chandukavar
Thanks!

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Express JS
Express JSExpress JS
Express JS
 
Google Maps Api
Google Maps ApiGoogle Maps Api
Google Maps Api
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Agile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAgile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven Design
 
Spring Boot on Amazon Web Services with Spring Cloud AWS
Spring Boot on Amazon Web Services with Spring Cloud AWSSpring Boot on Amazon Web Services with Spring Cloud AWS
Spring Boot on Amazon Web Services with Spring Cloud AWS
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Spring mvc
Spring mvcSpring mvc
Spring mvc
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 
Spring Boot and REST API
Spring Boot and REST APISpring Boot and REST API
Spring Boot and REST API
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
 
Introduction to Spring Framework
Introduction to Spring FrameworkIntroduction to Spring Framework
Introduction to Spring Framework
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring Boot
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
Fastapi
FastapiFastapi
Fastapi
 
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
 
Spring Boot in Action
Spring Boot in Action Spring Boot in Action
Spring Boot in Action
 

Semelhante a Testing in airflow

Slaven tomac unit testing in angular js
Slaven tomac   unit testing in angular jsSlaven tomac   unit testing in angular js
Slaven tomac unit testing in angular js
Slaven Tomac
 
Intro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran MizrahiIntro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran Mizrahi
Ran Mizrahi
 

Semelhante a Testing in airflow (20)

Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)Angularjs Test Driven Development (TDD)
Angularjs Test Driven Development (TDD)
 
Test-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS ApplicationsTest-Driven Development of AngularJS Applications
Test-Driven Development of AngularJS Applications
 
E2E testing con nightwatch.js - Drupalcamp Spain 2018
E2E testing con nightwatch.js  - Drupalcamp Spain 2018E2E testing con nightwatch.js  - Drupalcamp Spain 2018
E2E testing con nightwatch.js - Drupalcamp Spain 2018
 
Javascript first-class citizenery
Javascript first-class citizeneryJavascript first-class citizenery
Javascript first-class citizenery
 
Browser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal EuropeBrowser testing with nightwatch.js - Drupal Europe
Browser testing with nightwatch.js - Drupal Europe
 
Slaven tomac unit testing in angular js
Slaven tomac   unit testing in angular jsSlaven tomac   unit testing in angular js
Slaven tomac unit testing in angular js
 
Testing in JavaScript
Testing in JavaScriptTesting in JavaScript
Testing in JavaScript
 
Intro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran MizrahiIntro To JavaScript Unit Testing - Ran Mizrahi
Intro To JavaScript Unit Testing - Ran Mizrahi
 
Browser testing with nightwatch.js
Browser testing with nightwatch.jsBrowser testing with nightwatch.js
Browser testing with nightwatch.js
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance test
 
Improving your Gradle builds
Improving your Gradle buildsImproving your Gradle builds
Improving your Gradle builds
 
JavaCro'14 - Unit testing in AngularJS – Slaven Tomac
JavaCro'14 - Unit testing in AngularJS – Slaven TomacJavaCro'14 - Unit testing in AngularJS – Slaven Tomac
JavaCro'14 - Unit testing in AngularJS – Slaven Tomac
 
Full Stack Unit Testing
Full Stack Unit TestingFull Stack Unit Testing
Full Stack Unit Testing
 
Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)
 
Grails 101
Grails 101Grails 101
Grails 101
 
Slickdemo
SlickdemoSlickdemo
Slickdemo
 
Designing REST API automation tests in Kotlin
Designing REST API automation tests in KotlinDesigning REST API automation tests in Kotlin
Designing REST API automation tests in Kotlin
 
Using Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App EngineUsing Task Queues and D3.js to build an analytics product on App Engine
Using Task Queues and D3.js to build an analytics product on App Engine
 
Gradle - time for another build
Gradle - time for another buildGradle - time for another build
Gradle - time for another build
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Testing in airflow

  • 1. Testing in Airflow Chandu Kavar Data Engineer Grab, Ex-ThoughtWorker @chandukavar
  • 2. Traditional Web App Testing We mostly write - ● Unit Tests ● Integration Tests ● Functional Tests Client Database Server Stress testing, snapshot testing etc. for complex and large scale application.
  • 3. ● Typos, cyclicity, mandatory parameters etc. in the DAGs ● No of tasks, nature of tasks in the DAG ● Upstream and downstream dependencies of each task ● Custom Operator, sensor etc. ● Communication between tasks such as xComs ● End-to-End pipeline What can we test in Airflow?
  • 4. Five Categories of tests we can write - 1. DAG validation tests 2. DAG/Pipeline definition tests 3. Unit tests for custom operators, sensor etc. 4. Integration tests 5. End-to-End pipeline tests
  • 5. Sample DAG dag = DAG('hello_world', description='Hello world example', schedule_interval='0 12 * * *', start_date=datetime(2017, 3, 20)) dummy_operator = DummyOperator(task_id='dummy_task', retries = 3, dag=dag) hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag) multiplyby5_operator = MultiplyBy5Operator(my_operator_param= 20, task_id='multiplyby5_task', dag=dag) dummy_operator >> hello_operator dummy_operator >> multiplyby5_operator def print_hello(): return 'Hello'
  • 6. DAG Validation Tests class TestDagIntegrity(unittest.TestCase): def setUp(self): self.dagbag = DagBag() def test_import_dags(self): self.assertFalse( len(self.dagbag.import_errors), 'DAG import failures. Errors: {}'.format( self.dagbag.import_errors ) ) def test_queue_present(self): for dag_id, dag in self.dagbag.dags.iteritems(): queue = dag.default_args.get('queue', None) msg = Queue not set for DAG {id}'.format(id=dag_id) self.assertNotEqual(None, queue, msg)
  • 7. DAG Definition Tests class TestHelloWorldDAG(unittest.TestCase): """Check HelloWorldDAG expectation""" def setUp(self): self.dagbag = DagBag() def test_task_count(self): """Check task count of hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) self.assertEqual(len(dag.tasks), 3) def test_contain_tasks(self): """Check task contains in hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) tasks = dag.tasks task_ids = list(map(lambda task: task.task_id, tasks)) self.assertListEqual(task_ids, ['dummy_task', 'multiplyby5_task','hello_task'])
  • 8. DAG Definition Tests def test_dependencies_of_dummy_task(self): """Check the task dependencies of dummy_task in hello_world dag""" dag_id='hello_world' dag = self.dagbag.get_dag(dag_id) dummy_task = dag.get_task('dummy_task') upstream_task_ids = list(map(lambda task: task.task_id, dummy_task.upstream_list)) self.assertListEqual(upstream_task_ids, []) downstream_task_ids = list(map(lambda task: task.task_id, dummy_task.downstream_list)) self.assertListEqual(downstream_task_ids, ['hello_task', 'multiplyby5_task'])
  • 9. Custom Operator class MultiplyBy5Operator(BaseOperator): @apply_defaults def __init__(self, my_operator_param, *args, **kwargs): self.operator_param = my_operator_param super(MultiplyBy5Operator, self).__init__(*args, **kwargs) def execute(self, context): log.info('operator_param: %s', self.operator_param) return (self.operator_param * 5) class MultiplyBy5Plugin(AirflowPlugin): name = "multiplyby5_plugin"
  • 10. Unit tests - Custom Operator class TestMultiplyBy5Operator(unittest.TestCase): def test_execute(self): dag = DAG(dag_id='anydag', start_date=datetime.now()) task = MultiplyBy5Operator(my_operator_param=10, dag=dag, task_id='anytask') ti = TaskInstance(task=task, execution_date=datetime.now()) result = task.execute(ti.get_template_context()) self.assertEqual(result, 50)
  • 12. Blog post on medium - https://medium.com/@chandukavar