SlideShare uma empresa Scribd logo
1 de 13
Hadoop WorkFlow 
Scheduler / 
Automation Engine 
Azkaban & Oozie 
Praveen Thirukonda 
Senior Associate 
Data & Analytics 
Orange County, CA 
09/11/2014
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
1 
What is a workflow? 
- A workflow is a Directed Acyclic Graph 
(DAG) of “jobs” where each job has one 
or more inputs and outputs. 
- A workflow scheduler helps us manage 
the co ordination among the various 
jobs.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
2 
When do we need a workflow scheduler? 
- In a Data Pipeline, Batch jobs need to be 
scheduled to run periodically. 
- They also typically have intricate 
dependency chains—for example, 
dependencies on various data extraction 
processes or previous steps. 
- Larger processes might have 50 or 60 
steps, of which some might run in 
parallel and others must wait for the 
output of earlier steps.
Azkaban
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
4 
What is Azkaban? 
- “cron on steroids” 
- A workflow scheduler can be seen as a 
combination of the cron and make Unix 
utilities combined with a friendly UI.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
5 
What is Azkaban? 
- Azkaban was implemented at LinkedIn to 
solve the problem of Hadoop job 
dependencies. 
- Azkaban resolves the ordering through 
job dependencies and provides an easy 
to use web user interface to maintain and 
track your workflows.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
6 
An Image is worth a 1000 words..
Apache Oozie
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
8 
What is Apache Oozie? 
- Similar to Azkaban. 
- Whereas Azkaban uses a series of 
Properties files, Oozie uses an XML file. 
- Oozie supports Java API, command line 
methods for workflow submission in 
addition to Browser interface/REST API. 
- Oozie is part of our Hortonworks 
environment in our cluster.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
9 
Advantages of using a workflow scheduler 
- Let’s you easily manage dependencies within 
the various tasks. 
- Scheduling of workflows 
- Monitor the progress of your workflow with 
nice interface. 
- Email alerts on failure and successes 
- Retrying of failed jobs.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
10 
Application of a workflow scheduler 
- Real Life example of how and where you 
might use a workflow scheduler in your 
Big Data System architecture?
Thank you 
Presentation by Praveen Thirukonda
© 2014 KPMG LLP, a Delaware limited liability partnership and 
the U.S. member firm of the KPMG network of independent 
member firms affiliated with KPMG International Cooperative 
(“KPMG International”), a Swiss entity. All rights reserved. 
The KPMG name, logo and “cutting through complexity” are 
registered trademarks or trademarks of KPMG International.

Mais conteúdo relacionado

Mais procurados

Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Nagios
 
Enterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a ServiceEnterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a ServiceTodd Palino
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016Michael Kehoe
 
PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in HerokuDileepa Jayakody
 
Building a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingBuilding a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingDatabricks
 
Introduction to Graph QL
Introduction to Graph QLIntroduction to Graph QL
Introduction to Graph QLDeepak More
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow ObstructionsTatiana Al-Chueyr
 
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern AppsMeteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern AppsSashko Stubailo
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowDatabricks
 
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At CommercetoolsGraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At CommercetoolsNicola Molinari
 
GraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits togetherGraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits togetherSashko Stubailo
 
Adding GraphQL to your existing architecture
Adding GraphQL to your existing architectureAdding GraphQL to your existing architecture
Adding GraphQL to your existing architectureSashko Stubailo
 
GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018Sashko Stubailo
 
Performance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und GrafanaPerformance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und GrafanaIcinga
 
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/MeteorWhy UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/MeteorJon Wong
 

Mais procurados (20)

Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Enterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a ServiceEnterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a Service
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in Heroku
 
Building a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingBuilding a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays Processing
 
Introduction to Graph QL
Introduction to Graph QLIntroduction to Graph QL
Introduction to Graph QL
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern AppsMeteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At CommercetoolsGraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
 
Master thesis
Master thesisMaster thesis
Master thesis
 
GraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits togetherGraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits together
 
GraphQL Europe Recap
GraphQL Europe RecapGraphQL Europe Recap
GraphQL Europe Recap
 
Designing ACM solutions AMIS25
Designing  ACM solutions   AMIS25Designing  ACM solutions   AMIS25
Designing ACM solutions AMIS25
 
Scribe Online CDK & Connector Development
Scribe Online CDK & Connector DevelopmentScribe Online CDK & Connector Development
Scribe Online CDK & Connector Development
 
Adding GraphQL to your existing architecture
Adding GraphQL to your existing architectureAdding GraphQL to your existing architecture
Adding GraphQL to your existing architecture
 
GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018
 
Performance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und GrafanaPerformance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und Grafana
 
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/MeteorWhy UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
 

Destaque

Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkabandatamantra
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentationVlad Orlov
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Jak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowskąJak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowskąkolorynietoperza
 
24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the YearBridget Aherne
 
TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES ALIAH RUBAEE
 
Compare and Contrast
Compare and ContrastCompare and Contrast
Compare and ContrastLeah Allard
 
Compare & Contrast
Compare & ContrastCompare & Contrast
Compare & ContrastLeah Allard
 
THEORY of KNOWLEDGE
THEORY  of KNOWLEDGE THEORY  of KNOWLEDGE
THEORY of KNOWLEDGE ALIAH RUBAEE
 
Young Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu OanhYoung Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu OanhPhung Tu Oanh
 

Destaque (20)

Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Azkaban
AzkabanAzkaban
Azkaban
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Frank Prezzy
Frank PrezzyFrank Prezzy
Frank Prezzy
 
Święta Teresa z Avila
Święta Teresa z AvilaŚwięta Teresa z Avila
Święta Teresa z Avila
 
Jak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowskąJak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowską
 
24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year
 
Frank111
Frank111Frank111
Frank111
 
TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES
 
Compare and Contrast
Compare and ContrastCompare and Contrast
Compare and Contrast
 
Bilingualism pp
Bilingualism ppBilingualism pp
Bilingualism pp
 
Święto niepodległości
Święto niepodległościŚwięto niepodległości
Święto niepodległości
 
Compare & Contrast
Compare & ContrastCompare & Contrast
Compare & Contrast
 
THEORY of KNOWLEDGE
THEORY  of KNOWLEDGE THEORY  of KNOWLEDGE
THEORY of KNOWLEDGE
 
Pieśń o wodzu miłym
Pieśń o wodzu miłymPieśń o wodzu miłym
Pieśń o wodzu miłym
 
0580 s11 qp_22
0580 s11 qp_220580 s11 qp_22
0580 s11 qp_22
 
Young Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu OanhYoung Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
 

Semelhante a Azkaban - WorkFlow Scheduler/Automation Engine

Con8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial closeCon8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial closeOracle
 
KPMG SAP Licensing
KPMG SAP LicensingKPMG SAP Licensing
KPMG SAP LicensingTom Rogiers
 
A Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business ContinuityA Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business ContinuityWorkday, Inc.
 
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP JobsReducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP JobsCA Technologies
 
L1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptxL1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptxGuruprasad Bellary
 
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...Steelwedge
 
Revenue management kpmg_oracle
Revenue management kpmg_oracleRevenue management kpmg_oracle
Revenue management kpmg_oracleMiguel Felicio
 
Cloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in DetailCloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in DetailSAP Cloud Platform
 
Automating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba NetworkAutomating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba NetworkSAP Ariba
 
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdfSAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdfanandkumar558548
 
Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015p6academy
 
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070retheauditors
 
7. Andy Campbell - Make the Most of the Cloud
7. Andy Campbell -  Make the Most of the Cloud7. Andy Campbell -  Make the Most of the Cloud
7. Andy Campbell - Make the Most of the CloudCedar Consulting
 
SAP BRIM - New Innovations Q2 2014
SAP BRIM -  New Innovations Q2 2014SAP BRIM -  New Innovations Q2 2014
SAP BRIM - New Innovations Q2 2014SAP
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best PracticesCapgemini
 
Transforming Partner Consulting Business to Capture Profit in the Cloud
Transforming  Partner Consulting Business to Capture Profit in the CloudTransforming  Partner Consulting Business to Capture Profit in the Cloud
Transforming Partner Consulting Business to Capture Profit in the CloudSarkis Kerkezian, PMP
 
Workplace-of-the-Future
Workplace-of-the-FutureWorkplace-of-the-Future
Workplace-of-the-FutureOscar Kessen
 
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyPowering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyCapgemini
 

Semelhante a Azkaban - WorkFlow Scheduler/Automation Engine (20)

Con8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial closeCon8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial close
 
KPMG SAP Licensing
KPMG SAP LicensingKPMG SAP Licensing
KPMG SAP Licensing
 
A Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business ContinuityA Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business Continuity
 
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP JobsReducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
 
L1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptxL1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptx
 
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
 
Revenue management kpmg_oracle
Revenue management kpmg_oracleRevenue management kpmg_oracle
Revenue management kpmg_oracle
 
Cloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in DetailCloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in Detail
 
Automating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba NetworkAutomating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba Network
 
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdfSAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
 
AVT_Offerings&Credentials
AVT_Offerings&CredentialsAVT_Offerings&Credentials
AVT_Offerings&Credentials
 
Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015
 
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
 
7. Andy Campbell - Make the Most of the Cloud
7. Andy Campbell -  Make the Most of the Cloud7. Andy Campbell -  Make the Most of the Cloud
7. Andy Campbell - Make the Most of the Cloud
 
SAP BRIM - New Innovations Q2 2014
SAP BRIM -  New Innovations Q2 2014SAP BRIM -  New Innovations Q2 2014
SAP BRIM - New Innovations Q2 2014
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Sap investor symposioum
Sap investor symposioumSap investor symposioum
Sap investor symposioum
 
Transforming Partner Consulting Business to Capture Profit in the Cloud
Transforming  Partner Consulting Business to Capture Profit in the CloudTransforming  Partner Consulting Business to Capture Profit in the Cloud
Transforming Partner Consulting Business to Capture Profit in the Cloud
 
Workplace-of-the-Future
Workplace-of-the-FutureWorkplace-of-the-Future
Workplace-of-the-Future
 
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyPowering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
 

Último

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 

Último (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

Azkaban - WorkFlow Scheduler/Automation Engine

  • 1. Hadoop WorkFlow Scheduler / Automation Engine Azkaban & Oozie Praveen Thirukonda Senior Associate Data & Analytics Orange County, CA 09/11/2014
  • 2. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 1 What is a workflow? - A workflow is a Directed Acyclic Graph (DAG) of “jobs” where each job has one or more inputs and outputs. - A workflow scheduler helps us manage the co ordination among the various jobs.
  • 3. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 2 When do we need a workflow scheduler? - In a Data Pipeline, Batch jobs need to be scheduled to run periodically. - They also typically have intricate dependency chains—for example, dependencies on various data extraction processes or previous steps. - Larger processes might have 50 or 60 steps, of which some might run in parallel and others must wait for the output of earlier steps.
  • 5. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 4 What is Azkaban? - “cron on steroids” - A workflow scheduler can be seen as a combination of the cron and make Unix utilities combined with a friendly UI.
  • 6. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 5 What is Azkaban? - Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. - Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
  • 7. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 6 An Image is worth a 1000 words..
  • 9. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 8 What is Apache Oozie? - Similar to Azkaban. - Whereas Azkaban uses a series of Properties files, Oozie uses an XML file. - Oozie supports Java API, command line methods for workflow submission in addition to Browser interface/REST API. - Oozie is part of our Hortonworks environment in our cluster.
  • 10. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 9 Advantages of using a workflow scheduler - Let’s you easily manage dependencies within the various tasks. - Scheduling of workflows - Monitor the progress of your workflow with nice interface. - Email alerts on failure and successes - Retrying of failed jobs.
  • 11. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 10 Application of a workflow scheduler - Real Life example of how and where you might use a workflow scheduler in your Big Data System architecture?
  • 12. Thank you Presentation by Praveen Thirukonda
  • 13. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International.

Notas do Editor

  1. Got raw data from car, cleaned and preprocessed data on EC2 machines, based on amount of data spun up EMR instances, copied data to it, ran MR jobs, ran Hive scripts (which were dynamically created), then used sqoop to copy over final processed output to Postgresql db, then shut down the emr instances and did cleanup operations.