SlideShare uma empresa Scribd logo
1 de 21
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operations@inmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
http://falcon.apache.org
What’s on GRID
/user/sanjeev
/user/mohit
/user/Iliyas
/projects/meetup
/projects/support
/data/stream/click
/data/stream/beacon
Basic Components
Falcon
• Prism
• Server
• Client
ActiveMQ
Oozie
Hadoop
What’s in for DevOps
Cluster
NameNode, JT, Oozie, ActiveMQ, Colo
Feed
Data, DataPath, Lifetime, Retention, Owner,Replication
Process
Job, Queue, Priority, Parallelism, Input, Output, Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR1/2
Server
Oozie ActiveMQ
HDFS - MR1/2
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command:
falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml
falcon entity -list -type [cluster/feed/process]
Cluster
• Submit
• Delete
falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS]
Feed/Process OPTIONS
• schedule
• Status
• list
• touch
• depedency
• definition
• update
• delete
Falcon Cluster
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1">
<interfaces>
<interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/>
<interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/>
<interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/>
<interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/>
<interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/>
</interfaces>
<locations>
<location name="staging" path="/store/falcon/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/store/falcon/working"/>
</locations>
<properties>
<property name="colo.name" value="uk"/>
</properties>
</cluster>
Falcon Feed
<feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>input</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Feed
<feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>output</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Process
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
</cluster>
</clusters>
<parallel>1</parallel>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<inputs>
<input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" />
</inputs>
<outputs>
<output instance="now(0,0)" feed="uk-outputfeed" name="output" />
</outputs>
<properties>
<property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/>
<property name="user" value="${user()}"/>
<property name="baseTime" value="${today(0,0)}"/>
</properties>
<workflow engine="oozie" path="/user/sanjeev/falcon/workflow" />
<retry policy="periodic" delay="minutes(10)" attempts="3" />
</process>
Oozie Workflow
<workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow">
<start to="fs-cmds"/>
<action name="fs-cmds">
<fs>
<mkdir path='${output}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
What’s on HDFS
Input Feed: /user/sanjeev/falcon/input/2015/02/20/00
Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18
Output Feed: /user/sanjeevt/falcon/output/
Workflow: /user/sanjeevt/falcon/workflow/workflow.xml
falcon entity -type cluster -submit -file uk-clusterAlpha.xml
falcon entity -type feed -submit -file uk-inputfeed.xml
falcon entity -type feed -submit -file uk-outputfeed.xml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
Typical Production Process and Workflow
(process) process-click-convert
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="process-click-convert" xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/>
</cluster>
<cluster name="us-clusterGamma">
<validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/>
</cluster>
</clusters>
<parallel>2</parallel>
<order>FIFO</order>
<frequency>minutes(30)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/>
</inputs>
<outputs>
<output name="Output" feed="feed-click-convert" instance="now(0,-30)"/>
</outputs>
<properties>
<property name="queueName" value="stream"/>
<property name="jobPriority" value="NORMAL"/>
</properties>
<workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/>
</process>
(workflow) /projects/support/click/conversion/workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'>
<start to='click-convert' />
<action name='click-convert'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${Output}"/>
<delete path="${wf:conf('Output.stats')}"/>
<delete path="${wf:conf('Output.tmp')}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.job.priority</name>
<value>${jobPriority}</value>
</property>
</configuration>
nn.cluster.my.com
<main-class>com.my.cluster.io.Driver</main-class>
<arg>-inputpath</arg><arg>${Input}</arg>
<arg>-outputpath</arg><arg>${Output}</arg>
<arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg>
<arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
Falcon Instance
Operation
Command:
falcon instance -type [feed/process] -[status/list]
falcon entity -list -type [feed/process] -name [processname/feedname] {-start
YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS]
Feed/Process OPTIONS
• status
• list
• logs
• kill
• rerun
• suspend
• resume
Monitoring
• Falcon CLI
• Oozie CLI
• ActiveMQ
• falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
• falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end
2015-02-23T00:00Z -status
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboard
• https://github.com/ajayyadav/falcon-dashboard
OnBoarding Pipeline
• Group All Process
• Minutely, Hourly, Daily, Weekly, Monthly
• Group Related Feeds
• Verify All process jars, workflows pushed to cluster
• Verify ownerships of all feed and process directories
• Verify owners have job scheduling access roles in particular cluster
• Validate the feeds
• Submit and schedule the feeds, so retention and replication is in place
• Dryrun the process schedule
• Submit and schedule the process
• Document the FEED SLA, HDFS Usage, retention period for
monitoring
• Document the PROCESS SLA, to observe delays
Challenges
• Tightly Integrated with Oozie
• Monitoring, onboarding needs streamlined
• Realtime change in Schedule Time, Queues
Advantages
• Development is very aggressive
• Industry is adopted quickly
• Once onboarded, focus only needs to be on set of critical process
• Easy shutdown and upgrade, as all the running jobs are managed by oozie
• DevOps can do easy setup and manage data
Thank You

Mais conteúdo relacionado

Mais procurados

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016alanfgates
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...Michael Noel
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureIlmar Kerm
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Cedric CARBONE
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Featuressqlserver.co.il
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowBruno Borges
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?DLT Solutions
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesJustin Michael Raj
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideLars Platzdasch
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cLeighton Nelson
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationLeighton Nelson
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...Alex Zaballa
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisYue Chen
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridVinay Kumar
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...Michael Noel
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...Lars Platzdasch
 

Mais procurados (20)

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid Infrastructure
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Features
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data Services
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
 

Semelhante a Apache Falcon - Sanjeev Tripurari

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveCisco DevNet
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs muleAchyuta Lakshmi
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)ERPScan
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningAtlassian
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPChris John Riley
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)ERPScan
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018Vlad Mihalcea
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixelliando dias
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)ERPScan
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPChris John Riley
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Modern Data Stack France
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemAndres Almiray
 

Semelhante a Apache Falcon - Sanjeev Tripurari (20)

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
 
Immutant
ImmutantImmutant
Immutant
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs mule
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAP
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAP
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX Ecosystem
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Apache Falcon - Sanjeev Tripurari

  • 2. Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. http://falcon.apache.org
  • 4. Basic Components Falcon • Prism • Server • Client ActiveMQ Oozie Hadoop
  • 5. What’s in for DevOps Cluster NameNode, JT, Oozie, ActiveMQ, Colo Feed Data, DataPath, Lifetime, Retention, Owner,Replication Process Job, Queue, Priority, Parallelism, Input, Output, Workflow
  • 6. Basic Enviroment Setup UK US Prism Server Oozie ActiveMQ HDFS - MR1/2 Server Oozie ActiveMQ HDFS - MR1/2
  • 8. Falcon Entity Operation Command: falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml falcon entity -list -type [cluster/feed/process] Cluster • Submit • Delete falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS] Feed/Process OPTIONS • schedule • Status • list • touch • depedency • definition • update • delete
  • 9. Falcon Cluster <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/> <interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/> <interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/> <interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/> <interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/store/falcon/staging"/> <location name="temp" path="/tmp"/> <location name="working" path="/store/falcon/working"/> </locations> <properties> <property name="colo.name" value="uk"/> </properties> </cluster>
  • 10. Falcon Feed <feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>input</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 11. Falcon Feed <feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>output</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 12. Falcon Process <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> </cluster> </clusters> <parallel>1</parallel> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <inputs> <input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" /> </inputs> <outputs> <output instance="now(0,0)" feed="uk-outputfeed" name="output" /> </outputs> <properties> <property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/> <property name="user" value="${user()}"/> <property name="baseTime" value="${today(0,0)}"/> </properties> <workflow engine="oozie" path="/user/sanjeev/falcon/workflow" /> <retry policy="periodic" delay="minutes(10)" attempts="3" /> </process>
  • 13. Oozie Workflow <workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow"> <start to="fs-cmds"/> <action name="fs-cmds"> <fs> <mkdir path='${output}'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
  • 14. What’s on HDFS Input Feed: /user/sanjeev/falcon/input/2015/02/20/00 Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18 Output Feed: /user/sanjeevt/falcon/output/ Workflow: /user/sanjeevt/falcon/workflow/workflow.xml falcon entity -type cluster -submit -file uk-clusterAlpha.xml falcon entity -type feed -submit -file uk-inputfeed.xml falcon entity -type feed -submit -file uk-outputfeed.xml falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
  • 15. Typical Production Process and Workflow (process) process-click-convert <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="process-click-convert" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/> </cluster> <cluster name="us-clusterGamma"> <validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(30)</frequency> <timezone>UTC</timezone> <inputs> <input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/> </inputs> <outputs> <output name="Output" feed="feed-click-convert" instance="now(0,-30)"/> </outputs> <properties> <property name="queueName" value="stream"/> <property name="jobPriority" value="NORMAL"/> </properties> <workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/> </process> (workflow) /projects/support/click/conversion/workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'> <start to='click-convert' /> <action name='click-convert'> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${Output}"/> <delete path="${wf:conf('Output.stats')}"/> <delete path="${wf:conf('Output.tmp')}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.job.priority</name> <value>${jobPriority}</value> </property> </configuration> nn.cluster.my.com <main-class>com.my.cluster.io.Driver</main-class> <arg>-inputpath</arg><arg>${Input}</arg> <arg>-outputpath</arg><arg>${Output}</arg> <arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg> <arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg> </java> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
  • 16. Falcon Instance Operation Command: falcon instance -type [feed/process] -[status/list] falcon entity -list -type [feed/process] -name [processname/feedname] {-start YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS] Feed/Process OPTIONS • status • list • logs • kill • rerun • suspend • resume
  • 17. Monitoring • Falcon CLI • Oozie CLI • ActiveMQ • falcon entity type -type process -name falcon-sanjeev-process -dependency (cluster) uk-clusterAlpha (feed) uk-inputfeed - [Input] (feed) uk-outputfeed - [Output] • falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end 2015-02-23T00:00Z -status Consolidated Status: SUCCEEDED Instances: Instance Cluster SourceCluster Status Start End Details Log ----------------------------------------------------------------------------------------------- 2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135- oozie-oozi-W
  • 19. OnBoarding Pipeline • Group All Process • Minutely, Hourly, Daily, Weekly, Monthly • Group Related Feeds • Verify All process jars, workflows pushed to cluster • Verify ownerships of all feed and process directories • Verify owners have job scheduling access roles in particular cluster • Validate the feeds • Submit and schedule the feeds, so retention and replication is in place • Dryrun the process schedule • Submit and schedule the process • Document the FEED SLA, HDFS Usage, retention period for monitoring • Document the PROCESS SLA, to observe delays
  • 20. Challenges • Tightly Integrated with Oozie • Monitoring, onboarding needs streamlined • Realtime change in Schedule Time, Queues Advantages • Development is very aggressive • Industry is adopted quickly • Once onboarded, focus only needs to be on set of critical process • Easy shutdown and upgrade, as all the running jobs are managed by oozie • DevOps can do easy setup and manage data