SlideShare uma empresa Scribd logo
1 de 44
Apache Falcon
Data Management Platform for Hadoop
●Ajay Yadava
● Committer, Apache Falcon
● Lead - Apache Falcon @ Inmobi
4
What is Apache Falcon?
Falcon is a data processing and management solution for Hadoop
designed for data motion, coordination of data pipelines, lifecycle
management, and data discovery. Falcon enables end consumers to
quickly onboard their data and its associated processing and
management tasks on Hadoop clusters.
5
6
Core Services
Core
Services
Process
Relays
Late
Data
Manage
ment
Retentio
n
Replicati
on
Acquisiti
on
Operabili
ty
Holistic
declarati
on of
intent
Anonymi
zation
Lineage
SLA
Life of Byte
8
Data Relays
9
Data Retention as a service
Late Data Management
Data Replication as a service
Data Acquisition As Service
14
Holistic Declaration of Intent
Operability – Dashboard
Overview
17
Entity Dependency Graph
Cluster
Feed Process
depends
depends
depends
Cluster specification
19
<cluster colo="SF-datacenter" description="" name="prod-cluster" xmlns="uri:falcon:cluster:0.1">
<interfaces>
<interface type="readonly" endpoint="hftp://nn:50070" version="1.1.2"/>
<interface type="write" endpoint="hdfs://nn:8020" version="1.1.2"/>
<interface type="execute" endpoint="rm:8050" version="1.1.2"/>
<interface type="workflow" endpoint="http://oozie:41000/oozie/" version="4.0.0"/>
<interface type="registry" endpoint="http://oozie:41000/oozie/" version="4.0.0"/>
<interface type="messaging" endpoint="tcp://:61616?daemon=true" version="5.4.3"/>
</interfaces>
<locations>
<location name="staging" path="/projects/falcon/staging"/> <!--mandatory-->
<location name="temp" path="/projects/falcon/tmp"/> <!--optional-->
<location name="working" path="/projects/falcon/working"/> <!--optional-->
</locations>
</cluster>
Used by distcp for replication Writing to HDFS
Used to submit processes as MR
Submit oozie jobs
Used for alerts
HDFS directories used by Falcon
Hive metastore to
register/deregister partitions & get
data availability events
Feed specification
<feed description="enhanced clicks replication feed" name="repl-feed" xmlns="uri:falcon:feed:0.1">
<frequency>minutes(5)</frequency>
<late-arrival cut-off="hours(1)"/>
<sla slaLow="hours(2)" slaHigh="hours(3)"/>
<clusters>
<cluster name="primary" type="source">
<validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
<retention limit="days(2)" action="delete"/>
</cluster>
<cluster name="secondary" type="target">
<validity start="2013-11-15T00:00Z" end="2030-01-01T00:00Z"/>
<retention limit="days(2)" action="delete"/>
<locations>
<location type="data" path="/data/clicks/repl-enhanced/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
</locations>
</cluster>
</clusters>
<locations>
<location type="data" path="/data/clicks/enhanced/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
</locations>
<ACL owner="testuser-ut-user" group="group" permission="0x644"/>
</feed>
20
Frequency
Location
SLA Monitoring
Data Retention
Data Replication
Process specification
21
<process name="clicks-hourly" xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="corp">
<validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z"/>
</cluster>
<parallel>1</parallel>
<order>LIFO</order>
<frequency>hours(1)</frequency>
<inputs>
<input name="click" feed="clicks-enhanced" start="yesterday(0,0)" end="latest(0)" partition="*/US"/>
</inputs>
<outputs>
<output name="clicksummary" feed="click-hourly" instance="today(0,0)"/>
</outputs>
<workflow name="test" version="1.0.0" engine="oozie" path="/user/guest/workflow" lib="/user/guest/workflowlib"/>
<retry policy="periodic" delay="hours(10)" attempts="3"/>
<late-process policy="exp-backoff" delay="hours(1)">
<late-input input="click" workflow-path="hdfs://clicks/late/workflow"/>
</late-process>
</process>
Where should the
process run?
How should the process
run?
What to consume?
What to produce?
Late Data Handling
Retry
Architecture
22
23
24
Falcon Unit
A pipeline validation framework
25
Motivation for Falcon Unit
● User errors caught only at deploy time.
● Input/Output feeds and paths not getting resolved.
● Errors in specification.
● Integration Tests require environment setup/tearDown.
● Messy deployment scripts.
● Debugging was cumbersome.
26
Falcon Unit
27
Falcon
Unit
In Process execution env.
● Local Oozie
● Local File System
● Local Job Runner
● Local Message Queue
Actual cluster
● Oozie
● HDFS
● YARN
● Active MQ
Test
suite
Example
Process Submission:
submit(EntityType.Process, <Path to Daily clicks Agg XML>); → Local
submit(EntityType.Process, <Path to Daily clicks Agg XML>); → Cluster Mode
Process Scheduling:
scheduleProcess(“daily_clicks_agg”, startTime, numInstances, clusterName);
Process Verification:
getInstanceStatus(EntityType.Process,“daily_clicks_agg”, scheduleTime);
28
29
30
31
32
Deployment
33
34
Embedded Mode
Distributed Mode
35
Monitoring
36
SLA Monitoring
●Alerts based on data Availability
●Dashboard
●Pluggable Alerting System
● Email
● JMS Notifications
37
Pipeline view
38
Triage
39
40
●Better Authentication and Authorization
●Even Better UI
●Even Better monitoring
●Process SLAs
●Streaming support
●A more powerful scheduler
●Pipeline Recovery
41
Community
42
43
Questions?
●Apache Falcon
● falcon.apache.org
● dev@falcon.apache.org / user@falcon.apache.org
●Ajay Yadava
● ajayyadava@apache.org
44

Mais conteúdo relacionado

Mais procurados

Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentialsSteve Tran
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 

Mais procurados (20)

Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 

Semelhante a Apache Falcon - Data Management Platform For Hadoop

Mysql nowwhat
Mysql nowwhatMysql nowwhat
Mysql nowwhatsqlhjalp
 
Toulouse Java User Group
Toulouse Java User GroupToulouse Java User Group
Toulouse Java User GroupEmmanuel Vinel
 
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?M. Fevzi Korkutata
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talkAmrit Sarkar
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)ERPScan
 
Web Oriented Architecture at Oracle
Web Oriented Architecture at OracleWeb Oriented Architecture at Oracle
Web Oriented Architecture at OracleEmiliano Pecis
 
UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveCisco DevNet
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)ERPScan
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)ERPScan
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data GridDmitry Buzdin
 
Rich Portlet Development in uPortal
Rich Portlet Development in uPortalRich Portlet Development in uPortal
Rich Portlet Development in uPortalJennifer Bourey
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Modern Data Stack France
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
Apache Falcon - Sanjeev Tripurari
Apache Falcon - Sanjeev TripurariApache Falcon - Sanjeev Tripurari
Apache Falcon - Sanjeev TripurariDevOpsBangalore
 

Semelhante a Apache Falcon - Data Management Platform For Hadoop (20)

Mysql nowwhat
Mysql nowwhatMysql nowwhat
Mysql nowwhat
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Toulouse Java User Group
Toulouse Java User GroupToulouse Java User Group
Toulouse Java User Group
 
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?
Oracle WebLogic Multitenancy, Partitions and Resource Sharing... How it works?
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
 
Web Oriented Architecture at Oracle
Web Oriented Architecture at OracleWeb Oriented Architecture at Oracle
Web Oriented Architecture at Oracle
 
UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data Grid
 
GemFire In-Memory Data Grid
GemFire In-Memory Data GridGemFire In-Memory Data Grid
GemFire In-Memory Data Grid
 
Rich Portlet Development in uPortal
Rich Portlet Development in uPortalRich Portlet Development in uPortal
Rich Portlet Development in uPortal
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Apache Falcon - Sanjeev Tripurari
Apache Falcon - Sanjeev TripurariApache Falcon - Sanjeev Tripurari
Apache Falcon - Sanjeev Tripurari
 
Apache Falcon DevOps
Apache Falcon DevOpsApache Falcon DevOps
Apache Falcon DevOps
 

Último

Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 

Último (20)

Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 

Apache Falcon - Data Management Platform For Hadoop

  • 1. Apache Falcon Data Management Platform for Hadoop
  • 2.
  • 3.
  • 4. ●Ajay Yadava ● Committer, Apache Falcon ● Lead - Apache Falcon @ Inmobi 4
  • 5. What is Apache Falcon? Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters. 5
  • 6. 6
  • 10. Data Retention as a service
  • 12. Data Replication as a service
  • 14. 14
  • 18. Entity Dependency Graph Cluster Feed Process depends depends depends
  • 19. Cluster specification 19 <cluster colo="SF-datacenter" description="" name="prod-cluster" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint="hftp://nn:50070" version="1.1.2"/> <interface type="write" endpoint="hdfs://nn:8020" version="1.1.2"/> <interface type="execute" endpoint="rm:8050" version="1.1.2"/> <interface type="workflow" endpoint="http://oozie:41000/oozie/" version="4.0.0"/> <interface type="registry" endpoint="http://oozie:41000/oozie/" version="4.0.0"/> <interface type="messaging" endpoint="tcp://:61616?daemon=true" version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/projects/falcon/staging"/> <!--mandatory--> <location name="temp" path="/projects/falcon/tmp"/> <!--optional--> <location name="working" path="/projects/falcon/working"/> <!--optional--> </locations> </cluster> Used by distcp for replication Writing to HDFS Used to submit processes as MR Submit oozie jobs Used for alerts HDFS directories used by Falcon Hive metastore to register/deregister partitions & get data availability events
  • 20. Feed specification <feed description="enhanced clicks replication feed" name="repl-feed" xmlns="uri:falcon:feed:0.1"> <frequency>minutes(5)</frequency> <late-arrival cut-off="hours(1)"/> <sla slaLow="hours(2)" slaHigh="hours(3)"/> <clusters> <cluster name="primary" type="source"> <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/> <retention limit="days(2)" action="delete"/> </cluster> <cluster name="secondary" type="target"> <validity start="2013-11-15T00:00Z" end="2030-01-01T00:00Z"/> <retention limit="days(2)" action="delete"/> <locations> <location type="data" path="/data/clicks/repl-enhanced/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/> </locations> </cluster> </clusters> <locations> <location type="data" path="/data/clicks/enhanced/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/> </locations> <ACL owner="testuser-ut-user" group="group" permission="0x644"/> </feed> 20 Frequency Location SLA Monitoring Data Retention Data Replication
  • 21. Process specification 21 <process name="clicks-hourly" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="corp"> <validity start="2011-11-02T00:00Z" end="2011-12-30T00:00Z"/> </cluster> <parallel>1</parallel> <order>LIFO</order> <frequency>hours(1)</frequency> <inputs> <input name="click" feed="clicks-enhanced" start="yesterday(0,0)" end="latest(0)" partition="*/US"/> </inputs> <outputs> <output name="clicksummary" feed="click-hourly" instance="today(0,0)"/> </outputs> <workflow name="test" version="1.0.0" engine="oozie" path="/user/guest/workflow" lib="/user/guest/workflowlib"/> <retry policy="periodic" delay="hours(10)" attempts="3"/> <late-process policy="exp-backoff" delay="hours(1)"> <late-input input="click" workflow-path="hdfs://clicks/late/workflow"/> </late-process> </process> Where should the process run? How should the process run? What to consume? What to produce? Late Data Handling Retry
  • 23. 23
  • 24. 24
  • 25. Falcon Unit A pipeline validation framework 25
  • 26. Motivation for Falcon Unit ● User errors caught only at deploy time. ● Input/Output feeds and paths not getting resolved. ● Errors in specification. ● Integration Tests require environment setup/tearDown. ● Messy deployment scripts. ● Debugging was cumbersome. 26
  • 27. Falcon Unit 27 Falcon Unit In Process execution env. ● Local Oozie ● Local File System ● Local Job Runner ● Local Message Queue Actual cluster ● Oozie ● HDFS ● YARN ● Active MQ Test suite
  • 28. Example Process Submission: submit(EntityType.Process, <Path to Daily clicks Agg XML>); → Local submit(EntityType.Process, <Path to Daily clicks Agg XML>); → Cluster Mode Process Scheduling: scheduleProcess(“daily_clicks_agg”, startTime, numInstances, clusterName); Process Verification: getInstanceStatus(EntityType.Process,“daily_clicks_agg”, scheduleTime); 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 37. SLA Monitoring ●Alerts based on data Availability ●Dashboard ●Pluggable Alerting System ● Email ● JMS Notifications 37
  • 40. 40
  • 41. ●Better Authentication and Authorization ●Even Better UI ●Even Better monitoring ●Process SLAs ●Streaming support ●A more powerful scheduler ●Pipeline Recovery 41
  • 43. 43
  • 44. Questions? ●Apache Falcon ● falcon.apache.org ● dev@falcon.apache.org / user@falcon.apache.org ●Ajay Yadava ● ajayyadava@apache.org 44