SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
@serrazon
@serrazon
A system to process and distribute data
@serrazon
● Where NiFi came from?
● The NiFi way
● Flows
● Messaging
● Architecture
● Demo
Contents
https://nifi.apache.org
@serrazon
Where NiFi came from?
@serrazon
● NSA Technology Transfer Program - Niagara Files
● FBP - Flow Based Programming
● HortonWorks maintains NiFi on Apache
History
@serrazon
The NiFi way
@serrazon
Abstractions
NiFi Term FBP Term Description
FlowFile Information Packet Unit of data moving from one system to another. Tracked by its
key/value pair attributes
Processor Black Box Work of data routing, transformation or mediation between systems.
Have access to attributes, they can work with zero or more FlowFiles.
They can commit or rollback the work.
Connection Bounded Buffer Links between processors. Acts as queues and allow different
processes to work at different rates. Allows dynamic priorities and can
have upper bounds on load, which enables back pressure.
Flow Controller Scheduler Maintains the status of how process connect and manages the working
threads. Acts as a broker between processors.
Process Group subnet Set of processes and their connections. They have input and ouput port
for them to communicate with other process groups or processors.
Allows composition of other components.
@serrazon
Messaging
A B
Message
channel
Producer Consumer
Data flowing in a mesage from A (producer) through a channel up to B (consumer)
@serrazon
Data going from Producers to Consumers
● Formats (&& II) schemas
● Protocols
● Priorities - The most important first
● Batch vs Streams
● Data level security - authorization
● I need just a part of the message
● Before I get the data, please clean it and prepare it first.
@serrazon
Nowadays Messaging Scenario
Acquire Data
Process /
Analyze Data
Store Data
dataflows
Massive amount of data produced by
several types of producers going into the
wire using several types of channels.
Challenge: Acquire, process and store
them, online, fast and securely.
@serrazon
The Messaging Problem at large scale
@serrazon
What NIFI offers?
● No coding, No deployment - Visual operation and control - On the fly
● No log search - Tracking everything is happening - Data lineage (provenance)
● Configure and change how the data is distributed - Prioritization
● Regulate the speed of data consumption - Buffering Data - Back Pressure
● Control latency vs throughput
● Secure Control layer / Data layer - Authentication / Authorization
● Multiple instances - Clustering
● Extensibility
It was designed for tackling the Global Enterprise Dataflow challenges
@serrazon
Apache NIFI
● Simple data transfer between systems - Reliable and Secure
● Inject of data to Analytic layers
● Data magics / Preparing data
○ Conversion between formats
○ Extraction / Parsing
○ Routing decisions
What is NIFI for?
And what is NIFI NOT for ?
● Distributed Computation
● Complex Event Processing
@serrazon
Use cases types
● IoT Remote sensor data capture
● Enterprise integrations (among systems on intra or internet)
● Big Data ingestion
● Simple event processing (handling discrete points)
More use cases info out there...
@serrazon
So, why NIFI?
Wider coverage than other market solutions.
Wider range of dataflow scenarios covered. Allows composition of processes.
On-the-fly changes - wow!
Keep tracking
Highly security and compliance requirements
@serrazon
Apache NIFI - Architecture
OS Host
JVM
Web Server
Flow Controller
Processor 1 Processor 2
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
@serrazon
Demo
● Get log data from system A
● Publish dataflow to a telemetry queue
● Subscribe to the queue for processing on system B
● Show data provenance
● Show queuing at relationship level
@serrazon
@serrazon

Mais conteúdo relacionado

Mais procurados

BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 

Mais procurados (20)

NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 

Semelhante a Nifi

Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
balmanme
 

Semelhante a Nifi (20)

Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
IBM Aspera overview
IBM Aspera overview IBM Aspera overview
IBM Aspera overview
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Lecture 17
Lecture 17Lecture 17
Lecture 17
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practices
 

Mais de Julio Castro (6)

Blockchain zero administration with python
Blockchain zero administration with pythonBlockchain zero administration with python
Blockchain zero administration with python
 
Jasper
JasperJasper
Jasper
 
Digital transformation
Digital transformationDigital transformation
Digital transformation
 
Mobile Offline First
Mobile Offline FirstMobile Offline First
Mobile Offline First
 
xGem BigData
xGem BigDataxGem BigData
xGem BigData
 
Keynote xgem
Keynote xgemKeynote xgem
Keynote xgem
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Nifi

  • 2. @serrazon A system to process and distribute data
  • 3. @serrazon ● Where NiFi came from? ● The NiFi way ● Flows ● Messaging ● Architecture ● Demo Contents https://nifi.apache.org
  • 5. @serrazon ● NSA Technology Transfer Program - Niagara Files ● FBP - Flow Based Programming ● HortonWorks maintains NiFi on Apache History
  • 7. @serrazon Abstractions NiFi Term FBP Term Description FlowFile Information Packet Unit of data moving from one system to another. Tracked by its key/value pair attributes Processor Black Box Work of data routing, transformation or mediation between systems. Have access to attributes, they can work with zero or more FlowFiles. They can commit or rollback the work. Connection Bounded Buffer Links between processors. Acts as queues and allow different processes to work at different rates. Allows dynamic priorities and can have upper bounds on load, which enables back pressure. Flow Controller Scheduler Maintains the status of how process connect and manages the working threads. Acts as a broker between processors. Process Group subnet Set of processes and their connections. They have input and ouput port for them to communicate with other process groups or processors. Allows composition of other components.
  • 8. @serrazon Messaging A B Message channel Producer Consumer Data flowing in a mesage from A (producer) through a channel up to B (consumer)
  • 9. @serrazon Data going from Producers to Consumers ● Formats (&& II) schemas ● Protocols ● Priorities - The most important first ● Batch vs Streams ● Data level security - authorization ● I need just a part of the message ● Before I get the data, please clean it and prepare it first.
  • 10. @serrazon Nowadays Messaging Scenario Acquire Data Process / Analyze Data Store Data dataflows Massive amount of data produced by several types of producers going into the wire using several types of channels. Challenge: Acquire, process and store them, online, fast and securely.
  • 12. @serrazon What NIFI offers? ● No coding, No deployment - Visual operation and control - On the fly ● No log search - Tracking everything is happening - Data lineage (provenance) ● Configure and change how the data is distributed - Prioritization ● Regulate the speed of data consumption - Buffering Data - Back Pressure ● Control latency vs throughput ● Secure Control layer / Data layer - Authentication / Authorization ● Multiple instances - Clustering ● Extensibility It was designed for tackling the Global Enterprise Dataflow challenges
  • 13. @serrazon Apache NIFI ● Simple data transfer between systems - Reliable and Secure ● Inject of data to Analytic layers ● Data magics / Preparing data ○ Conversion between formats ○ Extraction / Parsing ○ Routing decisions What is NIFI for? And what is NIFI NOT for ? ● Distributed Computation ● Complex Event Processing
  • 14. @serrazon Use cases types ● IoT Remote sensor data capture ● Enterprise integrations (among systems on intra or internet) ● Big Data ingestion ● Simple event processing (handling discrete points) More use cases info out there...
  • 15. @serrazon So, why NIFI? Wider coverage than other market solutions. Wider range of dataflow scenarios covered. Allows composition of processes. On-the-fly changes - wow! Keep tracking Highly security and compliance requirements
  • 16. @serrazon Apache NIFI - Architecture OS Host JVM Web Server Flow Controller Processor 1 Processor 2 FlowFile Repository Content Repository Provenance Repository Local Storage
  • 17. @serrazon Demo ● Get log data from system A ● Publish dataflow to a telemetry queue ● Subscribe to the queue for processing on system B ● Show data provenance ● Show queuing at relationship level