SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
© Hortonworks Inc. 2011–2019. All rights reserved;1
Special Crash Course WiFi
SSID: CrashCourse
PW: DataWorks19
1. Download NiFi 1.8.0
1. Browse to https://nifi.apache.org/download.html
2.wget http://ftp.riken.jp/net/apache/nifi/1.8.0/nifi-1.8.0-bin.tar.gz
2. Untar file
1.tar -xvzf nifi-1.8.0-bin.tar.gz
3. Change into $NIFI_HOME dir
1.cd nifi-1.8.0
4. Start NiFi
1../bin/nifi.sh start
2. tail -f logs/nifi-app.log
© Hortonworks Inc. 2011–2019. All rights reserved;2
Apache NiFi Crash Course
Andy LoPresto | @yolopey
Sr. Member of Technical Staff at Hortonworks, Apache NiFi PMC & Committer
06 February 2019 Dataworks Summit Melbourne
Special Crash Course WiFi
SSID: CrashCourse
PW: DataWorks19
© Hortonworks Inc. 2011–2019. All rights reserved;3
Acknowledgement of Country
I acknowledge the Traditional Owners of the land on which we
are meeting. I pay my respects to their Elders, past and
present, and the Aboriginal Elders of other communities who
may be here today.
© Hortonworks Inc. 2011–2019. All rights reserved;4
Gauging Audience Familiarity With NiFi
“What’s a NeeFee?”
No experience with dataflow
No experience with NiFi
“I can pick this up pretty quickly”
Some experience with dataflow
Some experience with NiFi
“I refactored the Ambari
integration endpoint to allow
for mutual authentication
TLS during my coffee break”
Forgotten more about NiFi
than most of us will ever
know
© Hortonworks Inc. 2011–2019. All rights reserved;5
Agenda
• Introduction
• What is dataflow?
• What is NiFi?
• What’s next?
• All slides provided online, so no need to transcribe
© Hortonworks Inc. 2011–2019. All rights reserved;6
Goals for today
• Not to demonstrate what I know about NiFi
• Teach you something new
• Encourage exploration on topics that interest you
• Empower you to build flows on your own
© Hortonworks Inc. 2011–2019. All rights reserved;7
What is dataflow?
© Hortonworks Inc. 2011–2019. All rights reserved;8
What is dataflow?
• Moving some content from A to B
• Content could be any bytes
• Logs
• HTTP
• XML
• CSV
• Images
• Video
• Telemetry
Producers A.K.A
Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
© Hortonworks Inc. 2011–2019. All rights reserved;9
Connecting Data Points Is Easy
• Simple enough to write a process
• Bash/Ruby/Python
• SQL proc
• etc.
Log files
SQL
Big Data
© Hortonworks Inc. 2011–2019. All rights reserved;10
Big Data Is About Scale…
• …and this doesn’t scale
• Example use case:
• AOL Data Processing
• AWS -> HDFS
• 20 TB ingested/day
• Lev Brailovskiy, “Data Ingestion
and Distribution with Apache
NiFi”, Slide 27, 02/2017
• https://www.slideshare.net/
LevBrailovskiy/data-ingestion-and-
distribution-with-apache-nifi
© Hortonworks Inc. 2011–2019. All rights reserved;11
Moving data effectively is hard
Standards: http://xkcd.com/927/
© Hortonworks Inc. 2011–2019. All rights reserved;12
Moving data effectively is really hard
“Data Pipeline” https://xkcd.com/2054/
© Hortonworks Inc. 2011–2019. All rights reserved;13
• Standards
• Formats
• Protocols
• Veracity
• Validity
• Schemas
• Partitioning/
Bundling
Data
Dataflow Challenges In 3 Categories
Infrastructure
• “Exactly Once”
Delivery
• Ensuring
Security
• Overcoming
Security
• Credential
Management
• Network
People
• Compliance
• “That [person|
team|group]”
• Consumers
Change
• Requirements
Change
• “Exactly Once”
Delivery
© Hortonworks Inc. 2011–2019. All rights reserved;14
Raise your hand if you want to maintain Python scripts for the rest of your life
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
© Hortonworks Inc. 2011–2019. All rights reserved;15
What is Apache NiFi?
© Hortonworks Inc. 2011–2019. All rights reserved;16
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
© Hortonworks Inc. 2011–2019. All rights reserved;17
• Guaranteed delivery
• Data buffering
• Backpressure
• Pressure release
• Prioritized queuing
• Flow specific QoS
• Latency vs. throughput
• Loss tolerance
Key Features
Apache NiFi
• Data provenance
• Supports push and pull models
• Recovery/recording 

a rolling log of fine-grained history
• Visual command and control
• Flow templates
• Pluggable, multi-tenant security
• Designed for extension
• Clustering
© Hortonworks Inc. 2011–2019. All rights reserved;18
Flowfiles Are Like HTTP Data
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
Binary Content *
Header
Content
© Hortonworks Inc. 2011–2019. All rights reserved;19
User Interface
Less of this…
© Hortonworks Inc. 2011–2019. All rights reserved;19
User Interface
Less of this…… more of this
© Hortonworks Inc. 2011–2019. All rights reserved;20
Deeper Ecosystem Integration: 286+ Processors, 61 Controller
Services
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
Parse Records Convert Records
© Hortonworks Inc. 2011–2019. All rights reserved;21
Extension / Integration Points
NiFi Term Description
Flow File
Processor
Push/Pull behavior. Custom UI
Reporting
Task
Used to push data from NiFi to some external service (metrics, provenance,
etc.)
Controller
Service
Used to enable reusable components / shared services throughout the flow
REST API Allows clients to connect to pull information, change behavior, etc.
© Hortonworks Inc. 2011–2019. All rights reserved;22
Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile

Repository
Content

Repository
Provenance

Repository
Local Storage
Standalone
Cluster
© Hortonworks Inc. 2011–2019. All rights reserved;23
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1! C1
C1 P1! F1
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F2! C1
C1 P3! F2 – Clone (F1)
F1! C1 P2! F1 – Route
P1! F1 – Create
© Hortonworks Inc. 2011–2019. All rights reserved;24
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1! C1
C1 P1! F1 - CREATE
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F1! C1
F1.1! C2
C2 (encrypted)
C1 (plaintext)
P2! F1.1 - MODIFY
P1! F1 - CREATE
© Hortonworks Inc. 2011–2019. All rights reserved;25
People want to know about their data
Colin the Chicken | Portlandia | IFC
© Hortonworks Inc. 2011–2019. All rights reserved;25
People want to know about their data
Colin the Chicken | Portlandia | IFC
© Hortonworks Inc. 2011–2019. All rights reserved;26
Data Provenance
▪ Constrained
▪ High-latency
▪ Localized context
▪ Hybrid – cloud/on-premises
▪ Low-latency
▪ Global context
Origin – attribution
Replay – recovery
Evolution of topologies
Long retention
Types of Lineage
• Event
• Configuration
© Hortonworks Inc. 2011–2019. All rights reserved;27
• Previously, data had to be divided
into individual flowfiles to perform
work
• CSV output with 50k lines would
need to be split, operated on, re-
merged
• 1 + 50k + 50k + 1 flowfiles = 100k
flowfiles
Record Parsing
© Hortonworks Inc. 2011–2019. All rights reserved;28
• Now flowfile content can contain many “record”
elements
• Read and write with *Reader and *Writer Controller
Services
• Perform lookups, routing, conversion, SQL queries,
validation, and more…
• 1 + 1 flowfiles = 2 flowfiles
Record Parsing
© Hortonworks Inc. 2011–2019. All rights reserved;29
• Every provenance event
record is encrypted with
AES G/CM before being
persisted to disk
• Decrypted on
deserialization for
retrieval/query
• Random access via
offset seek
• Handles key migration &
rotation
Encrypted Provenance Repository
© Hortonworks Inc. 2011–2019. All rights reserved;30
What’s Next?
© Hortonworks Inc. 2011–2019. All rights reserved;31
• NiFi 1.8.0 — 26 Oct 2018 (212+ Jiras)
• Jetty, DB improvements
• Auto load-balancing queues
• TLS Toolkit w/ external CA
• Record processor improvements
• MiNiFi C++ 0.5.0 — 6 June 2018
• MiNiFi Java 0.5.0 — 7 July 2018
• NiFi Registry 0.3.0 — 25 Sept 2018
New Announcements
© Hortonworks Inc. 2011–2019. All rights reserved;32
• Previously, flows were exported via
XML templates
• Didn’t contain sensitive values
• Couldn’t be updated in-place
• No tracking system
• NiFi Registry brings asset management
as first-class citizen to NiFi
• Flows can be versioned
• Flows can be promoted between
environments
Introducing Apache NiFi Registry 0.3.0
NiFi Registry for Dataflows
© Hortonworks Inc. 2011–2019. All rights reserved;33
Community Health
© Hortonworks Inc. 2011–2019. All rights reserved;34
Apache NiFi site

https://nifi.apache.org
Subproject MiNiFi site
https://nifi.apache.org/minifi/
Subscribe to and collaborate at

dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues

https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
Learn more and join us
© Hortonworks Inc. 2011–2019. All rights reserved;35
More NiFi Today…
Title Time Room
The First Mile – Edge and IoT Data Collection with Apache NiFi and
MiNiFi
1100 - 1140 Room 103
Apache NiFi Crash Course 1400 - 1600 Room 109
Dataflow Management From Edge to Core with Apache NiFi 1650 - 1730 Room 112
Using Spark Streaming and NiFi for the Next Generation of ETL in
the Enterprise
1650 - 1730 Room 103
© Hortonworks Inc. 2011–2019. All rights reserved;36
https://hortonworks.com/tutorial/
analyze-transit-patterns-with-apache-nifi/
© Hortonworks Inc. 2011–2019. All rights reserved;37
Thank you
alopresto@hortonworks.com | alopresto@apache.org | @yolopey
github.com/alopresto/slides

Mais conteúdo relacionado

Mais procurados

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User GuideDeon Huang
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 

Mais procurados (20)

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 

Semelhante a Apache Nifi Crash Course

Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitAldrin Piri
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiJoe Percivall
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 

Semelhante a Apache Nifi Crash Course (20)

Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFi
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Apache Nifi Crash Course

  • 1. © Hortonworks Inc. 2011–2019. All rights reserved;1 Special Crash Course WiFi SSID: CrashCourse PW: DataWorks19 1. Download NiFi 1.8.0 1. Browse to https://nifi.apache.org/download.html 2.wget http://ftp.riken.jp/net/apache/nifi/1.8.0/nifi-1.8.0-bin.tar.gz 2. Untar file 1.tar -xvzf nifi-1.8.0-bin.tar.gz 3. Change into $NIFI_HOME dir 1.cd nifi-1.8.0 4. Start NiFi 1../bin/nifi.sh start 2. tail -f logs/nifi-app.log
  • 2. © Hortonworks Inc. 2011–2019. All rights reserved;2 Apache NiFi Crash Course Andy LoPresto | @yolopey Sr. Member of Technical Staff at Hortonworks, Apache NiFi PMC & Committer 06 February 2019 Dataworks Summit Melbourne Special Crash Course WiFi SSID: CrashCourse PW: DataWorks19
  • 3. © Hortonworks Inc. 2011–2019. All rights reserved;3 Acknowledgement of Country I acknowledge the Traditional Owners of the land on which we are meeting. I pay my respects to their Elders, past and present, and the Aboriginal Elders of other communities who may be here today.
  • 4. © Hortonworks Inc. 2011–2019. All rights reserved;4 Gauging Audience Familiarity With NiFi “What’s a NeeFee?” No experience with dataflow No experience with NiFi “I can pick this up pretty quickly” Some experience with dataflow Some experience with NiFi “I refactored the Ambari integration endpoint to allow for mutual authentication TLS during my coffee break” Forgotten more about NiFi than most of us will ever know
  • 5. © Hortonworks Inc. 2011–2019. All rights reserved;5 Agenda • Introduction • What is dataflow? • What is NiFi? • What’s next? • All slides provided online, so no need to transcribe
  • 6. © Hortonworks Inc. 2011–2019. All rights reserved;6 Goals for today • Not to demonstrate what I know about NiFi • Teach you something new • Encourage exploration on topics that interest you • Empower you to build flows on your own
  • 7. © Hortonworks Inc. 2011–2019. All rights reserved;7 What is dataflow?
  • 8. © Hortonworks Inc. 2011–2019. All rights reserved;8 What is dataflow? • Moving some content from A to B • Content could be any bytes • Logs • HTTP • XML • CSV • Images • Video • Telemetry Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 9. © Hortonworks Inc. 2011–2019. All rights reserved;9 Connecting Data Points Is Easy • Simple enough to write a process • Bash/Ruby/Python • SQL proc • etc. Log files SQL Big Data
  • 10. © Hortonworks Inc. 2011–2019. All rights reserved;10 Big Data Is About Scale… • …and this doesn’t scale • Example use case: • AOL Data Processing • AWS -> HDFS • 20 TB ingested/day • Lev Brailovskiy, “Data Ingestion and Distribution with Apache NiFi”, Slide 27, 02/2017 • https://www.slideshare.net/ LevBrailovskiy/data-ingestion-and- distribution-with-apache-nifi
  • 11. © Hortonworks Inc. 2011–2019. All rights reserved;11 Moving data effectively is hard Standards: http://xkcd.com/927/
  • 12. © Hortonworks Inc. 2011–2019. All rights reserved;12 Moving data effectively is really hard “Data Pipeline” https://xkcd.com/2054/
  • 13. © Hortonworks Inc. 2011–2019. All rights reserved;13 • Standards • Formats • Protocols • Veracity • Validity • Schemas • Partitioning/ Bundling Data Dataflow Challenges In 3 Categories Infrastructure • “Exactly Once” Delivery • Ensuring Security • Overcoming Security • Credential Management • Network People • Compliance • “That [person| team|group]” • Consumers Change • Requirements Change • “Exactly Once” Delivery
  • 14. © Hortonworks Inc. 2011–2019. All rights reserved;14 Raise your hand if you want to maintain Python scripts for the rest of your life Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
  • 15. © Hortonworks Inc. 2011–2019. All rights reserved;15 What is Apache NiFi?
  • 16. © Hortonworks Inc. 2011–2019. All rights reserved;16 NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 17. © Hortonworks Inc. 2011–2019. All rights reserved;17 • Guaranteed delivery • Data buffering • Backpressure • Pressure release • Prioritized queuing • Flow specific QoS • Latency vs. throughput • Loss tolerance Key Features Apache NiFi • Data provenance • Supports push and pull models • Recovery/recording 
 a rolling log of fine-grained history • Visual command and control • Flow templates • Pluggable, multi-tenant security • Designed for extension • Clustering
  • 18. © Hortonworks Inc. 2011–2019. All rights reserved;18 Flowfiles Are Like HTTP Data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  • 19. © Hortonworks Inc. 2011–2019. All rights reserved;19 User Interface Less of this…
  • 20. © Hortonworks Inc. 2011–2019. All rights reserved;19 User Interface Less of this…… more of this
  • 21. © Hortonworks Inc. 2011–2019. All rights reserved;20 Deeper Ecosystem Integration: 286+ Processors, 61 Controller Services Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket Parse Records Convert Records
  • 22. © Hortonworks Inc. 2011–2019. All rights reserved;21 Extension / Integration Points NiFi Term Description Flow File Processor Push/Pull behavior. Custom UI Reporting Task Used to push data from NiFi to some external service (metrics, provenance, etc.) Controller Service Used to enable reusable components / shared services throughout the flow REST API Allows clients to connect to pull information, change behavior, etc.
  • 23. © Hortonworks Inc. 2011–2019. All rights reserved;22 Architecture OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile
 Repository Content
 Repository Provenance
 Repository Local Storage Standalone Cluster
  • 24. © Hortonworks Inc. 2011–2019. All rights reserved;23 NiFi Architecture – Repositories - Pass by reference FlowFile Content Provenance F1! C1 C1 P1! F1 Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F2! C1 C1 P3! F2 – Clone (F1) F1! C1 P2! F1 – Route P1! F1 – Create
  • 25. © Hortonworks Inc. 2011–2019. All rights reserved;24 NiFi Architecture – Repositories – Copy on Write FlowFile Content Provenance F1! C1 C1 P1! F1 - CREATE Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F1! C1 F1.1! C2 C2 (encrypted) C1 (plaintext) P2! F1.1 - MODIFY P1! F1 - CREATE
  • 26. © Hortonworks Inc. 2011–2019. All rights reserved;25 People want to know about their data Colin the Chicken | Portlandia | IFC
  • 27. © Hortonworks Inc. 2011–2019. All rights reserved;25 People want to know about their data Colin the Chicken | Portlandia | IFC
  • 28. © Hortonworks Inc. 2011–2019. All rights reserved;26 Data Provenance ▪ Constrained ▪ High-latency ▪ Localized context ▪ Hybrid – cloud/on-premises ▪ Low-latency ▪ Global context Origin – attribution Replay – recovery Evolution of topologies Long retention Types of Lineage • Event • Configuration
  • 29. © Hortonworks Inc. 2011–2019. All rights reserved;27 • Previously, data had to be divided into individual flowfiles to perform work • CSV output with 50k lines would need to be split, operated on, re- merged • 1 + 50k + 50k + 1 flowfiles = 100k flowfiles Record Parsing
  • 30. © Hortonworks Inc. 2011–2019. All rights reserved;28 • Now flowfile content can contain many “record” elements • Read and write with *Reader and *Writer Controller Services • Perform lookups, routing, conversion, SQL queries, validation, and more… • 1 + 1 flowfiles = 2 flowfiles Record Parsing
  • 31. © Hortonworks Inc. 2011–2019. All rights reserved;29 • Every provenance event record is encrypted with AES G/CM before being persisted to disk • Decrypted on deserialization for retrieval/query • Random access via offset seek • Handles key migration & rotation Encrypted Provenance Repository
  • 32. © Hortonworks Inc. 2011–2019. All rights reserved;30 What’s Next?
  • 33. © Hortonworks Inc. 2011–2019. All rights reserved;31 • NiFi 1.8.0 — 26 Oct 2018 (212+ Jiras) • Jetty, DB improvements • Auto load-balancing queues • TLS Toolkit w/ external CA • Record processor improvements • MiNiFi C++ 0.5.0 — 6 June 2018 • MiNiFi Java 0.5.0 — 7 July 2018 • NiFi Registry 0.3.0 — 25 Sept 2018 New Announcements
  • 34. © Hortonworks Inc. 2011–2019. All rights reserved;32 • Previously, flows were exported via XML templates • Didn’t contain sensitive values • Couldn’t be updated in-place • No tracking system • NiFi Registry brings asset management as first-class citizen to NiFi • Flows can be versioned • Flows can be promoted between environments Introducing Apache NiFi Registry 0.3.0 NiFi Registry for Dataflows
  • 35. © Hortonworks Inc. 2011–2019. All rights reserved;33 Community Health
  • 36. © Hortonworks Inc. 2011–2019. All rights reserved;34 Apache NiFi site
 https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at
 dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues
 https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi Learn more and join us
  • 37. © Hortonworks Inc. 2011–2019. All rights reserved;35 More NiFi Today… Title Time Room The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi 1100 - 1140 Room 103 Apache NiFi Crash Course 1400 - 1600 Room 109 Dataflow Management From Edge to Core with Apache NiFi 1650 - 1730 Room 112 Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise 1650 - 1730 Room 103
  • 38. © Hortonworks Inc. 2011–2019. All rights reserved;36 https://hortonworks.com/tutorial/ analyze-transit-patterns-with-apache-nifi/
  • 39. © Hortonworks Inc. 2011–2019. All rights reserved;37 Thank you alopresto@hortonworks.com | alopresto@apache.org | @yolopey github.com/alopresto/slides