SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
© Hortonworks Inc. 2011–2019. All rights reserved;1
The First Mile – Edge and IoT Data
Collection with Apache NiFi and MiNiFi
Andy LoPresto | @yolopey
Sr. Member of Technical Staff at Hortonworks, Apache NiFi PMC & Committer
06 February 2019 Dataworks Summit Melbourne
© Hortonworks Inc. 2011–2019. All rights reserved;2
Acknowledgement of Country
I acknowledge the Traditional Owners of the land on which we
are meeting. I pay my respects to their Elders, past and
present, and the Aboriginal Elders of other communities who
may be here today.
© Hortonworks Inc. 2011–2019. All rights reserved;3
Gauging Audience Familiarity With NiFi
“What’s a NeeFee?”
No experience with dataflow
No experience with NiFi
“I can pick this up pretty quickly”
Some experience with dataflow
Some experience with NiFi
“I refactored the Ambari
integration endpoint to allow
for mutual authentication
TLS during my coffee break”
Forgotten more about NiFi
than most of us will ever
know
© Hortonworks Inc. 2011–2019. All rights reserved;4
Agenda
• What is dataflow and what are the challenges?
• Apache NiFi
• IoT Challenges
• Apache MiNiFi
• Exploration
• Community
• All slides provided online, so no need to transcribe
© Hortonworks Inc. 2011–2019. All rights reserved;5
What is dataflow?
© Hortonworks Inc. 2011–2019. All rights reserved;6
What is dataflow?
• Moving some content from A to B
• Content could be any bytes
• Logs
• HTTP
• XML
• CSV
• Images
• Video
• Telemetry
Producers A.K.A
Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
© Hortonworks Inc. 2011–2019. All rights reserved;7
Moving data effectively is hard
“Data Pipeline” https://xkcd.com/2054/
© Hortonworks Inc. 2011–2019. All rights reserved;8
• Standards
• Formats
• Protocols
• Veracity
• Validity
• Schemas
• Partitioning/
Bundling
Data
Dataflow Challenges In 3 Categories
Infrastructure
• “Exactly Once”
Delivery
• Ensuring
Security
• Overcoming
Security
• Credential
Management
• Network
People
• Compliance
• “That [person|
team|group]”
• Consumers
Change
• Requirements
Change
• “Exactly Once”
Delivery
© Hortonworks Inc. 2011–2019. All rights reserved;9
Raise your hand if you want to maintain Python scripts for the rest of your life
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
© Hortonworks Inc. 2011–2019. All rights reserved;10
What is Apache NiFi?
© Hortonworks Inc. 2011–2019. All rights reserved;11
• Guaranteed delivery
• Data buffering
• Backpressure
• Pressure release
• Prioritized queuing
• Flow specific QoS
• Latency vs. throughput
• Loss tolerance
Key Features
Apache NiFi
• Data provenance
• Supports push and pull models
• Recovery/recording 

a rolling log of fine-grained history
• Visual command and control
• Flow templates
• Pluggable, multi-tenant security
• Designed for extension
• Clustering
© Hortonworks Inc. 2011–2019. All rights reserved;12
Flowfiles Are Like HTTP Data
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
Binary Content *
Header
Content
© Hortonworks Inc. 2011–2019. All rights reserved;13
User Interface
Less of this…… more of this
© Hortonworks Inc. 2011–2019. All rights reserved;14
Deeper Ecosystem Integration: 286+ Processors, 61 Controller
Services
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
Parse Records Convert Records
© Hortonworks Inc. 2011–2019. All rights reserved;15
What are the IoT challenges?
© Hortonworks Inc. 2011–2019. All rights reserved;16
IoT Challenges
• Limited computing capability
• Limited power/network
• Restricted software library/platform
availability
• No UI
• Physically inaccessible
• Not frequently updated
• Competing standards/protocols
• Scalability
• Privacy & Security
@_lennart
© Hortonworks Inc. 2011–2019. All rights reserved;17
• When the Mirai attack has its own
Wikipedia page, that’s not good
• Hackers stole high-roller database from
casino via aquarium thermometer
connected to internet (04/2018)
Recent Examples
© Hortonworks Inc. 2011–2019. All rights reserved;18
• Runs on JVM
• Provides UI for flow design & monitoring
• Security built-in
• TLS, authentication/authorization, encrypted data
• Handles practically any format/protocol
NiFi Solves Everything*
© Hortonworks Inc. 2011–2019. All rights reserved;19
• NiFi supports AMQP, MQTT, UDP, TCP,
HTTP(S), CEF, JMS, (S)FTP, AWSIoT
• With a little pruning, NiFi can run on a
Raspberry Pi
NiFi for IoT
© Hortonworks Inc. 2011–2019. All rights reserved;20
• NiFi is designed to “own the box”
• NiFi 0.7.x started up in about 10-15 minutes on RP3 (593 MB)
• NiFi 1.x started up in about 30 minutes on RP3 (760 MB)
• 33 new processors
• Rewrite for multi tenant authorization
• Complete UI overhaul
So Why Do We Need A Different Solution?
© Hortonworks Inc. 2011–2019. All rights reserved;21
Enter Apache MiNiFi
© Hortonworks Inc. 2011–2019. All rights reserved;22
• Get the key parts of NiFi close to where data begins and provide bidirectional
communication
• NiFi lives in the data center — give it an enterprise server or a cluster of them
• MiNiFi lives as close to where data is born and is a guest on that device or system
• IoT
• Connected car
• Legacy hardware
Apache NiFi Subproject: MiNiFi
© Hortonworks Inc. 2011–2019. All rights reserved;23
• NiFi is big
• 1.8.0 release is 1.2 GB compressed
• Can be modified to run in restricted environments, but requires manual surgery
• Provides UI, provenance query, etc.
• Runs on dedicated machines/clusters — “owns the box”
• MiNiFi lives at the edge
• No UI
• 0.5.0 Java release is 67 MB, C++ release is 6.1 MB (0.2.0 fits on a floppy disk)
• “Good guest”
Why build MiNiFi?
© Hortonworks Inc. 2011–2019. All rights reserved;24
• MiNiFi Java (v0.5.0)
• Modified version of NiFi
• No UI
• YAML configuration
• Reduced processor count
• 63+ by default, more 

available with 

additional NARs
• MiNiFi C++ (v0.5.0)
• Written from scratch
• 33 processors by default
• Bi-directional site-to-site & provenance data
Flavors of MiNiFi
© Hortonworks Inc. 2011–2019. All rights reserved;25
NiFi vs MiNiFi Java Processes
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
© Hortonworks Inc. 2011–2019. All rights reserved;26
• NiFi
• Design flows
• Aggregate data from many
sources
• Perform routing/analysis/SEP
• MiNiFi
• Receive flows
• Collect data
• Send for processing
How Does MiNiFi Interact With NiFi?
© Hortonworks Inc. 2011–2019. All rights reserved;27
• We’ve been imagining EDGE to CORE as a bi-directional linear system
• Let’s expand 

that to the real 

world
Let’s Add Dimensionality
© Hortonworks Inc. 2011–2019. All rights reserved;28
• Data tagging/provenance
• Governance from edge (geopolitical
restrictions)
• Security (encryption, certificate-based
authentication)
• Low latency (immediate reactions &
decision-making)
What does MiNiFi provide? Connected Car Reference Platform Box
Tuner + DSRC CardConnectivity Card
© Hortonworks Inc. 2011–2019. All rights reserved;29
MiNiFi on a Connected Car
Comprehension
Collection
Processing / Synthesis
Parse <>
Listen <>
CAN Bus
Gateway
MCU MCU MCU
Ethernet /
Ethernet AVB
Local
Interconnect
Network
Yet to be
established
protocol
Listen Ethernet Listen LINListen CAN
Parse CAN Parse Ethernet Parse LIN
Route
Transmit Execute PrioritizeFilter
© Hortonworks Inc. 2011–2019. All rights reserved;30
MiNiFi on a Connected Car
© Hortonworks Inc. 2011–2019. All rights reserved;31
• Site-to-Site
• NiFi protocol
• Two implementations
• Raw socket
• HTTP(S)
• Secured with mutual authentication TLS
• HTTP(S), (S)FTP, JMS, Syslog, File, Email, Process
MiNiFi Exfil
© Hortonworks Inc. 2011–2019. All rights reserved;32
Edge Data Exploration
© Hortonworks Inc. 2011–2019. All rights reserved;33
• IoT Device generating log messages
• Need to encrypt data on device
• Need to prioritize some data for
unreliable network connectivity
• Transmit data to central node
• Decrypt data and analyze
• Make determinations and modify live
flow
Scenario
© Hortonworks Inc. 2011–2019. All rights reserved;34
• Simulate the log generation
• Schedule is customizable
• Script can write to dynamic location
NiFi As Test Harness/Environment
© Hortonworks Inc. 2011–2019. All rights reserved;35
• Tails a log file
• Logs the raw contents (can be
multiple lines in time window)
• Splits into individual lines
• Filters the content
• Using parity of the timestamp
• Prioritizes
• Encrypts using AES/GCM
• Exfils to remote NiFi
Build the MiNiFi Flow
© Hortonworks Inc. 2011–2019. All rights reserved;36
• Save as template from NiFi
• Run $ ./bin/config.sh transform
template.xml config.yml
• MiNiFi flow ready to run*

*Still need to set up TLS & encrypted properties
Export from NiFi to MiNiFi
© Hortonworks Inc. 2011–2019. All rights reserved;37
• NiFi TLS Toolkit makes certificates &
keystores simple (and secure)
• Copy encrypted property value from
flow.xml.gz to config.yml (flow repo)
Setting Up Crypto
© Hortonworks Inc. 2011–2019. All rights reserved;38
• All data transmitted over TLS is
encrypted
• On NiFi, automatically
decrypted
• Attributes visible
• Content still encrypted
because of EncryptContent
processor
• Can serve as secure route for
follow-on systems
If We Really Have TLS, Why Encrypt?
© Hortonworks Inc. 2011–2019. All rights reserved;39
• Receive the data over S2S
• Log the incoming messages
• Decrypt content
• Log again
Process Data In NiFi
© Hortonworks Inc. 2011–2019. All rights reserved;40
Does It Work?
© Hortonworks Inc. 2011–2019. All rights reserved;41
• Increase the write frequency
• Check that newer records (within tail
window) with higher priority arrive first
Prioritization?
© Hortonworks Inc. 2011–2019. All rights reserved;42
• Window Aggregator
• If >60% odd in window, switch prioritization
• Encrypt with different keys for different tags & send to different follow-on systems
• Spotty network? Tell MiNiFi to cache low priority and send in batches
• MiNiFi rollover & pruning of monitored log
• Exfil MiNiFi provenance data to NiFi
Next Steps
© Hortonworks Inc. 2011–2019. All rights reserved;43
Community
© Hortonworks Inc. 2011–2019. All rights reserved;44
Community Example
• Jeremy Dyer
• Alexa + MiNiFi + Dyer 2.0
http://www.opensourcedad.com/apache/minifi-cpp/2016/12/18/poop-scale.html
© Hortonworks Inc. 2011–2019. All rights reserved;45
What’s Next?
© Hortonworks Inc. 2011–2019. All rights reserved;46
• NiFi 1.8.0 — 26 Oct 2018 (212+ Jiras)
• Jetty, DB improvements
• Auto load-balancing queues
• TLS Toolkit w/ external CA
• Record processor improvements
• MiNiFi C++ 0.5.0 — 6 June 2018
• MiNiFi Java 0.5.0 — 7 July 2018
• NiFi Registry 0.3.0 — 25 Sept 2018
Introducing Apache NiFi Registry
New Announcements
© Hortonworks Inc. 2011–2019. All rights reserved;47
• Previously, flows were exported via
XML templates
• Didn’t contain sensitive values
• Couldn’t be updated in-place
• No tracking system
• NiFi Registry brings asset management
as first-class citizen to NiFi
• Flows can be versioned
• Flows can be promoted between
environments
Introducing Apache NiFi Registry 0.3.0
NiFi Registry for Dataflows
© Hortonworks Inc. 2011–2019. All rights reserved;48
Community Health
© Hortonworks Inc. 2011–2019. All rights reserved;49
Apache NiFi site

https://nifi.apache.org
Subproject MiNiFi site
https://nifi.apache.org/minifi/
Subscribe to and collaborate at

dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues

https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
Learn more and join us
© Hortonworks Inc. 2011–2019. All rights reserved;50
More NiFi Today
Title Time Room
The First Mile – Edge and IoT Data Collection with Apache NiFi and
MiNiFi
1100 - 1140 Room 103
Apache NiFi Crash Course 1400 - 1600 Room 109
Dataflow Management From Edge to Core with Apache NiFi 1650 - 1730 Room 112
Using Spark Streaming and NiFi for the Next Generation of ETL in
the Enterprise
1650 - 1730 Room 103
© Hortonworks Inc. 2011–2019. All rights reserved;51
Thank you
alopresto@hortonworks.com | alopresto@apache.org | @yolopey
github.com/alopresto/slides

Mais conteúdo relacionado

Mais procurados

Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 

Mais procurados (20)

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 

Semelhante a The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi

Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
DataWorks Summit
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
DataWorks Summit
 

Semelhante a The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi (20)

Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT Development
 

Mais de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi

  • 1. © Hortonworks Inc. 2011–2019. All rights reserved;1 The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi Andy LoPresto | @yolopey Sr. Member of Technical Staff at Hortonworks, Apache NiFi PMC & Committer 06 February 2019 Dataworks Summit Melbourne
  • 2. © Hortonworks Inc. 2011–2019. All rights reserved;2 Acknowledgement of Country I acknowledge the Traditional Owners of the land on which we are meeting. I pay my respects to their Elders, past and present, and the Aboriginal Elders of other communities who may be here today.
  • 3. © Hortonworks Inc. 2011–2019. All rights reserved;3 Gauging Audience Familiarity With NiFi “What’s a NeeFee?” No experience with dataflow No experience with NiFi “I can pick this up pretty quickly” Some experience with dataflow Some experience with NiFi “I refactored the Ambari integration endpoint to allow for mutual authentication TLS during my coffee break” Forgotten more about NiFi than most of us will ever know
  • 4. © Hortonworks Inc. 2011–2019. All rights reserved;4 Agenda • What is dataflow and what are the challenges? • Apache NiFi • IoT Challenges • Apache MiNiFi • Exploration • Community • All slides provided online, so no need to transcribe
  • 5. © Hortonworks Inc. 2011–2019. All rights reserved;5 What is dataflow?
  • 6. © Hortonworks Inc. 2011–2019. All rights reserved;6 What is dataflow? • Moving some content from A to B • Content could be any bytes • Logs • HTTP • XML • CSV • Images • Video • Telemetry Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 7. © Hortonworks Inc. 2011–2019. All rights reserved;7 Moving data effectively is hard “Data Pipeline” https://xkcd.com/2054/
  • 8. © Hortonworks Inc. 2011–2019. All rights reserved;8 • Standards • Formats • Protocols • Veracity • Validity • Schemas • Partitioning/ Bundling Data Dataflow Challenges In 3 Categories Infrastructure • “Exactly Once” Delivery • Ensuring Security • Overcoming Security • Credential Management • Network People • Compliance • “That [person| team|group]” • Consumers Change • Requirements Change • “Exactly Once” Delivery
  • 9. © Hortonworks Inc. 2011–2019. All rights reserved;9 Raise your hand if you want to maintain Python scripts for the rest of your life Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
  • 10. © Hortonworks Inc. 2011–2019. All rights reserved;10 What is Apache NiFi?
  • 11. © Hortonworks Inc. 2011–2019. All rights reserved;11 • Guaranteed delivery • Data buffering • Backpressure • Pressure release • Prioritized queuing • Flow specific QoS • Latency vs. throughput • Loss tolerance Key Features Apache NiFi • Data provenance • Supports push and pull models • Recovery/recording 
 a rolling log of fine-grained history • Visual command and control • Flow templates • Pluggable, multi-tenant security • Designed for extension • Clustering
  • 12. © Hortonworks Inc. 2011–2019. All rights reserved;12 Flowfiles Are Like HTTP Data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  • 13. © Hortonworks Inc. 2011–2019. All rights reserved;13 User Interface Less of this…… more of this
  • 14. © Hortonworks Inc. 2011–2019. All rights reserved;14 Deeper Ecosystem Integration: 286+ Processors, 61 Controller Services Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket Parse Records Convert Records
  • 15. © Hortonworks Inc. 2011–2019. All rights reserved;15 What are the IoT challenges?
  • 16. © Hortonworks Inc. 2011–2019. All rights reserved;16 IoT Challenges • Limited computing capability • Limited power/network • Restricted software library/platform availability • No UI • Physically inaccessible • Not frequently updated • Competing standards/protocols • Scalability • Privacy & Security @_lennart
  • 17. © Hortonworks Inc. 2011–2019. All rights reserved;17 • When the Mirai attack has its own Wikipedia page, that’s not good • Hackers stole high-roller database from casino via aquarium thermometer connected to internet (04/2018) Recent Examples
  • 18. © Hortonworks Inc. 2011–2019. All rights reserved;18 • Runs on JVM • Provides UI for flow design & monitoring • Security built-in • TLS, authentication/authorization, encrypted data • Handles practically any format/protocol NiFi Solves Everything*
  • 19. © Hortonworks Inc. 2011–2019. All rights reserved;19 • NiFi supports AMQP, MQTT, UDP, TCP, HTTP(S), CEF, JMS, (S)FTP, AWSIoT • With a little pruning, NiFi can run on a Raspberry Pi NiFi for IoT
  • 20. © Hortonworks Inc. 2011–2019. All rights reserved;20 • NiFi is designed to “own the box” • NiFi 0.7.x started up in about 10-15 minutes on RP3 (593 MB) • NiFi 1.x started up in about 30 minutes on RP3 (760 MB) • 33 new processors • Rewrite for multi tenant authorization • Complete UI overhaul So Why Do We Need A Different Solution?
  • 21. © Hortonworks Inc. 2011–2019. All rights reserved;21 Enter Apache MiNiFi
  • 22. © Hortonworks Inc. 2011–2019. All rights reserved;22 • Get the key parts of NiFi close to where data begins and provide bidirectional communication • NiFi lives in the data center — give it an enterprise server or a cluster of them • MiNiFi lives as close to where data is born and is a guest on that device or system • IoT • Connected car • Legacy hardware Apache NiFi Subproject: MiNiFi
  • 23. © Hortonworks Inc. 2011–2019. All rights reserved;23 • NiFi is big • 1.8.0 release is 1.2 GB compressed • Can be modified to run in restricted environments, but requires manual surgery • Provides UI, provenance query, etc. • Runs on dedicated machines/clusters — “owns the box” • MiNiFi lives at the edge • No UI • 0.5.0 Java release is 67 MB, C++ release is 6.1 MB (0.2.0 fits on a floppy disk) • “Good guest” Why build MiNiFi?
  • 24. © Hortonworks Inc. 2011–2019. All rights reserved;24 • MiNiFi Java (v0.5.0) • Modified version of NiFi • No UI • YAML configuration • Reduced processor count • 63+ by default, more 
 available with 
 additional NARs • MiNiFi C++ (v0.5.0) • Written from scratch • 33 processors by default • Bi-directional site-to-site & provenance data Flavors of MiNiFi
  • 25. © Hortonworks Inc. 2011–2019. All rights reserved;25 NiFi vs MiNiFi Java Processes NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 26. © Hortonworks Inc. 2011–2019. All rights reserved;26 • NiFi • Design flows • Aggregate data from many sources • Perform routing/analysis/SEP • MiNiFi • Receive flows • Collect data • Send for processing How Does MiNiFi Interact With NiFi?
  • 27. © Hortonworks Inc. 2011–2019. All rights reserved;27 • We’ve been imagining EDGE to CORE as a bi-directional linear system • Let’s expand 
 that to the real 
 world Let’s Add Dimensionality
  • 28. © Hortonworks Inc. 2011–2019. All rights reserved;28 • Data tagging/provenance • Governance from edge (geopolitical restrictions) • Security (encryption, certificate-based authentication) • Low latency (immediate reactions & decision-making) What does MiNiFi provide? Connected Car Reference Platform Box Tuner + DSRC CardConnectivity Card
  • 29. © Hortonworks Inc. 2011–2019. All rights reserved;29 MiNiFi on a Connected Car Comprehension Collection Processing / Synthesis Parse <> Listen <> CAN Bus Gateway MCU MCU MCU Ethernet / Ethernet AVB Local Interconnect Network Yet to be established protocol Listen Ethernet Listen LINListen CAN Parse CAN Parse Ethernet Parse LIN Route Transmit Execute PrioritizeFilter
  • 30. © Hortonworks Inc. 2011–2019. All rights reserved;30 MiNiFi on a Connected Car
  • 31. © Hortonworks Inc. 2011–2019. All rights reserved;31 • Site-to-Site • NiFi protocol • Two implementations • Raw socket • HTTP(S) • Secured with mutual authentication TLS • HTTP(S), (S)FTP, JMS, Syslog, File, Email, Process MiNiFi Exfil
  • 32. © Hortonworks Inc. 2011–2019. All rights reserved;32 Edge Data Exploration
  • 33. © Hortonworks Inc. 2011–2019. All rights reserved;33 • IoT Device generating log messages • Need to encrypt data on device • Need to prioritize some data for unreliable network connectivity • Transmit data to central node • Decrypt data and analyze • Make determinations and modify live flow Scenario
  • 34. © Hortonworks Inc. 2011–2019. All rights reserved;34 • Simulate the log generation • Schedule is customizable • Script can write to dynamic location NiFi As Test Harness/Environment
  • 35. © Hortonworks Inc. 2011–2019. All rights reserved;35 • Tails a log file • Logs the raw contents (can be multiple lines in time window) • Splits into individual lines • Filters the content • Using parity of the timestamp • Prioritizes • Encrypts using AES/GCM • Exfils to remote NiFi Build the MiNiFi Flow
  • 36. © Hortonworks Inc. 2011–2019. All rights reserved;36 • Save as template from NiFi • Run $ ./bin/config.sh transform template.xml config.yml • MiNiFi flow ready to run*
 *Still need to set up TLS & encrypted properties Export from NiFi to MiNiFi
  • 37. © Hortonworks Inc. 2011–2019. All rights reserved;37 • NiFi TLS Toolkit makes certificates & keystores simple (and secure) • Copy encrypted property value from flow.xml.gz to config.yml (flow repo) Setting Up Crypto
  • 38. © Hortonworks Inc. 2011–2019. All rights reserved;38 • All data transmitted over TLS is encrypted • On NiFi, automatically decrypted • Attributes visible • Content still encrypted because of EncryptContent processor • Can serve as secure route for follow-on systems If We Really Have TLS, Why Encrypt?
  • 39. © Hortonworks Inc. 2011–2019. All rights reserved;39 • Receive the data over S2S • Log the incoming messages • Decrypt content • Log again Process Data In NiFi
  • 40. © Hortonworks Inc. 2011–2019. All rights reserved;40 Does It Work?
  • 41. © Hortonworks Inc. 2011–2019. All rights reserved;41 • Increase the write frequency • Check that newer records (within tail window) with higher priority arrive first Prioritization?
  • 42. © Hortonworks Inc. 2011–2019. All rights reserved;42 • Window Aggregator • If >60% odd in window, switch prioritization • Encrypt with different keys for different tags & send to different follow-on systems • Spotty network? Tell MiNiFi to cache low priority and send in batches • MiNiFi rollover & pruning of monitored log • Exfil MiNiFi provenance data to NiFi Next Steps
  • 43. © Hortonworks Inc. 2011–2019. All rights reserved;43 Community
  • 44. © Hortonworks Inc. 2011–2019. All rights reserved;44 Community Example • Jeremy Dyer • Alexa + MiNiFi + Dyer 2.0 http://www.opensourcedad.com/apache/minifi-cpp/2016/12/18/poop-scale.html
  • 45. © Hortonworks Inc. 2011–2019. All rights reserved;45 What’s Next?
  • 46. © Hortonworks Inc. 2011–2019. All rights reserved;46 • NiFi 1.8.0 — 26 Oct 2018 (212+ Jiras) • Jetty, DB improvements • Auto load-balancing queues • TLS Toolkit w/ external CA • Record processor improvements • MiNiFi C++ 0.5.0 — 6 June 2018 • MiNiFi Java 0.5.0 — 7 July 2018 • NiFi Registry 0.3.0 — 25 Sept 2018 Introducing Apache NiFi Registry New Announcements
  • 47. © Hortonworks Inc. 2011–2019. All rights reserved;47 • Previously, flows were exported via XML templates • Didn’t contain sensitive values • Couldn’t be updated in-place • No tracking system • NiFi Registry brings asset management as first-class citizen to NiFi • Flows can be versioned • Flows can be promoted between environments Introducing Apache NiFi Registry 0.3.0 NiFi Registry for Dataflows
  • 48. © Hortonworks Inc. 2011–2019. All rights reserved;48 Community Health
  • 49. © Hortonworks Inc. 2011–2019. All rights reserved;49 Apache NiFi site
 https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at
 dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues
 https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi Learn more and join us
  • 50. © Hortonworks Inc. 2011–2019. All rights reserved;50 More NiFi Today Title Time Room The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi 1100 - 1140 Room 103 Apache NiFi Crash Course 1400 - 1600 Room 109 Dataflow Management From Edge to Core with Apache NiFi 1650 - 1730 Room 112 Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise 1650 - 1730 Room 103
  • 51. © Hortonworks Inc. 2011–2019. All rights reserved;51 Thank you alopresto@hortonworks.com | alopresto@apache.org | @yolopey github.com/alopresto/slides