Mais conteúdo relacionado Semelhante a Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi (20) Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi1. Taking
DataFlow
Management
to
the
Edge
with
Apache
NiFi/MiNiFi
Bryan
Bende
–
So>ware
Engineer
@Hortonworks
Future
of
Data
NY
–
December
5th
2016
2. 2
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Agenda
à Problem
DefiniHon
à IntroducHon
to
Apache
NiFi
à IntroducHon
to
Apache
MiNiFi
à Demo!!
à Q&A
3. 3
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
About
Me
à SoPware
Engineer
@
Hortonworks
à Apache
NiFi
PMC
&
CommiTer
à Working
with
NiFi
since
2011
à Recent
focus
on
integraHons
with
Hadoop
ecosystem
à bbende@hortonworks.com
/
TwiTer
@bbende
/
bryanbende.com
à Bethpage
Class
of
2001!
4. 4
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
The
Problem
5. 5
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Team
2
It
starts
out
so
simple…
Hey!
We
have
some
important
data
to
send
you!
Cool!
Your
data
is
really
important
to
us!
Team
1
This
should
be
easy
right?...
6. 6
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
But
what
about
formats
&
protocols?
Team
2
We
can
publish
Avro
records
to
a
Kaa
topic,
does
that
work?
Oh,
well
we
have
a
REST
service
that
accepts
JSON…
Team
1
7. 7
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
And
what
about
security
&
authenKcaKon?
Team
2
Hmm
what
about
security?
We
can
authenHcate
via
Kerberos
Sorry,
we
only
support
2-‐Way
TLS
with
cerHficates
Team
1
8. 8
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
And
what
about
all
these
devices
at
the
edge?
We
also
need
to
grab
data
from
all
these
devices,
how
are
we
going
to
do
that?
Team
2
9. 9
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
And
What
About…
à OrganizaHonal
PoliHcs
(my
data)
à BriTle
ConnecHvity
à Firewalls/Security
Domains
à Partnerships
bring
new
data
/
need
different
formats
à Data
has
to
be
masked
for
compliance
purposes
à Where
is
this
data
even
from?
à Data
is
in
that
other
system
–
I
need
it
over
here
à Bandwidth
between
those
sites
is
limited
à My
Big
Data
system
needs
it
in
this
other
beTer/faster/stronger
format
à What
schema
is
that
from?
à It
needs
to
be
enriched
first!
à No
not
that
reference
set
–
this
one!
à I
didn’t
even
know
that
system
existed
10. 10
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Ok
so
let’s
fix
this
• Enterprise
Architecture
–
Standardize
on
• …format
• …a
schema
(one
that
can
evolve)
• …a
protocol
• …an
ontology
But
now…
• Standard
schema
becomes
complex
• Hard
to
agree
on
common
changes
• Some
teams
stuck
on
older
versions
• ProducHvity
starts
slowing…
11. 11
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Something
to
ponder
–
the
disconnect
is
healthy
• Having
Corporate
Standards
is
a
good
thing.
• InnovaHon
is
a
good
thing.
Innova&on
o(en
does
not
follow
the
Corporate
Standard
12. 12
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
What
is
Dataflow
Management?
13. 13
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Dataflow
Management
The
systemaKc
process
by
which
data
is
acquired
from
all
producers
and
delivered
to
all
consumers
14. 14
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Dataflow
Management
ConsideraKons
• Promote
Loosely
Coupled
Systems
• Types
of
coupling:
Format,
Schema,
Protocol,
Priority,
Size,
Interest,
…
• Promote
Highly
Cohesive
Systems
• Producers
should
focus
on
producHon
(not
the
intricacies
of
consumpHon)
• Consumers
should
focus
on
storage
or
processing
(not
the
details
of
producHon)
• Provide
Provenance
• The
who/what/when/where/why
of
data
• Inter
and
Intra
Process
Latency
• Enable
enterprise
version
control
for
data
15. 15
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Dataflow
Management
ConsideraKons
• Empower
Understanding
and
InteracKon
• Ability
to
see
the
flow,
safely
and
quickly
iterate
and
experiment
• Breaking
producHon
is
bad
–
so
too
is
not
being
able
to
evolve
fast
enough
• Secure
• Bridge
between
security
domains
• Data
Plane
(transport)
• Control
Plane
(C&C,
Monitoring)
• Self
Service
• Centralized
teams
–
hard
to
scale
–
slow
turnaround
Hmes
• Centralized
systems
–
mulH-‐tenant
management
works
16. 16
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
The
role
of
messaging
systems
• Reduce
variables:
Fix
protocol,
Data
Size,
Provide
Buffering
• Historically
not
very
fast
or
replayable:
Apache
Ka]a
solved
that
• Strong
soluKon
within
a
controlled
domain
• But
numerous
challenges
remain
• Topics
do
not
separate
key
concerns
between
producer
and
consumer
pairs
such
as
§ AuthorizaHon
§ Format
§ Schema
§ Interest
§ PrioriHzaHon
• Flow
control
17. 17
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
IntroducKon
to
Apache
NiFi
18. 18
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
The NSA Years
• Created in 2006
• Improved over eight years
• Simple
IniHal
vision
–
Visio
for
real-‐Hme
dataflow
management
• Key Lessons Learned
• What
scale
means
–
down,
up,
and
out
• The
fearsome
force
known
as
Compliance
Requirements
• The
power
of
provenance!
• OperaHonal
best-‐pracHces
and
anH-‐paTerns
• NSA donated the codebase to the ASF in late 2014
19. 19
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
NiFi Key Features
• Guaranteed
delivery
• Data
buffering
- Backpressure
- Pressure
release
• PrioriKzed
queuing
• Flow
specific
QoS
- Latency
vs.
throughput
- Loss
tolerance
• Data
provenance
• Recovery/recording
a
rolling
log
of
fine-‐grained
history
• Visual
command
and
control
• Flow
templates
• Pluggable/mulK-‐role
security
• Designed
for
extension
• Clustering
20. 20
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
NiFi Core Concepts
FBP
Term
NiFi
Term
DescripKon
InformaHon
Packet
FlowFile
Each
object
moving
through
the
system.
Black
Box
FlowFile
Processor
Performs
the
work,
doing
some
combinaHon
of
data
rouHng,
transformaHon,
or
mediaHon
between
systems.
Bounded
Buffer
ConnecHon
The
linkage
between
processors,
acHng
as
queues
and
allowing
various
processes
to
interact
at
differing
rates.
Scheduler
Flow
Controller
Maintains
the
knowledge
of
how
processes
are
connected,
and
manages
the
threads
and
allocaHons
thereof
which
all
processes
use.
Subnet
Process
Group
A
set
of
processes
and
their
connecHons,
which
can
receive
and
send
data
via
ports.
A
process
group
allows
creaHon
of
enHrely
new
component
simply
by
composiHon
of
its
components.
21. 21
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Visual
Command
&
Control
• Drag
&
drop
processors
to
build
a
flow
• Start,
stop,
&
configure
components
in
real-‐Hme
• View
errors
&
corresponding
messages
• View
staHsHcs
&
health
of
the
dataflow
• Create
shareable
templates
of
common
flows
22. 22
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Provenance/Lineage
• Tracks
data
at
each
point
as
it
flows
through
the
system
• Records,
indexes,
and
makes
events
available
for
display
• Handles
fan-‐in/fan-‐out,
i.e.
merging
and
splisng
data
• View
aTributes
and
content
at
given
points
in
Hme
23. 23
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
PrioriKzaKon
• Configure
a
prioriHzer
per
connecHon
• Determine
what
is
important
for
your
data
–
Hme
based,
arrival
order,
importance
of
a
data
set
• Funnel
many
connecHons
down
to
a
single
connecHon
to
prioriHze
across
data
sets
24. 24
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Back-‐Pressure
• Configure
back-‐pressure
per
connecHon
• Based
on
number
of
FlowFiles
or
total
size
of
FlowFiles
• Upstream
processor
no
longer
scheduled
to
run
unHl
below
threshold
25. 25
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Latency
vs.
Throughput
• Choose
between
lower
latency,
or
higher
throughput
on
each
processor
• Higher
throughput
allows
framework
to
batch
together
all
operaHons
for
the
selected
amount
of
Hme
for
improved
performance
• Processor
developer
determines
whether
to
support
this
by
using
@SupportsBatching
annotaHon
26. 26
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Security
à Control
Plane
– Pluggable
authenHcaHon
• 2-‐Way
TLS/SSL,
LDAP,
Kerberos
– Pluggable
authorizaHon
with
mulH-‐tenancy
• NiFi
Policy
Based
Authorizer
• Apache
Ranger
Authorizer
– Audit
trail
of
all
user
acHons
à Data
Plane
– OpHonal
2-‐Way
TLS/SSL
between
cluster
nodes
– OpHonal
2-‐Way
TLS/SSL
on
Site-‐To-‐Site
connecHons
(NiFi-‐to-‐NiFi)
– EncrypHon/DecrypHon
of
data
through
processors
– Provenance
for
audit
trail
of
data
27. 27
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Extensibility
à Built
from
the
ground
up
with
extensions
in
mind
à Service-‐loader
paTern
for…
• Processors
• Controller
Services
• ReporHng
Tasks
à Extensions
packaged
as
NiFi
Archives
(NARs)
• Deploy
NiFi
lib
directory
and
restart
• Provides
ClassLoader
isolaHon
• Same
model
as
standard
components
28. 28
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Architecture
-‐
Standalone
OS/Host
JVM
Flow
Controller
Web
Server
Processor
1
Extension
N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local
Storage
à FlowFile
Repository
– Write
Ahead
Log
– State
of
every
FlowFile
– Pointers
to
content
repository
(pass-‐by-‐reference)
à Content
Repository
– FlowFile
content
– Copy-‐on-‐write
à Provenance
Repository
– Write
Ahead
Log
+
Lucene
Indexes
– Store
&
search
lineage
events
29. 29
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
OS/Host
JVM
Flow
Controller
Web
Server
Processor
1
Extension
N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local
Storage
OS/Host
JVM
Flow
Controller
Web
Server
Processor
1
Extension
N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local
Storage
Architecture
-‐
Cluster
OS/Host
JVM
Flow
Controller
Web
Server
Processor
1
Extension
N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local
Storage
ZooKeeper
à Same
dataflow
on
each
node,
data
parHHoned
across
cluster
à Access
the
UI
from
any
node
à ZooKeeper
for
auto-‐elecHon
of
Cluster
Coordinator
&
Primary
Node
à Cluster
Coordinator
receives
heartbeats
from
other
nodes,
manages
joining/
disconnecHng
à Primary
Node
for
scheduling
processors
on
a
single
node
30. 30
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Site-‐To-‐Site
à Direct
communicaHon
between
two
NiFi
instances
à Push
to
Input
Port
on
receiver,
or
Pull
from
Output
Port
on
source
à Communicate
between
clusters,
standalone
instances,
or
both
à Handles
load
balancing
and
reliable
delivery
à Secure
connecHons
using
cerHficates
(opHonal)
à Communicate
over
TCP
or
HTTP
31. 31
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Site-‐To-‐Site
Push
Model
à Source
connects
Remote
Process
Group
to
Input
Port
on
desHnaHon
à Site-‐To-‐Site
takes
care
of
load
balancing
across
the
nodes
in
the
cluster
NiFi
Cluster
-‐
Node
2
Input
Port
NiFi
Cluster
-‐
Node
3
Input
Port
Standalone
NiFi
RPG
NiFi
Cluster
-‐
Node
1
Input
Port
32. 32
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Site-‐To-‐Site
Pull
Model
à DesHnaHon
connects
Remote
Process
Group
to
Output
Port
on
the
source
à If
source
was
a
cluster,
each
node
would
pull
from
each
node
in
cluster
NiFi
Cluster
-‐
Node
2
RPG
NiFi
Cluster
-‐
Node
3
RPG
Standalone
NiFi
Output
Port
NiFi
Cluster
-‐
Node
1
RPG
33. 33
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
IntroducKon
to
Apache
MiNiFi
34. 34
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Apache
MiNiFi
à Sub-‐project
of
Apache
NiFi
à Created
to
more
effecHvely
collect
data
at
the
edge
à Smaller
footprint,
run
where
the
JVM
can’t
à Design
&
Deploy
vs.
Command
&
Control
35. 35
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
MiNiFi
DistribuKons
à Java
– <40MB
binary
distribuHon
– Requires
Java
1.8
– More
feature
complete
– Targeted
for
any
systems
that
can
run
a
JVM
(ie.
Servers,
Raspberry
Pi)
à C++
– 600KB
code
size
and
staHc
data
~50KB
– Dynamic
heap
of
~1MB
based
on
use-‐case
– Targeted
for
resource
constrained
environments
(ie.
edge
IoT
devices)
à Both
use
same
config
format
and
use
NiFi
terminology
Different
focuses
depending
on
requirements
36. 36
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
MiNiFi
Java
NiFi
Framework
Components
MiNiFi
NiFi
Framework
User
Interface
Components
NiFi
37. 37
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
MiNiFi
Java
à Uses
same
NAR
structure
as
NiFi
à Use
any
NAR
from
NiFi
with
MiNiFi
Java
à NiFi
standard
processors
are
bundled
by
default
– TailLog
– UpdateATribute
– Route
on
content
and
aTributes
– PutEmail
– ….
38. 38
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
MiNiFi
C++
à IniHal
set
of
processors
– TailFile
– GetFile
– GenerateFlowFile
– LogATribute
– ListenSyslog
à Site
to
Site
Client
implementaHon
in
C++
for
talking
to
NiFi
instances
39. 39
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Design
&
Deploy
Same
approach
for
Java
&
C++…
1. Design
a
flow
in
NiFi
UI
2. Export
template
to
XML
file
3. Run
MiNiFi
Toolkit
to
convert
NiFi
template
to
MiNiFi
YAML
4. Deploy
config.yaml
to
MiNiFi
instances
IniHally
targeHng
flows
like…
1. GetFile/TailFile
2. RouHng
Decision
3. Site-‐To-‐Site
Back
to
core
NiFi
40. 40
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Simple
config.yml
Tail
a
rolling
file
-‐>
Site
to
Site
41. 41
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
MiNiFi
Command
and
Control
à Design
Flow
at
a
centralized
place,
deploy
on
the
edge
à Version
control
of
flows
– Align
with
NiFi
SDLC
work
à Agent
status
monitoring
à Bi-‐direcHonal
command
and
control
Currently
a
feature
proposal,
iniKal
version
being
architected
hTps://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control
42. 42
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Demo!
43. 43
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Demo
Scenario
Raspberry
Pi
MiNiFi
Java
Temp/Humidity
Sensor
NiFi
Raspberry
Pi
MiNiFi
Java
Temp/Humidity
Sensor
site-‐to-‐site
Solr
Banana
44. 44
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
QuesKons?
45. 45
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Learn
more
and
join
us!
Apache NiFi site
http://nifi.apache.org
Subproject MiNiFi site
http://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
https://issues.apache.org/jira/browse/MINIFI
Follow us on Twitter
@apachenifi
46. 46
©
Hortonworks
Inc.
2011
–
2016.
All
Rights
Reserved
Thank
you!