These are the slides from the July 11th Meetup in Toronto for the Flow Based Programming meetup group at Lighthouse covering Enterprise Dataflow with Apache NiFi.
5. The data is over here but I want it over there…
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
7. It started so simple
• Just needed to scan a directory for new data
• Send it over the link.
But….
• Bandwidth was low, latency high, comms unreliable
• Some data was more useful than others
• The rules for that could change often
• Light-weight in-line analysis could be used to determine relative value
• The value of the data decayed rapidly
• The data’s raw form was highly inefficient for transport
• and large portions of the data could simply be removed in many cases
• How to document, maintain and fine tune the configuration?
• Infrastructure was highly limited
8. Challenges at the Edge
• Small footprint
• Low power
• Expensive bandwidth
• High latency
• Access to data exceeds
bandwidth (if you're doing
it right)
• Needs recoverability
• Needs to be secured for
both the data plane and
control plane
GATHER
DELIVER
PRIORITIZE
Track from the edge Through the datacenter
9. Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data