SlideShare uma empresa Scribd logo
1 de 35
Integração de Dados com
Apache Nifi
Marco Garcia
CTO, Founder – Cetax, TutorPro
mgarcia@cetax.com.br
https://www.linkedin.com/in/mgarciacetax/
Com mais de 20 anos de experiência em TI, sendo 18 exclusivamente com Business
Intelligence , Data Warehouse e Big Data, Marco Garcia é certificado pelo Kimball University,
nos EUA, onde obteve aula pessoalmente com Ralph Kimball – um dos principais gurus do
Data Warehouse.
1º Instrutor Certificado Hortonworks LATAM
Arquiteto de Dados e Instrutor na Cetax Consultoria.
02
Apresentação
Fluxos de Dados
• Remote sensor delivery (Internet of Things - IoT)
• Intra-site / Inter-site / global distribution (Enterprise)
• Ingest for feeding analytics (Big Data)
• Data Processing (Simple Event Processing)
Where do we find Data Flow?
SimplisticViewofEnterpriseDataFlow
The Data Flow Thing
Process and Analyze
Data
Acquire Data
Store Data
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
IoT is Driving New
Requirements
IoATDataGrowsFasterThanWeConsumeIt
Much of the new data
exists in-flight between
systems and devices as
part of the Internet of
AnythingNEW
TRADITIONAL
The Opportunity
Unlock transformational business value
from a full fidelity of data and analytics
for all data.
Geolocation
Server logs
Files & emails
ERP, CRM, SCM
Traditional Data Sources
Internet of Anything
Sensors
and machines
Clickstream
Web & social
Internet of Anything is Driving New Requirements
Need trusted insights from data at the very edge to the data lake in real-
time with full-fidelity
Data generated by sensors, machines, geo-location devices, logs, clickstreams, social feeds, etc.
Modern applications need access to both data-in-motion and data-at-rest
IoAT data flows are multi-directional and point-to-point
Very different than existing ETL, data movement, and streaming technologies which are generally one direction
The perimeter is outside the data center and can be very jagged
This “Jagged Edge” creates new opportunity for security, data protection, data governance and provenance
Meeting IoAT Edge Requirements
GATHE
R
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprints
operate with very little power
Limited Bandwidth
can create high latency
Data Availability
exceeds transmission bandwidth
recoverability
Data Must Be Secured
throughout its journey
both the data plane and control plane
The Need for Data Provenance
For Operators
• Traceability, lineage
• Recovery and replay
For Compliance
• Audit trail
• Remediation
For Business
• Value sources
• Value IT investment
BEGIN
END
LINEAGE
The Need for Fine-grained Security and
Compliance
It’s not enough to say you have
encrypted communications
• Enterprise authorization
services –entitlements
change often
• People and systems with
different roles require
difference access levels
• Tagged/classified data
Real-time Data Flow
It’s not just how quickly you
move data – it’s about how
quickly you can change behavior
and seize new opportunities
HDF Powered by Apache NiFi Addresses Modern
Data Flow Challenges
Aggregate all IoAT data from sensors, geo-location devices, machines, logs,
files, and feeds via a highly secure lightweight agent
Collect: Bring Together• Logs
• Files
• Feeds
• Sensors
Mediate point-to-point and bi-directional data flows, delivering data reliably
to real-time applications and storage platforms such as HDP
Conduct: Mediate the Data Flow• Deliver
• Secure
• Govern
• Audit
Parse, filter, join, transform, fork, and clone data in motion to
empower analytics and perishable insights
Curate: Gain Insights• Parse
• Filter
• Transform
• Fork
• Clone
ApacheNifiManagesData-in-Motion
Core
InfrastructureSources
 Constrained
 High-latency
 Localized context
 Hybrid – cloud / on-premises
 Low-latency
 Global context
Regional
Infrastructure
Apache NiFi, Apache MiNiFi, Apache Kafka, Apache Storm are trademarks of the Apache Software Foundation
Developed by the NSA over
the last 8 years.
"NSA's innovators work on
some of the most
challenging national security
problems imaginable,"
"Commercial enterprises
could use it to quickly
control, manage, and
analyze the flow of
information from
geographically dispersed
sites – creating
comprehensive situational
awareness"
-- Linda L. Burger,
Director of the NSA
NiFi Developed by the National Security Agency
November 2014
NiFi is donated to the Apache Software Foundation
(ASF) through NSA’s Technology Transfer Program
and enters ASF’s incubator.
2006
NiagaraFiles (NiFi) was first incepted at the National
Security Agency (NSA)
ABriefHistory
July 2015
NiFi reaches ASF top-level project status
Designed In Response to Real World Demands
Visual User Interface
Drag and drop for efficient, agile operations
Immediate Feedback
Start, stop, tune, replay dataflows in real-time
Adaptive to Volume and Bandwidth
Any data, big or small
Provenance Metadata
Governance, compliance & data evaluation
Secure Data Acquisition & Transport
Fine grained encryption for controlled data sharing
HDF Powered by
Apache NiFi
Apache NiFi
• Powerful and reliable system to process and
distribute data.
• Directed graphs of data routing and
transformation.
• Web-based User Interface for creating,
monitoring, & controlling data flows
• Highly configurable - modify data flow at runtime,
dynamically prioritize data
• Data Provenance tracks data through entire
system
• Easily extensible through development of custom
components [1] https://nifi.apache.org/
Nifi Use Cases
Ingest Logs for Cyber Security:
Integrated and secure log collection for real-time
data analytics and threat detection
Feed Data to Streaming Analytics:
Accelerate big data ROI by streaming data into
analytics systems such as Apache Storm or Apache
Spark Streaming
Data Warehouse Offload:
Convert source data to streaming data and use
HDF for data movement before delivering it for
ETL processing. Enable ETL processing to be
offloaded to Hadoop without having to change
source systems.
Move Data Internally:
Optimize resource utilization by
moving data between data centers or
between on-premises infrastructure
and cloud infrastructure
Capture IoT Data:
Transport disparate and often remote
IoT data in real time, despite any
limitations in device footprint, power
or connectivity—avoiding data loss
Big Data Ingest
Easily and efficiently ingest data into Hadoop
Arquitetura do NIFI
Apache NiFi: The three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and
data plane
Apache NiFi – Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Multi-tenant
Authorization
• Designed for extension
• Clustering
FlowBasedProgramming(FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation, or
mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
NiFiArchitecture
NiFiArchitecture
PrimaryComponents
NiFi executes within a JVM living within a host operating system. The primary components of NiFi then living
within the JVM are as follows:
Web Server
• The purpose of the web server is to host NiFi’s HTTP-based command and control API.
Flow Controller
• The flow controller is the brains of the operation.
• It provides threads for extensions to run on and manages their schedule of when they’ll receive resources to
execute.
Extensions
• There are various types of extensions for NiFi which will be described in other documents.
• But the key point here is that extensions operate/execute within the JVM.
PrimaryComponents(Cont..)
FlowFile Repository
• The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is
presently active in the flow.
• The default approach is a persistent Write-Ahead Log that lives on a specified disk partition.
Content Repository
• The Content Repository is where the actual content bytes of a given FlowFile live.
• The default approach stores blocks of data in the file system.
• More than one file system storage location can be specified so as to get different physical partitions engaged
to reduce contention on any single volume.
Provenance Repository
• The Provenance Repository is where all provenance event data is stored.
• The repository construct is pluggable with the default implementation being to use one or more physical
disk volumes.
• Within each location event data is indexed and searchable.
NiFiCluster
Starting with the NiFi 1.x/HDF-2.x release, a Zero-Master Clustering paradigm is employed.
NiFi Cluster Coordinator:
• A Cluster Coordinator is the node in a NiFI cluster that is responsible managing the nodes in a cluster.
• Determines which nodes are allowed in the cluster.
• Providing the most up-to-date flow to newly joining nodes.
Nodes:
• Each cluster is made up of one or more nodes. The nodes do the actual data processing.
Primary Node:
• Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below).
ZooKeeper Server:
• It is used to automatically elect a Primary Node and cluster co-ordinator.
We will learn in detail about NiFi Cluster in following Lessons..
NiFi - User Interface
• Drag and drop processors to build a flow
• Start, stop, and configure components in real time
• View errors and corresponding error messages
• View statistics and health of data flow
• Create templates of common processor & connections
NiFi - Provenance
• Tracks data at each point as it flows
through the system
• Records, indexes, and makes
events available for display
• Handles fan-in/fan-out, i.e. merging
and splitting data
• View attributes and content at given
points in time
NiFi - Queue Prioritization
• Configure a prioritizer per connection
• Determine what is important for your
data – time based, arrival order,
importance of a data set
• Funnel many connections down to a
single connection to prioritize across
data sets
• Develop your own prioritizer if needed
NiFi - Extensibility
Built from the ground up with extensions in mind
Service-loader pattern for…
• Processors
• Controller Services
• Reporting Tasks
• Prioritizers
Extensions packaged as NiFi Archives (NARs)
• Deploy NiFi lib directory and restart
• Provides ClassLoader isolation
• Same model as standard components
NiFi-Security
Administration
Central management and consistent
security
• Automatic NiFi Cluster Coordinator and Primary Node election with Zookeeper.
• Multiple entry Points
Authentication
Authenticate users and systems
• 2-Way SSL support out of the box; LDAP Integration; Kerberos Integration
Authorization
Provision access to data
• Multitenant Authorization
• File-based authority provider – Global and Component level Access policies
• Ranger Based Authority Provider
Audit
Maintain a record of data access
• Detailed logging of all user actions
• Detailed logging of key system behaviors
• Data Provenance enables unparalleled tracking from the edge through the Lake
Data Protection
Protect data at rest and in motion
• Support a variety of SSL/encrypted protocols
• Tag and utilize tags on data for fine grained access controls
• Encrypt/decrypt content using pre-shared key mechanisms
• Encrypted Passwords in Configuration Files
Initial Admin Manually designate initial
admin user granted access to
the UI
Legacy Authorized Users converted previously configured
users and roles to the multi-
tenant model
Cluster Node
Identities
Secure identities for
each node.
Obrigado !
Visite nos :
www.cetax.com.br
Estamos contratando !

Mais conteúdo relacionado

Mais procurados

4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive DataHortonworks
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...DataWorks Summit
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiDataWorks Summit
 
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Timothy Spann
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...DataWorks Summit
 
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi Vantara
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerDataWorks Summit
 

Mais procurados (20)

4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Data platform evolution
Data platform evolutionData platform evolution
Data platform evolution
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...
Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
 
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
 
Is Your Data Secure
Is Your Data SecureIs Your Data Secure
Is Your Data Secure
 

Semelhante a Integração de Dados com Apache NIFI - Marco Garcia Cetax

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformPaolo Platter
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Cloud Native Day Tel Aviv
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream ProcessingJorge Hirtz
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 

Semelhante a Integração de Dados com Apache NIFI - Marco Garcia Cetax (20)

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Nifi
NifiNifi
Nifi
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
IBM Aspera overview
IBM Aspera overview IBM Aspera overview
IBM Aspera overview
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 

Mais de Marco Garcia

Webinar Carreiras de Dados
Webinar Carreiras de DadosWebinar Carreiras de Dados
Webinar Carreiras de DadosMarco Garcia
 
Cases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaCases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaMarco Garcia
 
Trabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroTrabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroMarco Garcia
 
Webinar carreiras dados
Webinar carreiras dadosWebinar carreiras dados
Webinar carreiras dadosMarco Garcia
 
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosCASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosMarco Garcia
 
Using Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessUsing Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessMarco Garcia
 
Workshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealWorkshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealMarco Garcia
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataMarco Garcia
 
Carreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataCarreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataMarco Garcia
 
Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Marco Garcia
 
Palestra Business Intelligence
Palestra Business IntelligencePalestra Business Intelligence
Palestra Business IntelligenceMarco Garcia
 
O que é Business Intelligence (BI)
O que é Business Intelligence (BI)O que é Business Intelligence (BI)
O que é Business Intelligence (BI)Marco Garcia
 
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosCurso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosMarco Garcia
 
Cursos de Data Warehouse
Cursos de Data WarehouseCursos de Data Warehouse
Cursos de Data WarehouseMarco Garcia
 
Business Intelligence - Palestra
Business Intelligence - PalestraBusiness Intelligence - Palestra
Business Intelligence - PalestraMarco Garcia
 
Modelagem Dimensional
Modelagem DimensionalModelagem Dimensional
Modelagem DimensionalMarco Garcia
 

Mais de Marco Garcia (17)

Webinar Carreiras de Dados
Webinar Carreiras de DadosWebinar Carreiras de Dados
Webinar Carreiras de Dados
 
Cases Big Data Aplicados a logística
Cases Big Data Aplicados a logísticaCases Big Data Aplicados a logística
Cases Big Data Aplicados a logística
 
Trabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado FinanceiroTrabalhos Big Data e Algoritmos - Mercado Financeiro
Trabalhos Big Data e Algoritmos - Mercado Financeiro
 
Webinar carreiras dados
Webinar carreiras dadosWebinar carreiras dados
Webinar carreiras dados
 
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e AlgorítmosCASES Cetax de Inteligência em Saúde - Dados e Algorítmos
CASES Cetax de Inteligência em Saúde - Dados e Algorítmos
 
Using Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing BusinessUsing Data To Tranform Your Business - Marketing Business
Using Data To Tranform Your Business - Marketing Business
 
Live - BigData
Live - BigDataLive - BigData
Live - BigData
 
Workshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x DealWorkshop BigData, Hadoop e Data Science - Cetax x Deal
Workshop BigData, Hadoop e Data Science - Cetax x Deal
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigData
 
Carreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big DataCarreiras em Business Intelligence e Big Data
Carreiras em Business Intelligence e Big Data
 
Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é Big Data - Artigo, Conceito, o Que é
Big Data - Artigo, Conceito, o Que é
 
Palestra Business Intelligence
Palestra Business IntelligencePalestra Business Intelligence
Palestra Business Intelligence
 
O que é Business Intelligence (BI)
O que é Business Intelligence (BI)O que é Business Intelligence (BI)
O que é Business Intelligence (BI)
 
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e FundamentosCurso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
Curso de Business Intelligence e Data Warehouse - Conceitos e Fundamentos
 
Cursos de Data Warehouse
Cursos de Data WarehouseCursos de Data Warehouse
Cursos de Data Warehouse
 
Business Intelligence - Palestra
Business Intelligence - PalestraBusiness Intelligence - Palestra
Business Intelligence - Palestra
 
Modelagem Dimensional
Modelagem DimensionalModelagem Dimensional
Modelagem Dimensional
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 

Último (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 

Integração de Dados com Apache NIFI - Marco Garcia Cetax

  • 1. Integração de Dados com Apache Nifi Marco Garcia CTO, Founder – Cetax, TutorPro mgarcia@cetax.com.br https://www.linkedin.com/in/mgarciacetax/
  • 2. Com mais de 20 anos de experiência em TI, sendo 18 exclusivamente com Business Intelligence , Data Warehouse e Big Data, Marco Garcia é certificado pelo Kimball University, nos EUA, onde obteve aula pessoalmente com Ralph Kimball – um dos principais gurus do Data Warehouse. 1º Instrutor Certificado Hortonworks LATAM Arquiteto de Dados e Instrutor na Cetax Consultoria. 02 Apresentação
  • 4. • Remote sensor delivery (Internet of Things - IoT) • Intra-site / Inter-site / global distribution (Enterprise) • Ingest for feeding analytics (Big Data) • Data Processing (Simple Event Processing) Where do we find Data Flow?
  • 5. SimplisticViewofEnterpriseDataFlow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 6. Basics of Connecting Systems For every connection, these must agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 Producer C1 Consumer
  • 7. IoT is Driving New Requirements
  • 8. IoATDataGrowsFasterThanWeConsumeIt Much of the new data exists in-flight between systems and devices as part of the Internet of AnythingNEW TRADITIONAL The Opportunity Unlock transformational business value from a full fidelity of data and analytics for all data. Geolocation Server logs Files & emails ERP, CRM, SCM Traditional Data Sources Internet of Anything Sensors and machines Clickstream Web & social
  • 9. Internet of Anything is Driving New Requirements Need trusted insights from data at the very edge to the data lake in real- time with full-fidelity Data generated by sensors, machines, geo-location devices, logs, clickstreams, social feeds, etc. Modern applications need access to both data-in-motion and data-at-rest IoAT data flows are multi-directional and point-to-point Very different than existing ETL, data movement, and streaming technologies which are generally one direction The perimeter is outside the data center and can be very jagged This “Jagged Edge” creates new opportunity for security, data protection, data governance and provenance
  • 10. Meeting IoAT Edge Requirements GATHE R DELIVER PRIORITIZE Track from the edge Through to the datacenter Small Footprints operate with very little power Limited Bandwidth can create high latency Data Availability exceeds transmission bandwidth recoverability Data Must Be Secured throughout its journey both the data plane and control plane
  • 11. The Need for Data Provenance For Operators • Traceability, lineage • Recovery and replay For Compliance • Audit trail • Remediation For Business • Value sources • Value IT investment BEGIN END LINEAGE
  • 12. The Need for Fine-grained Security and Compliance It’s not enough to say you have encrypted communications • Enterprise authorization services –entitlements change often • People and systems with different roles require difference access levels • Tagged/classified data
  • 13. Real-time Data Flow It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities
  • 14. HDF Powered by Apache NiFi Addresses Modern Data Flow Challenges Aggregate all IoAT data from sensors, geo-location devices, machines, logs, files, and feeds via a highly secure lightweight agent Collect: Bring Together• Logs • Files • Feeds • Sensors Mediate point-to-point and bi-directional data flows, delivering data reliably to real-time applications and storage platforms such as HDP Conduct: Mediate the Data Flow• Deliver • Secure • Govern • Audit Parse, filter, join, transform, fork, and clone data in motion to empower analytics and perishable insights Curate: Gain Insights• Parse • Filter • Transform • Fork • Clone
  • 15. ApacheNifiManagesData-in-Motion Core InfrastructureSources  Constrained  High-latency  Localized context  Hybrid – cloud / on-premises  Low-latency  Global context Regional Infrastructure Apache NiFi, Apache MiNiFi, Apache Kafka, Apache Storm are trademarks of the Apache Software Foundation
  • 16. Developed by the NSA over the last 8 years. "NSA's innovators work on some of the most challenging national security problems imaginable," "Commercial enterprises could use it to quickly control, manage, and analyze the flow of information from geographically dispersed sites – creating comprehensive situational awareness" -- Linda L. Burger, Director of the NSA NiFi Developed by the National Security Agency
  • 17. November 2014 NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator. 2006 NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA) ABriefHistory July 2015 NiFi reaches ASF top-level project status
  • 18. Designed In Response to Real World Demands Visual User Interface Drag and drop for efficient, agile operations Immediate Feedback Start, stop, tune, replay dataflows in real-time Adaptive to Volume and Bandwidth Any data, big or small Provenance Metadata Governance, compliance & data evaluation Secure Data Acquisition & Transport Fine grained encryption for controlled data sharing HDF Powered by Apache NiFi
  • 19. Apache NiFi • Powerful and reliable system to process and distribute data. • Directed graphs of data routing and transformation. • Web-based User Interface for creating, monitoring, & controlling data flows • Highly configurable - modify data flow at runtime, dynamically prioritize data • Data Provenance tracks data through entire system • Easily extensible through development of custom components [1] https://nifi.apache.org/
  • 20. Nifi Use Cases Ingest Logs for Cyber Security: Integrated and secure log collection for real-time data analytics and threat detection Feed Data to Streaming Analytics: Accelerate big data ROI by streaming data into analytics systems such as Apache Storm or Apache Spark Streaming Data Warehouse Offload: Convert source data to streaming data and use HDF for data movement before delivering it for ETL processing. Enable ETL processing to be offloaded to Hadoop without having to change source systems. Move Data Internally: Optimize resource utilization by moving data between data centers or between on-premises infrastructure and cloud infrastructure Capture IoT Data: Transport disparate and often remote IoT data in real time, despite any limitations in device footprint, power or connectivity—avoiding data loss Big Data Ingest Easily and efficiently ingest data into Hadoop
  • 22. Apache NiFi: The three key concepts • Manage the flow of information • Data Provenance • Secure the control plane and data plane
  • 23. Apache NiFi – Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Multi-tenant Authorization • Designed for extension • Clustering
  • 24. FlowBasedProgramming(FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 27. PrimaryComponents NiFi executes within a JVM living within a host operating system. The primary components of NiFi then living within the JVM are as follows: Web Server • The purpose of the web server is to host NiFi’s HTTP-based command and control API. Flow Controller • The flow controller is the brains of the operation. • It provides threads for extensions to run on and manages their schedule of when they’ll receive resources to execute. Extensions • There are various types of extensions for NiFi which will be described in other documents. • But the key point here is that extensions operate/execute within the JVM.
  • 28. PrimaryComponents(Cont..) FlowFile Repository • The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow. • The default approach is a persistent Write-Ahead Log that lives on a specified disk partition. Content Repository • The Content Repository is where the actual content bytes of a given FlowFile live. • The default approach stores blocks of data in the file system. • More than one file system storage location can be specified so as to get different physical partitions engaged to reduce contention on any single volume. Provenance Repository • The Provenance Repository is where all provenance event data is stored. • The repository construct is pluggable with the default implementation being to use one or more physical disk volumes. • Within each location event data is indexed and searchable.
  • 29. NiFiCluster Starting with the NiFi 1.x/HDF-2.x release, a Zero-Master Clustering paradigm is employed. NiFi Cluster Coordinator: • A Cluster Coordinator is the node in a NiFI cluster that is responsible managing the nodes in a cluster. • Determines which nodes are allowed in the cluster. • Providing the most up-to-date flow to newly joining nodes. Nodes: • Each cluster is made up of one or more nodes. The nodes do the actual data processing. Primary Node: • Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper Server: • It is used to automatically elect a Primary Node and cluster co-ordinator. We will learn in detail about NiFi Cluster in following Lessons..
  • 30. NiFi - User Interface • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  • 31. NiFi - Provenance • Tracks data at each point as it flows through the system • Records, indexes, and makes events available for display • Handles fan-in/fan-out, i.e. merging and splitting data • View attributes and content at given points in time
  • 32. NiFi - Queue Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 33. NiFi - Extensibility Built from the ground up with extensions in mind Service-loader pattern for… • Processors • Controller Services • Reporting Tasks • Prioritizers Extensions packaged as NiFi Archives (NARs) • Deploy NiFi lib directory and restart • Provides ClassLoader isolation • Same model as standard components
  • 34. NiFi-Security Administration Central management and consistent security • Automatic NiFi Cluster Coordinator and Primary Node election with Zookeeper. • Multiple entry Points Authentication Authenticate users and systems • 2-Way SSL support out of the box; LDAP Integration; Kerberos Integration Authorization Provision access to data • Multitenant Authorization • File-based authority provider – Global and Component level Access policies • Ranger Based Authority Provider Audit Maintain a record of data access • Detailed logging of all user actions • Detailed logging of key system behaviors • Data Provenance enables unparalleled tracking from the edge through the Lake Data Protection Protect data at rest and in motion • Support a variety of SSL/encrypted protocols • Tag and utilize tags on data for fine grained access controls • Encrypt/decrypt content using pre-shared key mechanisms • Encrypted Passwords in Configuration Files Initial Admin Manually designate initial admin user granted access to the UI Legacy Authorized Users converted previously configured users and roles to the multi- tenant model Cluster Node Identities Secure identities for each node.
  • 35. Obrigado ! Visite nos : www.cetax.com.br Estamos contratando !

Notas do Editor

  1. Where do we find Data Flow? Every Moving Metal have sensors nowadays, transferring its data In and out of It. Enterprise data, from chains or data end points towards Central data center or a Data hub before reaching Central ware house. Social Media Information tweets, posts, comments likes, Click stream data for analytics Simple Massaging and processing of data as it arrives.
  2. Simplistic View of Enterprise Data Flow - The Diagram above shows Simplistic View of Enterprise Data Flow, how a data flow solution helps acquire, process, analyze and store Data.
  3. Basics of Connecting Systems When we look at basics of connecting systems these must agree: Protocol Format Schema Priority Size of event Frequency of event Authorization access Relevance
  4. IoAT Data Grows Faster Than We Consume It The emergence and explosion from the Internet of Anything data has put tremendous pressure on the existing platforms.   - The data from these new paradigm sources has created several key challenges:   Exponential Growth. As of 2013 there was an estimated 2.8ZB [Zettabyte] of data across the cybersphere, and that is expected to grow to 44ZB by 2020, with 85% of this data growth coming from new types of data including connected devices.   Varied Nature. The incoming data can have little or no structure, or structure that changes too frequently for reliable schema creation at time of ingest.   Value at High Volumes. The incoming data can have little or no value as individual, or small groups of, records. But at high volumes and longer historical perspectives can be inspected for patterns and used for advanced analytic applications.   This New Data Paradigm opens up the Opportunity for both an architectural and business transformation that applies to virtually every industry.   abbreviation Enterprise resource planning (ERP) Customer Relationship Management (CRM ) Supply Chain Management (SCM)
  5. Internet of Anything is Driving New Requirements As more and more data is generated from the Internet of Anything (IoAT) including from sensors, geo-location devices, server logs, clicks, machines, social feeds, as well as any other data source at the edge, the technical issue of securely ingesting and processing the data from the “jagged edge” is an issue. Customers and developers have no choice but to create custom, disjointed and loosely integrated solutions to solve the problem of analyzing data and providing insights. Traditional data and multiple streams from a variety of sources created the need for those custom solutions thus driving up cost and complexity.
  6. The IoAT data edges created specific data flow requirements that Hortonworks DataFlow satisfies: Edges with small footprints operate with very little power Limited bandwidth and high latency are commonplace Data availability often exceeds transmission bandwidth Data must be secured throughout its journey
  7. Who all Need Data Provenance and why? - For Operators- Traceability, lineage, Recovery and replay - For Compliance - Audit trail, Remediation - For Business - Value sources, Value IT investment
  8. The Need for Fine-grained Security and Compliance - LDAP Integration coming up as pluggable authentication - User roles and control with different access levels. - Tagging the data with priority or classification
  9. Real-time Data Flow - Leverege IoT platform makes extensive use of HDP already; they basically host the platform for customers like “Special Forces” - They’re looking at NiFi to replace the Ingestors and Translators portion of their architecture - NiFi would then flow the data into Kafka for downstream data delivery to real-time and historical analytic applications - NiFi gives them the ability to add new data feeds (with corresponding NiFi processors) in a matter of hours (rather than days/weeks); they use a JSON spec file that contains the info needed to plumb in the new NiFi processor - NiFi data provenance capabilities are a big value (knowing where data come from and tracking where/how it flows is a key operational capability) - NiFi’s logging and tracing capabilities make it easy to debug dataflows, and NiFi’s ability to replay flows is invaluable as well (ex. they were able to replay a weeks worth of inbound data in an hour) - They like the ability to fork a flow to plug in a new processor (agility is a key attribute) - Leverege is not dealing with large volumes (ex. only dealing with 1000’s of messages per minute) so they have no input into scalability / sizing yet - NiFi is currently running on 2 servers PRESCIENT EDGE NOTES: - “Traveler Safety” is a key application they provide - They built their own “data curation” toolset (comprised of lots of Python scripts) for getting data from a range of sources - 355 independent data sources, with many sources being aggregators of other data sources; so they deal with a total of ~3,500 sources in aggregate - Sources are mostly from IP endpoints such as Twitter feeds to Closed Caption video feeds (that they¹re interested in scraping through the video file for travel security-related breaking news items) - Existing tools lacked data provenance, so they looked at NiFi and got very excited at its capabilities - They wrapped their existing toolset of Python scripts as NiFi processors which makes them available with NiFi tool with consistent provenance capabilities - NiFi provides the "data curation" and “fork in the road” capability they need to deal with data before storing in SAP Hana (and potentially other data systems including HDP) - SAP Hana provides a COTS solution for geo-coding, language translation from 37 languages, and visualization abilities thought SAP tools for their “Traveler Safety” app - They¹re using SAP tools since it helped them accelerate time to solution (I.e. They don¹t have a lot of time and resources to build analytic apps and visualizations from raw open source tools) - Their application is able to dynamically draw threat zone and WITH NIFI, they are able to tie back to the specific data sources that were involved in flagging the threat WARGAMING.NET NOTES: - Using Hadoop (CDH in pre-prod and Oracle BDA 12 nodes / 700TB) in prod - Lots of data in relational DBMS¹s - LOGs in MySQL, managing schemas, changing databases, etc. - Funnel all data into Oracle BDA on a daily batch for Impala and Hive and then Oracle database for downstream aggregated reports and Tableau - Looking to use NiFi to front-end data flow that forks into Kafka and HDFS (they use Avro to format the HDFS data) - Using Kafka for enterprise analytical events/messaging bus; while NiFi may do some similar things, they¹re committed to Kafka as the standard messaging protocol - They also aggregate game stats (how many kills, shots fired, etc.) and store those logs into S3 using Amazon Kinesis; they then pull down from there for analytic needs with Hadoop - They essentially see NiFi as the data Collector and pipeline Conductor that ultimately forks the data flow into Kafka steam and HDFS stream - The thing the like about NiFi is that it enables them to hand a runbook and a the NiFi tool to the Ops team who can operate the dataflows, start/stop processors when needed, etc. without a Java developer having to be involved every time something goes wrong or generate warnings/errors. Less beepers for developers == good.
  10. HDF Powered by Apache NiFi Addresses Modern Data Flow Challenges - HDF provides 3 key capabilities – the ability to collect data from different types of data sources via a highly secure lightweigt agent, the ability to mediate the data flow to/from the data source and the “collector”, and the ability to trace, parse, transform data in motion to enable analytics and derive insights within an operationally relevant time window. Systems fail Networks fail, disks fail, software crashes, people make mistakes. Data access exceeds capacity to consume Sometimes a given data source can outpace some part of the processing or delivery chain - it only takes one weak-link to have an issue. Boundary conditions are mere suggestions You will invariably get data that is too big, too small, too fast, too slow, corrupt, wrong, or in the wrong format. What is noise one day becomes signal the next Priorities of an organization change - rapidly. Enabling new flows and changing existing ones must be fast. Systems evolve at different rates The protocols and formats used by a given system can change anytime and often irrespective of the systems around them. Dataflow exists to connect what is essentially a massively distributed system of components that are loosely or not-at-all designed to work together. Compliance and security Laws, regulations, and policies change. Business to business agreements change. System to system and system to user interactions must be secure, trusted, accountable. Continuous improvement occurs in production It is often not possible to come even close to replicating production environments in the lab.
  11. Hortonworks: Powering the Future of Data
  12. NiFi Developed by the National Security Agency Hortonworks DataFlow is based on technology originally created by the NSA that encountered big data collection and processing issues at a scale and stage that is beyond most enterprise implementations today. Dataflow was designed inherently to meet the timely decision making needs from collecting and analyzing data from a wide range of disparate data sources, securely, efficiently and over a geographically disperse and possibly fragmented network the likes of which are becoming commonplace in many industries today. Deployed at scale for almost a decade before being contributed to the Open Source Community Hortonworks Dataflow has been proven to be an excellent and effective tool that integrates the most common current and future needs of big data acquisition and ingestion for accurately informed, on-time decision making.  
  13. A Brief History Of NiFi 2006 - NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA) November 2014 - NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator. July 2015 - NiFi reaches ASF top-level project status
  14. HDF Designed In Response to Real World Demands HDF provides a number of benefits to customers, developers, data stewards including: Use of standard open source software with the Hortonworks DataFlow powered by Apache NiFi and Hortonworks Data Platform powered by Apache Hadoop An easy, web-based, seamless experience that allows for simple drag and drop design, control, feedback, and monitoring of all data sources “Off the Shelf” A highly configurable solution that optimizes for high throughput and low bandwidth on all data Fine-grained provenance metadata supporting compliance and governance Secure end-to-end data routing includes encryption & compression SSL, SSH, HTTPS, encrypted content Pluggable role-based authentication/authorization
  15. Apache NiFi Powerful and reliable system to process and distribute data. Directed graphs of data routing and transformation. Web-based User Interface for creating, monitoring, & controlling data flows Highly configurable - modify data flow at runtime, dynamically prioritize data Data Provenance tracks data through entire system Easily extensible through development of custom components
  16. HDF Use Cases They optimize their Splunk investment by pre-filtering data before sending to Splunk for storage. They ingest logs for cyber security and threat detection. They feed data to streaming analytics engines like Apache Spark or Apache Storm (both of which ship with Hortonworks Data Platform). They move their own data internally between data centers on premises or to the cloud. And of course, they capture data from the Internet of Things. HDF was originally designed to be robust, so that it could continue to move data despite varying device footprints or fluctuating power or connectivity levels. The data keeps flowing, without being lost in transit. Predictive Analytics - Ensure the highest value data is captured and available for analysis Fraud Detection - Move sales transaction data in real time to analyze on demand Accelerated Data Collection - An integrated, data collection platform with full transparency into provenance and flow of data IoT Optimization - Secure, Prioritize, Enrich and Trace data at the edge Big Data Ingest - Easily and efficiently ingest data into Hadoop You can find more Details on use cases below: http://hortonworks.com/hdf/use-cases/
  17. 3 Main Central theme of NiFi Really Solid Flow control/management of bidirectional data flow. Fine grain details of data and its life cycle. With UI which solves some enterprise problems of data governance. Rock solid security on Data and Control
  18. Apache NiFi – Key Features Guaranteed Delivery A core philosophy of NiFi has been that even at very high scale, guaranteed delivery is a must. This is achieved through effective use of a purpose-built persistent write-ahead log and content repository. Together they are designed in such a way as to allow for very high transaction rates, effective load-spreading, copy-on-write, and play to the strengths of traditional disk read/writes. Data Buffering w/ Back Pressure and Pressure Release NiFi supports buffering of all queued data as well as the ability to provide back pressure as those queues reach specified limits or to age off data as it reaches a specified age (its value has perished). Prioritized Queuing NiFi allows the setting of one or more prioritization schemes for how data is retrieved from a queue. The default is oldest first, but there are times when data should be pulled newest first, largest first, or some other custom scheme. Flow Specific QoS (latency v throughput, loss tolerance, etc.) There are points of a dataflow where the data is absolutely critical and it is loss intolerant. There are also times when it must be processed and delivered within seconds to be of any value. NiFi enables the fine-grained flow specific configuration of these concerns. Data Provenance NiFi automatically records, indexes, and makes available provenance data as objects flow through the system even across fan-in, fan-out, transformations, and more. This information becomes extremely critical in supporting compliance, troubleshooting, optimization, and other scenarios. Recovery / Recording a rolling buffer of fine-grained history NiFi’s content repository is designed to act as a rolling buffer of history. Data is removed only as it ages off the content repository or as space is needed. This combined with the data provenance capability makes for an incredibly useful basis to enable click-to-content, download of content, and replay, all at a specific point in an object’s lifecycle which can even span generations. Visual Command and Control Dataflows can become quite complex. Being able to visualize those flows and express them visually can help greatly to reduce that complexity and to identify areas that need to be simplified. NiFi enables not only the visual establishment of dataflows but it does so in real-time. Rather than being design and deploy it is much more like molding clay. If you make a change to the dataflow that change immediately takes effect. Changes are fine-grained and isolated to the affected components. You don’t need to stop an entire flow or set of flows just to make some specific modification. Flow Templates Dataflows tend to be highly pattern oriented and while there are often many different ways to solve a problem, it helps greatly to be able to share those best practices. Templates allow subject matter experts to build and publish their flow designs and for others to benefit and collaborate on them. Security System to system A dataflow is only as good as it is secure. NiFi at every point in a dataflow offers secure exchange through the use of protocols with encryption such as 2-way SSL. In addition NiFi enables the flow to encrypt and decrypt content and use shared-keys or other mechanisms on either side of the sender/recipient equation. User to system NiFi enables 2-Way SSL authentication and provides pluggable authorization so that it can properly control a user’s access and at particular levels (read-only, dataflow manager, admin). If a user enters a sensitive property like a password into the flow, it is immediately encrypted server side and never again exposed on the client side even in its encrypted form. Designed for Extension NiFi is at its core built for extension and as such it is a platform on which dataflow processes can execute and interact in a predictable and repeatable manner. Points of extension Processors, Controller Services, Reporting Tasks, Prioritizers, Customer User Interfaces Classloader Isolation For any component-based system, dependency nightmares can quickly occur. NiFi addresses this by providing a custom class loader model, ensuring that each extension bundle is exposed to a very limited set of dependencies. As a result, extensions can be built with little concern for whether they might conflict with another extension. The concept of these extension bundles is called NiFi Archives and will be discussed in greater detail in the developer’s guide. Clustering (scale-out) NiFi is designed to scale-out through the use of clustering many nodes together as described above. If a single node is provisioned and configured to handle hundreds of MB/s then a modest cluster could be configured to handle GB/s. This then brings about interesting challenges of load balancing and fail-over between NiFi and the systems from which it gets data. Use of asynchronous queuing based protocols like messaging services, Kafka, etc., can help. Use of NiFi’s -to-site feature is also very effective as it is a protocol that allows NiFi and a client (could be another NiFi cluster) to talk to each other, share information about loading, and to exchange data on specific authorized ports.
  19. Flow Based Programming (FBP) Introducing Flow Based Programming fundamentals, why they matter, and how NiFi adopts them FlowFile Unit of data moving through the system Content + Attributes (key/value pairs) Processor Performs the work, can access FlowFiles Connection Links between processors Queues that can be dynamically prioritized Process Group Set of processors and their connections Receive data via input ports, send data via output ports
  20. NiFi Architecture Introducing the architecture of NiFi. NiFi executes within a JVM living within a host operating system. The primary components of NiFi then living within the JVM are described in following slides.
  21. NiFi Architecture Introducing the architecture of NiFi. NiFi executes within a JVM living within a host operating system. The primary components of NiFi then living within the JVM are described in following slides.
  22. Primary Components NiFi executes within a JVM living within a host operating system. The primary components of NiFi then living within the JVM are as follows: Web Server The purpose of the web server is to host NiFi’s HTTP-based command and control API. Flow Controller The flow controller is the brains of the operation. It provides threads for extensions to run on and manages their schedule of when they’ll receive resources to execute. Extensions There are various types of extensions for NiFi which will be described in other documents. But the key point here is that extensions operate/execute within the JVM. Custom processors NiFi Plugins for applications to talk to Ports Controller services
  23. Primary Components FlowFile Repository The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow. The default approach is a persistent Write-Ahead Log that lives on a specified disk partition. Content Repository The Content Repository is where the actual content bytes of a given FlowFile live. The default approach stores blocks of data in the file system. More than one file system storage location can be specified so as to get different physical partitions engaged to reduce contention on any single volume. Provenance Repository The Provenance Repository is where all provenance event data is stored. The repository construct is pluggable with the default implementation being to use one or more physical disk volumes. Within each location event data is indexed and searchable.
  24. NiFi Cluster Components NiFi is also able to operate within a cluster, components are: NiFi Cluster Coordinator: A Cluster Coordinator is the node in a NiFI cluster that is responsible managing the nodes in a cluster. Determines which nodes are allowed in the cluster. Providing the most up-to-date flow to newly joining nodes. NiFi Nodes (Node) : These nodes do the actual data processing. Primary Node: The first node who joined the cluster, who can run Isolated Processors. We will learn in detail about NiFi Cluster in following Lessons..
  25. NiFi User Interface The NiFi User Interface (UI) provides mechanisms for creating automated dataflows, as well as visualizing, editing, monitoring, and administering those dataflows. The UI can be broken down into several segments, each responsible for different functionality of the application. This section provides screenshots of the application and highlights the different segments of the UI. When the application is started, the user is able to navigate to the User Interface by going to the default address of http://<hostname>:8080/nifi in a web browser. There are no permissions configured by default, so anyone is able to view and modify the dataflow.
  26. Data Provenance While monitoring a dataflow, users often need a way to determine what happened to a particular data object (FlowFile). NiFi’s Data Provenance page provides that information. Because NiFi records and indexes data provenance details as objects flow through the system, users may perform searches, conduct troubleshooting and evaluate things like dataflow compliance and optimization in real time. By default, NiFi updates this information every five minutes, but that is configurable.
  27. NiFi - Queue Prioritization Configure a prioritizer per connection Determine what is important for your data – time based, arrival order, importance of a data set Funnel many connections down to a single connection to prioritize across data sets Develop your own prioritizer if needed
  28. NiFi – Extensibility Built from the ground up with extensions in mind Extensions packaged as NiFi Archives (NARs) Deploy NiFi lib directory and restart Provides ClassLoader isolation Same model as standard components Service-loader pattern for… Processors Controller Services Reporting Tasks Prioritizers
  29. NiFi Security NiFi provides several different configuration options for security purposes. The most important properties are those under the "security properties" heading in the nifi.properties file. NiFi supports user authentication via client certificates or via username/password. Username/password authentication is performed by a Login Identity Provider.