O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Distribution Patterns with Apache NiFi
March 2016
Bryan Ben...
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Overview
• Frequently Asked Question:
 “How do I distribute dat...
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pushing to Listeners
NCM
Node 1
ListenHTTP
Node 2
ListenHTTP
Loa...
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling on All Nodes
NCM
Node 1
GetKafka
Node 2
GetKafka
Kafka T...
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling With List & Fetch Operations
NCM
Node 1 (Primary)
ListHD...
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Push
NCM
Node 1
Input Port
Node 2
Input Port
Standa...
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Pull
NCM
Node 1
RPG
Node 2
RPG
Standalone NiFi
Outp...
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
Several mechanism for distributing data across a cluster...
Próximos SlideShares
Carregando em…5
×

Data Distribution Patterns with Apache NiFi

5.391 visualizações

Publicada em

An overview of how to distribute data across a NiFi cluster. Includes pushing to listening processors, pulling on each node, pulling on primary node with redistribution, and site-to-site.

Publicada em: Software

Data Distribution Patterns with Apache NiFi

  1. 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Distribution Patterns with Apache NiFi March 2016 Bryan Bende – Member of Technical Staff
  2. 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Overview • Frequently Asked Question:  “How do I distribute data across my NiFi cluster?” • Answer:  “It depends…” • Typically based on whether the data is being pushed or pulled
  3. 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pushing to Listeners NCM Node 1 ListenHTTP Node 2 ListenHTTP Load Balancer Data Producer Data Producer Data Producer • Processors listening for incoming data on each node in the cluster • Load balancer sitting in front of the cluster pointing at listeners • Data producers push data to load balancer which distributes data across the cluster • Same approach works for ListenSyslog, ListenUDP, and HandleHttpRequest
  4. 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pulling on All Nodes NCM Node 1 GetKafka Node 2 GetKafka Kafka Topic • Works when data source ensures each request gets a unique piece of data • Kafka sees each GetKafka processor as the same client • Each GetKafka processor gets different data
  5. 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pulling With List & Fetch Operations NCM Node 1 (Primary) ListHDFS HDFS FetchHDFS RPG Input Port Node 2 ListHDFS FetchHDFS RPG Input Port • Perform “list” operation on primary node • Send results to Remote Process Group pointing back to same cluster • Redistributes results to all nodes to perform “fetch” in parallel • Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP
  6. 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Push NCM Node 1 Input Port Node 2 Input Port Standalone NiFi RPG • Site-To-Site makes a direct connection between NiFi instances • Either side can be a cluster or standalone instance • In a push scenario, the source connects a Remote Process Group to an Input Port on the destination • If pushing to a cluster, Site-To-Site takes care of load balancing across the nodes in the cluster
  7. 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Pull NCM Node 1 RPG Node 2 RPG Standalone NiFi Output Port • In a pull scenario, the destination connects a Remote Process Group to an Output Port on the source • Each node will pull different data from the source • If the source was a cluster, each node would pull from each node in the source cluster
  8. 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary Several mechanism for distributing data across a cluster… • Push to Listeners with a load balancer in front • Pull from a source that provides a queue • Pull with a List + Fetch approach • Push with Site-to-Site • Pull with Site-to-Site Contact Info • bbende@hortonworks.com • Twitter - @bbende

×