SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
Hortonworks  DataFlow
Enterprise  Data  Flow  powered  by  Apache  NiFi
Mats  Johansson
Solutions  Engineer  -­ EMEA
©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Page  2 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
Disclaimer
This  document  may  contain  product  features  and  technology  directions  that  are  under  
development,  may  be  under  development  in  the  future  or  may  ultimately  not  be  
developed.
Project  capabilities  are  based  on  information  that  is  publicly  available  within  the  Apache  
Software  Foundation  project  websites  ("Apache").    Progress  of  the  project  capabilities  
can  be  tracked  from  inception  to  release  through  Apache,  however,  technical  feasibility,  
market  demand,  user  feedback  and  the  overarching  Apache  Software  Foundation  
community  development  process  can  all  effect  timing  and  final  delivery.
This  document’s  description  of  these  features  and  technology  directions  does  not  
represent  a  contractual  commitment,  promise  or  obligation  from  Hortonworks  to  deliver  
these  features  in  any  generally  available  product.
Product  features  and  technology  directions  are  subject  to  change,  and  must  not  be  
included  in  contracts,  purchase  orders,  or  sales  agreements  of  any  kind.
Since  this  document  contains  an  outline  of  general  product  development  plans,  
customers  should  not  rely  upon  it  when  making  purchasing  decisions.
Page  3 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
IoAT Data  Grows  Faster  Than  We  Consume  It
Much  of  the  new  data  
exists  in-­flight,  between  
systems  and  devices  as  
part  of  the  Internet  of  
AnythingNEW
TRADITIONAL
The  Opportunity
Unlock  transformational  business  value
from  a  full  fidelity  of  data  and  analytics
for  all  data.
Geolocation
Server  logs
Files &  emails
ERP,  CRM,  SCM
Traditional  Data  Sources
Internet  of  Anything
Sensors
and machines
Clickstream
Social  media
Page  4 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
Internet  of  Anything  is  Driving  New  Requirements
Need  trusted  insights  from  data  at  the  very  edge  to  the  data  lake  in  real-­
time  with  full-­fidelity
– Data  generated  by  sensors,  machines,  geo-­location  devices,  logs,  clickstreams,  social  feeds,  etc.  
Modern  applications need  access  to  both  data-­in-­motion  and  data-­at-­rest
IoAT data  flows  are  multi-­directional  and  point-­to-­point
– Very  different  than  existing  ETL,  data  movement,  and  streaming  technologies  which  are  generally  one  direction
The  perimeter  is  outside  the  data  center  and  can  be  very  jagged
– This  “Jagged  Edge”  creates  new  opportunity  for  security,  data  protection,  data  governance  and  provenance
Page  5 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
Architectural  Limitations  Today
• Traditional  data  movement  software  has  been  built  for  the  world  of  
standardized data  and  one  way  flows
• Tools  built  for  newer  types  of  data  tend  to  be  custom,  difficult  to  
manage,  and  architecturally  disjoint
• Businesses  can  not  easily  collect,  conduct,  and  curate  secure  multi-­
directional  and  point-­to-­point  IoAT data  flows
• IoAT data  flows  are  not  optimized  and  use  costly/limited  bandwidth  and  
cannot  dynamically  prioritize  the  most  valuable  data
• Difficult  to  gain  actionable  insights  from  the  combination  of  data-­in-­
motion  and  data-­at-­rest
Page   6 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
The  IoAT Data  Flow
Hortonworks  Data  Platform
powered  by  Apache  Hadoop
Hortonworks  Data  Platform
powered  by  Apache  Hadoop
Enrich
Context
Store  Data  
and  Metadata
Internet
of  Anything
Hortonworks  DataFlow  
powered  by  Apache  NiFi
Perishable  
Insights
Historical
Insights
Introducing  Hortonworks  DataFlow
Hortonworks  DataFlow  and  the  Hortonworks  Data  Platform  
deliver  the  industry’s  most  complete  solution  for  management  of  Big  Data.
Page   7 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Simplistic  View  of  IoAT &  Data  Flow
The  Data  Flow  Thing
Process  and  
Analyze  Data
Acquire  Data
Store  Data
Page   8 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Global  interactions  with  customers,  business  partners,  and  things
spanning  different  volume,  velocity,  bandwidth,  and  latency  needs
Realistic  View  of  IoAT and  Data  Flow
Page   9 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Meeting  IoAT Edge  Requirements
GATHE
R
DELIVER
PRIORITIZE
Track  from  the  edge Through  to  the  datacenter
Small  Footprints
operate  with  very  little  power
Limited  Bandwidth
can  create  high  latency
Data  Availability
exceeds  transmission  bandwidth
Data  Must  Be  Secured
throughout  its  journey
Page   10 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Dataflow  requirements  within  the  Data  Center
Understanding
Ability  to  observe  precisely  how  systems  exchange  data  in  real-­time  and  historically
Agility
Ability  to  interact  with  and  alter  live  flows  and  iterate  on  new  ones
Dynamic  Access  Controls
The  entitlements  of  users  and  systems  and  sensitivity  of  data  can  change  frequently
Cross  Cutting  Concerns
Address  common  needs  once  like  enrichment,  filtering,  transformation
Enable  architecture  transition
Legacy  vs modern  is  an  ‘always’  event.    Format,  schema,  protocol  conversion  is  routine
Page  11 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
Apache  NiFi:  Collect,  Conduct,  Curate
Aggregate  all  IoAT data  from  sensors,  geo-­location  devices,  
machines,  logs,  files,  and  feeds  via  a  highly  secure  lightweight  agent
Collect:        Bring  Together• Logs
• Files
• Feeds
• Sensors
Mediate  point-­to-­point  and  bi-­directional  data  flows,  delivering  data  
reliably  to  real-­time  applications  and  storage  platforms  such  as  HDP
Conduct:    Mediate  the  Data  Flow• Deliver
• Secure
• Govern
• Audit
Parse,  filter,  join,  transform,  fork,  and  clone  data  in  motion  to  
empower  analytics  and  perishable  insights
Curate:        Gain  Insights• Parse
• Filter
• Transform
• Fork
• Clone
Page  12 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved
November  2014
NiFi is  donated  to  the  Apache  Software  Foundation  
(ASF)  through  NSA’s  Technology  Transfer  Program  
and  enters  ASF’s  incubator.
2006
NiagaraFiles (NiFi)  was  first  incepted  by  Joe  Witt  at  
the  National  Security  Agency  (NSA)
A  Brief  History  of  Apache  Nifi
July  2015
NiFi reaches  ASF  top-­level  project  status
Page   13 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Apache  NiFi:  Three  key  concepts
• Manage  the  flow  of  information
• Data  Provenance
• Secure  the  control  plane  and  data  plane
Page   14 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Apache  NiFi  – Key  Features
• Guaranteed  delivery
• Data  buffering  
- Backpressure
- Pressure  release
• Prioritized  queuing
• Flow  specific  QoS
- Latency  vs.  throughput
- Loss  tolerance
• Data  provenance
• Recovery/recording  
a  rolling  log  of  fine-­
grained  history
• Visual  command  and  
control
• Flow  templates
• Pluggable/multi-­role  
security
• Designed  for  extension
• Clustering
Page   15 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Common  Apache  NiFi Use  Cases
Predictive  Analytics
Ensure  the  highest  value  data  is  captured  and  available  for  analysis
Compliance
Gain  full  transparency  into  provenance  and  flow  of  data  
IoT Optimization
Secure,  Prioritize,  Enrich  and  Trace  data  at  the  edge
Fraud  Detection
Move  sales  transaction  data  in  real  time  to  analyze  on  demand  
Big  Data  Ingest
Easily  and  efficiently  ingest  data  into  Hadoop
Value  Resources
Gain  visibility  into  how  data  sources  are  used  to  determine  value
Page   16 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Flow  Based  Programming  (FBP)
FBP  Term NiFi Term Description
Information  
Packet
FlowFile Each object  moving  through  the  system.
Black Box FlowFile  
Processor
Performs  the  work, doing  some  combination  of  data  routing,  
transformation,  or  mediation  between  systems.
Bounded  
Buffer
Connection The  linkage between  processors, acting  as  queues  and  allowing  various  
processes  to  interact  at  differing  rates.
Scheduler Flow  
Controller
Maintains  the  knowledge  of  how  processes  are  connected, and  manages  
the  threads  and  allocations  thereof  which  all  processes  use.
Subnet Process  
Group
A  set  of  processes  and  their  connections,  which  can  receive  and  send  
data  via  ports.  A  process group  allows  creation  of  entirely  new  
component  simply  by  composition  of  its components.
Page   17 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Hortonworks Data  Flow
Visual  User  Interface
HTML  5,  drag  and  drop,  for  agile  execution
High  Throughput,  Low  Bandwidth
for  any  data,  big  or  small
Provenance  Metadata
for  governance  and  compliance
Secure  End-­to-­End  Data  Routing
with  encryption  and  compressionPowered  by  
Apache  NiFi
Page   18 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Basics  of  Connecting  Systems
For  every  connection,  
these  must  agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size  of  event
6. Frequency  of  event
7. Authorization  access
8. Relevance
P1
Producer
C1
Consumer
Page   19 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Using  Messaging
Only  a  subset  agree  
using  messaging
1. Protocol
2. Format
3. Schema
4. Priority
5. Size  of  event
6. Frequency  of  event
7. Authorization  access
8. Relevance
P1
CN
C1
Messaging
More  issues  to  consider:
• How  do  you  know  what  the  data  flow  looks  like?  
• How  is  it  managed?
• How  is  it  working  – today,  yesterday?
Page   20 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Using  an  Enterprise  Service  Bus  (ESB)
Still,  only  a  subset  agree  
using  an  ESB:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size  of  event
6. Frequency  of  event
7. Authorization  access
8. Relevance
P1
Broker
CN
C1
Messaging
Even  more  issues  to  consider:
• Remote  procedure  calls  (RPC)  and  throughput  issues  
are  introduced
• Design  and  deploy  management  – slow  setup,  not  interactive
• You  can  scale  out,  but  not  up  or  down
• You  still  don’t  know  what  the  data  flow  looks  like
Page   21 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
OS/Host
JVM
Flow  Controller
Web  Server
Processor  1 Extension  N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local  Storage
OS/Host
JVM
Flow  Controller
Web  Server
Processor  1 Extension  N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local  Storage
Architecture
OS/Host
JVM
NiFi  Cluster  Manager  – Request  Replicator
Web  Server
Master
NiFi  Cluster  
Manager  (NCM)
OS/Host
JVM
Flow  Controller
Web  Server
Processor  1 Extension  N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local  Storage
Slaves
NiFi  Nodes
High  Availability:  Control  plane  vs Data  plane…
Page   22 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Define  A  Hortonworks  DataFlow
• Easy  to  use  drag  and  drop  UI
• Flexible  to  define  the  Data  Flow
Page   23 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
HDF  – Powered  by  Apache  NiFi
Page   24 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Add  processor  for  data  intake
1 Drag  and  drop  processor  icon  from  the  top  menu
Page   25 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Choose  the  specific  processor
2 Choose  one  of  the  processors  – currently  90  available  – designed  for  extension
Page   26 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Example:  Pick  Twitter  Processor
Page   27 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Configure  the  processor
3 Select  processor  and  
choose  option  to  Configure
4
Adjust  
parameters  as  
required
Page   28 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Another  processor  for  data  output
5 Drag  and  drop  processor  icon  from  the  top  menu
6 Example:  choose  PutHDFS processor
Page   29 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Configure  second  processor
7 Configure  2nd processor
Page   30 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Connect  processors,  configure  connection
8
Page   31 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Click  Start  to  begin  processing
9
Page   32 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
See  processors  update  with  real  time  changes
10
As  data  flows,  GUI  interface  updates  in  real  
time.  
Page   33 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Dynamically  adjust  and  tune  data  flow  as  needed
11 Dynamically  adjust  and  tune  dataflow  as  needed,  in  
real  time.  Can  also  replicate  data  for  testing  and  
comparison.  
Page   34 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Understand  the  data  path  with  Data  Provenance
14 Select  Data  Provenance
Page   35 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Trace  lineage  of  a  particular  piece  of  data
15
Icon  for  Data  Lineage
Page   36 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Every  change  to  data  is  tracked:  processing,  views
16
Provenance  event  is  tracked
Page   37 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Updates  as  changes  happen
17 Updates  as  data  flows
Page   38 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Easily  access  and  trace  changes  to  dataflow
Page   39 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Audit  trail  of  Hortonworks  DataFlow User  Actions
Page   40 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Nifi is  complementary  to  Hadoop
Deployment  flexibility  from  devices  to  data  center.  Delivers  data  flow  
QoS across  dimensions  such  as:  loss  tolerant  vs.  guaranteed  
delivery,  low  latency  vs.  high  throughput,  and  priority-­based  
queuing.    
Operations
Governance
Starting  at  the  source,  captures  fine-­grained  metadata  regarding  all  
data  received,  forked,  joined,  cloned,  modified,  sent,  and  ultimately  
dropped  as  data  reaches  its  configured  end-­state  delivering  
comprehensive  governance  (aka  provenance,  chain  of  custody)  
Security
Secures  the  data  movement  from  beginning  to  end.  Allows  for  fine-­
grained  data  authorization  policies  to  be  enforced  at  the  flow-­level.    
Page   41 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Operations
• Reporting  tasks (push)
• Statistics  /  status  (pull)
• Dynamic  flow  changes
- Push  new  business  rules  via  REST  API  
(closed  loop)
- Pull  updates  periodically  from  web  
services
• Site-­to-­site
- Stay  at  the  ‘flow  level’  not  suddenly  
doing  file  transfer  protocols
• Extensible
• Optimized  user  
experience  – log  hunts  
should  be  the  exception
Scale  down,  up,  and  out  – in  
containers  and  on  virtual  machines
Page   42 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
The  Need  for  Data  Provenance
For  Operators
• Traceability,  lineage
• Recovery  and  replay
For  Compliance
• Audit  trail
For  Business
• Value  sources  
• Value  IT  investment
BEGIN
END
LINEAGE
Page   43 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Internet  of  
Anything
Extending  Data  Governance  from  the  Edge  to  Hadoop
ETL   /  DQ MDM
ARCHIVE
Traditional  
Data  Systems
Data  Governance  Requirements
Transparent
Governance  standards  and  
protocols  must  be  clearly  defined  
and  available  to  all
Reproducible
Recreate  the  relevant  data  
landscape  at  a  given  point  in  time
Auditable
Trace all  relevant  events  and  assets  
with  appropriate  historical  lineage
Consistent
Compliance  practices  must  be  
consistent
Hadoop  Data  
Platform
Must  snap  into  existing
data  governance  
frameworks  and  openly
exchange  metadata
SCM
CRM
ERP
Holistic  Data  
Governance
Business  
Analytics
Visualization
&  Dashboards
Page   44 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
The  Need  for  Fine-­grained  Security  and  Compliance
It’s  not  enough  to  say  you  have  
encrypted  communications
• Enterprise  authorization  
services  –entitlements  
change  often
• People  and  systems  with  
different  roles  require  
difference  access  levels
• Tagged/classified  data
Page   45 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Security
Administration
Central  management  and  
consistent  security
• NiFi  Cluster  Manager
Authentication
Authenticate  users  and  systems
• 2-­Way  SSL  support  out  of  the  box;;  additional  types  coming
Authorization
Provision  access  to  data
• Pluggable  authorization  designed  to  fit  any  Identity  and  Access  Management  (IAM)  scheme
• File-­based  authority  provider  out  of  the  box
• Multi-­role
Audit
Maintain  a  record  of  data  access
• Detailed  logging  of  all  user  actions
• Detailed  logging  of  key  system  behaviors
• Data  Provenance  enables  unparalleled  tracking  from  the  edge  through  the  Lake
Data  Protection
Protect  data  at  rest  and  in  motion
• Support  a  variety  of  SSL/encrypted  protocols
• Tag  and  utilize  tags  on  data  for  fine  grained  access  controls
• Encrypt/decrypt  content  using  pre-­shared  key  mechanisms
Administrator Configure  system  threads,  user  
accounts,  and  flow  audit  history
Data  Flow  Manager Manipulate   the  dataflow
Read  Only View  the  dataflow  only
+NiFi Configure  system  threads,  user  
accounts,  and  flow  audit  history
Proxy Manipulate   the  dataflow
Provenance Query  the  provenance  
repository  and  
download content
Page   46 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Page   47 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Operations:  Planned
Page   48 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Page   49 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Page   50 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Planned  Apache  NiFi Enhancements
IN  PROGRESS Enhanced  Configuration  management of  flows
STARTED Extension and  template  registry
TARGETTED  TONIFI  0.4.0  RELEASE First-­class Avro  support1
STARTED Interactive  queue  management
STARTED Multi-­tenant data  flow
FUTURE Pluggable authentication
FUTURE Reference-­able  process groups
FUTURE Variable registry
https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
Page   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  ReservedPage   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Tweet:  #hadooproadshow
Try  It  Yourself,  
Download  Nifi and  HDP  Sandbox from  
hortonworks.com/sandbox
Tweet:  #hadooproadshow
Page   52 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
Thank  you!
Mats  Johansson
mjohansson@hortonworks.com
@matsjo66
https://se.linkedin.com/in/matsjo66

Mais conteúdo relacionado

Mais procurados

Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifiGaneshJoshi47
 
Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Mats Johansson
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitAldrin Piri
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 

Mais procurados (20)

Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifi
 
Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 

Destaque

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFHortonworks
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkHortonworks
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementDataWorks Summit
 

Destaque (19)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 

Semelhante a Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the dataDataWorks Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Data on the Move - DataCon DC
Data on the Move - DataCon DCData on the Move - DataCon DC
Data on the Move - DataCon DCJoseph Witt
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HPMITEF México
 

Semelhante a Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data (20)

BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFi
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Data on the Move - DataCon DC
Data on the Move - DataCon DCData on the Move - DataCon DC
Data on the Move - DataCon DC
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 

Último

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligencePriyadharshiniG41
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 

Último (20)

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 

Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data

  • 1. Hortonworks  DataFlow Enterprise  Data  Flow  powered  by  Apache  NiFi Mats  Johansson Solutions  Engineer  -­ EMEA ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  • 2. Page  2 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Disclaimer This  document  may  contain  product  features  and  technology  directions  that  are  under   development,  may  be  under  development  in  the  future  or  may  ultimately  not  be   developed. Project  capabilities  are  based  on  information  that  is  publicly  available  within  the  Apache   Software  Foundation  project  websites  ("Apache").    Progress  of  the  project  capabilities   can  be  tracked  from  inception  to  release  through  Apache,  however,  technical  feasibility,   market  demand,  user  feedback  and  the  overarching  Apache  Software  Foundation   community  development  process  can  all  effect  timing  and  final  delivery. This  document’s  description  of  these  features  and  technology  directions  does  not   represent  a  contractual  commitment,  promise  or  obligation  from  Hortonworks  to  deliver   these  features  in  any  generally  available  product. Product  features  and  technology  directions  are  subject  to  change,  and  must  not  be   included  in  contracts,  purchase  orders,  or  sales  agreements  of  any  kind. Since  this  document  contains  an  outline  of  general  product  development  plans,   customers  should  not  rely  upon  it  when  making  purchasing  decisions.
  • 3. Page  3 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved IoAT Data  Grows  Faster  Than  We  Consume  It Much  of  the  new  data   exists  in-­flight,  between   systems  and  devices  as   part  of  the  Internet  of   AnythingNEW TRADITIONAL The  Opportunity Unlock  transformational  business  value from  a  full  fidelity  of  data  and  analytics for  all  data. Geolocation Server  logs Files &  emails ERP,  CRM,  SCM Traditional  Data  Sources Internet  of  Anything Sensors and machines Clickstream Social  media
  • 4. Page  4 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Internet  of  Anything  is  Driving  New  Requirements Need  trusted  insights  from  data  at  the  very  edge  to  the  data  lake  in  real-­ time  with  full-­fidelity – Data  generated  by  sensors,  machines,  geo-­location  devices,  logs,  clickstreams,  social  feeds,  etc.   Modern  applications need  access  to  both  data-­in-­motion  and  data-­at-­rest IoAT data  flows  are  multi-­directional  and  point-­to-­point – Very  different  than  existing  ETL,  data  movement,  and  streaming  technologies  which  are  generally  one  direction The  perimeter  is  outside  the  data  center  and  can  be  very  jagged – This  “Jagged  Edge”  creates  new  opportunity  for  security,  data  protection,  data  governance  and  provenance
  • 5. Page  5 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Architectural  Limitations  Today • Traditional  data  movement  software  has  been  built  for  the  world  of   standardized data  and  one  way  flows • Tools  built  for  newer  types  of  data  tend  to  be  custom,  difficult  to   manage,  and  architecturally  disjoint • Businesses  can  not  easily  collect,  conduct,  and  curate  secure  multi-­ directional  and  point-­to-­point  IoAT data  flows • IoAT data  flows  are  not  optimized  and  use  costly/limited  bandwidth  and   cannot  dynamically  prioritize  the  most  valuable  data • Difficult  to  gain  actionable  insights  from  the  combination  of  data-­in-­ motion  and  data-­at-­rest
  • 6. Page   6 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  IoAT Data  Flow Hortonworks  Data  Platform powered  by  Apache  Hadoop Hortonworks  Data  Platform powered  by  Apache  Hadoop Enrich Context Store  Data   and  Metadata Internet of  Anything Hortonworks  DataFlow   powered  by  Apache  NiFi Perishable   Insights Historical Insights Introducing  Hortonworks  DataFlow Hortonworks  DataFlow  and  the  Hortonworks  Data  Platform   deliver  the  industry’s  most  complete  solution  for  management  of  Big  Data.
  • 7. Page   7 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Simplistic  View  of  IoAT &  Data  Flow The  Data  Flow  Thing Process  and   Analyze  Data Acquire  Data Store  Data
  • 8. Page   8 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Global  interactions  with  customers,  business  partners,  and  things spanning  different  volume,  velocity,  bandwidth,  and  latency  needs Realistic  View  of  IoAT and  Data  Flow
  • 9. Page   9 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Meeting  IoAT Edge  Requirements GATHE R DELIVER PRIORITIZE Track  from  the  edge Through  to  the  datacenter Small  Footprints operate  with  very  little  power Limited  Bandwidth can  create  high  latency Data  Availability exceeds  transmission  bandwidth Data  Must  Be  Secured throughout  its  journey
  • 10. Page   10 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Dataflow  requirements  within  the  Data  Center Understanding Ability  to  observe  precisely  how  systems  exchange  data  in  real-­time  and  historically Agility Ability  to  interact  with  and  alter  live  flows  and  iterate  on  new  ones Dynamic  Access  Controls The  entitlements  of  users  and  systems  and  sensitivity  of  data  can  change  frequently Cross  Cutting  Concerns Address  common  needs  once  like  enrichment,  filtering,  transformation Enable  architecture  transition Legacy  vs modern  is  an  ‘always’  event.    Format,  schema,  protocol  conversion  is  routine
  • 11. Page  11 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved Apache  NiFi:  Collect,  Conduct,  Curate Aggregate  all  IoAT data  from  sensors,  geo-­location  devices,   machines,  logs,  files,  and  feeds  via  a  highly  secure  lightweight  agent Collect:        Bring  Together• Logs • Files • Feeds • Sensors Mediate  point-­to-­point  and  bi-­directional  data  flows,  delivering  data   reliably  to  real-­time  applications  and  storage  platforms  such  as  HDP Conduct:    Mediate  the  Data  Flow• Deliver • Secure • Govern • Audit Parse,  filter,  join,  transform,  fork,  and  clone  data  in  motion  to   empower  analytics  and  perishable  insights Curate:        Gain  Insights• Parse • Filter • Transform • Fork • Clone
  • 12. Page  12 ©  Hortonworks  Inc.  2011  – 2014.  All  Rights  Reserved November  2014 NiFi is  donated  to  the  Apache  Software  Foundation   (ASF)  through  NSA’s  Technology  Transfer  Program   and  enters  ASF’s  incubator. 2006 NiagaraFiles (NiFi)  was  first  incepted  by  Joe  Witt  at   the  National  Security  Agency  (NSA) A  Brief  History  of  Apache  Nifi July  2015 NiFi reaches  ASF  top-­level  project  status
  • 13. Page   13 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Apache  NiFi:  Three  key  concepts • Manage  the  flow  of  information • Data  Provenance • Secure  the  control  plane  and  data  plane
  • 14. Page   14 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Apache  NiFi  – Key  Features • Guaranteed  delivery • Data  buffering   - Backpressure - Pressure  release • Prioritized  queuing • Flow  specific  QoS - Latency  vs.  throughput - Loss  tolerance • Data  provenance • Recovery/recording   a  rolling  log  of  fine-­ grained  history • Visual  command  and   control • Flow  templates • Pluggable/multi-­role   security • Designed  for  extension • Clustering
  • 15. Page   15 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Common  Apache  NiFi Use  Cases Predictive  Analytics Ensure  the  highest  value  data  is  captured  and  available  for  analysis Compliance Gain  full  transparency  into  provenance  and  flow  of  data   IoT Optimization Secure,  Prioritize,  Enrich  and  Trace  data  at  the  edge Fraud  Detection Move  sales  transaction  data  in  real  time  to  analyze  on  demand   Big  Data  Ingest Easily  and  efficiently  ingest  data  into  Hadoop Value  Resources Gain  visibility  into  how  data  sources  are  used  to  determine  value
  • 16. Page   16 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Flow  Based  Programming  (FBP) FBP  Term NiFi Term Description Information   Packet FlowFile Each object  moving  through  the  system. Black Box FlowFile   Processor Performs  the  work, doing  some  combination  of  data  routing,   transformation,  or  mediation  between  systems. Bounded   Buffer Connection The  linkage between  processors, acting  as  queues  and  allowing  various   processes  to  interact  at  differing  rates. Scheduler Flow   Controller Maintains  the  knowledge  of  how  processes  are  connected, and  manages   the  threads  and  allocations  thereof  which  all  processes  use. Subnet Process   Group A  set  of  processes  and  their  connections,  which  can  receive  and  send   data  via  ports.  A  process group  allows  creation  of  entirely  new   component  simply  by  composition  of  its components.
  • 17. Page   17 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Hortonworks Data  Flow Visual  User  Interface HTML  5,  drag  and  drop,  for  agile  execution High  Throughput,  Low  Bandwidth for  any  data,  big  or  small Provenance  Metadata for  governance  and  compliance Secure  End-­to-­End  Data  Routing with  encryption  and  compressionPowered  by   Apache  NiFi
  • 18. Page   18 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Basics  of  Connecting  Systems For  every  connection,   these  must  agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 Producer C1 Consumer
  • 19. Page   19 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Using  Messaging Only  a  subset  agree   using  messaging 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 CN C1 Messaging More  issues  to  consider: • How  do  you  know  what  the  data  flow  looks  like?   • How  is  it  managed? • How  is  it  working  – today,  yesterday?
  • 20. Page   20 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Using  an  Enterprise  Service  Bus  (ESB) Still,  only  a  subset  agree   using  an  ESB: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size  of  event 6. Frequency  of  event 7. Authorization  access 8. Relevance P1 Broker CN C1 Messaging Even  more  issues  to  consider: • Remote  procedure  calls  (RPC)  and  throughput  issues   are  introduced • Design  and  deploy  management  – slow  setup,  not  interactive • You  can  scale  out,  but  not  up  or  down • You  still  don’t  know  what  the  data  flow  looks  like
  • 21. Page   21 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage Architecture OS/Host JVM NiFi  Cluster  Manager  – Request  Replicator Web  Server Master NiFi  Cluster   Manager  (NCM) OS/Host JVM Flow  Controller Web  Server Processor  1 Extension  N FlowFile Repository Content Repository Provenance Repository Local  Storage Slaves NiFi  Nodes High  Availability:  Control  plane  vs Data  plane…
  • 22. Page   22 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Define  A  Hortonworks  DataFlow • Easy  to  use  drag  and  drop  UI • Flexible  to  define  the  Data  Flow
  • 23. Page   23 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved HDF  – Powered  by  Apache  NiFi
  • 24. Page   24 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Add  processor  for  data  intake 1 Drag  and  drop  processor  icon  from  the  top  menu
  • 25. Page   25 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Choose  the  specific  processor 2 Choose  one  of  the  processors  – currently  90  available  – designed  for  extension
  • 26. Page   26 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Example:  Pick  Twitter  Processor
  • 27. Page   27 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Configure  the  processor 3 Select  processor  and   choose  option  to  Configure 4 Adjust   parameters  as   required
  • 28. Page   28 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Another  processor  for  data  output 5 Drag  and  drop  processor  icon  from  the  top  menu 6 Example:  choose  PutHDFS processor
  • 29. Page   29 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Configure  second  processor 7 Configure  2nd processor
  • 30. Page   30 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Connect  processors,  configure  connection 8
  • 31. Page   31 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Click  Start  to  begin  processing 9
  • 32. Page   32 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved See  processors  update  with  real  time  changes 10 As  data  flows,  GUI  interface  updates  in  real   time.  
  • 33. Page   33 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Dynamically  adjust  and  tune  data  flow  as  needed 11 Dynamically  adjust  and  tune  dataflow  as  needed,  in   real  time.  Can  also  replicate  data  for  testing  and   comparison.  
  • 34. Page   34 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Understand  the  data  path  with  Data  Provenance 14 Select  Data  Provenance
  • 35. Page   35 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Trace  lineage  of  a  particular  piece  of  data 15 Icon  for  Data  Lineage
  • 36. Page   36 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Every  change  to  data  is  tracked:  processing,  views 16 Provenance  event  is  tracked
  • 37. Page   37 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Updates  as  changes  happen 17 Updates  as  data  flows
  • 38. Page   38 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Easily  access  and  trace  changes  to  dataflow
  • 39. Page   39 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Audit  trail  of  Hortonworks  DataFlow User  Actions
  • 40. Page   40 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Nifi is  complementary  to  Hadoop Deployment  flexibility  from  devices  to  data  center.  Delivers  data  flow   QoS across  dimensions  such  as:  loss  tolerant  vs.  guaranteed   delivery,  low  latency  vs.  high  throughput,  and  priority-­based   queuing.     Operations Governance Starting  at  the  source,  captures  fine-­grained  metadata  regarding  all   data  received,  forked,  joined,  cloned,  modified,  sent,  and  ultimately   dropped  as  data  reaches  its  configured  end-­state  delivering   comprehensive  governance  (aka  provenance,  chain  of  custody)   Security Secures  the  data  movement  from  beginning  to  end.  Allows  for  fine-­ grained  data  authorization  policies  to  be  enforced  at  the  flow-­level.    
  • 41. Page   41 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Operations • Reporting  tasks (push) • Statistics  /  status  (pull) • Dynamic  flow  changes - Push  new  business  rules  via  REST  API   (closed  loop) - Pull  updates  periodically  from  web   services • Site-­to-­site - Stay  at  the  ‘flow  level’  not  suddenly   doing  file  transfer  protocols • Extensible • Optimized  user   experience  – log  hunts   should  be  the  exception Scale  down,  up,  and  out  – in   containers  and  on  virtual  machines
  • 42. Page   42 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  Need  for  Data  Provenance For  Operators • Traceability,  lineage • Recovery  and  replay For  Compliance • Audit  trail For  Business • Value  sources   • Value  IT  investment BEGIN END LINEAGE
  • 43. Page   43 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Internet  of   Anything Extending  Data  Governance  from  the  Edge  to  Hadoop ETL   /  DQ MDM ARCHIVE Traditional   Data  Systems Data  Governance  Requirements Transparent Governance  standards  and   protocols  must  be  clearly  defined   and  available  to  all Reproducible Recreate  the  relevant  data   landscape  at  a  given  point  in  time Auditable Trace all  relevant  events  and  assets   with  appropriate  historical  lineage Consistent Compliance  practices  must  be   consistent Hadoop  Data   Platform Must  snap  into  existing data  governance   frameworks  and  openly exchange  metadata SCM CRM ERP Holistic  Data   Governance Business   Analytics Visualization &  Dashboards
  • 44. Page   44 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved The  Need  for  Fine-­grained  Security  and  Compliance It’s  not  enough  to  say  you  have   encrypted  communications • Enterprise  authorization   services  –entitlements   change  often • People  and  systems  with   different  roles  require   difference  access  levels • Tagged/classified  data
  • 45. Page   45 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Security Administration Central  management  and   consistent  security • NiFi  Cluster  Manager Authentication Authenticate  users  and  systems • 2-­Way  SSL  support  out  of  the  box;;  additional  types  coming Authorization Provision  access  to  data • Pluggable  authorization  designed  to  fit  any  Identity  and  Access  Management  (IAM)  scheme • File-­based  authority  provider  out  of  the  box • Multi-­role Audit Maintain  a  record  of  data  access • Detailed  logging  of  all  user  actions • Detailed  logging  of  key  system  behaviors • Data  Provenance  enables  unparalleled  tracking  from  the  edge  through  the  Lake Data  Protection Protect  data  at  rest  and  in  motion • Support  a  variety  of  SSL/encrypted  protocols • Tag  and  utilize  tags  on  data  for  fine  grained  access  controls • Encrypt/decrypt  content  using  pre-­shared  key  mechanisms Administrator Configure  system  threads,  user   accounts,  and  flow  audit  history Data  Flow  Manager Manipulate   the  dataflow Read  Only View  the  dataflow  only +NiFi Configure  system  threads,  user   accounts,  and  flow  audit  history Proxy Manipulate   the  dataflow Provenance Query  the  provenance   repository  and   download content
  • 46. Page   46 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  • 47. Page   47 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Operations:  Planned
  • 48. Page   48 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  • 49. Page   49 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved
  • 50. Page   50 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Planned  Apache  NiFi Enhancements IN  PROGRESS Enhanced  Configuration  management of  flows STARTED Extension and  template  registry TARGETTED  TONIFI  0.4.0  RELEASE First-­class Avro  support1 STARTED Interactive  queue  management STARTED Multi-­tenant data  flow FUTURE Pluggable authentication FUTURE Reference-­able  process groups FUTURE Variable registry https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
  • 51. Page   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  ReservedPage   51 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Tweet:  #hadooproadshow Try  It  Yourself,   Download  Nifi and  HDP  Sandbox from   hortonworks.com/sandbox Tweet:  #hadooproadshow
  • 52. Page   52 ©  Hortonworks  Inc.  2011  – 2015.  All  Rights  Reserved Thank  you! Mats  Johansson mjohansson@hortonworks.com @matsjo66 https://se.linkedin.com/in/matsjo66