SlideShare uma empresa Scribd logo
1 de 20
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a live dataflow in minutes
How would that change your business?
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Add processor for data intake. Time: 1 minute
1 Drag and drop processor from top menu
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Choose the specific processor
2 Choose one of the processors – currently 170+ available
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Pick Twitter Processor
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure the processor. Time: 2 minutes
3
4
Select processor and choose
option to Configure
Adjust
parameters as
required
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Another processor for data output. Time: 1 minute
5
6 Filter for and select a “Put” processor
Drag and drop processor from top menu
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure second processor. Time: 1 minute
7 Configure 2nd processor
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect processors, configure connection. 2 minutes
Configure Connection8
Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Click Start to Begin Processing. Time total: 7 minutes
9 Click start “play” to begin processing
(will run continuously until you select stop)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
See Processors Update with Real Time Changes
10 As data flows, GUI interface updates in real time.
11 If destination is stopped or unable to receive, queue builds
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamically adjust and tune data flow as needed
12
Dynamically configure/ start/ stop/ tune/
reroute change/ pause dataflows as needed.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Powerful Tools to Quickly Replicate, Group, Repurpose, Tune and Test
in Real-Time
13
14 Create a new template
Group multiple processes together to create a process group
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Provenance Means
Real-Time Traceability of:
Data Flow
Data Content
Data Context
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Watch Real Time Flow of Data: Data Provenance
Select Data Provenance15
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Trace Lineage of a Particular Piece of Data
Icon for Data Lineage16
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Every Change to Data is Tracked in Real-Time: processing, views
Every event is traceable
17
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-Time Updates of Dataflow: Traceable Context & Content
Know immediately both context and content18
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Easily access and trace changes to dataflow
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Audit trail of Hortonworks DataFlow User Actions
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:
Data Ingestion and Streaming
https://community.hortonworks.com/

Mais conteúdo relacionado

Destaque

Destaque (20)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
KBM Equipamentos Agrícolas
KBM Equipamentos AgrícolasKBM Equipamentos Agrícolas
KBM Equipamentos Agrícolas
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Extendible data model for real-time business process analysis
Extendible data model for real-time business process analysisExtendible data model for real-time business process analysis
Extendible data model for real-time business process analysis
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 

Semelhante a Design a Dataflow in 7 minutes with Apache NiFi/HDF

Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 

Semelhante a Design a Dataflow in 7 minutes with Apache NiFi/HDF (20)

Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
Using Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising TeamsUsing Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising Teams
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
そのデータフロー NiFiで楽にしてあげましょう
そのデータフロー NiFiで楽にしてあげましょうそのデータフロー NiFiで楽にしてあげましょう
そのデータフロー NiFiで楽にしてあげましょう
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 

Mais de Hortonworks

Mais de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Design a Dataflow in 7 minutes with Apache NiFi/HDF

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Create a live dataflow in minutes How would that change your business?
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Add processor for data intake. Time: 1 minute 1 Drag and drop processor from top menu
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 170+ available
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Pick Twitter Processor
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure the processor. Time: 2 minutes 3 4 Select processor and choose option to Configure Adjust parameters as required
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Another processor for data output. Time: 1 minute 5 6 Filter for and select a “Put” processor Drag and drop processor from top menu
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure second processor. Time: 1 minute 7 Configure 2nd processor
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connect processors, configure connection. 2 minutes Configure Connection8 Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Click Start to Begin Processing. Time total: 7 minutes 9 Click start “play” to begin processing (will run continuously until you select stop)
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved See Processors Update with Real Time Changes 10 As data flows, GUI interface updates in real time. 11 If destination is stopped or unable to receive, queue builds
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamically adjust and tune data flow as needed 12 Dynamically configure/ start/ stop/ tune/ reroute change/ pause dataflows as needed.
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Powerful Tools to Quickly Replicate, Group, Repurpose, Tune and Test in Real-Time 13 14 Create a new template Group multiple processes together to create a process group
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Provenance Means Real-Time Traceability of: Data Flow Data Content Data Context
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Watch Real Time Flow of Data: Data Provenance Select Data Provenance15
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Trace Lineage of a Particular Piece of Data Icon for Data Lineage16
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Every Change to Data is Tracked in Real-Time: processing, views Every event is traceable 17
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-Time Updates of Dataflow: Traceable Context & Content Know immediately both context and content18
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Easily access and trace changes to dataflow
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Audit trail of Hortonworks DataFlow User Actions
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/

Notas do Editor

  1. HDF supports over 90 difference processors to accelerate the process of ingesting and processing data. There are ready-made “off the shelf” processors for data collection, data processing. For example – in alphabetical order, not necessarily popularity: EncryptContent, ExecuteFlumeSink, ExecuteFlumeSource, ExecuteSQL, ExtractHL7, GetFTP, GetHTTP, PutKafka, MergeContent, MonitorActivity, PutEmail, PutHDFS, SpltJSON, TransformXML.
  2. There are many different processors, some of which are designed to simplify collection of big data from popular data sources. Twitter is one of them. Others include:
  3. This is a very unique capability of dataflow – the ability to see processors update in real time This gives data developers and data scientists the ability to quickly verify hypothesis and as well enable on-time decision making – within the relevant time-window.
  4. Once the data flow is established, it can be dynamically manipulated, replicated and transformed. This removes the need to develop code in a test environment, and then porting to a production environment. Being able to immediately test within the production environment, accelerates the time to insight. And all of this is tracked so when you get to the point of “what did I try before again”, or “what happened last time”, it is readily accessible via the GUI interface.
  5. HDF provides very fine-grained, high fidelity reporting about the origins of data, how it was used, who used it etc.