Mais conteúdo relacionado Semelhante a HDF 3.1 : An Introduction to New Features (20) Mais de Timothy Spann (20) HDF 3.1 : An Introduction to New Features1. 1 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Hortonworks Data Flow 3.1
Timothy Spann, Solutions Engineer
Hortonworks @PaaSDev
2. 2 © Hortonworks Inc. 2011–2018. All rights reserved.
Disclaimer
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3. 3 © Hortonworks Inc. 2011–2018. All rights reserved.
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA STEWARD
STUDIO*
ISV
SERVICES
*not yet available, coming soon
EXTENSIBLE SERVICES
IBM DSX*CLOUD-
BREAK*
DATA
ANALYTICS
STUDIO*
CONNECTED DATA PLATFORMS
HORTONWORKS
DATA PLATFORM (HDP®)
DATA-AT-REST
HORTONWORKS
DATAFLOW (HDF™)
DATA-IN-MOTION
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBER SECURITY DATA SCIENCE
ADVANCED
ANALYTICS
PARTNER
SOLUTIONS
IOT/ STREAMING
ANALYTICS
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
Global Data Management With Hortonworks
4. 4 © Hortonworks Inc. 2011–2018. All rights reserved.
HDF Data-In-Motion Platform – with HDF 3.1 GA Release
5. 5 © Hortonworks Inc. 2011–2018. All rights reserved.
HDF 3.1 New and Enhanced Features
Ease of Use
Core
Enhancements
Cross-Product
Integration
Flow
Management
Stream
Processing
• NiFi-Atlas, -SmartSense, and
-Knox integration
(HDF on HDP scenario only)
• NiFi-Ranger: Group based
policy support for NiFi
resources
• New SAM operations
module
• SAM ”Test Mode”
• Kafka 1.0 Support
• Schema Registry
• Schema Version
Lifecycle Mgmt.
• SAM extensibility
improvements
• Ambari and Ranger
support for Kafka 1.0
• Improved Ambari
experience: Automate
adding NiFi nodes to
existing cluster
• Apache NiFi Registry (new)
• Flow migration and
version control
• MiNiFi C++, Java, Android/IOS
libraries GA
• Containerized deployment
(Docker)
6. 6 © Hortonworks Inc. 2011–2018. All rights reserved.
Improved Operational Efficiency
MiNiFi C++ Agent
C++ Agent
C++ Agent
C++ Agent
There are many configuration options for MiNiFi
C++, all dependent on the use case, they may
help with:
• Minimizing memory footprint
• Lowering CPU consumption
• Reducing size on disk
https://community.hortonworks.com/articles/167193/building-and-
running-minifi-cpp-in-orangepi-zero.html
7. 7 © Hortonworks Inc. 2011–2018. All rights reserved.
Integrated Provisioning and Security
Kafka 1.0 Support
To enhance data governance and lineage, users can
now manage access control policies using resource or
tag-based security in Ranger for Kafka 1.0 clusters.
Users can now install, configure, manage, upgrade,
monitor, and secure Kafka 1.0 clusters with Ambari.
New processors in NiFi and Streaming Analytics
Manager support Kafka 1.0 features including message
headers and transactions.
8. 8 © Hortonworks Inc. 2011–2018. All rights reserved.
When HDF is co-located with HDP…
Integrations with Atlas, Knox and SmartSense
SmartSense
9. 9 © Hortonworks Inc. 2011–2018. All rights reserved.
220+ Processors for Deeper Ecosystem Integration
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
10. 10 © Hortonworks Inc. 2011–2018. All rights reserved.
HDF 3.1 for Big Data Engineers
Multiple users, frameworks, languages, data sources & clusters
BIG DATA ENGINEER
• Experience in ETL
• Coding skills in Scala,
Python, Java
• Experience with Apache
Hadoop
• Knowledge of tools such
Hive, Flume or Pig
• Knowledge of SQL
• Expert in ETL (Eating, Ties
and Laziness)
• Social Media Maven
• Deep SME in Buzzwords
• No Coding skills
• Interest in Pig and Falcon
CAT AI
• Will Drive your Car
• Will Fix Your Code
• Will Not Be Discussed
Today
• Will Not Finish This Talk For
Me, This Time
11. 11 © Hortonworks Inc. 2011–2018. All rights reserved.
Aggregate all data from sensors, drones, logs, geo-location devices,
machines and social feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to Apache HBase, Apache Hive, HDFS, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect, enrich with weather,
location, Apache OpenNLP and Apache MXNet.
Curate: Gain Insights
12. 12 © Hortonworks Inc. 2011–2018. All rights reserved.
NiFi (PROD)
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
Flow Registry API
Persistence Other services Other services
NiFi (QA)
NiFi (Dev)
Register DeployDeploy
DataFlow Registry
• NiFi Flow Registry
• Standalone application/service
(URL)
• Standard API with pluggable
components
• Design and deploy mechanism for
flow migration (SDLC) use cases
13. 13 © Hortonworks Inc. 2011–2018. All rights reserved.
Kafka
Powerful Pattern with Kafka Headers: Pass Schema Key in Kafka Header
Truck Geo
Sensor
Truck Speed
Sensor
Kafka Topic
(raw-all_truck_events_csv)
Centralized Schema
Repository
Publish CSV Events with
Schema metadata from SR
stored in Kafka Header
Data Movement and
Processing by NiFi using
new Record-Based
processing
Kafka Event with Header Published by the Sensor Producing App
Kafka Header Kafka Payload
header with key schema.name
that has metadata info to lookup
the schema in HWX SR
CSV Binary Event
14. 14 © Hortonworks Inc. 2011–2018. All rights reserved.
Nifi and Kafka 1.0 – Use Case for Kafka Message Headers
Kafka
15. 15 © Hortonworks Inc. 2011–2018. All rights reserved.
Grafana & Kafka 1.0 Integration: Monitoring
Topic
Level
KPIs
Broker
Level
KPIs
Kafka
16. 16 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Spark Integration
17. 17 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Spark Integration
18. 18 © Hortonworks Inc. 2011–2018. All rights reserved.
New: Integrated Registry Service
• Integrated Flow Registry Service
• Sharable between NiFi
environments for Dev/UAT/Prod
promotion
• API or GUI driven
• Can be integrated with Enterprise
Version Control e.g. GitLab
• ‘Buckets’ of Flows for security and
access control
SDLC
19. 19 © Hortonworks Inc. 2011–2018. All rights reserved.
New: Integrated Variable Registry Service
• Integrated Variable Registry
• Sets of key:value pairs available on
every Process Group
• Referenced with NiFi Expression
Language
• Dynamically changeable at runtime
• Use within Versioned Flows to set
Environment Variables
• GUI or API driven
SDLC
20. 20 © Hortonworks Inc. 2011–2018. All rights reserved.
• Wrap atomic functions in harnesses
for regression testing
• Integrate via the Rest-API to
automate testing through Jenkins
etc.
• Automate triggering tests when
new Versions are pushed to the
Flow Registry
SDLC
Regression test with Golden Datasets
21. 21 © Hortonworks Inc. 2011–2018. All rights reserved.
• Nest Versioned Process Groups to
test composite functions
• Wrap in test harnesses to validate
functionality
• Flow Versioning provides visibility
as components of Composites are
updated
SDLC
Build & Test Composite DataFlows
22. 22 © Hortonworks Inc. 2011–2018. All rights reserved.
New: Design & Deploy complementing Command & Control
• SDLC Dev: Place Process Groups
under Version Control
• Make changes and commit to new
version
• Roll Versions back or forward
SDLC
23. 23 © Hortonworks Inc. 2011–2018. All rights reserved.
• Get Notifications of local changes
or new versions available in
Repository
• Revert or Commit local changes via
the GUI or Rest-API
• Use Rest-API to integrate with
Jenkins, etc.
SDLC
New: Design & Deploy complementing Command & Control
29. 29 © Hortonworks Inc. 2011–2018. All rights reserved.
Lifecycle Action 1 - Action: Fork Schema Version to Branch called Dev
Schema Registry
30. 30 © Hortonworks Inc. 2011–2018. All rights reserved.
More Data Set Coverage
AtlasNiFiFlowLineage
(ReportingTask)
NiFi Flow
NiFi Data Provenance
Kafka topic
1. static flow lineage from NiFi flow def
2. Add DataSet entities from NiFi Data
Provenance events
Atlas Integration
31. 31 © Hortonworks Inc. 2011–2018. All rights reserved.
sensor-data
tweets
default.sensor_data
path1
path0
path2
Atlas Integration
37. 37 © Hortonworks Inc. 2011–2018. All rights reserved.
https://community.hortonworks.com/articles/161761/new-features-in-apache-nifi-
15-apache-nifi-registr.html
https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-
spark-via-executesparkinte.html
https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-
apache-nifi-15-instance-w.html
https://community.hortonworks.com/articles/171893/hdf-31-executing-apache-
spark-via-executesparkinte-1.html
Resources
38. 38 © Hortonworks Inc. 2011–2018. All rights reserved.
Contact
https://github.com/tspannhw/ApacheBigData101/tree/master
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
39. 39 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
40. 40 © Hortonworks Inc. 2011–2018. All rights reserved.
Community Engagement
Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved
4,000+
Registered Users
10,000+
Answers
15,000+
Technical Assets
One Website!