This document provides information about Apache Ranger and Apache Atlas partner ecosystems and integration partnerships. It discusses Hortonworks' partner certification programs for SEC Ready and GOV Ready, and showcases partner technologies that have been integrated and certified with Apache Ranger and Apache Atlas, including from Talend, Arcadia Data, and Protegrity. The document also provides timelines and release information for Apache Ranger and Apache Atlas community development and integration with Hortonworks Data Platform (HDP) releases.
9. 9
Integration Goals
Support lineage of Talend Studio jobs on Apache Atlas /
Hortonworks HDP
Similar (or improved) functionality to what we offer for other
lineage providers.
Lineage for Talend Big Data jobs both on Spark/Hadoop.
Authentication with Lineage Backend.
Die-on-error: Lineage failure does not affect job execution.
10. 10
Design
Goal: Support a similar generic lineage model.
Solution:
Send the transformation graph representation with each node as a HashMap of properties.
Translate the graph into the given model in an integration layer.
For the Atlas case it uses the Atlas REST API via atlas-client JAR.
Let the specific lineage provider functionality open for advanced functionality
• Future Roadmap items
11. 11
Technical Details - Talend Model for Atlas
Note that Lineage view only shows Entities
that are in the “DataSet – Process – DataSet”
form.
So we had to represent every Component as
a DataSet (tComponent) and create artificial
components (tArtificialComponent) as a
Process so we can show them in the Lineage
view.
12. 12
Technical Details – Open Issues
The entity connection constraint is our biggest issue.
Breaking changes on the API (atlas-client 0.8 but compatible with 0.7 through
redirect).
Inherited properties are shown even if not assigned (this is not an issue, but
due to our reuse of DataSet we have issues like this:
DataSet has an owner, but an owner does not make sense for a Talend transform.
Atlas Model is flexible but strict at the same time, data is constrained to
evolve with metadata, if we pass new arguments that are not defined in the
metadata model they are ignored.
15. Arcadia Data. Proprietary and Confidential
Securing Visual Analytics for Big Data
with Apache Ranger
Shant Hovsepian – CTO & co-Founder
@superdupershant
June 14, 2017
16. Arcadia Data. Proprietary and Confidential
Arcadia Visualization Engine
The First Native Visual Analytics Platform for Big Data
Arcadia Analytic Platform
(Smart Acceleration™)
On-Premises
Drag-and-drop Visual Analytics & Dashboards
HybridCloud
Custom Data Applications
…BIG DATA OS
Distributed execution,
data storage, metadata, security
IN-CLUSTER ANALYTICS ENGINE
Scales linearly with cluster for
speed and easier management
WEB-BASED INTERFACE
Drag & drop interface for
visual analytics & app workflow
DataPlatform
18. Arcadia Data. Proprietary and Confidential
What is Apache Ranger?
• Centralized authorization and auditing across Hadoop components
• Access authorization based on resources
• Policy based behavior such as column masking
• Extensible Architecture
18
19. Arcadia Data. Proprietary and Confidential
The Value of a Robust Policy Engine
• It’s complicated code to get right
• I am Lazy, I don’t want to implement it
• Zero Knowledge Proofs
19
20. Arcadia Data. Proprietary and Confidential
Native Security Integration
Arcadia analytics
platform
HDFS
SINGLE COPY OF DATA TO SECURE
Reduces footprint of data copies with the same or summarized
information
Single policy definition for access control
Easier compliance
ENTERPRISE GRADE
Kerberos, LDAPS/AD, PAM and SAML
Single sign on for business users
Role-based access control with delegation
INTEGRATED ROLE-BASED ACCESS
Use role definitions from Ranger for access at BI tier
No risk of mismatching policies between data management tier
and BI tier
21. Arcadia Data. Proprietary and Confidential
Configuration
• Tight integration with Ranger + Ambari makes installation and
configuration very easy!
21
22. Arcadia Data. Proprietary and Confidential
Arcadia Data OLAP Engine
• In order to accelerate data access and reporting we have an on-cluster
engine
• Cubes are pre-computed and stored in memory and in HDFS via
HCatalog.
• We had to make sure all Hive catalog accesses were first authorized
through Ranger
• Simple implementation just requires an Authorizer class with
isAccessAllowed()
22
23. Arcadia Data. Proprietary and Confidential
Arcadia Data Visualization Server (BETA)
• While table level privileges like SELECT/INSERT make sense for tables
visuals tend to have a richer set of verbs
• Need to define custom “resources” in Ranger
• Define custom “privileges” Edit / Clone / Export / Interact
• A little tricky to do if you are not Java based
• Wildcard support is awesome!!!!!
• See Yesterday’s talk on Ranger + HAWQ for more details (EXTENDING
APACHE RANGER AUTHORIZATION BEYOND HADOOP)
23
29. Protegrity Big Data Protector and Apache
Ranger
Ranger Integration
By
Sunil Sabat
Copyright – Protegrity Inc.
30. WHATDO WE DO?
Deliver centralized
policy enforcement
across enterprise
Apply security as
close to the data as
possible
Protect the entire
data flow – at rest,
in transit, in use
31. HOW WE DO IT
Spending
Healthcare
Financial
ASSOCIATED DATAIDENTIFIED DATA
SSN (023-45-1288)
Name (Jane Doe)
Email (joe@yahoo.com)
DE-IDENTIFIED DATA
SSN (153-51-4363)
Name (Hfhe Jes)
Email (fhj@jjwvw.chw)
IDENTITY IS KNOWN
IDENTITY IS NOT KNOWN
To Unauthorized Users
To Authorized Users
32. ACROSSTHE ENTERPRSE
ESA
1/02/1966 xxxx2278 ysieondusbak
Tokenized In the clearMaskedDe-identified
Joe Smith
12/25/1966
076-39-2778
CENTRAL
MANAGEMENT
POLICY
ENFORCED
TECHNOLOGY
CONSISTENT
PROTECTION
33. Protegrity’s Big Data Protector for Hadoop
Hive
MapReduce
YARN
HDFS
OS File System
Pig Other
Name
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Edge
Node
Data
Node
Edge
Node
Data
Node
Edge
Node
Edge
Node
Edge
Node
Edge
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Hadoop Cluster Hadoop Node
Policy
Audit
Protegrity Big Data Protector for Hadoop delivers protection at every
node and is delivered with our own cluster management capability.
All nodes are managed by the Enterprise Security Administrator that
delivers policy and accepts audit logs
Protegrity Data Security Policy contains information about how data is de-
identified and who is authorized to have access to that data.
Policy is enforced at different levels of protection in Hadoop.
Coarse Grained Encryption
Fine Grained Encryption
Spark ( Java
and Scala )
34. Perfect data security and governance
• Combine best of two products – Apache Ranger and Protegrity ESA (
enterprise security administrator )
• Apache Ranger controls access and authorization
• Protegrity protects data at fine grained level using tokenization
• Modern Data Lakes benefit from both products
• Data lake is protected according to enterprise security policy while Hadoop
access and authorization in in the hands of Ranger
35. Process Flow
Protegrity
coexists with
Apache Ranger
policies
Ranger controls
column access
policy
Ranger KMS
coexists along
with Protegrity
KMS
Protegrity
protects column
data based on
ESA policy
Ranger logs along with ESA
logs give comprehensive
security audit ( access and
data protection ) logs for
forensic analysis, fraud
alerts and other benefits
Ranger custom
masking function
can be a
Protegrity UDF
36. Protegrity and Ranger Integration
Protegrity coexists with Apache Ranger policies
•Ranger controls column access policy
•Ranger KMS coexists along with Protegrity KMS
•Protegrity protects column data based on ESA policy
•Ranger logs along with ESA logs give comprehensive
security audit ( access and data protection ) logs for
forensic analysis, fraud alerts and other benefits
•Ranger custom masking function can be a Protegrity UDF
Future Exploration
•Embed access policy in Ranger with Protegrity Data
Element protection policy for better alert and
management
•Inherit access policies from Ranger into ESA policy design
•Single KMS - Best
37. Use Cases
• Data Protection is provided by Protegrity across the enterprise while
Hadoop authorization and access is controlled by Ranger
• Enhance Apache Ranger Column masking using custom function in
the form of Protegrity UDFs.
• Result is Ranger in control of data access and protection
38. Clear Data in Hive table
• Original Data present in table “clear_table”
•
• select * from clear_table;
• +-------------------+--+
• | clear_table.ccn |
• +-------------------+--+
• | 5539455602750205 |
• | 5464987835837424 |
• | 6226540862865375 |
• | 6226600538383292 |
• | 376235139103947 |
• +-------------------+--+
41. Summary of Demo
Original Data Protected Data Unprotected Data
5539455602750200 8295281832577430 5539455602750200
5464987835837420 8437400318738670 5464987835837420
6226540862865370 9683356798323010 6226540862865370
6226600538383290 9885536985189730 6226600538383290
376235139103947 222096775455034 376235139103947