Mais conteúdo relacionado
Semelhante a TriHUG October: Apache Ranger (20)
TriHUG October: Apache Ranger
- 2. Page 2 © Hortonworks Inc. 2015
About me
• Biren Saini
• Senior Solutions Engineer
• Governance SME Lead
• Overall 15 years of technology experience
@ Hortonworks}
- 3. Page 3 © Hortonworks Inc. 2015
Agenda
• Hadoop Security Overview
• Apache Ranger
– Introduction
– Architecture
– Sample Flow
– Best Practices
– Ranger Stacks
– Demo
- 4. Page 4 © Hortonworks Inc. 2015
Overview of Security in Hadoop
- 5. Page 5 © Hortonworks Inc. 2015
5 Pillars of Security
• Authentication
• Authorization
• Audit
• Encryption
• Centralized Administration
- 6. Page 6 © Hortonworks Inc. 2015
Security Tools in Hadoop world
• Kerberos (authentication)
• Apache Knox (authentication)
• AD/LDAP (authentication)
• Apache Ranger (authorization, audit, kms)
• HDFS TDE (data encryption)
• Wire Encryption (data protection)
- 7. Page 7 © Hortonworks Inc. 2015
HDFS
Typical Flow – SQL Access through Beeline client
HiveServer 2
A B C
Beeline
Client
- 8. Page 8 © Hortonworks Inc. 2015
HDFS
Typical Flow – Authenticate through Kerberos
HiveServer 2
A B C
KDC
Login into Hive using
AD password
Hive gets
Namenode (NN)
service ticket
Hive creates
map reduce
using NN ST
Client gets
service ticket for
Hive
Beeline
Client
Active
Directory
- 9. Page 9 © Hortonworks Inc. 2015
HDFS
Typical Flow – Add Authorization through Apache Ranger
HiveServer 2
A B C
KDC
Hive gets
Namenode (NN)
service ticket
Column level
access control,
auditing
Ranger
Beeline
Client
File level
access control
Active
Directory
Import users/groups from
LDAP
Login into Hive using
AD password
- 10. Page 10 © Hortonworks Inc. 2015
HDFS
Typical Flow – Firewall, Route through Knox Gateway
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode (NN)
service ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
Beeline
Client
Apache
Knox
Active
Directory
- 11. Page 11 © Hortonworks Inc. 2015
HDFS
Typical Flow – Add Wire and File Encryption
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode (NN)
service ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
SSL
Beeline
Client
SSL SASL
SSL SSL
Apache
Knox
Active
Directory
- 12. Page 12 © Hortonworks Inc. 2015
Apache Ranger
- 13. Page 13 © Hortonworks Inc. 2015
Apache Ranger
• Provides centralized policy definition for authorizing & auditing access
to resources in a consistent manner.
• Supported components as of v0.5
• HDFS
• HBase
• Hive
• YARN
• Knox
• Storm
• Solr
• Kafka
- 14. Page 14 © Hortonworks Inc. 2015
Setup Authorization Policies
14
file level
access
control,
flexible
definition
Control
permissions
- 15. Page 15 © Hortonworks Inc. 2015
Monitor through Auditing
- 16. Page 16 © Hortonworks Inc. 2015
Agent AgentAgent AgentAgent Agent
Apache Ranger authZ Architecture
HBase Hive YARN Knox Storm Solr Kafka
Agent
HDFS
Agent
Administration Portal
(Ranger UI)
REST APIs
DB
SOLR
HDFS
Policy
Server
LDAP/AD
user/group
sync
Log4j
KMS
Audit
Server
User Sync
Server
- 17. Page 17 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy
Server
Audit
Server
Administration Portal
Agent
Namenode
Audit
Store
Ranger
Policy
Store
Unauthorized user attempts
to access the data
User access is denied
No Policy defined.
- 18. Page 18 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy
Server
Audit
Server
Administration Portal
Admin sets policies for HDFS files/
folder1a
1b
1d
Agent
Namenode
Audit
Store
Ranger
Policy
Store
1c
- 19. Page 19 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy
Server
Audit
Server
Administration Portal
Admin sets policies for HDFS files/
folder1a
Data scientist runs a map
reduce job
User
Application
Analysts access HDFS
data through application
IT users access
HDFS through CLI
1b
2a
2a
2a
Agent
Namenode
Namenode provides
resource access to
user/client
Namenode uses
Agent for Authorization2b
Audit
Store
2d
2c
Ranger
Policy
Store
1d
1c
- 20. Page 20 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy
Server
Audit
Server
Administration Portal
Admin sets policies for HDFS files/
folder1a
Data scientist runs a map
reduce job
User
Application
Analysts access HDFS
data through application
IT users access
HDFS through CLI
1b
2a
2a
2a
Agent
Namenode
Namenode provides
resource access to
user/client
Namenode uses
Agent for Authorization2b
Audit
Store
Admin requests the Audit report3a
3b
3c
2d
2c
Ranger
Policy
Store
1d
1c
- 21. Page 21 © Hortonworks Inc. 2015
Ranger UserSync Best Practice
21
• Ensure LDAPS is used to integrate with Ranger
• Create OU ONLY for Hadoop users for performance
• Only run usersync when necessary
– How much users are being added and how often
– How much users are changing roles
– Too much syncing can degrade LDAP performance
• Do not sync anonymously
- 22. Page 22 © Hortonworks Inc. 2015
Ranger Audit Best Practices
22
• HDFS
– Long term storage that can be used to understand user event
trends and predict anomaly
• RDBMS
– When SQL is preferred by auditors
– MySQL, Oracle, Postgres, SQL Server
• Solr
– Nice quick reporting metrics to understand user event trends
• Log4j Appenders
- 23. Page 23 © Hortonworks Inc. 2015
Ranger Stacks
• Apache Ranger v0.5 supports stack-model to enable easier onboarding
of new components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin
• Create a JSON file with
following details :
- Resources
- Access types
- Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component.
• During initialization of the service: Init
RangerBasePlugIn & RangerDefaultAuditHandler class.
• To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed()
• To support resource lookup: Implement
RangerBaseService.lookupResource() &
RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
- 24. Page 24 © Hortonworks Inc. 2015
Summary & Misc. points
24
• All functions are available as Rest API
• Ranger integrates with AD/LDAP for ranger login as well as user sync.
• Support for High Availability (HA)
• Support for Transparent Data Encryption with KMS implementation
• Tighter integration with Apache Ambari
• Stack based implementation of Plugins
• Ranger also has the KMS for HDFS TDE.
• Some features in development are
– Spark support
– Time based authorization
– Geo Location based authorization
- 25. Page 25 © Hortonworks Inc. 2015
Demo - HDFS
Admin
SamTom
/demo/data/trihug
/demo/data/trihugRanger UI
WRITE Access denied READ Access denied
1
2
SamTom
/demo/data/trihug
WRITE Access allowed READ Access allowed
3
Grants access
READ for Sam
WRITE for Tom
Ranger Plugin
gets the update
WRITE Access denied
hdfs:hdfs rwx --- ---
Elevated Privileges Restricted Privileges
Directory already exists
- 26. Page 26 © Hortonworks Inc. 2015
Demo - Hive
Admin
SamTom
tickers
eod
Ranger UI
WRITE Access denied READ Access denied
1
2
SamTom
WRITE Access allowed READ Access to SOME
COLUMNS allowed
3
Grants access
READ for Sam
ALL for Tom
Ranger Plugin
gets the update
WRITE Access denied
hive tables
tickers
eod hive tables
tickers
eod hive tables
SOME COLUMNS
READ Access to ALL
COLUMNS denied
Created by “hive” user in
trihug schema
Elevated Privileges Restricted Privileges
GRANT Access allowed
DB already exists
- 27. Page 27 © Hortonworks Inc. 2015
Demo time..
- 28. Page 28 © Hortonworks Inc. 2015
Thank you.
Questions?