As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores. This presentation discusses how the Alluxio virtual filesystem can be integrated with Apache Ranger.
1. Alluxio and Apache Ranger
Best Practices
Greg Palmer | Lead Solutions Engineer | Alluxio | greg.palmer@alluxio.com
1
Product School - May 26th, 2022
2. 2
AGENDA:
● Why Centralized Access Policies?
● What is Apache Ranger?
● What is Alluxio?
● Alluxio & Ranger - Best Practices
● Live Demonstration
● Closing Remarks and Questions
3. 3
Why Centralized Access Policies?
From stand-alone systems to distributed systems to centralized data lakes
3
Stand-alone Systems
DB 1
Single Tier
(App 1 & UI)
PERMS
4. 4
Why Centralized Access Policies?
From stand-alone systems to distributed systems to centralized data lakes
4
Stand-alone Systems
DB 1
Single Tier
(App 1 & UI)
PERMS
Distributed Systems
DB 1
Multi-Tier
(App Server 1)
Multi-Tier
(Client)
DB 2
PERMS
PERMS
DW 1 DW 2
Multi-Tier
(App Server 2)
PERMS
Multi-Tier
(Client)
PERMS
5. 5
Why Centralized Access Policies?
From stand-alone systems to distributed systems to centralized data lakes
5
Stand-alone Systems
DB 1
Single Tier
(App 1 & UI)
PERMS
Centralized Data Lakes
Distributed Systems
DB 1
DBs
Multi-Tier
(App Server 1)
DWs
Multi-Tier
(Client)
DB 2
PERMS
PERMS
Streaming Data
DW 1 DW 2
Multi-Tier
(App Server 2)
PERMS
Multi-Tier
(Client)
PERMS
Python ML
Spark ML Analytics
Dashboards
PERMS
PERMS
PERMS
PERMS
PERMS
6. 6
Apache Ranger™ is a framework to enable, monitor and manage
comprehensive data security across the Hadoop platform
● Ranger is bundled with HDP and Cloudera Hadoop Platforms
● Ranger is bundled with Privacera
● Ranger can be deployed stand-alone from the OSS source
code
What is Apache Ranger?
6
10. 10
What is Alluxio?
Alluxio is an orchestration platform that brings your data closer to compute across clusters, regions,
clouds, and countries
10
11. 11
Alluxio & Ranger - Best Practices
11
Alluxio Hosts
Policy
Store
Ranger Admin Hosts
Ranger Policy Manager Ranger
Plugin
Alluxio
Master
Daemons
Ranger
Admins
Ranger User
Sync
Enterprise
Directory
Services
Worker
Daemons
PERMS
HDFS,
S3,
Ceph,
etc.
Audit
Store
(ES)
Read/Write Requests
Alluxio Users
Under
File
System
Worker
Daemons
Worker
Daemons
Under
File
System
Integration Architecture
ML Workloads
Analytics Workloads
12. 12
• When: you only have one HDFS under file system
• Why: easy to setup, no new Ranger services/policies required
Alluxio & Ranger - Best Practices
What: Enforce existing Ranger HDFS Access policies
12
Ranger Policy
Manager
HDFS
UFS
Alluxio Users
ML Workloads
Analytics Workloads
Cache
14. 14
• When: no HDFS under file system or heterogeneous UFSs
• Why: supports true virtual file system and unified namespace
Alluxio & Ranger - Best Practices
What: New Ranger policies for Alluxio file system permissions
14
Ranger Policy
Manager
HDFS
UFS
Alluxio Users
ML Workloads
Analytics Workloads
S3
Compat
UFS
Google
GCS
UFS
Azure
ADLS
UFS
On-prem
UFS
Unified Namespace
Cache
15. 15
• How:
• Configure the ./conf/alluxio-site.properties file:
alluxio.security.authorization.plugins.enabled=true
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=<plugin name>
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=/opt/alluxio/conf
• Configure the ./conf/ranger-hdfs-security.xml file:
<property>
<name>ranger.plugin.hdfs.service.name</name>
<value>new-ranger-hdfs-service-name</value>
</property>
Alluxio & Ranger - Best Practices
What: New Ranger policies for Alluxio file system permissions
15
16. 16
• How:
• Define new service in Ranger service manager:
Alluxio & Ranger - Best Practices
What: New Ranger policies for Alluxio file system permissions
16
17. 17
• How:
• Define new access policies in Ranger:
Alluxio & Ranger - Best Practices
What: New Ranger policies for Alluxio file system permissions
17
18. 18
• How: What about Alluxio file paths that have no Ranger policies?
• Alluxio will fall back on POSIX style file and directory permissions
• Permissions can be viewed with:
$ alluxio fs ls -R /mydir/mysubdir/
• Permissions can be change with:
$ alluxio fs chmod 640
• Consider configuring Alluxio default UMASK with:
alluxio.security.authorization.permission.umask=077
Alluxio & Ranger - Best Practices
What: New Ranger policies for Alluxio file system permissions
18
20. Alluxio and Apache Ranger Best Practices
Greg Palmer | Lead Solutions Engineer | Alluxio | greg.palmer@alluxio.com
20
Product School - May 26th, 2022
Explore Alluxio & Apache Ranger on your laptop or desktop computer:
https://github.com/gregpalmr/alluxio-ranger-sandbox