Mais conteúdo relacionado
Semelhante a Taking Hadoop to Enterprise Security Standards (20)
Mais de DataWorks Summit (20)
Taking Hadoop to Enterprise Security Standards
- 3. How many of you need or have
access control in Hadoop?
- 6. ©2014 LinkedIn Corporation. All Rights Reserved.
Hadoop – Status Quo
Multiple Query
Execution
Engines
Custom Code
Execution
Auditing
- 7. ©2014 LinkedIn Corporation. All Rights Reserved.
User ID Email Address IP address Billing address
Security Customer Service Data Scientist
Adding & Removing group membership can take up to few hours
HDFS file permissions are very coarse (at file level)
HDFS File Permissions
- 10. ©2014 LinkedIn Corporation. All Rights Reserved.
Extensible
Authorization
Fine Grain
Control
Fast Changes to
Authorization
Rules
What do we need?
- 11. ©2014 LinkedIn Corporation. All Rights Reserved.
Our Solution: Access Control via Encryption
Apache Kafka
HDFS
Key Server
Parquet
ETLEncrypted
Events
- 12. ©2014 LinkedIn Corporation. All Rights Reserved.
User A’s Job
User B’s Job
User C’s Job
Producer
Job
ETL User
Parquet File
User Columns
A 5
B 2, 5
Key Server
Access Control via Encryption
- 13. ©2014 LinkedIn Corporation. All Rights Reserved.
Columnar Storage
Page 0
Page 1
Page 2
Column a Column b
Rowgroup
Parquet Format
Brief Overview of Parquet
- 14. ©2014 LinkedIn Corporation. All Rights Reserved.
*Yet to be integrated into open source Parquet
Field mode
Page
Column
| Page Mode | Hybrid Mode
Encryption Support in Parquet*
- 15. ©2014 LinkedIn Corporation. All Rights Reserved.
Examples
Emails – Analysts need it to join with other tables but may not require
access to individual emails
N Values
(Page)
Encrypt each value at
a time
karthik@gmail.com
harsh@gmail.com
harsh@gmail.com
arvind@gmail.com
xxxxxxx
yyyyyyy
yyyyyyy
zzzzzzz
Field Mode
- 17. ©2014 LinkedIn Corporation. All Rights Reserved.
Page Mode
No information is leaked except entropy of the data
Better performance than other modes
N Values
(Page)
Encode Compress Encrypt
- 18. ©2014 LinkedIn Corporation. All Rights Reserved.
Hybrid Mode
More fine grain control of information
Increase in overhead due to double encryption/decryption
N Values
(Page)
Encrypt each
value
Encrypt
- 20. ©2014 LinkedIn Corporation. All Rights Reserved.
Key Versioning
Each key is versioned and specific for a source (File/Event name)
Reduces the exposure incase of key leakage
Time based access control
– All users by default can access only last 30 days of data
– Give users access to data in specific time period
Authentication of producers can be done separately
- 21. ©2014 LinkedIn Corporation. All Rights Reserved.
Better Auditing
Coverage
Retention
Enforcement
Key Server Features
Multifactor
Authentication