SlideShare uma empresa Scribd logo
1 de 20
1© Cloudera, Inc. All rights reserved.
Charles Lamb
HDFS Transparent Encryption
SFHUG
2© Cloudera, Inc. All rights reserved.
Overview
• Done under open source (HDFS-6134)
• Data read from and written to certain directories is transparently encrypted
• No changes to user code
• Encryption/decryption always done by client
• HDFS never handles unencrypted data or unencrypted keys
• Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)
3© Cloudera, Inc. All rights reserved.
Background
• Encryption can happen at any of several levels:
• Application: most secure and flexible, but hardest to do
• Adding encryption to legacy applications may be difficult
• Database: most DBMSs have this, but may incur performance penalties
• Secondary indices can not generally be encrypted
• Filesystem: high performance, transparent, but may not be flexible enough
• Multi-tenancy vs per-user encryption policies
• Disk: high performance but only really protects against physical theft
• HDFS encryption is somewhere between Filesystem and Database level
4© Cloudera, Inc. All rights reserved.
Design Goals
• Performance and scalability
• Transparent to applications, including legacy apps
• End-to-end
• Data should be encrypted on the network and ‘at-rest’
• Compartmentalization
• Key management independent of HDFS management
• Includes preventing HDFS admins and root users from accessing sensitive data
• Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har,
etc.
5© Cloudera, Inc. All rights reserved.
Architectural Concepts
• Key Management Server
• Encryption Zones
• Keys
6© Cloudera, Inc. All rights reserved.
Key Management Server
7© Cloudera, Inc. All rights reserved.
Key Management Server (KMS)
• KMS sits between client and key server
• E.g. Cloudera Navigator Key Trustee
• Provides a unified API and scalability
• REST API
• Does not actually store keys (backend does that), but does cache them
• ACLs on per-key basis
8© Cloudera, Inc. All rights reserved.
Encryption Zones
• An HDFS directory in which the contents (including subdirs) are encrypted on
write and decrypted on read.
• An EZ begins life as an empty directory
• Renames in/out of an EZ are prohibited
• Encryption is transparent to application with no code changes
9© Cloudera, Inc. All rights reserved.
Keys
• Every Encryption Zone has a key (“EZ Key”)
• Every file in an Encryption Zone has a unique key (“Data Encryption Key” or
“DEK”)
• The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir
• The actual EZ Key is stored in the Key Server
• The NameNode stores the DEK in an Xattr of the file, but only in encrypted form
• Encrypted Data Encryption Key, or “EDEK”
• The NameNode never touches decrypted data or decrypted keys
10© Cloudera, Inc. All rights reserved.
EZ Keys, Data Encryption Keys, and Encrypted Data
Encryption Keys
11© Cloudera, Inc. All rights reserved.
Key Handling
12© Cloudera, Inc. All rights reserved.
Design
• End-to-end encryption
• Encryption occurs on the client and decrypted data is never touched by HDFS
• Protects against network sniffing, evil HDFS admins, and hard drive theft
• HDFS never touches key material (DEK’s or EZ keys)
• Compromising an HDFS daemon is not a viable attack vector
• HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs)
• Key permissions are handled by the KMS ACLs
• Each file is encrypted with a unique DEK
13© Cloudera, Inc. All rights reserved.
HDFS Encryption Configuration
• hadoop key create <keyname>
• hdfs dfs –mkdir <path>
• hdfs crypto –createZone –keyName <keyname> -path <path>
14© Cloudera, Inc. All rights reserved.
KMS Per-User ACL Configuration
• White lists (check for inclusion) and black lists (check for exclusion)
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.CREATE
• hadoop.kms.blacklist.CREATE
• … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA,
GENERATE_EEK, DECRYPT_EEK
15© Cloudera, Inc. All rights reserved.
KMS Per-Key ACL Configuration
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.<keyname>.<operation>
• MANAGEMENT – createKey, deleteKey, rolloverNewVersion
• GENERATE_EEK – generateEncryptedKey,
warmUpEncryptedKeys
• DECRYPT_EEK – decryptEncryptedKey
• READ – getKeyVersion, getKeyVersions, getMetadata,
getKeysMetadata, getCurrentKey
• ALL – all of the above
16© Cloudera, Inc. All rights reserved.
Performance
• AES-CTR, 128 or 256 (with unlimited strength JCE installed)
• AES-NI available
• Negligible overhead on writes and 7.5% impact on reads for datasets larger
than memory
17© Cloudera, Inc. All rights reserved.
DistCp
• Encryption Zone to Encryption Zone
• use –update –skipcrccheck
• Admins use special /.reserved/raw path prefix
• /.reserved/raw is only available to root and provides the encrypted
contents
18© Cloudera, Inc. All rights reserved.
Exceptions
• Hive: may not be able to do a query that combines data from more than one
encryption zone
19© Cloudera, Inc. All rights reserved.
HDFS Encryption - Summary
• Good performance (4-10% hit)
• No mods to existing applications
• Prevents attacks at the filesystem and below
• OS and filesystem only see encrypted bytes
• Data is encrypted all the way to the client
• Secure ‘at rest’ and in transit
• Key management is independent of HDFS
• Key admin != HDFS admin
• Can prevent HDFS admin from accessing secure data
20© Cloudera, Inc. All rights reserved.
Questions

Mais conteúdo relacionado

Mais procurados

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 

Mais procurados (20)

Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
Migrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQLMigrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQL
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Oracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous DatabaseOracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 19c - the Basis for the Autonomous Database
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Kibana overview
Kibana overviewKibana overview
Kibana overview
 

Destaque

Destaque (20)

Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.
 
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
 
How PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applicationsHow PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applications
 
La práctica de Machine Learning en la empresa
La práctica de Machine Learning en la empresaLa práctica de Machine Learning en la empresa
La práctica de Machine Learning en la empresa
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial Services
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 

Semelhante a Overview of HDFS Transparent Encryption

How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
 

Semelhante a Overview of HDFS Transparent Encryption (20)

Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop Encryption
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Ozone: Evolution of HDFS
Ozone: Evolution of HDFSOzone: Evolution of HDFS
Ozone: Evolution of HDFS
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
 
Security best practices for informix
Security best practices for informixSecurity best practices for informix
Security best practices for informix
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 

Mais de Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Overview of HDFS Transparent Encryption

  • 1. 1© Cloudera, Inc. All rights reserved. Charles Lamb HDFS Transparent Encryption SFHUG
  • 2. 2© Cloudera, Inc. All rights reserved. Overview • Done under open source (HDFS-6134) • Data read from and written to certain directories is transparently encrypted • No changes to user code • Encryption/decryption always done by client • HDFS never handles unencrypted data or unencrypted keys • Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)
  • 3. 3© Cloudera, Inc. All rights reserved. Background • Encryption can happen at any of several levels: • Application: most secure and flexible, but hardest to do • Adding encryption to legacy applications may be difficult • Database: most DBMSs have this, but may incur performance penalties • Secondary indices can not generally be encrypted • Filesystem: high performance, transparent, but may not be flexible enough • Multi-tenancy vs per-user encryption policies • Disk: high performance but only really protects against physical theft • HDFS encryption is somewhere between Filesystem and Database level
  • 4. 4© Cloudera, Inc. All rights reserved. Design Goals • Performance and scalability • Transparent to applications, including legacy apps • End-to-end • Data should be encrypted on the network and ‘at-rest’ • Compartmentalization • Key management independent of HDFS management • Includes preventing HDFS admins and root users from accessing sensitive data • Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har, etc.
  • 5. 5© Cloudera, Inc. All rights reserved. Architectural Concepts • Key Management Server • Encryption Zones • Keys
  • 6. 6© Cloudera, Inc. All rights reserved. Key Management Server
  • 7. 7© Cloudera, Inc. All rights reserved. Key Management Server (KMS) • KMS sits between client and key server • E.g. Cloudera Navigator Key Trustee • Provides a unified API and scalability • REST API • Does not actually store keys (backend does that), but does cache them • ACLs on per-key basis
  • 8. 8© Cloudera, Inc. All rights reserved. Encryption Zones • An HDFS directory in which the contents (including subdirs) are encrypted on write and decrypted on read. • An EZ begins life as an empty directory • Renames in/out of an EZ are prohibited • Encryption is transparent to application with no code changes
  • 9. 9© Cloudera, Inc. All rights reserved. Keys • Every Encryption Zone has a key (“EZ Key”) • Every file in an Encryption Zone has a unique key (“Data Encryption Key” or “DEK”) • The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir • The actual EZ Key is stored in the Key Server • The NameNode stores the DEK in an Xattr of the file, but only in encrypted form • Encrypted Data Encryption Key, or “EDEK” • The NameNode never touches decrypted data or decrypted keys
  • 10. 10© Cloudera, Inc. All rights reserved. EZ Keys, Data Encryption Keys, and Encrypted Data Encryption Keys
  • 11. 11© Cloudera, Inc. All rights reserved. Key Handling
  • 12. 12© Cloudera, Inc. All rights reserved. Design • End-to-end encryption • Encryption occurs on the client and decrypted data is never touched by HDFS • Protects against network sniffing, evil HDFS admins, and hard drive theft • HDFS never touches key material (DEK’s or EZ keys) • Compromising an HDFS daemon is not a viable attack vector • HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs) • Key permissions are handled by the KMS ACLs • Each file is encrypted with a unique DEK
  • 13. 13© Cloudera, Inc. All rights reserved. HDFS Encryption Configuration • hadoop key create <keyname> • hdfs dfs –mkdir <path> • hdfs crypto –createZone –keyName <keyname> -path <path>
  • 14. 14© Cloudera, Inc. All rights reserved. KMS Per-User ACL Configuration • White lists (check for inclusion) and black lists (check for exclusion) • etc/hadoop/kms-acls.xml • hadoop.kms.acl.CREATE • hadoop.kms.blacklist.CREATE • … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA, GENERATE_EEK, DECRYPT_EEK
  • 15. 15© Cloudera, Inc. All rights reserved. KMS Per-Key ACL Configuration • etc/hadoop/kms-acls.xml • hadoop.kms.acl.<keyname>.<operation> • MANAGEMENT – createKey, deleteKey, rolloverNewVersion • GENERATE_EEK – generateEncryptedKey, warmUpEncryptedKeys • DECRYPT_EEK – decryptEncryptedKey • READ – getKeyVersion, getKeyVersions, getMetadata, getKeysMetadata, getCurrentKey • ALL – all of the above
  • 16. 16© Cloudera, Inc. All rights reserved. Performance • AES-CTR, 128 or 256 (with unlimited strength JCE installed) • AES-NI available • Negligible overhead on writes and 7.5% impact on reads for datasets larger than memory
  • 17. 17© Cloudera, Inc. All rights reserved. DistCp • Encryption Zone to Encryption Zone • use –update –skipcrccheck • Admins use special /.reserved/raw path prefix • /.reserved/raw is only available to root and provides the encrypted contents
  • 18. 18© Cloudera, Inc. All rights reserved. Exceptions • Hive: may not be able to do a query that combines data from more than one encryption zone
  • 19. 19© Cloudera, Inc. All rights reserved. HDFS Encryption - Summary • Good performance (4-10% hit) • No mods to existing applications • Prevents attacks at the filesystem and below • OS and filesystem only see encrypted bytes • Data is encrypted all the way to the client • Secure ‘at rest’ and in transit • Key management is independent of HDFS • Key admin != HDFS admin • Can prevent HDFS admin from accessing secure data
  • 20. 20© Cloudera, Inc. All rights reserved. Questions