SlideShare uma empresa Scribd logo
1 de 41
How to Protect Big Data in a
Containerized Environment
Thomas Phelan
Chief Architect, BlueData
@tapbluedata
Outline
 Securing a Big Data Environment
 Data Protection
 Transparent Data Encryption
 Transparent Data Encryption in a Containerized Environment
 Takeaways
In the Beginning …
 Hadoop was used to process public web data
- No compelling need for security
• No user or service authentication
• No data security
Then Hadoop Became Popular
Security is important.
Layers of Security in Hadoop
 Access
 Authentication
 Authorization
 Data Protection
 Auditing
 Policy (protect from human error)
Hadoop Security: Data Protection
Reference: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_edh_overview.html
Focus on Data Security
 Confidentiality
- Confidentiality is lost when data is accessed by someone not
authorized to do so
 Integrity
- Integrity is lost when data is modified in unexpected ways
 Availability
- Availability is lost when data is erased or becomes inaccessible
Reference: https://www.us-cert.gov/sites/default/files/publications/infosecuritybasics.pdf
Hadoop Distributed File System (HDFS)
 Data Security Features
- Access Control
- Data Encryption
- Data Replication
Access Control
 Simple
- Identity determined by host operating system
 Kerberos
- Identity determined by Kerberos credentials
- One realm for both compute and storage
- Required for HDFS Transparent Data Encryption
Data Encryption
 Transforming data
Data Replication
 3 way replication
- Can survive any 2 failures
 Erasure Coding
- Can survive more than 2 failures depending on parity bit configuration
HDFS with End-to-End Encryption
 Confidentiality
- Data Access
 Integrity
- Data Access + Data Encryption
 Availability
- Data Access + Data Replication
Data Encryption
 How to transform the data?
10101110001001000101110
00101000111010101010101
00011101010101110
Cleartext
XXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXX
XXX
Ciphertext
Data Encryption – At Rest
 Data is encrypted while on persistent media (disk)
Data Encryption – In Transit
 Data is encrypted while traveling over the network
The Whole Process
Ciphertext
HDFS Transparent Data Encryption (TDE)
 End-to-end encryption
- Data is encrypted/decrypted at the client
• Data is protected at rest and in transit
 Transparent
- No application level code changes required
HDFS TDE – Design
 Goals:
- Only an authorized client/user can access cleartext
- HDFS never stores cleartext or unencrypted data encryption keys
HDFS TDE – Terminology
 Encryption Zone
- A directory whose file contents will be encrypted upon write and
decrypted upon read
- An EZKEY is generated for each zone
HDFS TDE – Terminology
 EZKEY – encryption zone key
 DEK – data encryption key
 EDEK – encrypted data encryption key
HDFS TDE - Data Encryption
 The same key is used to encrypt and decrypt data
 The size of the ciphertext is exactly the same as the size of the original
cleartext
- EZKEY + DEK => EDEK
- EDEK + EZKEY => DEK
HDFS TDE - Services
 HDFS NameNode (NN)
 Kerberos Key Distribution Center (KDC)
 Hadoop Key Management Server (KMS)
- Key Trustee Server
HDFS TDE – Security Concepts
 Division of Labor
- KMS creates the EZKEY & DEK
- KMS encrypts/decrypts the DEK/EDEK using the EZKEY
- HDFS NN communicates with the KMS to create EZKEYs &
EDEKs to store in the extended attributes in the encryption zone
- HDFS client communicates with the KMS to get the DEK using
the EZKEY and EDEK.
HDFS TDE – Security Concepts
 The name of the EZKEY is stored in the HDFS extended
attributes of the directory associated with the encryption zone
 The EDEK is stored in the HDFS extended attributes of the file in
the encryption zone
$ hadoop key …
$ hdfs crypto …
HDFS Examples
 Simplified for the sake of clarity:
- Kerberos actions not shown
- NameNode EDEK cache not shown
HDFS – Create Encryption Zone
/encrypted_dir
xattr: EZKEYNAME EZKEYNAME = KEY
3. Create EZKEY
HDFS – Create Encrypted File
3. Create EDEK
1. Create file 2. Create EDEK
/encrypted_dir/file
xattr: EDEK
4. Store EDEK5. Return Success
/encrypted_dir/file
encrypted data
HDFS TDE – File Write Work Flow
4. Decrypt DEK from EDEK
5. Return DEK
/encrypted_dir/file
write encrypted data
read
unencrypted data
/encrypted_dir/file
xattr: EDEK
3. Request DEK from EDEK & EZKEYNAME
HDFS TDE – File Read Work Flow
4. Decrypt DEK from EDEK
5. Return DEK
/encrypted_dir/file
read encrypted data
write
unencrypted data
/encrypted_dir/file
xattr: EDEK
3. Request DEK from EDEK & EZKEYNAME
Bring in the Containers (i.e. Docker)
 Issues with containers are the same for any virtualization platform
- Multiple compute clusters
- Multiple HDFS file systems
- Multiple Kerberos realms
- Cross-realm trust configuration
Containers as Virtual Machines
 Note – this is not about using containers to run Big Data tasks:
Containers as Virtual Machines
 This is about running Hadoop / Big Data clusters in containers:
cluster
Containers as Virtual Machines
 A true containerized Big Data environment:
KDC Cross-Realm Trust
 Different KDC realms for corporate, data, and compute
 Must interact correctly in order for the Big Data cluster to function
CORP.ENTERPRISE.COM
End Users
COMPUTE.ENTERPRISE.COM
Hadoop/Spark Service Principals
DATALAKE.ENTERPRISE.COM
HDFS Service Principals
KDC Cross-Realm Trust
 Different KDC realms for corporate, data, and compute
- One-way trust
• Compute realm trusts the corporate realm
• Data realm trusts corporate realm
• Data realm trusts the compute realm
CORP.ENTERPRISE.COM Realm
COMPUTE.ENTERPRISE.COM Realm DATALAKE.ENTERPRISE.COM Realm
KDC:
CORP.ENTERPRISE.COM
KDC:
DATALAKE.ENTERPRISE.COM
KDC:
COMPUTE.ENTERPRISE.COM
HDFS:
hdfs://remotedata/
Hadoop Cluster
rm@COMPUTE.ENTERPRISE.COM
user@CORP.ENTERPRISE.COM
Hadoop Key Management Service
KDC Cross-Realm Trust
Key Management Service
 Must be enterprise quality
- Key Trustee Server
• Java KeyStore KMS
• Cloudera Navigator Key Trustee Server
Containers as Virtual Machines
 A true containerized Big Data environment:
DataLake
DataLake
DataLake
CORP.ENTERPRISE.COM
End Users
COMPUTE.ENTERPRISE.COM
Hadoop/Spark Service Principals
DATALAKE.ENTERPRISE.COM
HDFS Service Principals
CORP.ENTERPRISE.COM
End Users
COMPUTE.ENTERPRISE.COM
Hadoop/Spark Service Principals
DATALAKE.ENTERPRISE.COM
HDFS Service Principals
CORP.ENTERPRISE.COM
End Users
COMPUTE.ENTERPRISE.COM
Hadoop/Spark Service Principals
DATALAKE.ENTERPRISE.COM
HDFS Service Principals
Key Takeaways
 Hadoop has many security layers
- HDFS Transparent Data Encryption (TDE) is best of breed
- Security is hard (complex)
- Virtualization / containerization only makes it potentially harder
- Compute and storage separation with virtualization /
containerization can make it even harder still
Key Takeaways
 Be careful with a build vs. buy decision for containerized Big Data
- Recommendation: buy one already built
- There are turnkey solutions
(e.g. BlueData EPIC)
Reference: www.bluedata.com/blog/2017/08/hadoop-spark-docker-ten-things-to-know
www.bluedata.com
BlueData Booth #1508
in Strata Expo Hall
@tapbluedata

Mais conteúdo relacionado

Mais procurados

VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
 
DynomiteDB - No spof High-availability Redis cluster solution
DynomiteDB -  No spof High-availability Redis cluster solutionDynomiteDB -  No spof High-availability Redis cluster solution
DynomiteDB - No spof High-availability Redis cluster solutionLeandro Totino Pereira
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
Bootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxBootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxMaximiliano Accotto
 
Migrate Oracle database to Amazon RDS
Migrate Oracle database to Amazon RDSMigrate Oracle database to Amazon RDS
Migrate Oracle database to Amazon RDSJesus Guzman
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...Yahoo Developer Network
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red_Hat_Storage
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJoseph Kuo
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Wei Gong
 

Mais procurados (20)

VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake Environment
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
DynomiteDB - No spof High-availability Redis cluster solution
DynomiteDB -  No spof High-availability Redis cluster solutionDynomiteDB -  No spof High-availability Redis cluster solution
DynomiteDB - No spof High-availability Redis cluster solution
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
SQL on Azure
SQL on AzureSQL on Azure
SQL on Azure
 
Bootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on LinuxBootcamp 2017 - SQL Server on Linux
Bootcamp 2017 - SQL Server on Linux
 
Migrate Oracle database to Amazon RDS
Migrate Oracle database to Amazon RDSMigrate Oracle database to Amazon RDS
Migrate Oracle database to Amazon RDS
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
 

Semelhante a How to Protect Big Data in a Containerized Environment

Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости HadoopPositive Hack Days
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreHadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreCLOUDIAN KK
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Cloudera, Inc.
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
The Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsThe Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsDelphix
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudLeons Petražickis
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 

Semelhante a How to Protect Big Data in a Containerized Environment (20)

Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreHadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStore
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
The Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsThe Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOps
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 

Mais de BlueData, Inc.

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData, Inc.
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData, Inc.
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperBlueData, Inc.
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made EasyBlueData, Inc.
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData, Inc.
 

Mais de BlueData, Inc. (18)

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes Meetup
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 

Último

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 

Último (20)

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 

How to Protect Big Data in a Containerized Environment

  • 1. How to Protect Big Data in a Containerized Environment Thomas Phelan Chief Architect, BlueData @tapbluedata
  • 2. Outline  Securing a Big Data Environment  Data Protection  Transparent Data Encryption  Transparent Data Encryption in a Containerized Environment  Takeaways
  • 3. In the Beginning …  Hadoop was used to process public web data - No compelling need for security • No user or service authentication • No data security
  • 4. Then Hadoop Became Popular Security is important.
  • 5. Layers of Security in Hadoop  Access  Authentication  Authorization  Data Protection  Auditing  Policy (protect from human error)
  • 6. Hadoop Security: Data Protection Reference: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_edh_overview.html
  • 7. Focus on Data Security  Confidentiality - Confidentiality is lost when data is accessed by someone not authorized to do so  Integrity - Integrity is lost when data is modified in unexpected ways  Availability - Availability is lost when data is erased or becomes inaccessible Reference: https://www.us-cert.gov/sites/default/files/publications/infosecuritybasics.pdf
  • 8. Hadoop Distributed File System (HDFS)  Data Security Features - Access Control - Data Encryption - Data Replication
  • 9. Access Control  Simple - Identity determined by host operating system  Kerberos - Identity determined by Kerberos credentials - One realm for both compute and storage - Required for HDFS Transparent Data Encryption
  • 11. Data Replication  3 way replication - Can survive any 2 failures  Erasure Coding - Can survive more than 2 failures depending on parity bit configuration
  • 12. HDFS with End-to-End Encryption  Confidentiality - Data Access  Integrity - Data Access + Data Encryption  Availability - Data Access + Data Replication
  • 13. Data Encryption  How to transform the data? 10101110001001000101110 00101000111010101010101 00011101010101110 Cleartext XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX XXX Ciphertext
  • 14. Data Encryption – At Rest  Data is encrypted while on persistent media (disk)
  • 15. Data Encryption – In Transit  Data is encrypted while traveling over the network
  • 17. HDFS Transparent Data Encryption (TDE)  End-to-end encryption - Data is encrypted/decrypted at the client • Data is protected at rest and in transit  Transparent - No application level code changes required
  • 18. HDFS TDE – Design  Goals: - Only an authorized client/user can access cleartext - HDFS never stores cleartext or unencrypted data encryption keys
  • 19. HDFS TDE – Terminology  Encryption Zone - A directory whose file contents will be encrypted upon write and decrypted upon read - An EZKEY is generated for each zone
  • 20. HDFS TDE – Terminology  EZKEY – encryption zone key  DEK – data encryption key  EDEK – encrypted data encryption key
  • 21. HDFS TDE - Data Encryption  The same key is used to encrypt and decrypt data  The size of the ciphertext is exactly the same as the size of the original cleartext - EZKEY + DEK => EDEK - EDEK + EZKEY => DEK
  • 22. HDFS TDE - Services  HDFS NameNode (NN)  Kerberos Key Distribution Center (KDC)  Hadoop Key Management Server (KMS) - Key Trustee Server
  • 23. HDFS TDE – Security Concepts  Division of Labor - KMS creates the EZKEY & DEK - KMS encrypts/decrypts the DEK/EDEK using the EZKEY - HDFS NN communicates with the KMS to create EZKEYs & EDEKs to store in the extended attributes in the encryption zone - HDFS client communicates with the KMS to get the DEK using the EZKEY and EDEK.
  • 24. HDFS TDE – Security Concepts  The name of the EZKEY is stored in the HDFS extended attributes of the directory associated with the encryption zone  The EDEK is stored in the HDFS extended attributes of the file in the encryption zone $ hadoop key … $ hdfs crypto …
  • 25. HDFS Examples  Simplified for the sake of clarity: - Kerberos actions not shown - NameNode EDEK cache not shown
  • 26. HDFS – Create Encryption Zone /encrypted_dir xattr: EZKEYNAME EZKEYNAME = KEY 3. Create EZKEY
  • 27. HDFS – Create Encrypted File 3. Create EDEK 1. Create file 2. Create EDEK /encrypted_dir/file xattr: EDEK 4. Store EDEK5. Return Success /encrypted_dir/file encrypted data
  • 28. HDFS TDE – File Write Work Flow 4. Decrypt DEK from EDEK 5. Return DEK /encrypted_dir/file write encrypted data read unencrypted data /encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME
  • 29. HDFS TDE – File Read Work Flow 4. Decrypt DEK from EDEK 5. Return DEK /encrypted_dir/file read encrypted data write unencrypted data /encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME
  • 30. Bring in the Containers (i.e. Docker)  Issues with containers are the same for any virtualization platform - Multiple compute clusters - Multiple HDFS file systems - Multiple Kerberos realms - Cross-realm trust configuration
  • 31. Containers as Virtual Machines  Note – this is not about using containers to run Big Data tasks:
  • 32. Containers as Virtual Machines  This is about running Hadoop / Big Data clusters in containers: cluster
  • 33. Containers as Virtual Machines  A true containerized Big Data environment:
  • 34. KDC Cross-Realm Trust  Different KDC realms for corporate, data, and compute  Must interact correctly in order for the Big Data cluster to function CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals
  • 35. KDC Cross-Realm Trust  Different KDC realms for corporate, data, and compute - One-way trust • Compute realm trusts the corporate realm • Data realm trusts corporate realm • Data realm trusts the compute realm
  • 36. CORP.ENTERPRISE.COM Realm COMPUTE.ENTERPRISE.COM Realm DATALAKE.ENTERPRISE.COM Realm KDC: CORP.ENTERPRISE.COM KDC: DATALAKE.ENTERPRISE.COM KDC: COMPUTE.ENTERPRISE.COM HDFS: hdfs://remotedata/ Hadoop Cluster rm@COMPUTE.ENTERPRISE.COM user@CORP.ENTERPRISE.COM Hadoop Key Management Service KDC Cross-Realm Trust
  • 37. Key Management Service  Must be enterprise quality - Key Trustee Server • Java KeyStore KMS • Cloudera Navigator Key Trustee Server
  • 38. Containers as Virtual Machines  A true containerized Big Data environment: DataLake DataLake DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals
  • 39. Key Takeaways  Hadoop has many security layers - HDFS Transparent Data Encryption (TDE) is best of breed - Security is hard (complex) - Virtualization / containerization only makes it potentially harder - Compute and storage separation with virtualization / containerization can make it even harder still
  • 40. Key Takeaways  Be careful with a build vs. buy decision for containerized Big Data - Recommendation: buy one already built - There are turnkey solutions (e.g. BlueData EPIC) Reference: www.bluedata.com/blog/2017/08/hadoop-spark-docker-ten-things-to-know
  • 41. www.bluedata.com BlueData Booth #1508 in Strata Expo Hall @tapbluedata

Notas do Editor

  1. Jason to briefly cover agenda
  2. Jason to briefly cover agenda