SlideShare uma empresa Scribd logo
1 de 32
Securing the Hadoop Ecosystem
Hadoop Security and Compliance Challenges
2
• History
• Security was not a priority in early Hadoop adopters like Yahoo!
and Facebook / it is now!
• Data concentration
• Quantity and diversity of data creates compliance challenges
• Flexibility of the Hadoop architecture
• Many paths for data in, out, processing
• Access data at different granularities, from fields to files
• ELT: sensitive data “discovery” occurs after data arrives
Cloudera has led in investments in security
3
Authentication
• First Hadoop distribution to offer strong authentication throughout
Encryption
• First Hadoop distribution to support encryption on wire
Audit
• Only Hadoop distribution to support audit histories for all data objects & access
paths
• Single point for log capture, audit
Authorization
• Founded the Apache Sentry project along with Oracle and Lab41 to manage fine-
grained permissions
Automation
• Cloudera Manager automates security configurations & LDAP/AD integration
Case Study: Finance and Banking
• Identify patterns in financially-sensitive, PCI and PII
data
• Before: Unable to build applications on Hadoop; forced
to use other systems, to greatly limit Hadoop access, or
to forgo analysis due to privacy concerns
• Now: Provide broad analysis capabilities with Impala to
large population and secured by Sentry
Fraud and Purchasing
Behavior Analysis
Enterprise Security in Hadoop overview
5
Four Functional Areas
Hadoop Cluster
Users
Applications Operators
Perimeter
Data
Access
Visibility
Defining the Functional Areas
6
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
Enabling Enterprise Security
7
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
Enabling Enterprise Security
8
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
Perimeter: Authentication in Hadoop
10
Kerberos
• Provably strong authentication between all
Hadoop services and (optionally) to end-points
• Cloudera Manager hides complexity
LDAP/AD
• Username / password
• Option for Hue, Hive Metastore, Impala
connectors, Cloudera Manager admin logins
SAML
• For Single Sign-On (SSO) for listed options
• Kerberos clients no longer required on most user
end-points
Authentication Options and Coverage
11
HDFS
DN NN
YARN
RM AM
Impala
ID SS
MapReduce
JT TT
… Services …
(Oozie, Search, etc.)
3rd Party
Gateway …
Client
Client
Client
Client
… Applications …
(Pig, Hive, Hue, etc.)
“End-to-End” Kerberos
“Core” Kerberos “Edge” AD/LDAP/SAML
IT Integration: Kerberos
• Users don’t want Yet Another Credential
• Corp IT doesn’t want to provision and maintain thousands of service principals and
keytabs
• Solution: local KDC + one-way trust
• Run MIT Kerberos KDC in the cluster
• Put all service principals here
• Set up one-way trust of central corporate realm by local KDC
• Normal user credentials can be used to access Hadoop
• Recommended: Use Cloudera Manager
• To properly tune inter-related configuration knobs
• To manage principals/keytabs creation and distribution
• To preserve service monitoring with Kerberos security enabled
IT Integration: Kerberos + LDAP
Hadoop Cluster
Local KDC (MIT Kerberos)
hdfs/host1@HADOOP.EXAMPLE.COM
yarn/host2@HADOOP.EXAMPLE.COM
…
Central
Active Directory
user@EXAMPLE.COM …
Cross-realm
trust
NN JT
LDAP group
mapping
Network Access Management
• Use Hue to front-end both Hadoop and Oozie to control access through a web browser
• HTTP proxy servers:
• Oozie : MR jobs, Pig jobs, Hive jobs
• HttpFS: hadoop fs is front-ended over HTTP
• HBase REST server: HBase reads
Secure configuration with Oozie, Hue and HttpFS front-ends co-located to act as network
bridge
Hue supports AD/LDAP based authentication instead of Kerberos for client simplicity
Enabling Enterprise Security
15
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
Data: Protection in Hadoop
16
Data in Motion Data at Rest
“Network Encryption”
• SASL: Network RPC
• SSL: MapReduce shuffle
• SSL: Web-based user and
administration tools
• SSL: JDBC
• HDFS data transfer protocol
“Data Encryption”
• Certified partner solutions
• Field-level encryption
• Data masking or tokenization
• OS-level file system encryption
Enabling Enterprise Security
18
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
Prior State of Authorization
Two Sub-Optimal Choices for SQL on Hadoop
19
• Insecure Advisory Authorization
• Users could grant themselves permissions
• Intended to prevent accidental deletion of data
• Problem: Did not guard against malicious users
• Problem: Only worked with Hive
• HDFS Impersonation
• Data was only protected at the file level by HDFS permissions
• Problem: File-level not granular enough
• Problem: Lacked flexibility; not role-based
Sentry: Key Capabilities
21
Fine-Grained Authorization
• Specify security for SERVERS, DATABASES, TABLES,
VIEWS, and search indices
Role-Based Authorization
• SELECT privilege on views & tables
• INSERT privilege on tables
• TRANSFORM privilege on servers
• ALL privilege on the server, databases, tables & views
• ALL privilege is needed to create/modify schema
Multitenant Administration
• Separate policies for each database/schema
• Can be maintained by separate admins
Sentry Architecture
22
Binding
Layer
Impala
Impala Hive
Policy Engine
Search
Policy Provider
File Database
HiveServer2
Authorization
Provider Evaluation, Validation
Parsing
Interface
Interface
Local FS/HDFS
Search
QueryMR
SQL
Query Execution Flow
23
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement tree
Validate statement objects
• First check: Authorization
Forward to execution planner
Multitenant Security
Global
[groups]
admin_group = admin_role
dep1_admin = uri_role
[roles]
admin_role = server=server1
uri_role = hdfs:///ha-nn-uri/data
[databases]
db1 = hdfs://ha-nn-
uri/user/hive/sentry/db1.ini
Per Database
[groups]
dep1_admin = db1_admin_role
dep1_analyst = db1_read_role
[roles]
db1_admin_role = server=server1-
>db=db1
db1_read_role = server=server1-
>db=db1->table=*->action=select
Apache Ecosystem and Sentry
Inline support in Cloudera Impala
Extensibility plug-in for Apache HiveServer2
Inline support in Cloudera Search
Complementary security with HDFS ACLs
Access: Authorization in Hadoop
26
File ACL
Admin RBAC
Data RBAC
• Permission at file-level granularity
• HDFS POSIX-style permissions: u/g/o
• Access Control Lists (ACL)
• HBase, Oozie, MapReduce
• Permissions on tables, views, indices
• Sentry for HiveServer2, Impala, Search
App and Workflow
• Cloudera Manager, Hue
Enabling Enterprise Security
28
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
Visibility: Cloudera Navigator
29
Audit & Access Control
• Maintain full audit history
• Ensuring appropriate
permissions and reporting
on data access for
compliance
Discovery & Exploration
• Finding out what data is
available and what it looks
like
Lineage
• Tracing data back to its
original source
Lifecycle Management
• Migration of data based on
policies
3RD PARTY
APPS
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
CLOUDERA’S ENTERPRISE DATA HUB
BATCH
PROCESSING
MAPREDUCE
ANALYTIC
SQL
IMPALA
SEARCH
ENGINE
SOLR
MACHINE
LEARNING
SPARK
STREAM
PROCESSING
SPARK STREAMING
WORKLOAD MANAGEMENT YARN
FILESYSTEM
HDFS
ONLINE NOSQL
HBASE
DATA
MANAGEMENT
CLOUDERANAVIGATOR
SYSTEM
MANAGEMENT
CLOUDERAMANAGER
SENTRY, SECURE
Why Navigator?
30
Lots of Data Landing in Cloudera Enterprise
 Huge quantities
 Many different sources – structured and unstructured
 Varying levels of sensitivity
1
Many Users Working with the Data
 Administrators and compliance officers
 Analysts and data scientists
 Business users
2
Need to Effectively Control and Consume Data
 Get visibility and control over the environment
 Discover and explore data
3
31
31
32
32
33
33
Leading Investment to Address the Challenges
34
Authentication First Hadoop distribution to offer strong authentication
throughout
Encryption First Hadoop distribution to support encryption on wire
Audit Only Hadoop distribution to support audit histories for all data
objects and access paths; Single point for log capture, audit
Authorization Founded the Apache Sentry project along with Oracle and
Lab41 to manage fine-grained permissions
Automation Cloudera Manager automates security configurations &
LDAP/AD integration
Cloudera 5: Enabling the Enterprise Data Hub
35
Open Source
Scalable
Flexible
Cost-Effective
✔
Managed ✖
Open
Architecture ✖
Secure and
Governed ✖
✔
✔
✔
3RD PARTY
APPS
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE
CLOUDERA’S ENTERPRISE DATA HUB
BATCH
PROCESSING
MAPREDUCE
ANALYTIC
SQL
IMPALA
SEARCH
ENGINE
SOLR
MACHINE
LEARNING
SPARK
STREAM
PROCESSING
SPARK STREAMING
WORKLOAD MANAGEMENT YARN
FILESYSTEM
HDFS
ONLINE NOSQL
HBASE
DATA
MANAGEMENT
CLOUDERANAVIGATOR
SYSTEM
MANAGEMENT
CLOUDERAMANAGER
SENTRY
Hadoop and Data Access Security

Mais conteúdo relacionado

Mais procurados

Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 

Mais procurados (20)

Les BD NoSQL
Les BD NoSQLLes BD NoSQL
Les BD NoSQL
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-Reduce
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
BigData_TP2: Design Patterns dans Hadoop
BigData_TP2: Design Patterns dans HadoopBigData_TP2: Design Patterns dans Hadoop
BigData_TP2: Design Patterns dans Hadoop
 
Big Data, Hadoop & Spark
Big Data, Hadoop & SparkBig Data, Hadoop & Spark
Big Data, Hadoop & Spark
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Alphorm.com Formation Big Data & Hadoop : Le Guide Complet
Alphorm.com Formation Big Data & Hadoop : Le Guide CompletAlphorm.com Formation Big Data & Hadoop : Le Guide Complet
Alphorm.com Formation Big Data & Hadoop : Le Guide Complet
 
Introduction au BIG DATA
Introduction au BIG DATAIntroduction au BIG DATA
Introduction au BIG DATA
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...
 
Introduction à HDFS
Introduction à HDFSIntroduction à HDFS
Introduction à HDFS
 
BigData_TP3 : Spark
BigData_TP3 : SparkBigData_TP3 : Spark
BigData_TP3 : Spark
 
Quand utiliser MongoDB … Et quand vous en passer…
Quand utiliser MongoDB	… Et quand vous en passer…Quand utiliser MongoDB	… Et quand vous en passer…
Quand utiliser MongoDB … Et quand vous en passer…
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
introduction à MongoDB
introduction à MongoDBintroduction à MongoDB
introduction à MongoDB
 
Une introduction à Hive
Une introduction à HiveUne introduction à Hive
Une introduction à Hive
 
Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture
 
Une Introduction à Hadoop
Une Introduction à HadoopUne Introduction à Hadoop
Une Introduction à Hadoop
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Cours Big Data Chap2
Cours Big Data Chap2Cours Big Data Chap2
Cours Big Data Chap2
 

Destaque

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Destaque (20)

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop
HadoopHadoop
Hadoop
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 

Semelhante a Hadoop and Data Access Security

The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
 

Semelhante a Hadoop and Data Access Security (20)

Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
大数据数据安全
大数据数据安全大数据数据安全
大数据数据安全
 

Mais de Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Hadoop and Data Access Security

  • 2. Hadoop Security and Compliance Challenges 2 • History • Security was not a priority in early Hadoop adopters like Yahoo! and Facebook / it is now! • Data concentration • Quantity and diversity of data creates compliance challenges • Flexibility of the Hadoop architecture • Many paths for data in, out, processing • Access data at different granularities, from fields to files • ELT: sensitive data “discovery” occurs after data arrives
  • 3. Cloudera has led in investments in security 3 Authentication • First Hadoop distribution to offer strong authentication throughout Encryption • First Hadoop distribution to support encryption on wire Audit • Only Hadoop distribution to support audit histories for all data objects & access paths • Single point for log capture, audit Authorization • Founded the Apache Sentry project along with Oracle and Lab41 to manage fine- grained permissions Automation • Cloudera Manager automates security configurations & LDAP/AD integration
  • 4. Case Study: Finance and Banking • Identify patterns in financially-sensitive, PCI and PII data • Before: Unable to build applications on Hadoop; forced to use other systems, to greatly limit Hadoop access, or to forgo analysis due to privacy concerns • Now: Provide broad analysis capabilities with Impala to large population and secured by Sentry Fraud and Purchasing Behavior Analysis
  • 5. Enterprise Security in Hadoop overview 5 Four Functional Areas Hadoop Cluster Users Applications Operators Perimeter Data Access Visibility
  • 6. Defining the Functional Areas 6 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage
  • 7. Enabling Enterprise Security 7 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • 8. Enabling Enterprise Security 8 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • 9. Perimeter: Authentication in Hadoop 10 Kerberos • Provably strong authentication between all Hadoop services and (optionally) to end-points • Cloudera Manager hides complexity LDAP/AD • Username / password • Option for Hue, Hive Metastore, Impala connectors, Cloudera Manager admin logins SAML • For Single Sign-On (SSO) for listed options • Kerberos clients no longer required on most user end-points
  • 10. Authentication Options and Coverage 11 HDFS DN NN YARN RM AM Impala ID SS MapReduce JT TT … Services … (Oozie, Search, etc.) 3rd Party Gateway … Client Client Client Client … Applications … (Pig, Hive, Hue, etc.) “End-to-End” Kerberos “Core” Kerberos “Edge” AD/LDAP/SAML
  • 11. IT Integration: Kerberos • Users don’t want Yet Another Credential • Corp IT doesn’t want to provision and maintain thousands of service principals and keytabs • Solution: local KDC + one-way trust • Run MIT Kerberos KDC in the cluster • Put all service principals here • Set up one-way trust of central corporate realm by local KDC • Normal user credentials can be used to access Hadoop • Recommended: Use Cloudera Manager • To properly tune inter-related configuration knobs • To manage principals/keytabs creation and distribution • To preserve service monitoring with Kerberos security enabled
  • 12. IT Integration: Kerberos + LDAP Hadoop Cluster Local KDC (MIT Kerberos) hdfs/host1@HADOOP.EXAMPLE.COM yarn/host2@HADOOP.EXAMPLE.COM … Central Active Directory user@EXAMPLE.COM … Cross-realm trust NN JT LDAP group mapping
  • 13. Network Access Management • Use Hue to front-end both Hadoop and Oozie to control access through a web browser • HTTP proxy servers: • Oozie : MR jobs, Pig jobs, Hive jobs • HttpFS: hadoop fs is front-ended over HTTP • HBase REST server: HBase reads Secure configuration with Oozie, Hue and HttpFS front-ends co-located to act as network bridge Hue supports AD/LDAP based authentication instead of Kerberos for client simplicity
  • 14. Enabling Enterprise Security 15 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • 15. Data: Protection in Hadoop 16 Data in Motion Data at Rest “Network Encryption” • SASL: Network RPC • SSL: MapReduce shuffle • SSL: Web-based user and administration tools • SSL: JDBC • HDFS data transfer protocol “Data Encryption” • Certified partner solutions • Field-level encryption • Data masking or tokenization • OS-level file system encryption
  • 16. Enabling Enterprise Security 18 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • 17. Prior State of Authorization Two Sub-Optimal Choices for SQL on Hadoop 19 • Insecure Advisory Authorization • Users could grant themselves permissions • Intended to prevent accidental deletion of data • Problem: Did not guard against malicious users • Problem: Only worked with Hive • HDFS Impersonation • Data was only protected at the file level by HDFS permissions • Problem: File-level not granular enough • Problem: Lacked flexibility; not role-based
  • 18. Sentry: Key Capabilities 21 Fine-Grained Authorization • Specify security for SERVERS, DATABASES, TABLES, VIEWS, and search indices Role-Based Authorization • SELECT privilege on views & tables • INSERT privilege on tables • TRANSFORM privilege on servers • ALL privilege on the server, databases, tables & views • ALL privilege is needed to create/modify schema Multitenant Administration • Separate policies for each database/schema • Can be maintained by separate admins
  • 19. Sentry Architecture 22 Binding Layer Impala Impala Hive Policy Engine Search Policy Provider File Database HiveServer2 Authorization Provider Evaluation, Validation Parsing Interface Interface Local FS/HDFS Search
  • 20. QueryMR SQL Query Execution Flow 23 Parse Build Check Plan Sentry Validate SQL grammar Construct statement tree Validate statement objects • First check: Authorization Forward to execution planner
  • 21. Multitenant Security Global [groups] admin_group = admin_role dep1_admin = uri_role [roles] admin_role = server=server1 uri_role = hdfs:///ha-nn-uri/data [databases] db1 = hdfs://ha-nn- uri/user/hive/sentry/db1.ini Per Database [groups] dep1_admin = db1_admin_role dep1_analyst = db1_read_role [roles] db1_admin_role = server=server1- >db=db1 db1_read_role = server=server1- >db=db1->table=*->action=select
  • 22. Apache Ecosystem and Sentry Inline support in Cloudera Impala Extensibility plug-in for Apache HiveServer2 Inline support in Cloudera Search Complementary security with HDFS ACLs
  • 23. Access: Authorization in Hadoop 26 File ACL Admin RBAC Data RBAC • Permission at file-level granularity • HDFS POSIX-style permissions: u/g/o • Access Control Lists (ACL) • HBase, Oozie, MapReduce • Permissions on tables, views, indices • Sentry for HiveServer2, Impala, Search App and Workflow • Cloudera Manager, Hue
  • 24. Enabling Enterprise Security 28 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | AD/LDAP Cloudera NavigatorNative | Certified Partners
  • 25. Visibility: Cloudera Navigator 29 Audit & Access Control • Maintain full audit history • Ensuring appropriate permissions and reporting on data access for compliance Discovery & Exploration • Finding out what data is available and what it looks like Lineage • Tracing data back to its original source Lifecycle Management • Migration of data based on policies 3RD PARTY APPS STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE CLOUDERA’S ENTERPRISE DATA HUB BATCH PROCESSING MAPREDUCE ANALYTIC SQL IMPALA SEARCH ENGINE SOLR MACHINE LEARNING SPARK STREAM PROCESSING SPARK STREAMING WORKLOAD MANAGEMENT YARN FILESYSTEM HDFS ONLINE NOSQL HBASE DATA MANAGEMENT CLOUDERANAVIGATOR SYSTEM MANAGEMENT CLOUDERAMANAGER SENTRY, SECURE
  • 26. Why Navigator? 30 Lots of Data Landing in Cloudera Enterprise  Huge quantities  Many different sources – structured and unstructured  Varying levels of sensitivity 1 Many Users Working with the Data  Administrators and compliance officers  Analysts and data scientists  Business users 2 Need to Effectively Control and Consume Data  Get visibility and control over the environment  Discover and explore data 3
  • 27. 31 31
  • 28. 32 32
  • 29. 33 33
  • 30. Leading Investment to Address the Challenges 34 Authentication First Hadoop distribution to offer strong authentication throughout Encryption First Hadoop distribution to support encryption on wire Audit Only Hadoop distribution to support audit histories for all data objects and access paths; Single point for log capture, audit Authorization Founded the Apache Sentry project along with Oracle and Lab41 to manage fine-grained permissions Automation Cloudera Manager automates security configurations & LDAP/AD integration
  • 31. Cloudera 5: Enabling the Enterprise Data Hub 35 Open Source Scalable Flexible Cost-Effective ✔ Managed ✖ Open Architecture ✖ Secure and Governed ✖ ✔ ✔ ✔ 3RD PARTY APPS STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE CLOUDERA’S ENTERPRISE DATA HUB BATCH PROCESSING MAPREDUCE ANALYTIC SQL IMPALA SEARCH ENGINE SOLR MACHINE LEARNING SPARK STREAM PROCESSING SPARK STREAMING WORKLOAD MANAGEMENT YARN FILESYSTEM HDFS ONLINE NOSQL HBASE DATA MANAGEMENT CLOUDERANAVIGATOR SYSTEM MANAGEMENT CLOUDERAMANAGER SENTRY