SlideShare uma empresa Scribd logo
1 de 36
Securing Hadoop Data Lake 
Page 1 © Hortonworks Inc. 2014 
Hortonworks. We do Hadoop.
Agenda 
• Security Approach within Hadoop 
• Security Pillars 
• Workshops 
• Questions 
Page 2 © Hortonworks Inc. 2014
A Modern Data Architecture 
DATA SYSTEM APPLICATIONS 
RDBMS EDW MPP 
REPOSITORIES 
SOURCES 
Existing Sources 
(CRM, ERP, Clickstream, Logs) 
Page 3 © Hortonworks Inc. 2014 
Emerging Sources 
(Sensor, Sentiment, Geo, Unstructured) 
DEV & DATA 
TOOLS 
BUILD & TEST 
OPERATIONAL 
TOOLS 
MANAGE & 
MONITOR 
Business 
Analytics 
Custom Applications 
Packaged 
Applications 
Governance 
& Integration 
ENTERPRISE HADOOP 
Security 
Operations 
Data Access 
Data Management
Core Capabilities of Enterprise Hadoop 
Load data and 
manage according 
to policy 
Page 4 © Hortonworks Inc. 2014 
ENTERPRISE MGMT & SECURITY 
Deploy and 
effectively 
manage the 
platform 
PRESENTATION & APPLICATION 
DATA ACCESS SECURITY 
Access your data simultaneously in multiple ways 
(batch, interactive, real-time) Provide layered 
Store and process all of your Corporate Data Assets 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and Data 
Protection 
DATA MANAGEMENT 
GOVERNANCE & 
INTEGRATION 
OPERATIONS 
Enable both existing and new application to 
provide value to the organization 
Empower existing operations and 
security tools to manage Hadoop 
Provide deployment choice across physical, virtual, cloud 
DEPLOYMENT OPTIONS
Security needs are changing 
Administration 
Centrally management & 
consistent security 
Authentication 
Authenticate users and systems 
Authorization 
Provision access to data 
Audit 
Maintain a record of data access 
Data Protection 
Protect data at rest and in motion 
Page 5 © Hortonworks Inc. 2014 
Security needs are changing 
• YARN unlocks the data lake 
• Multi-tenant: Multiple applications for data 
access 
• Changing and complex compliance environment 
• Data classification 
Summer 2014 
65% of clusters host 
multiple workloads 
Fall 2013 
Largely silo’d deployments 
with single workload clusters
Security today in Hadoop with HDP 
Authentication 
Who am I/prove it? 
Page 6 © Hortonworks Inc. 2014 
Authorization 
Restrict access to 
explicit data 
Audit 
Understand who 
did what 
Data Protection 
Encrypt data at 
rest & in motion 
• Kerberos in native 
Apache Hadoop 
• HTTP/REST API 
Secured with 
Apache Knox 
Gateway 
• Wire encryption 
in Hadoop 
• Orchestrated 
encryption with 
partner tools 
• HDFS, Hive and 
Hbase (Storm 
and Knox in 2.2) 
• Fine grain 
access control 
• Centralized 
audit reporting 
• Policy and 
access history 
HDP 2.1 
Ranger 
Centralized Security Administration
Typical Flow – Hive Access through Beeline client 
Page 7 © Hortonworks Inc. 2014 
HDFS 
HiveServer 2 
A B C 
Beeline 
Client
Typical Flow – Authenticate through Kerberos 
Page 8 © Hortonworks Inc. 2014 
HDFS 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode 
(NN) service 
ticket 
Hive creates 
map reduce 
using NN ST 
Client gets 
service ticket for 
Hive 
Beeline 
Client
Typical Flow – Add Authorization through Ranger(XA 
Secure) 
Page 9 © Hortonworks Inc. 2014 
HDFS 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode 
(NN) service 
ticket 
Ranger 
Hive creates 
map reduce 
using NN ST 
Client gets 
service ticket for 
Hive 
Beeline 
Client
Page 10 © Hortonworks Inc. 2014 
HDFS 
Typical Flow – Firewall, Route through Knox 
Gateway 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode 
(NN) service 
ticket 
Ranger 
Hive creates 
map reduce 
using NN ST 
Knox runs as proxy 
user using Hive ST 
Knox gets 
service ticket for 
Hive 
Original 
request w/user 
id/password 
Client gets 
query result 
Beeline 
Client
SSL 
Page 11 © Hortonworks Inc. 2014 
HDFS 
Typical Flow – Add Wire and File Encryption 
SSL SSL 
HiveServer 2 
A B C 
KDC 
Use Hive ST, 
submit query 
Hive gets 
Namenode 
(NN) service 
ticket 
Ranger 
Hive creates 
map reduce 
using NN ST 
Knox runs as proxy 
user using Hive ST 
Knox gets 
service ticket for 
Hive 
Original 
request w/user 
id/password 
Client gets 
query result 
Beeline 
Client 
SSL SASL
Security Features 
Page 12 © Hortonworks Inc. 2014 
HDP Security 
Authentication 
Kerberos Support ✔ 
Perimeter Security – For services and rest API ✔ 
Authorizations 
Fine grained access control HDFS, Hbase and Hive, Storm 
and Knox (next release) 
Role base access control ✔ 
Column level ✔ 
Permission Support Create, Drop, Index, lock, user 
Auditing 
Resource access auditing Extensive Auditing 
Policy auditing ✔
Security Features 
Page 13 © Hortonworks Inc. 2014 
HDP Security 
Data Protection 
Wire Encryption ✔ 
Volume Encryption ✔ 
File/Column Encryption HDFS TDE & Partners 
Reporting 
Global view of policies and audit data ✔ 
Manage 
User/ Group mapping ✔ 
Global policy manager, Web UI ✔ 
Delegated administration ✔
Authorization and Auditing 
Apache Ranger 
Page 14 © Hortonworks Inc. 2014
Authorization and Audit 
Authorization 
Fine grain access control 
• HDFS – Folder, File 
• Hive – Database, Table, Column 
• HBase – Table, Column Family, Column 
Audit 
Extensive user access auditing in 
HDFS, Hive and HBase 
• IP Address 
• Resource type/ resource 
• Timestamp 
• Access granted or denied 
Page 15 © Hortonworks Inc. 2014 
Flexibility 
in defining 
policies 
Control 
access into 
system
Central Security Administration 
HDP Advanced Security 
• Delivers a ‘single pane of glass’ for 
the security administrator 
• Centralizes administration of 
security policy 
• Ensures consistent coverage across 
the entire Hadoop stack 
Page 16 © Hortonworks Inc. 2014
Setup Authorization Policies 
Page 17 © Hortonworks Inc. 2014 
file level 
access 
control, 
flexible 
definition 
Control 
permissions
Monitor through Auditing 
18 
Page 18 © Hortonworks Inc. 2014
Authorization and Auditing w/ Ranger 
Hadoop distributed 
file system (HDFS) 
Page 19 © Hortonworks Inc. 2014 
Ranger Administration Portal 
HBase 
Hive Server2 
Ranger Policy 
Server 
Ranger Audit 
Server 
Plugin 
Hadoop Components Enterprise 
Users 
Plugin 
Plugin 
Legacy 
Tools 
Integration API 
RDBMS 
HDFS 
Knox 
TBD 
Plugin 
Plugin 
Plugin* 
Storm 
* - Future Integration 
New features
Simplified Workflow - HDFS 
Users access HDFS data 
through application Name Node 
Page 20 © Hortonworks Inc. 2014 
XA Policy 
Manager 
XA Agent 
Admin sets policies for HDFS 
files/folder 
User 
Application 
Data scientist runs a 
map reduce job 
IT users access 
HDFS through 
CLI 
Namenode uses 
XA Agent for 
Authorization 
Audit 
Database Audit logs pushed to DB 
Namenode provides 
resource access to 
user/client 
1 
2 
2 
2 
3 
4 
5
Ranger Investments for HDP 2.2 
• New Components Coverage 
• Storm Authorization & Auditing 
• Knox Authorization & Auditing 
• Deeper Integration with HDP 
• Windows Support 
• Integration with Hive Auth API, support grant/revoke commands 
• Support grant/revoke commands in Hbase 
• Enterprise Readiness 
• Rest APIs for policy manager 
• Store Audit logs locally in HDFS 
• Support Oracle DB 
• Ambari support, as part of Ambari 2.0 release 
Page 21 © Hortonworks Inc. 2014
REST API Security through Knox 
Securely share Hadoop Cluster 
Page 22 © Hortonworks Inc. 2014
Hadoop REST API with Knox 
Service Direct URL Knox URL 
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs 
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton 
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie 
HBase http://hbasehost:60080 https://knox-host:8443/hbase 
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive 
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager 
Masters could 
be on many 
different hosts 
Page 23 © Hortonworks Inc. 2014 
One hosts, 
one port 
Consistent 
paths 
SSL config 
at one host
Why Knox? 
Simplified Access 
• Kerberos encapsulation 
• Extends API reach 
• Single access point 
• Multi-cluster support 
• Single SSL certificate 
Page 24 © Hortonworks Inc. 2014 
Centralized Control 
• Central REST API auditing 
• Service-level authorization 
• Alternative to SSH “edge node” 
Enterprise Integration 
• LDAP integration 
• Active Directory integration 
• SSO integration 
• Apache Shiro extensibility 
• Custom extensibility 
Enhanced Security 
• Protect network details 
• SSL for non-SSL services 
• WebApp vulnerability filter
Hadoop REST API Security: Drill-Down 
REST 
Client 
Page 25 © Hortonworks Inc. 2014 
Enterprise 
Identity 
Provider 
LDAP/AD 
Knox Gateway 
GW 
GW 
Firewall 
Firewall 
DMZ 
LB 
Edge 
Node/Hado 
op CLIs RPC 
HTTP 
HTTP HTTP 
LDAP 
Hadoop Cluster 1 
Masters 
Slaves 
NN 
RM 
Web 
HCat 
Oozie 
DN NM 
HBase 
HS2 
Hadoop Cluster 2 
Masters 
Slaves 
NN 
RM 
Web 
HCat 
Oozie 
DN NM 
HBase 
HS2
What’s New in Knox with HDP 2.2 
• Use Ambari for Install/start/stop/configuration 
• Knox support for HDFS HA 
• Support for YARN REST API 
• Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, 
Hive & Oozie) 
• Knox Management REST API 
• Integration with Ranger (fka XA Secure) to for Knox Service Level 
Authorization 
Page 26 © Hortonworks Inc. 2014
Workshop: Enabling Security 
Page 27 © Hortonworks Inc. 2014
Let’s Begin 
• We will use HDP Sandbox with FreeIPA Software Installed 
• FreeIPA is an integrated security information management solution combining Linux (Fedora), 389 
Directory Server, MIT Kerberos, NTP, DNS, Dogtag (Certificate System). It consists of a web 
interface and command-line administration tools 
• In the workshop we use FreeIPA for User Identity Management 
• Note: Steps outlined in the workshop are applicable for other identity management solutions such 
as Active Directory 
Page 28 © Hortonworks Inc. 2014
Authentication 
1. Create end users and groups in FreeIPA 
– End Users will query HDP via Hue, Beeline & JDBC/ODBC clients 
2. Enable Kerberos for the HDP Cluster 
– Hadoop now authenticates all access to the cluster 
3. Integrate Hue with FreeIPA 
– Users are validated against FreeIPA 
4. Configure Linux to use FreeIPA as central store of posix data using nslcd 
– Enables Hadoop to determine user groups without requiring a local linux user account 
We have now set Authentication 
– A user can open a shell, authenticate using kinit and submit hadoop commands or alternatively log into HUE to 
access Hadoop. 
Page 29 © Hortonworks Inc. 2014
Enable Perimeter Security 
1. KNOX Is Available on Sandbox 
– Enables Perimeter Security. Enables single point of cluster access using Hadoop REST APIs, JDBC and ODBC 
calls 
2. Configure KNOX to authenticate against FreeIPA 
3. Configure WebHDFS & Hiveserver2 to support JDBC/ODBC access over HTTP 
4. Use Excel to access Hive via KNOX 
– Note, Knox eliminates the need to secure Kerberos ticket on the client machine for user authentication 
We have now set Perimeter Security 
– Users can now access the cluster via the Gateway services 
Page 30 © Hortonworks Inc. 2014
Authorization & Audit 
1. Install Apache Ranger 
– Comprehensive authorization and audit tool for Hadoop 
2. Sync users between Apache Ranger and FreeIPA 
– Note, end users are only required to be maintained in one enterprise identity management system 
3. Configure HDFS & Hive to use Apache Ranger 
– In this workshop we will only show steps as it relates to hive authorization. Similar capabilities are available for 
other HDP components. 
4. Define HDFS & Hive Access Policy For Users 
– User “hive” is a special user and must be assigned universal access 
5. Log into Hue as the end user and note the authorization policies being enforced 
– Review Audit Information 
We have now set Authorization & Audit 
– All user access to a Hive is governed & audited by policies maintained in Apache Ranger. 
Page 31 © Hortonworks Inc. 2014
Encryption 
1. Wire Level Encryption 
– Follow instruction here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 
2.0.6.0/bk_reference/content/ch_wire6.html 
2. Volume Level Encryption 
– Leverage LUKS. Sample script provided 
3. Column level encryption & data masking 
– Collaborate with our key security partners 
Page 32 © Hortonworks Inc. 2014
Resources 
Page 33 © Hortonworks Inc. 2014
Security Page 
Page 34 © Hortonworks Inc. 2014
Hortonworks Security Investment Plans 
HDP + XA 
Comprehensive Security for Enterprise Hadoop 
Comprehensive Security 
Meet all security requirements across Authentication, 
Authorization, Audit & Data Protection for all HDP 
components 
Page 35 © Hortonworks Inc. 2014 
…all IN Hadoop 
Goals: 
Investment themes 
Central Administration 
Provide one location for administering security policies and 
audit reporting for entire platform 
Consistent Integration 
Integrate with other security & identity management systems, 
for compliance with IT policies 
Previous Phases 
 Kerberos Authentication 
 HDFS, Hive & Hbase authorization 
 Wire Encryption for data in motion 
 Knox for perimeter security 
 Basic Audit in HDFS & MR 
 SQL Style Hive Authorization 
 ACLs for HDFS 
XA Secure Phase 
• Centralized Security Admin for HDFS, Hive & 
HBase 
• Centralized Audit Reporting 
• Delegated Policy Administration 
Future Phases 
• Encryption in HDFS, Hive & Hbase 
• Centralized security administration of entire 
Hadoop platform 
• Centralized auditing of entire platform 
• Expand Authentication & SSO integration choices 
• Tag based global policies (e.g. Policy for PII)
Q&A 
Page 36 © Hortonworks Inc. 2014

Mais conteúdo relacionado

Mais procurados

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 

Mais procurados (20)

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
 
Hive ppt (1)
Hive ppt (1)Hive ppt (1)
Hive ppt (1)
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Une introduction à HBase
Une introduction à HBaseUne introduction à HBase
Une introduction à HBase
 
Introduction à la big data v3
Introduction à la big data v3 Introduction à la big data v3
Introduction à la big data v3
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 

Destaque

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Destaque (20)

Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 

Semelhante a Hdp security overview

Semelhante a Hdp security overview (20)

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 

Mais de Hortonworks

Mais de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Hdp security overview

  • 1. Securing Hadoop Data Lake Page 1 © Hortonworks Inc. 2014 Hortonworks. We do Hadoop.
  • 2. Agenda • Security Approach within Hadoop • Security Pillars • Workshops • Questions Page 2 © Hortonworks Inc. 2014
  • 3. A Modern Data Architecture DATA SYSTEM APPLICATIONS RDBMS EDW MPP REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) Page 3 © Hortonworks Inc. 2014 Emerging Sources (Sensor, Sentiment, Geo, Unstructured) DEV & DATA TOOLS BUILD & TEST OPERATIONAL TOOLS MANAGE & MONITOR Business Analytics Custom Applications Packaged Applications Governance & Integration ENTERPRISE HADOOP Security Operations Data Access Data Management
  • 4. Core Capabilities of Enterprise Hadoop Load data and manage according to policy Page 4 © Hortonworks Inc. 2014 ENTERPRISE MGMT & SECURITY Deploy and effectively manage the platform PRESENTATION & APPLICATION DATA ACCESS SECURITY Access your data simultaneously in multiple ways (batch, interactive, real-time) Provide layered Store and process all of your Corporate Data Assets approach to security through Authentication, Authorization, Accounting, and Data Protection DATA MANAGEMENT GOVERNANCE & INTEGRATION OPERATIONS Enable both existing and new application to provide value to the organization Empower existing operations and security tools to manage Hadoop Provide deployment choice across physical, virtual, cloud DEPLOYMENT OPTIONS
  • 5. Security needs are changing Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Page 5 © Hortonworks Inc. 2014 Security needs are changing • YARN unlocks the data lake • Multi-tenant: Multiple applications for data access • Changing and complex compliance environment • Data classification Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters
  • 6. Security today in Hadoop with HDP Authentication Who am I/prove it? Page 6 © Hortonworks Inc. 2014 Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway • Wire encryption in Hadoop • Orchestrated encryption with partner tools • HDFS, Hive and Hbase (Storm and Knox in 2.2) • Fine grain access control • Centralized audit reporting • Policy and access history HDP 2.1 Ranger Centralized Security Administration
  • 7. Typical Flow – Hive Access through Beeline client Page 7 © Hortonworks Inc. 2014 HDFS HiveServer 2 A B C Beeline Client
  • 8. Typical Flow – Authenticate through Kerberos Page 8 © Hortonworks Inc. 2014 HDFS HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Client gets service ticket for Hive Beeline Client
  • 9. Typical Flow – Add Authorization through Ranger(XA Secure) Page 9 © Hortonworks Inc. 2014 HDFS HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Ranger Hive creates map reduce using NN ST Client gets service ticket for Hive Beeline Client
  • 10. Page 10 © Hortonworks Inc. 2014 HDFS Typical Flow – Firewall, Route through Knox Gateway HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Ranger Hive creates map reduce using NN ST Knox runs as proxy user using Hive ST Knox gets service ticket for Hive Original request w/user id/password Client gets query result Beeline Client
  • 11. SSL Page 11 © Hortonworks Inc. 2014 HDFS Typical Flow – Add Wire and File Encryption SSL SSL HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Ranger Hive creates map reduce using NN ST Knox runs as proxy user using Hive ST Knox gets service ticket for Hive Original request w/user id/password Client gets query result Beeline Client SSL SASL
  • 12. Security Features Page 12 © Hortonworks Inc. 2014 HDP Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API ✔ Authorizations Fine grained access control HDFS, Hbase and Hive, Storm and Knox (next release) Role base access control ✔ Column level ✔ Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing ✔
  • 13. Security Features Page 13 © Hortonworks Inc. 2014 HDP Security Data Protection Wire Encryption ✔ Volume Encryption ✔ File/Column Encryption HDFS TDE & Partners Reporting Global view of policies and audit data ✔ Manage User/ Group mapping ✔ Global policy manager, Web UI ✔ Delegated administration ✔
  • 14. Authorization and Auditing Apache Ranger Page 14 © Hortonworks Inc. 2014
  • 15. Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Page 15 © Hortonworks Inc. 2014 Flexibility in defining policies Control access into system
  • 16. Central Security Administration HDP Advanced Security • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack Page 16 © Hortonworks Inc. 2014
  • 17. Setup Authorization Policies Page 17 © Hortonworks Inc. 2014 file level access control, flexible definition Control permissions
  • 18. Monitor through Auditing 18 Page 18 © Hortonworks Inc. 2014
  • 19. Authorization and Auditing w/ Ranger Hadoop distributed file system (HDFS) Page 19 © Hortonworks Inc. 2014 Ranger Administration Portal HBase Hive Server2 Ranger Policy Server Ranger Audit Server Plugin Hadoop Components Enterprise Users Plugin Plugin Legacy Tools Integration API RDBMS HDFS Knox TBD Plugin Plugin Plugin* Storm * - Future Integration New features
  • 20. Simplified Workflow - HDFS Users access HDFS data through application Name Node Page 20 © Hortonworks Inc. 2014 XA Policy Manager XA Agent Admin sets policies for HDFS files/folder User Application Data scientist runs a map reduce job IT users access HDFS through CLI Namenode uses XA Agent for Authorization Audit Database Audit logs pushed to DB Namenode provides resource access to user/client 1 2 2 2 3 4 5
  • 21. Ranger Investments for HDP 2.2 • New Components Coverage • Storm Authorization & Auditing • Knox Authorization & Auditing • Deeper Integration with HDP • Windows Support • Integration with Hive Auth API, support grant/revoke commands • Support grant/revoke commands in Hbase • Enterprise Readiness • Rest APIs for policy manager • Store Audit logs locally in HDFS • Support Oracle DB • Ambari support, as part of Ambari 2.0 release Page 21 © Hortonworks Inc. 2014
  • 22. REST API Security through Knox Securely share Hadoop Cluster Page 22 © Hortonworks Inc. 2014
  • 23. Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts Page 23 © Hortonworks Inc. 2014 One hosts, one port Consistent paths SSL config at one host
  • 24. Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Page 24 © Hortonworks Inc. 2014 Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • SSL for non-SSL services • WebApp vulnerability filter
  • 25. Hadoop REST API Security: Drill-Down REST Client Page 25 © Hortonworks Inc. 2014 Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hado op CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves NN RM Web HCat Oozie DN NM HBase HS2 Hadoop Cluster 2 Masters Slaves NN RM Web HCat Oozie DN NM HBase HS2
  • 26. What’s New in Knox with HDP 2.2 • Use Ambari for Install/start/stop/configuration • Knox support for HDFS HA • Support for YARN REST API • Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) • Knox Management REST API • Integration with Ranger (fka XA Secure) to for Knox Service Level Authorization Page 26 © Hortonworks Inc. 2014
  • 27. Workshop: Enabling Security Page 27 © Hortonworks Inc. 2014
  • 28. Let’s Begin • We will use HDP Sandbox with FreeIPA Software Installed • FreeIPA is an integrated security information management solution combining Linux (Fedora), 389 Directory Server, MIT Kerberos, NTP, DNS, Dogtag (Certificate System). It consists of a web interface and command-line administration tools • In the workshop we use FreeIPA for User Identity Management • Note: Steps outlined in the workshop are applicable for other identity management solutions such as Active Directory Page 28 © Hortonworks Inc. 2014
  • 29. Authentication 1. Create end users and groups in FreeIPA – End Users will query HDP via Hue, Beeline & JDBC/ODBC clients 2. Enable Kerberos for the HDP Cluster – Hadoop now authenticates all access to the cluster 3. Integrate Hue with FreeIPA – Users are validated against FreeIPA 4. Configure Linux to use FreeIPA as central store of posix data using nslcd – Enables Hadoop to determine user groups without requiring a local linux user account We have now set Authentication – A user can open a shell, authenticate using kinit and submit hadoop commands or alternatively log into HUE to access Hadoop. Page 29 © Hortonworks Inc. 2014
  • 30. Enable Perimeter Security 1. KNOX Is Available on Sandbox – Enables Perimeter Security. Enables single point of cluster access using Hadoop REST APIs, JDBC and ODBC calls 2. Configure KNOX to authenticate against FreeIPA 3. Configure WebHDFS & Hiveserver2 to support JDBC/ODBC access over HTTP 4. Use Excel to access Hive via KNOX – Note, Knox eliminates the need to secure Kerberos ticket on the client machine for user authentication We have now set Perimeter Security – Users can now access the cluster via the Gateway services Page 30 © Hortonworks Inc. 2014
  • 31. Authorization & Audit 1. Install Apache Ranger – Comprehensive authorization and audit tool for Hadoop 2. Sync users between Apache Ranger and FreeIPA – Note, end users are only required to be maintained in one enterprise identity management system 3. Configure HDFS & Hive to use Apache Ranger – In this workshop we will only show steps as it relates to hive authorization. Similar capabilities are available for other HDP components. 4. Define HDFS & Hive Access Policy For Users – User “hive” is a special user and must be assigned universal access 5. Log into Hue as the end user and note the authorization policies being enforced – Review Audit Information We have now set Authorization & Audit – All user access to a Hive is governed & audited by policies maintained in Apache Ranger. Page 31 © Hortonworks Inc. 2014
  • 32. Encryption 1. Wire Level Encryption – Follow instruction here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.0.6.0/bk_reference/content/ch_wire6.html 2. Volume Level Encryption – Leverage LUKS. Sample script provided 3. Column level encryption & data masking – Collaborate with our key security partners Page 32 © Hortonworks Inc. 2014
  • 33. Resources Page 33 © Hortonworks Inc. 2014
  • 34. Security Page Page 34 © Hortonworks Inc. 2014
  • 35. Hortonworks Security Investment Plans HDP + XA Comprehensive Security for Enterprise Hadoop Comprehensive Security Meet all security requirements across Authentication, Authorization, Audit & Data Protection for all HDP components Page 35 © Hortonworks Inc. 2014 …all IN Hadoop Goals: Investment themes Central Administration Provide one location for administering security policies and audit reporting for entire platform Consistent Integration Integrate with other security & identity management systems, for compliance with IT policies Previous Phases  Kerberos Authentication  HDFS, Hive & Hbase authorization  Wire Encryption for data in motion  Knox for perimeter security  Basic Audit in HDFS & MR  SQL Style Hive Authorization  ACLs for HDFS XA Secure Phase • Centralized Security Admin for HDFS, Hive & HBase • Centralized Audit Reporting • Delegated Policy Administration Future Phases • Encryption in HDFS, Hive & Hbase • Centralized security administration of entire Hadoop platform • Centralized auditing of entire platform • Expand Authentication & SSO integration choices • Tag based global policies (e.g. Policy for PII)
  • 36. Q&A Page 36 © Hortonworks Inc. 2014