SlideShare a Scribd company logo
1 of 30
Curb Your Insecurity with HDP
Tips for a Secure Cluster (with Spark too)
Hadoop Summit – San Jose
June 29th, 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pardeep Kumar
Sr. Systems Architect, NA Prof. Services
4+ years in Hadoop
Helping Fortune500 customers succeed in
their Hadoop journey
Setup, implement, migrate and secure some
of the largest clusters in North America
Security, & Migration SME, HCC Guru
Loves Hadoop, Cricket and Kerberos ;)
pardeep.kumar@hortonworks.com
@hadooptutor
linkedin.com/in/pardeepkumarmishra
Ancil McBarnett
Sr. Solutions Engineer, NorthEast
Helping organizations design, implement,
operate and consume Hadoop and Big Data
Solutions. Specialize in Security and Hive
Tuning. HCC Guru.
Loves Cricket, and DJ Bravo Champion :D
amcbarnett@hortonworks.com
@mcbkingdom
linkedin.com/in/mcbkingdom
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Security in 4 Steps
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How do I set policy across the entire cluster?
Who am I/prove it?
What can I do?
What did I do?
How can I encrypt at rest and over the wire?
Comprehensive Approach to Security
Data Protection
Protect data at rest and in motion
In order to protect any data system you must implement the following:
Audit
Maintain a record of data access
Authorization
Provision access to data
Authentication
Authenticate users and systems
Administration
Central management and consistent security
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security: Comprehensive, Complete, Extensible
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Authentication
• LDAP/ AD - Kerberos
Data Protection
• Encrypts data in motion and data at rest;
refer partner encryption solutions for broader
needs: HDFS TDE with Ranger KMS
Authorization & Audit
• Consistent authorization controls
across all Apache components within
HDP: Apache Ranger
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication with Kerberos
Kerberos is necessary evil, just do it!!
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security Without Kerberos
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure Kerberos – Ambari Wizard
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security With Kerberos
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS File Security
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive Database and Table Security
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization and Audit
Authorization
Fine grain access control
• HDFS – Folder, File
• Hive – Database, Table, Column
• HBase – Table, Column Family, Column
• Storm, Knox and more
Audit
Extensive user access auditing in
HDFS, Hive and HBase
• IP Address
• Resource type/ resource
• Timestamp
• Access granted or denied
Control access
into system
Flexibility
in defining
policies
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Rest API Security with Apache Knox
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST APIs
Useful for connecting to Hadoop from the outside the cluster
When more client language flexibility is required
– i.e. Java binding not an option
Challenges
– Client must have knowledge of cluster topology
– Required to open ports (and in some cases, on every host) outside the cluster
Service API
WebHDFS Supports HDFS user operations including reading files, writing to
files, making directories, changing permissions and renaming.
WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL
commands. Learn more about WebHCat.
Hive Hive REST API operations
HBase HBase REST API operations
Oozie Job submission and management, and Oozie administration.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication—API Security with Knox
• Eliminates SSH “edge node”
• Central API management
• Central audit control
• Service level authorization
• SSO Integration—Siteminder
and OAM
• LDAP and AD integration
Incubated and led by Hortonworks,
Apache Knox extends the reach of Hadoop REST API
without Kerberos complexities
Integrated with existing systems to
simplify identity maintenance
Single, simple point of access for a
cluster
Central controls ensure consistency
across one or more clusters
• Kerberos Encapsulation
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST API with Knox
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager
Masters could
be on many
different hosts
One hosts,
one port
Consistent
paths
SSL config at
one host
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop REST API Security: Drill-Down
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox Gateway
GW
GW
Firewall
Firewall
DMZ
LB
Edge
Node/Hadoo
p CLIs RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
Hadoop Cluster 2
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
HBase
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection
HDP allows you to apply data protection policy at
different layers across the Hadoop stack
Layer What? How ?
Storage and
Access
Encrypt data while it is at rest
HDFS Transparent Data Encryption, Partners,
Hbase encryption, OS level encrypt,
Transmission Encrypt data as it moves SSL, SASL, RPC
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Points of Communication
Page 22
WebHDFS
DataTransferProtocol
Nodes
M/R Shuffle
Client
1
2
4
RPC3
Nodes
DataTransfer2
JDBC/ODBC
3
Hadoop Cluster
RPC
4
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection - HDFS Encryption
DATA ACCESS
DATA MANAGEMENT
SECURITY PARTNERS
YARN
KeyProvider API
(partner integration point)
Key Management System (KMS)
Stateless Key Management
°
1
°
°
°
°
° °
° °
° °
° °
° N°
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°
HDFS
Encryption Zone
Encrypted
File
Encrypted
File
Encrypted
File
Encrypted
File
Encrypted
Files
Name
Node
HDFS
Client
HDFS
Client
• Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc.
• Hortonworks collaborating with partners to deliver enterprise scale
Key Management , deliver more choices to customers
• Open source KMS with Ranger
• Or Partner with commercial KMS solutions i.e. Voltage KMS
- Partner joint engineering resources
- Voltage Stateless Key Management integrated with KeyProvider API
Only HDP offers open
source and
commercial choices
for key managementOpen Source Key Management
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Transparent Data Encryption
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Securing Spark Deployments
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark - Authentication
Hadoop Cluster
Spark leverages Kerberos on
YARN
KDC
Use Spark ST,
submit Spark Job
Spark gets Namenode
(NN) service ticket
YARN launches
Spark Executors
using John Doe’s
identity
John
Doe
Spark AM
NN
Executor reads from HDFS
using John Doe’s
delegation token
kinit
1
2
3
4
5
6
7
Get Service Ticket
(ST) for Spark
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Spark – Authorization
YARN Cluster
A B C
KDC
Use Spark ST,
submit Spark Job
Get Namenode (NN)
service ticket
Executors
read from
HDFS
Client gets service
ticket for Spark
John
Doe
RangerCan John launch this job?
Can John read this file
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark – Channel Encryption - Example
Shuffle Data
Control/RPC
Shuffle
BlockTransfer
Read/Write
Data
FS – Broadcast,
File Download
spark.authenticate.enableSaslEncryption= true
spark.authenticate = true. Leverage YARN to distribute keys
Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS
NM > Ex leverages YARN based SSL
spark.ssl.enabled = true
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Gotchas with Spark Security
 Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop
– Forces STS to run as Hive user to read all data
– Reduces security
– Use SparkSQL via shell or programmatic API
– https://issues.apache.org/jira/browse/SPARK-5159
 SparkSQL – Granular security unavailable
– Ranger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and
Hive)
– Brings Row/Column level/Masking features to SparkSQL
 Spark + HBase with Kerberos
– Issue fixed in Spark 1.4 (Spark-6918)
 Spark Stream + Kafka + Kerberos + SSL
– Issues fixed in HDP 2.4.x
 Spark jobs > 72 Hours
– Kerberos token not renewed, fixed in Spark 1.5+
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions??

More Related Content

What's hot

Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 

What's hot (20)

Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 

Similar to Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...huguk
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopYahoo Developer Network
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016Adam Gibson
 

Similar to Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!! (20)

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!

  • 1. Curb Your Insecurity with HDP Tips for a Secure Cluster (with Spark too) Hadoop Summit – San Jose June 29th, 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Pardeep Kumar Sr. Systems Architect, NA Prof. Services 4+ years in Hadoop Helping Fortune500 customers succeed in their Hadoop journey Setup, implement, migrate and secure some of the largest clusters in North America Security, & Migration SME, HCC Guru Loves Hadoop, Cricket and Kerberos ;) pardeep.kumar@hortonworks.com @hadooptutor linkedin.com/in/pardeepkumarmishra Ancil McBarnett Sr. Solutions Engineer, NorthEast Helping organizations design, implement, operate and consume Hadoop and Big Data Solutions. Specialize in Security and Hive Tuning. HCC Guru. Loves Cricket, and DJ Bravo Champion :D amcbarnett@hortonworks.com @mcbkingdom linkedin.com/in/mcbkingdom
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Security in 4 Steps
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How do I set policy across the entire cluster? Who am I/prove it? What can I do? What did I do? How can I encrypt at rest and over the wire? Comprehensive Approach to Security Data Protection Protect data at rest and in motion In order to protect any data system you must implement the following: Audit Maintain a record of data access Authorization Provision access to data Authentication Authenticate users and systems Administration Central management and consistent security
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Security: Comprehensive, Complete, Extensible Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • LDAP/ AD - Kerberos Data Protection • Encrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS Authorization & Audit • Consistent authorization controls across all Apache components within HDP: Apache Ranger
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication with Kerberos Kerberos is necessary evil, just do it!!
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security Without Kerberos
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure Kerberos – Ambari Wizard
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security With Kerberos
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS File Security
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive Database and Table Security
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column • Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Control access into system Flexibility in defining policies
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Rest API Security with Apache Knox
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST APIs Useful for connecting to Hadoop from the outside the cluster When more client language flexibility is required – i.e. Java binding not an option Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations HBase HBase REST API operations Oozie Job submission and management, and Oozie administration.
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication—API Security with Knox • Eliminates SSH “edge node” • Central API management • Central audit control • Service level authorization • SSO Integration—Siteminder and OAM • LDAP and AD integration Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexities Integrated with existing systems to simplify identity maintenance Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters • Kerberos Encapsulation • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One hosts, one port Consistent paths SSL config at one host
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop REST API Security: Drill-Down REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hadoo p CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack Layer What? How ? Storage and Access Encrypt data while it is at rest HDFS Transparent Data Encryption, Partners, Hbase encryption, OS level encrypt, Transmission Encrypt data as it moves SSL, SASL, RPC
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Points of Communication Page 22 WebHDFS DataTransferProtocol Nodes M/R Shuffle Client 1 2 4 RPC3 Nodes DataTransfer2 JDBC/ODBC 3 Hadoop Cluster RPC 4
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection - HDFS Encryption DATA ACCESS DATA MANAGEMENT SECURITY PARTNERS YARN KeyProvider API (partner integration point) Key Management System (KMS) Stateless Key Management ° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS Encryption Zone Encrypted File Encrypted File Encrypted File Encrypted File Encrypted Files Name Node HDFS Client HDFS Client • Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc. • Hortonworks collaborating with partners to deliver enterprise scale Key Management , deliver more choices to customers • Open source KMS with Ranger • Or Partner with commercial KMS solutions i.e. Voltage KMS - Partner joint engineering resources - Voltage Stateless Key Management integrated with KeyProvider API Only HDP offers open source and commercial choices for key managementOpen Source Key Management
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Transparent Data Encryption
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Securing Spark Deployments
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark - Authentication Hadoop Cluster Spark leverages Kerberos on YARN KDC Use Spark ST, submit Spark Job Spark gets Namenode (NN) service ticket YARN launches Spark Executors using John Doe’s identity John Doe Spark AM NN Executor reads from HDFS using John Doe’s delegation token kinit 1 2 3 4 5 6 7 Get Service Ticket (ST) for Spark
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Spark – Authorization YARN Cluster A B C KDC Use Spark ST, submit Spark Job Get Namenode (NN) service ticket Executors read from HDFS Client gets service ticket for Spark John Doe RangerCan John launch this job? Can John read this file
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark – Channel Encryption - Example Shuffle Data Control/RPC Shuffle BlockTransfer Read/Write Data FS – Broadcast, File Download spark.authenticate.enableSaslEncryption= true spark.authenticate = true. Leverage YARN to distribute keys Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS NM > Ex leverages YARN based SSL spark.ssl.enabled = true
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Gotchas with Spark Security  Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop – Forces STS to run as Hive user to read all data – Reduces security – Use SparkSQL via shell or programmatic API – https://issues.apache.org/jira/browse/SPARK-5159  SparkSQL – Granular security unavailable – Ranger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and Hive) – Brings Row/Column level/Masking features to SparkSQL  Spark + HBase with Kerberos – Issue fixed in Spark 1.4 (Spark-6918)  Spark Stream + Kafka + Kerberos + SSL – Issues fixed in HDP 2.4.x  Spark jobs > 72 Hours – Kerberos token not renewed, fixed in Spark 1.5+
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions??