SlideShare uma empresa Scribd logo
1 de 24
© Hortonworks Inc. 2014
Hadoop Security Today &
Tomorrow
Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos
© Hortonworks Inc. 2014
Agenda
• What is Hadoop Security?
– 4 Security Pillars & Rings of Defense
• What security elements exists today?
– Authentication
– Authorization
– Audit
– Data Protection
• What is on the security roadmap?
– Coming soon
– Longer term projects
• Securing Hadoop with Apache Knox Gateway
– Knox overview
– Demo
• How to get involved
© Hortonworks Inc. 2014
Two Reasons for Security in Hadoop
Hadoop Contains Sensitive Data
–As Hadoop adoption grows so too has the types of data
organizations look to store. Often the data is proprietary
or personal and it must be protected.
–In this context, Hadoop is governed by the same
security requirements as any data center platform.
Hadoop is subject to Compliance adherence
–Organizations are often subject to comply with
regulations such as HIPPA, PCI DSS, FISAM that
require protection of personal information.
–Adherence to other Corporate security policies.
1
2
© Hortonworks Inc. 2014
What is Apache Hadoop Security?
Security in Apache Hadoop is
defined by four key pillars:
authentication, authorization, accou
ntability, and data protection.
© Hortonworks Inc. 2014
Security: Rings of Defense
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Data Protection
• Core Hadoop
• Partners
Authentication
• Kerberos
OS Security
Authorization
• MR ACLs
• HDFS Permissions
• HDFS ACLs
• HiveATZ-NG
• HBase ACLs
• Accumulo Label Security
Page 5
© Hortonworks Inc. 2014
Authentication in Hadoop Today…
Authentication
Who am I/prove it?
Control access to
cluster.
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
© Hortonworks Inc. 2014
Kerberos Authentication in Hadoop
For more than 20 years, Kerberos has been the de-facto
standard for strong authentication.
…no other option exists.
The design and implementation of Kerberos security in native Apache
Hadoop was delivered by Hortonworker Owen O’Malley in 2010.
What does Kerberos Do?
– Establishes identity for clients, hosts and services
– Prevents impersonation/passwords are never sent over the wire
– Integrates w/ enterprise identity management tools such as LDAP & Active Directory
– More granular auditing of data access/job execution
© Hortonworks Inc. 2014
• Single Hadoop
access point
• REST API hierarchy
• Consolidated API
calls
• Multi-cluster
support
• Eliminates SSH
“edge node”
• Central API
management
• Central audit control
• Simple Service
level Authorization
• SSO Integration –
Siteminder, API
Key*, OAuth* &
SAML*
• LDAP & AD
integration
Perimeter Security with Apache Knox
Integrated with
existing systems to
simplify identity
maintenance
Incubated and led by Hortonworks,
Apache Knox provides a simple and open
framework for Hadoop perimeter security.
Single, simple point
of access for a
cluster
Central controls
ensure consistency
across one or more
clusters
© Hortonworks Inc. 2014
Authentication & Audit in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Authorization: Who can do what in Hadoop?
• Access Control Services exist for each of the Hadoop
components
–HDFS has file Permissions
–YARN, MapReduce, HBase has Access Control Lists (ACL)
–Accumulo Proves more granular label/cell level security
• Improvements to these services are being led by
Hortonworks Team:
–HDFS Improvements – Extended ACL, more flexible via multiple
policies on the same file or directory
–Hive Improvements – Hortonworks initiative called Hive ATZ-
NG, better integration allows familiar SQL/database syntax
(GRANT/REVOKE) and allows more clients (including partner
integrations) to be secure.
© Hortonworks Inc. 2014
Data Protection in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Wire encryption
in native Apache
Hadoop
Orchestrated
encryption with
3rd party tools
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Data Protection in Hadoop
must be applied at three different
layers in Apache Hadoop
Storage: encrypt data while it is at rest
Direct data flows “into” and “out of” 3rd party encryption tools and/or
rely upon hardware specific techniques (i.e. drive-level encryption).
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Direct data flows “into” and “out of” 3rd party encryption tools.
Data Protection
© Hortonworks Inc. 2014
Data Protection – Details - Today
• Encryption of Data at Rest
–Option 1: OS or Hardware Level Encryption (Out of the Box)
–Option 2: Custom Development
–Option 3: Certified Partners
–Work underway for encryption in Hive, HDFS and HBase as core
platform capabilities.
• Encryption of Data on the Wire
–All wire protocols can be encrypted by HDP platform (2.x). Wire-level
encryption enhancements led by HWX Team.
• Column Level Encryption
–No current out of the box support in Hadoop.
–Certified Partners provide these capabilities.
© Hortonworks Inc. 2014
What can be done today?
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in
native Apache
Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Service level Authorization with
Knox
Access Audit with Knox
Wire encryption
in native Apache
Hadoop
Wire Encryption
with Knox
Orchestrated
encryption with
3rd party tools
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Hadoop Security
Hortonworks is Delivering Secure Hadoop for the Enterprise
Security for Hadoop must be addressed within
every layer of the stack and integrated into existing frameworks
For a full description of what is available in Enterprise Hadoop
today across Authentication, Authorization, Accountability and
Data Protection please visit our security labs page
Governance
&Integration
Security
Operations
Data Access
Data
Management
HDP 2.1
New: Apache Knox
Perimeter security for Hadoop
 A common place to preform authentication
across Hadoop and all related projects
 Integrated to LDAP and AD
 Currently supports:
WebHDFS, WebHCAT, Oozie, Hive & HBase
 Broad community effort, incubated with
Microsoft, broad set of developers involved
Security Investments
Security Phase 3:
• Audit event correlation and Audit viewer
• Data Encryption in HDFS, Hive & HBase
• Knox for HDFS HA, Ambari & Falcon
• Support Token-Based AuthN beyond Kerb
Security Phase 2:
• ACLs for HDFS
• Knox: Hadoop REST API Security
• SQL-style Hive AuthZ (GRANT, REVOKE)
• SSL support for Hive Server 2
• SSL for DN/NN UI & WebHDFS
• PAM support for Hive
Phase 1
• Strong AuthN with Kerberos
• HBase, Hive, HDFS basic AuthZ
• Encryption with SSL for NN, JT, etc.
• Wire encryption with Shuffle, HDFS, JDBC
© Hortonworks Inc. 2014
Hadoop Security: Phase 2
Page 16
HDP 2.1 Features
Release Theme REST API Security, Improve AuthZ, Wire Encryption
Specific Features • Hadoop REST API Security with Apache Knox
• Eliminates SSH edge node
• Single Hadoop access point
• LDAP, AD based Authentication
• Service-level Authorization
• Audit support for REST access
• SQL style Hive Authorization with fine grain access
• HDFS Access Control Lists
• SSL support in HiveServer2
• SSL support in NN/DN UI & WebHDFS
• Pluggable Authentication Module (PAM) in Hive
Included
Components
Apache Knox, Hive, HDFS
© Hortonworks Inc. 2014
Why Knox?
From fb.com/hadoopmemes
Apache Knox Gateway
• REST/HTTP API security for
Hadoop
• Eliminates SSH edge node
• Single REST API access point
• Centralized Authentication,
Authorization, and Audit for
Hadoop REST/HTTP services
• LDAP/AD Authentication,
Service Authorization, Audit etc.
Knox Eliminates
• Client’s requirements for intimate knowledge of cluster topology
© Hortonworks Inc. 2014
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master
Nodes
Rack 1
Switch
NN
SNN
….
Slave
Nodes
Rack 2
….
Slave
Nodes
Rack N
SwitchSwitch
DN DN
Web
Tier
LB
Knox
Hadoop
CLIs
© Hortonworks Inc. 2014
Hadoop REST API Security: Drill-Down
Page 19
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox
Gateway
GW
GW
Firewall
Firewall
DMZ
L
B
Edge
Node/H
adoop
CLIs
RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
Hadoop Cluster 2
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
HBase
© Hortonworks Inc. 2014
Selects appropriate
service filter chain
based on request URL
mapping rules
REST
Client
Protocol
Listener
Listens for requests on the
appropriate protocols
(e.g. HTTP/HTTPS)
Service
Selector
Service Specific Filter Chain
Identity
Asserter
Filter
Dispatch
Rewrite
Filter
AuthN
Filter
Hadoop
Service
Enforces propagation of
authenticated identity to Hadoop
by modifying request
Streams request and
response to and from
Hadoop service based
on rewritten URLs
Translates URLs in request and
response between external and
internal URLs based on service
specific rules
Enterprise
Identity
Provider
Enterprise/Cl
oud SSO
Provider
Challenges client for
credentials and authenticates
or validates SSO Token
Service filter chains are composed
and configured at deployment time
by service specific plugins
What is Knox? Client > Knox > Hadoop Cluster
Page 20
Knox Gateway
© Hortonworks Inc. 2014© Hortonworks Inc. 2014
Knox Gateway in action
Submit MR job via Knox
Page 21
© Hortonworks Inc. 2014
HDFS & MR Operations with Knox
• Create a few directories
curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777'
curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777"
curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777"
• Upload files
curl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib/hadoop-
examples.jar?op=CREATE
curl -iku guest:guest-password -X PUT -L -T README -X PUT
"https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/README?op=CREATE"
• Run MR job
curl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d
class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar
• Query the jobs for a user
curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue
• Query the status of a given job
curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id>
• Read the output file
curl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN
• Remove a directory
curl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"
Page 22
© Hortonworks Inc. 2014
How to get Involved
Page 23
Resource Location
Security Labs http://hortonworks.com/labs/security/
Security Blogs http://hortonworks.com/blog/category/innovation/security/
Apache Knox
Tutorial
http://hortonworks.com/hadoop-tutorial/securing-hadoop-
infrastructure-apache-knox/
Need help? http://hortonworks.com/community/forums/forum/security/ or
vshukla@hortonworks.com
© Hortonworks Inc. 2014 Page 24
Thank you! Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos

Mais conteúdo relacionado

Mais procurados

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

Mais procurados (20)

Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Spark etl
Spark etlSpark etl
Spark etl
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 

Destaque

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Destaque (20)

Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Semelhante a Hadoop Security Today & Tomorrow with Apache Knox

Semelhante a Hadoop Security Today & Tomorrow with Apache Knox (20)

August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 

Hadoop Security Today & Tomorrow with Apache Knox

  • 1. © Hortonworks Inc. 2014 Hadoop Security Today & Tomorrow Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos
  • 2. © Hortonworks Inc. 2014 Agenda • What is Hadoop Security? – 4 Security Pillars & Rings of Defense • What security elements exists today? – Authentication – Authorization – Audit – Data Protection • What is on the security roadmap? – Coming soon – Longer term projects • Securing Hadoop with Apache Knox Gateway – Knox overview – Demo • How to get involved
  • 3. © Hortonworks Inc. 2014 Two Reasons for Security in Hadoop Hadoop Contains Sensitive Data –As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. –In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence –Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information. –Adherence to other Corporate security policies. 1 2
  • 4. © Hortonworks Inc. 2014 What is Apache Hadoop Security? Security in Apache Hadoop is defined by four key pillars: authentication, authorization, accou ntability, and data protection.
  • 5. © Hortonworks Inc. 2014 Security: Rings of Defense Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Data Protection • Core Hadoop • Partners Authentication • Kerberos OS Security Authorization • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-NG • HBase ACLs • Accumulo Label Security Page 5
  • 6. © Hortonworks Inc. 2014 Authentication in Hadoop Today… Authentication Who am I/prove it? Control access to cluster. Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway
  • 7. © Hortonworks Inc. 2014 Kerberos Authentication in Hadoop For more than 20 years, Kerberos has been the de-facto standard for strong authentication. …no other option exists. The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen O’Malley in 2010. What does Kerberos Do? – Establishes identity for clients, hosts and services – Prevents impersonation/passwords are never sent over the wire – Integrates w/ enterprise identity management tools such as LDAP & Active Directory – More granular auditing of data access/job execution
  • 8. © Hortonworks Inc. 2014 • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support • Eliminates SSH “edge node” • Central API management • Central audit control • Simple Service level Authorization • SSO Integration – Siteminder, API Key*, OAuth* & SAML* • LDAP & AD integration Perimeter Security with Apache Knox Integrated with existing systems to simplify identity maintenance Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security. Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters
  • 9. © Hortonworks Inc. 2014 Authentication & Audit in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Authentication Who am I/prove it? Control access to cluster.
  • 10. © Hortonworks Inc. 2014 Authorization: Who can do what in Hadoop? • Access Control Services exist for each of the Hadoop components –HDFS has file Permissions –YARN, MapReduce, HBase has Access Control Lists (ACL) –Accumulo Proves more granular label/cell level security • Improvements to these services are being led by Hortonworks Team: –HDFS Improvements – Extended ACL, more flexible via multiple policies on the same file or directory –Hive Improvements – Hortonworks initiative called Hive ATZ- NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.
  • 11. © Hortonworks Inc. 2014 Data Protection in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • 12. © Hortonworks Inc. 2014 Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Direct data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption). Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Direct data flows “into” and “out of” 3rd party encryption tools. Data Protection
  • 13. © Hortonworks Inc. 2014 Data Protection – Details - Today • Encryption of Data at Rest –Option 1: OS or Hardware Level Encryption (Out of the Box) –Option 2: Custom Development –Option 3: Certified Partners –Work underway for encryption in Hive, HDFS and HBase as core platform capabilities. • Encryption of Data on the Wire –All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team. • Column Level Encryption –No current out of the box support in Hadoop. –Certified Partners provide these capabilities.
  • 14. © Hortonworks Inc. 2014 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Service level Authorization with Knox Access Audit with Knox Wire encryption in native Apache Hadoop Wire Encryption with Knox Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • 15. © Hortonworks Inc. 2014 Hadoop Security Hortonworks is Delivering Secure Hadoop for the Enterprise Security for Hadoop must be addressed within every layer of the stack and integrated into existing frameworks For a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page Governance &Integration Security Operations Data Access Data Management HDP 2.1 New: Apache Knox Perimeter security for Hadoop  A common place to preform authentication across Hadoop and all related projects  Integrated to LDAP and AD  Currently supports: WebHDFS, WebHCAT, Oozie, Hive & HBase  Broad community effort, incubated with Microsoft, broad set of developers involved Security Investments Security Phase 3: • Audit event correlation and Audit viewer • Data Encryption in HDFS, Hive & HBase • Knox for HDFS HA, Ambari & Falcon • Support Token-Based AuthN beyond Kerb Security Phase 2: • ACLs for HDFS • Knox: Hadoop REST API Security • SQL-style Hive AuthZ (GRANT, REVOKE) • SSL support for Hive Server 2 • SSL for DN/NN UI & WebHDFS • PAM support for Hive Phase 1 • Strong AuthN with Kerberos • HBase, Hive, HDFS basic AuthZ • Encryption with SSL for NN, JT, etc. • Wire encryption with Shuffle, HDFS, JDBC
  • 16. © Hortonworks Inc. 2014 Hadoop Security: Phase 2 Page 16 HDP 2.1 Features Release Theme REST API Security, Improve AuthZ, Wire Encryption Specific Features • Hadoop REST API Security with Apache Knox • Eliminates SSH edge node • Single Hadoop access point • LDAP, AD based Authentication • Service-level Authorization • Audit support for REST access • SQL style Hive Authorization with fine grain access • HDFS Access Control Lists • SSL support in HiveServer2 • SSL support in NN/DN UI & WebHDFS • Pluggable Authentication Module (PAM) in Hive Included Components Apache Knox, Hive, HDFS
  • 17. © Hortonworks Inc. 2014 Why Knox? From fb.com/hadoopmemes Apache Knox Gateway • REST/HTTP API security for Hadoop • Eliminates SSH edge node • Single REST API access point • Centralized Authentication, Authorization, and Audit for Hadoop REST/HTTP services • LDAP/AD Authentication, Service Authorization, Audit etc. Knox Eliminates • Client’s requirements for intimate knowledge of cluster topology
  • 18. © Hortonworks Inc. 2014 Knox Deployment with Hadoop Cluster Application Tier DMZ Switch Switch …. Master Nodes Rack 1 Switch NN SNN …. Slave Nodes Rack 2 …. Slave Nodes Rack N SwitchSwitch DN DN Web Tier LB Knox Hadoop CLIs
  • 19. © Hortonworks Inc. 2014 Hadoop REST API Security: Drill-Down Page 19 REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ L B Edge Node/H adoop CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  • 20. © Hortonworks Inc. 2014 Selects appropriate service filter chain based on request URL mapping rules REST Client Protocol Listener Listens for requests on the appropriate protocols (e.g. HTTP/HTTPS) Service Selector Service Specific Filter Chain Identity Asserter Filter Dispatch Rewrite Filter AuthN Filter Hadoop Service Enforces propagation of authenticated identity to Hadoop by modifying request Streams request and response to and from Hadoop service based on rewritten URLs Translates URLs in request and response between external and internal URLs based on service specific rules Enterprise Identity Provider Enterprise/Cl oud SSO Provider Challenges client for credentials and authenticates or validates SSO Token Service filter chains are composed and configured at deployment time by service specific plugins What is Knox? Client > Knox > Hadoop Cluster Page 20 Knox Gateway
  • 21. © Hortonworks Inc. 2014© Hortonworks Inc. 2014 Knox Gateway in action Submit MR job via Knox Page 21
  • 22. © Hortonworks Inc. 2014 HDFS & MR Operations with Knox • Create a few directories curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777' curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777" curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777" • Upload files curl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib/hadoop- examples.jar?op=CREATE curl -iku guest:guest-password -X PUT -L -T README -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/README?op=CREATE" • Run MR job curl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar • Query the jobs for a user curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue • Query the status of a given job curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id> • Read the output file curl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN • Remove a directory curl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true" Page 22
  • 23. © Hortonworks Inc. 2014 How to get Involved Page 23 Resource Location Security Labs http://hortonworks.com/labs/security/ Security Blogs http://hortonworks.com/blog/category/innovation/security/ Apache Knox Tutorial http://hortonworks.com/hadoop-tutorial/securing-hadoop- infrastructure-apache-knox/ Need help? http://hortonworks.com/community/forums/forum/security/ or vshukla@hortonworks.com
  • 24. © Hortonworks Inc. 2014 Page 24 Thank you! Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos

Notas do Editor

  1. BackgroundHortonworks led initiativeUseful for connecting to Hadoop from the outside the clusterWhen more client language flexibility is requiredi.e. Java binding not an optionNot intended for RPC callsCall it REST API Gateway for HadoopDon’t call it a firewallFirewalls are at the network layerDon’t call is perimeter securityPerimeter security is getting discredited as an incomplete security solution
  2. Node the arrows to Hadoop Cluster are simplificationsActually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) &amp; more in future
  3. Functions as HTTP reverse proxyRe-writes URLs to protect internal network topologyKnox Gateway embeds Jetty containerReads/Writes HTTP