SlideShare a Scribd company logo
1 of 24
HADOOP SECURITY FEATURES
That make your risk officer happy
By Anurag Shrivastava, ING Commercial Bank, Amsterdam
@shri2201
Security for Hadoop
Source: http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/
Hadoop Security Features 2
Hadoop in Enterprise
Data Lake – an important information assets for enterprise
Data from System of
Records and Logs are stored
in Hadoop
Significant cost
savings for
Enterprise
Diverse types of
users
Picture Source: http://arunkottolli.blogspot.nl/2014/03/understanding-data-in-big-data.html
Hadoop Security Features 3
Operational Security in Enterprise
• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Hadoop Security Features 4
User Access Management
Requirements
Privileged, group and generic accounts
Separation of technical and business users
Separation of environments (DTAP)
Separation of admins and other users
Separation of users in different business roles
Application of four eyes principle when entering or
changing the data
Hadoop Security Features 5
Security Event Monitoring
• Definition of application specific events
• All login attempts failed or successful
• Unauthorized attempt to access a table or file
• Operational performance of application
• Name node performance
• CPU, Disk
• Integration with Master Control Room
• Alerting the asset manager
Hadoop Security Features 6
Data Protection (1/2)
• Confidentiality
• Protect information from unauthorized
disclosure
• Integrity
• Ensure the accuracy, completeness and
timeliness of information and prevent data
tempering
• Availability
• Ensure that information and service is
available when required
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 7
Data Protection (2/2)
• Confidentiality
• Logon
• Access Control
• Malicious code protection
• Security Event Monitoring
• Encryption
• Integrity
• Message authentication code
• Data Lineage
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 8
Security under spotlight in Data Lake
• All kinds of enterprise data – structured,
semi-structured and unstructured
• Many groups of users – Data Scientists,
Analysts, Engineers, Marketers,
Managers
• Long term retention of data
• Different types of workloads
• Value of data grows as the data from
different sources are combined in Data
Lake
Picture source: http://beyondplm.com/2014/05/05/plm-downstream-usage-and-future-information-rivers/
Hadoop Security Features 9
Data Lake Risks
• Data Lake is an attractive target of inside and outside attackers
• Security compromise in Data Lake can have major or catastrophic
business impact
IT Risk assessment gives Hadoop implementation
the highest risk rating for Data Lake use case.
Hadoop Security Features 10
Lab Like Security is not Enough
Play Area Big Data Predictive Analytics
Lab
Production
System
Hadoop Security Features 11
Predictive Analytics Lab
Stepping Stone
(Citrix)
18 x Hadoop
Nodes
GIT, Libraries,
Build Tools
Monitoring
Services
Data Files in
Batches
Dedicated VLAN Shared ServicesShared Services
SMTP Relay
Internet via
Corporate
Infrastructure
Firewall Rules
Guard the
Perimeter
Security
Of Hadoop
Cluster
18 x Hadoop
Nodes
Lab like security works for a small group of people
Hadoop Security Features 12
Limitations of Hadoop
• No “Data at Rest” Encryption
• A Kerberos-Centric Approach
• Limited Authorization Capabilities
• Complexity of the Security Model and Configuration
Unfortunately this is not sufficient for Data Lake that ingests all the
data and caters to thousands of users.
Hadoop Security Features 13
Hadoop Security
Hadoop Security Solutions from Major Vendors
Hortonworks acquires XASecure to
bring ACLs in Hadoop
Apache Ranger
Apache Knox
Apache Falcon
Cloudera is working on Project Rhino Project Rhino
Apache Sentry
Hadoop Security Features 14
HDP-Apache Ranger
Hadoop Security Features 15
Apache Ranger
Apache Ranger currently supports authorization, auditing and security administration of limited
number of HDP components
Hive
HBase
Storm
Knox
HDFS
Hadoop Security Features 16
Apache Ranger Goals
1. Centralized security administration to manage all security related tasks in
a central UI or using REST APIs.
2. Fine grained authorization to do a specific action and/or operation with
Hadoop component/tool and managed through a central administration tool
3. Standardize authorization method across all Hadoop components.
4. Enhanced support for different authorization methods - Role based access
control, attribute based access control etc.
5. Centralize auditing of user access and administrative actions (security
related) within all the components of Hadoop.
Hadoop Security Features 17
Apache Knox and Hadoop Services
Hadoop Services
Covered
• WebHDFS (HDFS)
• Templeton
(HCatalog)
• Stargate (HBase)
• Oozie
• Hive/JDBC
Hadoop Security Features 18
Apache Falcon
• Visualize Data Pipeline Lineage
• Track Data Pipeline audit logs
• End to End Monitoring of Data
Pipeline
• Policies for Data Replication and
Retention
Hadoop Security Features 19
Apache Sentry and Project Rhino
Hadoop Security Features 20
Goals of Project Rhino
• Provide encryption with hardware-enhanced performance
• Support enterprise-grade authentication and single sign-on for
Hadoop services
• Provide role-based access control in Hadoop with cell-level
granularity in HBase
• Ensure consistent auditing across essential Apache Hadoop
components
Hadoop Security Features 21
Apache Sentry and Project Rhino
Hadoop Security Features 22
Making Risk Officer Happy
• Hadoop security has
more to offer
• Role based access
• Audit logging
• Data encryption
• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Overlapping efforts of vendors, Lack of complete coverage for all products,
Varying commitment to open source would slow down the adoption of Hadoop.
Hadoop Security Features 23
THANK YOU
Anurag Shrivastava
@shri2201

More Related Content

What's hot

Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosSarvesh Meena
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 

What's hot (20)

Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 

Viewers also liked

Report Card Kaas Plateau-Western Ghats
Report Card Kaas Plateau-Western Ghats Report Card Kaas Plateau-Western Ghats
Report Card Kaas Plateau-Western Ghats TERRE Policy Centre
 
BTCA - SHA - GHNP Ecotourism
BTCA - SHA - GHNP Ecotourism BTCA - SHA - GHNP Ecotourism
BTCA - SHA - GHNP Ecotourism btcaghnp
 
DNote Xpress, Issue #14 May 2015
DNote Xpress, Issue #14 May 2015DNote Xpress, Issue #14 May 2015
DNote Xpress, Issue #14 May 2015Fiinovation
 
Ecotourism in Great Himalayan National Park - Views of Ecotour Operator
Ecotourism in Great Himalayan National Park - Views of Ecotour OperatorEcotourism in Great Himalayan National Park - Views of Ecotour Operator
Ecotourism in Great Himalayan National Park - Views of Ecotour OperatorSunshine Himalayan Adventures
 
Snow Leopard Conservation: Tales From the Top of the World
Snow Leopard Conservation: Tales From the Top of the WorldSnow Leopard Conservation: Tales From the Top of the World
Snow Leopard Conservation: Tales From the Top of the WorldEdprograms
 
tourism projects of inda
tourism projects of indatourism projects of inda
tourism projects of indaNidhi Joshi
 
10 top ecotourism destinations in kerala (1)
10 top ecotourism destinations in kerala (1)10 top ecotourism destinations in kerala (1)
10 top ecotourism destinations in kerala (1)Kerian Jogg
 
Rhinos, Poaching & Conservation
Rhinos, Poaching & ConservationRhinos, Poaching & Conservation
Rhinos, Poaching & ConservationPeter Hammond
 
Conservation, distribution and types of forests and wildlife
Conservation, distribution and types of forests and wildlifeConservation, distribution and types of forests and wildlife
Conservation, distribution and types of forests and wildlifeMeghana Uppu
 
Govardhan eco village
Govardhan eco villageGovardhan eco village
Govardhan eco villageGRIHA India
 
Eco-tourism management in Rajaji National Park, Uttarakhand (India)
Eco-tourism management in Rajaji National Park, Uttarakhand (India)Eco-tourism management in Rajaji National Park, Uttarakhand (India)
Eco-tourism management in Rajaji National Park, Uttarakhand (India)BASIX
 
Biopiracy and its effect on Biodiversity
Biopiracy and its effect on BiodiversityBiopiracy and its effect on Biodiversity
Biopiracy and its effect on BiodiversityAnna K
 

Viewers also liked (20)

Report Card Kaas Plateau-Western Ghats
Report Card Kaas Plateau-Western Ghats Report Card Kaas Plateau-Western Ghats
Report Card Kaas Plateau-Western Ghats
 
BTCA - SHA - GHNP Ecotourism
BTCA - SHA - GHNP Ecotourism BTCA - SHA - GHNP Ecotourism
BTCA - SHA - GHNP Ecotourism
 
Btca
BtcaBtca
Btca
 
Btca
BtcaBtca
Btca
 
Wildlife crime rhino poaching
Wildlife crime rhino poachingWildlife crime rhino poaching
Wildlife crime rhino poaching
 
DNote Xpress, Issue #14 May 2015
DNote Xpress, Issue #14 May 2015DNote Xpress, Issue #14 May 2015
DNote Xpress, Issue #14 May 2015
 
Ecotourism - Himachal Pradesh
Ecotourism - Himachal PradeshEcotourism - Himachal Pradesh
Ecotourism - Himachal Pradesh
 
Ecotourism in Great Himalayan National Park - Views of Ecotour Operator
Ecotourism in Great Himalayan National Park - Views of Ecotour OperatorEcotourism in Great Himalayan National Park - Views of Ecotour Operator
Ecotourism in Great Himalayan National Park - Views of Ecotour Operator
 
Black rhino
Black rhinoBlack rhino
Black rhino
 
Snow Leopard Conservation: Tales From the Top of the World
Snow Leopard Conservation: Tales From the Top of the WorldSnow Leopard Conservation: Tales From the Top of the World
Snow Leopard Conservation: Tales From the Top of the World
 
tourism projects of inda
tourism projects of indatourism projects of inda
tourism projects of inda
 
10 top ecotourism destinations in kerala (1)
10 top ecotourism destinations in kerala (1)10 top ecotourism destinations in kerala (1)
10 top ecotourism destinations in kerala (1)
 
Rhinos, Poaching & Conservation
Rhinos, Poaching & ConservationRhinos, Poaching & Conservation
Rhinos, Poaching & Conservation
 
Conservation, distribution and types of forests and wildlife
Conservation, distribution and types of forests and wildlifeConservation, distribution and types of forests and wildlife
Conservation, distribution and types of forests and wildlife
 
Govardhan eco village
Govardhan eco villageGovardhan eco village
Govardhan eco village
 
Eco-tourism management in Rajaji National Park, Uttarakhand (India)
Eco-tourism management in Rajaji National Park, Uttarakhand (India)Eco-tourism management in Rajaji National Park, Uttarakhand (India)
Eco-tourism management in Rajaji National Park, Uttarakhand (India)
 
Biopiracy and its effect on Biodiversity
Biopiracy and its effect on BiodiversityBiopiracy and its effect on Biodiversity
Biopiracy and its effect on Biodiversity
 
Indian Rhino
Indian RhinoIndian Rhino
Indian Rhino
 
Ecotourism
EcotourismEcotourism
Ecotourism
 
Maharashtra at a glance
Maharashtra at a glanceMaharashtra at a glance
Maharashtra at a glance
 

Similar to Hadoop Security Features that make your risk officer happy

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Wadoop vivek shrivastava
Wadoop vivek shrivastavaWadoop vivek shrivastava
Wadoop vivek shrivastavaData Con LA
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Cloudera, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 

Similar to Hadoop Security Features that make your risk officer happy (20)

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Wadoop vivek shrivastava
Wadoop vivek shrivastavaWadoop vivek shrivastava
Wadoop vivek shrivastava
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Hadoop Security Features that make your risk officer happy

  • 1. HADOOP SECURITY FEATURES That make your risk officer happy By Anurag Shrivastava, ING Commercial Bank, Amsterdam @shri2201
  • 2. Security for Hadoop Source: http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/ Hadoop Security Features 2
  • 3. Hadoop in Enterprise Data Lake – an important information assets for enterprise Data from System of Records and Logs are stored in Hadoop Significant cost savings for Enterprise Diverse types of users Picture Source: http://arunkottolli.blogspot.nl/2014/03/understanding-data-in-big-data.html Hadoop Security Features 3
  • 4. Operational Security in Enterprise • User Access Management • Security Event Monitoring • Application State Monitoring • Security Testing • Patch Management • Data Protection • Backup and restore Hadoop Security Features 4
  • 5. User Access Management Requirements Privileged, group and generic accounts Separation of technical and business users Separation of environments (DTAP) Separation of admins and other users Separation of users in different business roles Application of four eyes principle when entering or changing the data Hadoop Security Features 5
  • 6. Security Event Monitoring • Definition of application specific events • All login attempts failed or successful • Unauthorized attempt to access a table or file • Operational performance of application • Name node performance • CPU, Disk • Integration with Master Control Room • Alerting the asset manager Hadoop Security Features 6
  • 7. Data Protection (1/2) • Confidentiality • Protect information from unauthorized disclosure • Integrity • Ensure the accuracy, completeness and timeliness of information and prevent data tempering • Availability • Ensure that information and service is available when required Picture Source: http://www.attix5.co.uk/thought- leadership/why-data-protection-software- essential-good-nights-sleep Hadoop Security Features 7
  • 8. Data Protection (2/2) • Confidentiality • Logon • Access Control • Malicious code protection • Security Event Monitoring • Encryption • Integrity • Message authentication code • Data Lineage Picture Source: http://www.attix5.co.uk/thought- leadership/why-data-protection-software- essential-good-nights-sleep Hadoop Security Features 8
  • 9. Security under spotlight in Data Lake • All kinds of enterprise data – structured, semi-structured and unstructured • Many groups of users – Data Scientists, Analysts, Engineers, Marketers, Managers • Long term retention of data • Different types of workloads • Value of data grows as the data from different sources are combined in Data Lake Picture source: http://beyondplm.com/2014/05/05/plm-downstream-usage-and-future-information-rivers/ Hadoop Security Features 9
  • 10. Data Lake Risks • Data Lake is an attractive target of inside and outside attackers • Security compromise in Data Lake can have major or catastrophic business impact IT Risk assessment gives Hadoop implementation the highest risk rating for Data Lake use case. Hadoop Security Features 10
  • 11. Lab Like Security is not Enough Play Area Big Data Predictive Analytics Lab Production System Hadoop Security Features 11
  • 12. Predictive Analytics Lab Stepping Stone (Citrix) 18 x Hadoop Nodes GIT, Libraries, Build Tools Monitoring Services Data Files in Batches Dedicated VLAN Shared ServicesShared Services SMTP Relay Internet via Corporate Infrastructure Firewall Rules Guard the Perimeter Security Of Hadoop Cluster 18 x Hadoop Nodes Lab like security works for a small group of people Hadoop Security Features 12
  • 13. Limitations of Hadoop • No “Data at Rest” Encryption • A Kerberos-Centric Approach • Limited Authorization Capabilities • Complexity of the Security Model and Configuration Unfortunately this is not sufficient for Data Lake that ingests all the data and caters to thousands of users. Hadoop Security Features 13
  • 14. Hadoop Security Hadoop Security Solutions from Major Vendors Hortonworks acquires XASecure to bring ACLs in Hadoop Apache Ranger Apache Knox Apache Falcon Cloudera is working on Project Rhino Project Rhino Apache Sentry Hadoop Security Features 14
  • 16. Apache Ranger Apache Ranger currently supports authorization, auditing and security administration of limited number of HDP components Hive HBase Storm Knox HDFS Hadoop Security Features 16
  • 17. Apache Ranger Goals 1. Centralized security administration to manage all security related tasks in a central UI or using REST APIs. 2. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool 3. Standardize authorization method across all Hadoop components. 4. Enhanced support for different authorization methods - Role based access control, attribute based access control etc. 5. Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop. Hadoop Security Features 17
  • 18. Apache Knox and Hadoop Services Hadoop Services Covered • WebHDFS (HDFS) • Templeton (HCatalog) • Stargate (HBase) • Oozie • Hive/JDBC Hadoop Security Features 18
  • 19. Apache Falcon • Visualize Data Pipeline Lineage • Track Data Pipeline audit logs • End to End Monitoring of Data Pipeline • Policies for Data Replication and Retention Hadoop Security Features 19
  • 20. Apache Sentry and Project Rhino Hadoop Security Features 20
  • 21. Goals of Project Rhino • Provide encryption with hardware-enhanced performance • Support enterprise-grade authentication and single sign-on for Hadoop services • Provide role-based access control in Hadoop with cell-level granularity in HBase • Ensure consistent auditing across essential Apache Hadoop components Hadoop Security Features 21
  • 22. Apache Sentry and Project Rhino Hadoop Security Features 22
  • 23. Making Risk Officer Happy • Hadoop security has more to offer • Role based access • Audit logging • Data encryption • User Access Management • Security Event Monitoring • Application State Monitoring • Security Testing • Patch Management • Data Protection • Backup and restore Overlapping efforts of vendors, Lack of complete coverage for all products, Varying commitment to open source would slow down the adoption of Hadoop. Hadoop Security Features 23

Editor's Notes

  1. Ask a question about the biggest data security breaches. Target 40 million debit/credit card number stolen Sony Online 102 million records Home Depot 56 million payment cards Hadoop Security was completely ineffective APT is real..
  2. We are bunch of people very excited about the technology when we hear about Hadoop. However when it comes to security the it seems that nobody is bothered about it except risk officer. This creates some tension between IT, business and risk. Technology has not kept up with marketing.
  3. All sweet marketing and enterprise sales guys sell Hadoop as the right system for enterprise. Hadoop becomes the important information system assets in the enterprise Enterprises find Hadoop attractive because of lower cost Hadoop analytics is not limited to web logs alone but also data stores in system of records Hadoop caters to diverse group of business and technical users I see a paradox here. A system for enterprise where CIO do not bother about the security.
  4. State monitoring is about monitoring the application settings. Security testing involves static and dynamic code scans Patch management requires patch history is maintained, systems are tested after patching, deciding which patch is appropriate for the system Backup frequency, logging of restore activity, incomplete backups are detected and safe storage of backup as per CIA rating
  5. Typical requirements of user access management are explained. Role based access.
  6. You can use several techniques to convince your risk officer about data protection. However as you bring all the data in data lake, you have to take all the measures.
  7. A very important Hadoop use case (Data Lake) puts the Hadoop security story under hard test.. Multitenancy A beautiful house without door locks..
  8. Multi tenancy, workload segregation User separation Sanitized hadoop cluster does not work
  9. Peripheral security with stepping stone has its limitations. We had to implement two factor authentication. Put Hadoop team in sanitized area. Hadoop provides all or nothing model for security. Relied heavily upon file system security
  10. 1. No “Data at Rest” Encryption. Currently, data is not encrypted at rest on HDFS. For organizations with strict security requirements related to the encryption of their data in Hadoop clusters, they are forced to use third-party tools for implementing HDFS disk-level encryption, or security-enhanced Hadoop distributions (like Intel’s distribution from earlier this year). 2. A Kerberos-Centric Approach – Hadoop security relies on Kerberos for authentication. For organizations utilizing other approaches not involving Kerberos, this means setting up a separate authentication system in the enterprise. 3. Limited Authorization Capabilities – Although Hadoop can be configured to perform authorization based on user and group permissions and Access Control Lists (ACLs), this may not be enough for every organization. Many organizations use flexible and dynamic access control policies based on XACML and Attribute-Based Access Control. Although it is certainly possible to perform these level of authorization filters using Accumulo, Hadoop’s authorization credentials are limited 4. Complexity of the Security Model and Configuration. There are a number of data flows involved in Hadoop authentication – Kerberos RPC authentication for applications and Hadoop Services, HTTP SPNEGO authentication for web consoles, and the use of delegation tokens, block tokens, and job tokens. For network encryption, there are also three encryption mechanisms that must be configured – Quality of Protection for SASL mechanisms, and SSL for web consoles, HDFS Data Transfer Encryption. All of these settings need to be separately configured – and it is easy to make mistakes. As the Wall Street Journal reported, Bank of New York Mellon Corp.’s Hadoop system bogged down after too many employees accessed it. Ms. Crisp is hedging her bets by maintaining the bank’s commercial database and data warehouse software.
  11. How Hadoop leaders have responded to these challenges. In addition to several proprietary initiatives which are not covered here.
  12. HDP 2.2 brings a major change in Hadoop security. Acquisition of XA secure has been significant in terms of user access management. Role based access for several components Logging Single console Not a single point of failure
  13. Apache ranger is very promising from the user access management perspective and security event monitoring perspective. But not all the hadoop components are covered Most security is geared toward the consumers of data.
  14. No 4 & 5 is a very promising feature..
  15. The following Hadoop services have integrations with the Knox Gateway: WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC
  16. Sentry: Unified authorization and RBAC. Overlap with Ranger Secure authorization Limited coverage: Hive and Impala Pluggable interfaces, binding with PIG Cloudera CDH 4.3
  17. Open source commitment of Cloudera is a big question mark? DG Secure alternative for HDP. Key distribution and management is included. Snapshots, log etc. can be encrypted. Crypto codecs. Integration with PKI infra in a large enterprise is a challenge..
  18. As compared to previous year, Hadoop security has lot more to offer but it is still far from being a complete system suited for Data Lake use cases. You have to mix and match the components which is hard. Ranger is strong in user access management and security monitoring. Rhino is strong is data protection. Hadoop is ready for the enterprise but still we are working on readiness..
  19. You can’t make risk officer very happy.. All kind of reason for not building the security: Performance, Architecture, You did not need it before. Time to improve it..