O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Hadoop security

4.246 visualizações

Publicada em

This slide deck talks about Hadoop Security within the Hortonworks HDP and PHD 3.0 Platform

Publicada em: Software
  • Seja o primeiro a comentar

Hadoop security

  1. 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Security with HDP/PHD
  2. 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development or may be under development in the future. Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
  3. 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Agenda • Hadoop Security • Kerberos • Authorization and Auditing with Ranger • Gateway Security with Knox • Encryption
  4. 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Wire encryption in Hadoop • Native and partner encryption • Centralized audit reporting w/ Apache Ranger • Fine grain access control with Apache Ranger Security today in Hadoop with HDP/PHD Authorization What can I do? Audit What did I do? Data Protection Can data be encrypted at rest and over the wire? • Kerberos • API security with Apache Knox Authentication Who am I/prove it? HDPPHD Centralized Security Administration EnterpriseServices:Security
  5. 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security needs are changing Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Security needs are changing • YARN unlocks the data lake • Multi-tenant: Multiple applications for data access • Different kinds of data • Changing and complex compliance environment 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters
  6. 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Hive Access through Beeline client HiveServer 2 A B C Beeline Client
  7. 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Authenticate through Kerberos HiveServer 2 A B C KDC Use Hive Service T,icket submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN Service Ticket Client • Requests a TGT • Receives TGT • Client dcrypts it with the password hash • Sends the TGT and receives a Service Ticket Beeline Client
  8. 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Authorization through Ranger(XA Secure) HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Client gets service ticket for Hive Beeline Client
  9. 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Firewall, Route through Knox Gateway HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result Beeline Client Apache Knox
  10. 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Wire and File Encryption HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result SSL Beeline Client SSL SASL SSL SSL Apache Knox
  11. 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security Features PHD/HDP Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API ✔ Authorizations Fine grained access control HDFS, Hbase and Hive, Storm and Knox Role base access control ✔ Column level ✔ Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing ✔
  12. 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP/PHD Security w/ Ranger Data Protection Wire Encryption ✔ Volume Encryption TDE File/Column Encryption HDFS TDE & Partners Reporting Global view of policies and audit data ✔ Manage User/ Group mapping ✔ Global policy manager, Web UI ✔ Delegated administration ✔ Security Features
  13. 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Partner Integration Security Integrations: ● Ranger plugins: centralize authorization/audit of 3rd party s/w in Ranger UI ● Via Custom Log4J appender, can stream audit events to INFA infrastructure ● Knox: Route partner APIs through Knox after validating compatibility ● Provide SSO capability to end users
  14. 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authentication w/ Kerberos Page 14
  15. 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Kerberos in the field Kerberos no longer “too complex”. Adoption growing. ● Ambari helps automate and manage kerberos integration with cluster Use: Active directory or a combine Kerberos/Active Directory ● Active Directory is seen most commonly in the field ● Many start with separate MIT KDC and then later grow into the AD KDC Knox should be considered for API/Perimeter security ● Removes need for Kerberos for end users ● Enables integration with different authentication standards ● Single location to manage security for REST APIs & HTTP based services ● Tip: In DMZ
  16. 16. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Auditing Apache Ranger Page 22
  17. 17. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column • Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Control access into system Flexibility in defining policies
  18. 18. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Central Security Administration Apache Ranger • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack
  19. 19. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Setup Authorization Policies 25 file level access control, flexible definition Control permissions
  20. 20. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Monitor through Auditing 26
  21. 21. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Ranger Flow
  22. 22. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Auditing w/ Ranger HDFS Ranger Administration Portal HBase Hive Server2 Ranger Policy Server Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools & Data Governance Integration APIHDFS Knox Storm Ranger Plugin Ranger Plugin RDBMS HDP 2.2 Additions Planned for 2015 TBD EnterpriseServices:Security Ranger Plugin*
  23. 23. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Installation Steps • Install PHD 3.0 • Install Apache Ranger (https://tinyurl.com/mlgs3jy) – Install Policy Manager – Install User Sync – Install Ranger Plugins • Start Policy Manager – service ranger-admin start • Verify – http://<host>:6080/ - admin/admin
  24. 24. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Ranger Plugins • HDFS • HIVE • KNOX • STORM • HBASE Steps to Enable plugins 1. Start the Policy Manager 2. Create the Plugin repository in the Policy Manager 3. Install the Plugin • Edit the install.properties • Execue ./enable-<plugin>.sh 4. Restart the plugin service (e.g. HDFS, Hive etc)
  25. 25. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Ranger Console 31 • The Repository Manager Tab • The Policy Manager Tab • The User/Group Tab • The Analytics Tab • The Audit Tab
  26. 26. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Repository Manager 32 • Add New Repository • Edit Repository • Delete Repository
  27. 27. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo 33
  28. 28. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved REST API Security through Knox Securely share Hadoop Cluster Page 34
  29. 29. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Share Data Lake with everyone - Securely • Simplifies access: Extends Hadoop’s REST/HTTP services by encapsulating Kerberos to within the Cluster. • Enhances security: Exposes Hadoop’s REST/HTTP services without revealing network details, providing SSL out of the box. • Centralized control: Enforces REST API security centrally, routing requests to multiple Hadoop clusters. • Enterprise integration: Supports LDAP, Active Directory, SSO, SAML and other authentication systems.
  30. 30. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Knox Knox can be used with both unsecured Hadoop clusters, and Kerberos secured clusters. In an enterprise solution that employs Kerberos secured clusters, the Apache Knox Gateway provides an enterprise security solution that: • Integrates well with enterprise identity management solutions • Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from end users) • Simplifies the number of services with which a client needs to interact
  31. 31. Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Load Balancer Extend Hadoop API reach with Knox Hadoop Cluster Application TierApp A App NApp B App C Data Ingest ETL Admin/ Operators Bastian Node SSH RPC Call Falcon Oozie Scoop Flume Data Operator Business User Hadoop Admin JDBC/ODBCREST/HTTP Knox
  32. 32. Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Wire and File Encryption HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result SSL Beeline Client SSL SASL SSL SSL Apache Knox
  33. 33. Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • SSL for non-SSL services • WebApp vulnerability filter
  34. 34. Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One hosts, one port Consistent paths SSL config at one host
  35. 35. Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop REST API Security: Drill-Down Page 41 REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hado op CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  36. 36. Page42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Knox –features in PHD • Use Ambari for Install/start/stop/configuration • Knox support for HDFS HA • Support for YARN REST API • Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) • Integration with Ranger for Knox Service Level Authorization • Knox Management REST API
  37. 37. Page43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Installation • Installed via Ambari –This can be done manually –Start the embeded ldap • There is good examples in the Apache doc with groovy scripts –https://knox.apache.org/books/knox-0-4-0/knox-0-4-0.html
  38. 38. Page44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Protection Wire and data at rest encryption Page 44
  39. 39. Page45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack Layer What? How ? Storage and Access Encrypt data while it is at rest Partners, HDFS Tech Preview, Hbase encryption, OS level encrypt, Transmission Encrypt data as it moves Supported from HDP 2.1
  40. 40. Page49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) in 2.2 • Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop • End-to-end: data can be both encrypted and decrypted by the clients • Encryption/decryption using the usual HDFS functions from the client • No need to requiring to change user application code • No need to store data encryption keys on HDFS itself • No need to unencrypted data. • Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted. • HDFS file encryption/decryption is transparent to its client • users can read/write files to/from encryption zone as long they have the permission to access it • Depends on installing a Key Management Server
  41. 41. Page53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) in 2.2 • Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop • End-to-end: data can be both encrypted and decrypted by the clients • Encryption/decryption using the usual HDFS functions from the client • No need to requiring to change user application code • No need to store data encryption keys on HDFS itself • No need to unencrypted data. • Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted. • HDFS file encryption/decryption is transparent to its client • users can read/write files to/from encryption zone as long they have the permission to access it • Depends on installing a Key Management Server
  42. 42. Page54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) - Steps • Install and run KMS on top of HDP 2.2 • Change HDFS params via Ambari • Create encryption key • hadoop key create key1 -size 256 • hadoop key list –metadata • Create an encryption zone using the key • hdfs dfs -mkdir /zone1 • hdfs crypto -createZone -keyName key1 /zone1 • hdfs –listZones – http://hortonworks.com/kb/hdfs-transparent-data-encryption/
  43. 43. Page55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You

×