Mais conteúdo relacionado

Apresentações para você(20)

Similar a Hadoop Security and Compliance - StampedeCon 2016(20)


Mais de StampedeCon(20)


Hadoop Security and Compliance - StampedeCon 2016

  1. July 27th, 2016 Hadoop Security and Compliance Derek Sun Big Data Architect StampedeCon 2016
  2. ©2016 MasterCard. Proprietary and Confidential2 • Securing the infrastructure • Authentication • Authorization • Data Security • Auditing • Governance & Metadata management • Masking & Redaction • Compliant procedures Today’s Discussion Hadoop Security Governance & Compliance
  3. ©2016 MasterCard. Proprietary and Confidential3 Hadoop Security – Securing the infrastructure • On-premise deployment & network segmentation • Firewalls, e.g., corporate & host firewalls • Intrusion detection & prevention, such as Open Network Insights (ONI) • Up-to-date OS security patches, e.g., twice a year • Hadoop services separation 1. Cores on master nodes, e.g., Yarn, ZK, NameNode … 2. Worker services on data nodes, e.g., NodeManager, DataNode, Hbase RegionServer … 3. Edge Nodes for authorized regular users, e.g., SSH, NFS gateway, Sqoop, HUE, Oozie, Thrift/REST servers, users apps …
  4. ©2016 MasterCard. Proprietary and Confidential4 Hadoop Security – Authentication • KERBEROS 1. What is Kerberos? 2. User & service principals • Every user and service that participates in the Kerberos authentication protocol requires a principal to uniquely identify itself 3. Realm, across Realm trust, e.g., one-way trust from MIT realm to Windows AD domain • A Kerberos realm is an authentication administrative domain. All principals are assigned to a specific Kerberos realm 4. MIT KDC vs AD KDC I. Locally vs centrally managed principals II. Flat vs more structural, e.g., use sub OU for each cluster III. Maintained by Hadoop admin vs Corporate IT IV. Potential single point of failure vs a set of AD domain controllers
  5. ©2016 MasterCard. Proprietary and Confidential5 Hadoop Security – Authorization • File systems based - Local & HDFS, e.g., putting administrators in dfs.permissions.superusergroup in Hadoop-policy.xml or UIs – CM/Ambari • Service based 1. Set up ACLs in hadoop-policy.xml for HDFS, MR1, Yarn, ZK, Oozie … 2. Setup permissions at different levels for HBase, e.g., table, column or cell level … • Role based, Apache Sentry, Apache Ranger, Apache KNOX  Make sure to bypass some services accounts, such as in Hive, you should bypass Hive, Impala, HDFS, and HUE users
  6. ©2016 MasterCard. Proprietary and Confidential6 Hadoop Security – Data Security • Encryption At-Rest • Native HDFS encryption, Encryption Zone, EZK, DEK & EDEK, Key Management Server (KMS) & Key Storage Server
  7. ©2016 MasterCard. Proprietary and Confidential7 Hadoop Security – Data Security • Encryption At-Rest Continue • MapReduce intermediate data • Services, e.g., Hive(user/hive), SOLR(/user/solr), HUE(user/hue), Yarn History(user/history), Spark/Impala shuffle & disk spill data, Kafka log/data folders & etc. • File system & full disk, e.g., audit log folders • Use a Hardware Security Module (HSM) as a more secured key storage
  8. ©2016 MasterCard. Proprietary and Confidential8 Hadoop Security – Data Security • Encryption Over-The-Wire (In-Motion) 1. Use TLS/SSL encrypt the communication channel for different protocols, e.g., TCP/IP, RPC, Http 2. TLS/SSL certificate can be used for a specify identity, e.g., per host, service, port & etc. 3. Keep scalability and maintenance effort in mind when apply certificate strategy, e.g., based on type of applications - JKS for Java apps, and PEM for others 4. Use automatic deployment tools to deploy SSL certificates, e.g., Chef, StackIQ & etc. 5. Set up company own Certificate Authority (CA) 6. Use CA signed certificates for all non-dev environments, self-signed certificates can be used in the dev environment 7. Disable clear text services once TLS/SSL is enabled, e.g., disable HTTP service on 11000 once Oozie HTTPS is enabled on port 11443
  9. ©2016 MasterCard. Proprietary and Confidential9 Hadoop Security – Auditing • Passive auditing, doesn’t generate alerts, main purpose is to audit certain events to meet business requirements, e.g., setOwner, setPermission on HDFS folders • Active auditing, more aggressive, normally generates alerts, e.g., automatically generates email alerts to InfoSec upon access denial events and configuration changes • Enable and configure auditing logs for each service in, e.g., HDFS, YARN, HIVE, Impala & etc. • Use vendor supported products, such as Cloudera Navigator, to centrally configure, collect, monitor audit policies/events and aggregate logs • Segregation of duties set up ACLs properly, enable AES256 strong encryption on the audit logs, and only auditors and their log collecting process such, as Splunk, can access the audit log folders
  10. ©2016 MasterCard. Proprietary and Confidential10 Governance & Compliance – Governance & Metadata Management • Metadata extraction and management, e.g., HDFS folders, files & permissions, Yarn job metadata, Oozie workflows, Hive queries & etc. • Searchable with Lineage information attached • Data Classification based on business perspectives • Centralized Auditing • Use tools to satisfy majority of the requirements, e.g., Cloudera Navigator & Apache Atlas
  11. ©2016 MasterCard. Proprietary and Confidential11 Governance & Compliance – Masking & Redaction • HDFS log and query redaction, e.g., Hostname: b(([A-Za-z]|[A-Za-z][A-Za-z0-9-]*[A-Za-z0-9]).)+([A-Za-z0-9]|[A-Za-z0-9][A-Za- z0-9-]*[A-Za-z0-9])b Replace: HOSTNAME.REDACTED • Audit Server Log masking & redaction for all supported Hadoop services, e.g., (4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0- 9]{11})|(6(?:011|5[0-9]{2})[0-9]{12})|((?:2131|1800|35d{3})d{11})
  12. ©2016 MasterCard. Proprietary and Confidential12 Governance & Compliance – Compliant Procedures • Penetration test & application scan  All medium & high security vulnerability findings have to be remediated before the certification deadline • User roles and groups validation, e.g., audit policies on data access based on groups & roles • Application log, audit log & change reports, e.g., provide role-based authorization audit logs once a week to an internal auditor • Patch management • Data retention, encryption and key rotation policies • Other business requirements
  13. ©2016 MasterCard. Proprietary and Confidential13 Takeaways • Securing your Hadoop environments can be lengthy and evolving • Homegrown processes are needed to satisfy business requirements • Security is applied end to end in the process • Big Data is still maturing • Don’t confuse compliance with security
  14. ©2016 MasterCard. Proprietary and Confidential Contact Us 14 Craig Hibbeler +1 (636) 439 8186 Derek Sun +1 (636) 722 5512