Mais conteúdo relacionado

Apresentações para você(20)

Similar a BigData Security - A Point of View(20)


BigData Security - A Point of View

  1. BigData Security A Point of View by Karan Alang
  2. The Document provides an overview of the key security challenges in Big Data (Apache Hadoop)systems, and showcases the solutions used by Hortonworks Distribution to solve these security challenges.
  3. Ø BigData and Security - Key Asks and Challenges Ø BigData Security - what it encompasses Ø BigData Security - Approaches Ø Reference Architecture& Hadoop Security Ø Security - HDP Solutions components Ø Hadoop Security - Popular Hadoop Distributions Contents
  4. I can do so much with Big Data BigData and Security CheaperFaster decisions Real-time Analytics Competitive Advantage With the many benefits of BigData and the growing dependance of organizations on it, BigData presents new challenges with Data Security ~Increased threat due to increaseddevice and machine data ~Distributed platform sometimes lack peripheral security ~Interaction between distributed nodes are not secured ~Threadexposing sensitive data through unstructureddata ~Lack of Auditing & logging features and many more … BigDataPlatform I can do so much with Big Data
  5. Inside BigData Ecosystem - Security Challenges Distributed environment- lack of Peripheral Security to streamline & monitor external requests to the BigData system Need to protect data against untrusted processing tracks Most NoSQL databases do not provide comprehensive security features. including wrt Unstructured data Distributed environment involves inter-node communications, client communication with resource managers & nodes - which is not secured Lack of Audit & Logging features - to monitor & handle security threats Handling/masking sensitive data e.g.. PII data
  6. BigData Security - what it encompasses Authentication Authorization Audit Data Protection Administration How do i set policies across the entire environment ? Who am I ? What can I do ? What, who, when ? Protection/encryption of Data at rest & in motion, PII data encryption, masking
  7. BigData Security - Approaches Perimeter Security (Walled Garden) Data Centric Security Cluster Security • Secure the cluster by tightly controlling access through firewalls or API gateways • Simple to setup • If the gateway is breached, there is not protection • Data is secured using techniques - • Encryption • (data encryption using Hadoop KMS or third-party KMS) • Masking (e.g. PII data masking) • Role based access to data • Security at every step with the cluster • Multi-level security authentication (Ldap, kerberos etc) & Authorization • Leveraging kerberos for Authentication • SSL/TLC for encrypting data in motion. • Audit – who, what, when ?
  8. Data Sources Data Visualization DW Data Centric Security Authentication using Kerberos TLS/SSL ( Data In Motion) • Authorization • Data Protection • Audit PerimeterSecurity PerimeterSecurity Perimeter Security Security & Governance Data Integration Hub Data Storage/Processing Big Data Platform Data Scientist App Developer Admin Data Intelligence Predictive Models Time Series Analysis Regression Analysis Recommendation Engine External Sources firewall Data Ingestion Data Storage & Processing Data Access Big Data Security - Reference Architecture Users
  9. BigData - Security at each step Data Ingestion Data Storage Data Processing Data Access & Visualization Authentication (LDAP/Kerberos) y - y y Authorization y y y y Audit y y y y Data Protection (Encryption, Masking) y y - y
  10. Apache Hadoop - Cluster Security Master NameNode Application Data Node Data Node Data Node SSL.TLS Data Encryption Identity & Auth Logging & Monitoring Kerberos Systems like Apache Hadoop utilize distributed computing, inter-node communication, replication, & other cluster services - which exposes the cluster at multiple levels. Representative diagram - Does not show all components e.g.. YARN
  11. Hadoop security - Solution components used by Hortonworks Apache Ranger FW to enable, monitor and manage Comprehensive data security across the Hadoop platform. Centralized platform of Security policies, wire encryption, fine-grained access control Apache Knox • Enables Perimeter Security • Kerberosencapsulation • single access point for all REST interactions with Apache Hadoop clusters • Centralized Authentication, Authorization, and Audit for Hadoop REST/HTTP services • Integrated with existing systemsto simplify identity maintenance (SSO, LDAP, AD) • Knox eliminates the client requirement of knowledge of cluster topology. Kerberos Authentication protocol that works on basisof ‘tickets’ to authenticate requests. Enables authentication of all external requeststo BigData ecosystem, as well as requestsinternal to the BigData ecosystem (incl. inter-node communications, communications between resource manager and nodesetc) SSL/TLS SSL/TLS - This is the standard securitytechnology for for establishing an encrypted link between server & client. This ensuresthe data transfer between the client & server remains private and integral.
  12. BigData Security Solution Components • Apache Ranger - formerly XA Secure, before Hortonworksacquired XA Secure systems in 2014 is - FW to enable, monitor and manage Comprehensive Data Securityacross the Hadoop platform. • It provides - • Centralized platform for Security policy Administration. • Enables fine-grained accesscontrol • Centralized Audit reporting • Wire encryption • HDFS Encryption with Ranger KMS • Supports fine-grained Authorization & Auditing for following Apache projects Apache Hadoop, Apache Hive, Apache HBase, Apache Storm, Apache Knox, Apache Solr, Apache Kafka, YARN
  13. BigData Security Solution Components Source - Hortonworks Ranger Plugins run on the specific applications, therefore do not have adverse affect on system performance
  14. BigData Security Solution Components Ranger UI – enabling User/Usergrouplevel authorization for compoments – HDFS, YARN, Hbase, Hive, Kafka and others.
  15. BigData Security Solution Components Using Apache Ranger to enable authorization for HDFS at User/Usergrouplevel.
  16. BigData Security Solution Components Apache Ranger – can be used to enable Audit for user actions/acess
  17. Apache Knox - Perimeter Security Source - Hortonworks
  18. Kafka and Security • Kafka Security introducedin Kafka 0.9 & includedin ConfluentPlatform 2.0 Key features : • ClientAuthentication usingKerberos/TLSclient certificates,so Kafka brokers know who is making requests • Unix like permission to control which users can access which data • Encrypted Network communication,allowing messages to be sent securely across networks • Authentication requiredfor communication between Brokers and Zookeeper • Apache Ranger support for Kafka authentication & audit. Source - Hortonworks
  19. HBase & Spark - Security • Apache HBase • Leverage Apache Ranger,Apache Knox, Kerberosfor HBase Security • Authentication, • Authorization (ACLs) • Encryption for Data at Rest • Wire encryption • Apache Spark • Kerberos- token based Authentication • Spark Communication Encryption settings • leverage YARN SSL for Yarn NM - Executor communication • Other settings include - • spark.authenticate=true (enable RPC) • spark.authenticate.enableSaslEncryption = true (wire encryption, shuffle) • spark.ssl.enables=true • Available HDP 2.5 onwards • Fine-grained Column Level AccessControl using Hive LLAP (Long live and Process)
  20. Spark Column Security with LLAP 1. SparkSQL gets data locations known as “splits” from hive server, and plan query. 2. HiveServer2 authorizes access using Ranger.Per-user policies like row filtering are applied. 3. Spark gets modified query plan based on dynamic security policy. 4. Spark reads data from LLAP, filtering/masking done by LLAP • Fine grained Column level access control for Spark • Dynamic policiesper user, doesn't require Views • Use Standard Ranger policiesand tools to control access and masking policies Ranger Server (Dynamic policies) Hive Server2 (Authorization) Spark Client + LLAP Context LLAP (Data read, filter pushdown) 1 2 3 4 Source - Hortonworks
  21. BigData Security - Popular Hadoop Distributions Distributions Description Perimeter Security Data Centric Security Cloudera Provides comprehensive security for CDH leveraging Cloudera Manager, Apache Sentry, Kerberos Cloudera manager provides Authentication and Network Isolation. It can be integrated with Kerberos/LDAP/AD Leverages Apache sentry to provide fine grained cell level security. Auditing provided through Cloudera navigator. Security for data in transit, through TLS and other mechanisms - centrally deployed through Cloudera Manager Transparent data-at-rest protection is provided through the combination of HDFS encryption, Navigator Encrypt, and Navigator Key Trustee. MapR Built-in support for Authentication, Authorization, Impersonation, Encryption, Auditing.TLS/SSL for data- in-motion Possess native authentication mechanism. However has capability for integration with Kerberos MapR supports Hadoop Access Control Lists (ACLs) for regulating user privileges to the job queue and cluster. The Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol secures several channels of HTTP traffic Hortonworks provides Comprehensive security for Apache Hadoop stack - Perimeter security, Data Centric, Cluster security Leverages Apache Knox gateway for Perimeter security. Uses Apache Ranger for authorization(inc. fine grained cell level), audit and data protection through support HDFS Transparent Encryption. Ranger supports security for multiple solutions including Apache Kafka, HBase, YARN.