Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a 大数据数据治理及数据安全(20)

Anúncio

Último(20)

大数据数据治理及数据安全

  1. 1© 2014 Cloudera, Inc. All rights reserved. Data Governance and Protection in Hadoop Jianwei Li jarred@cloudera.com Introduction of Cloudera Navigator
  2. 2© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  3. 3© 2014 Cloudera, Inc. All rights reserved. Hadoop Ecosystem OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
  4. 4© 2014 Cloudera, Inc. All rights reserved. The Benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
  5. 5© 2014 Cloudera, Inc. All rights reserved. …Can Create Information Security Challenges Business Manager • Run high value workloads in cluster • Quickly adopt new innovations Information Security • Follow established policies and procedures • Maintain compliance IT/Operations • Integrate with existing IT investments • Minimize end-user support • Automate configuration
  6. 6© 2014 Cloudera, Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
  7. 7© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  8. 8© 2014 Cloudera, Inc. All rights reserved. Data Management Challenges Compliance Officers • Who’s accessing what data? • What are they doing with the data? • Is sensitive data governed and protected? • Can I meet compliance needs? Data Stewards/Curators • How can I manage data from ingest to purge? • How do I classify data efficiently? • How can data be made available to end-users? Business Users • How do I find what’s relevant? • Can I trust what I find? • How can I explore data on my own? Database Admins • How is data being used today? • How can I optimize for future workloads? • How can I take advantage of Hadoop risk-free and fast?
  9. 9© 2014 Cloudera, Inc. All rights reserved. Cloudera Navigator • Metadata Management • Audit • Policy Based Data Management • Data Analytics The only integrated data management and governance platform for Hadoop
  10. 10© 2014 Cloudera, Inc. All rights reserved. Navigator Metadata Architecture
  11. 11© 2014 Cloudera, Inc. All rights reserved. Metadata Extraction • HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. • Hive - Extracts database and table metadata from the Hive Metastore Server. • Impala - Extracts database and table metadata from the Hive Metastore Server. Extracts query metadata from the Impala Daemon lineage logs. • MapReduce - Extracts job metadata from the JobTracker
  12. 12© 2014 Cloudera, Inc. All rights reserved. Metadata Extraction • Oozie - Extracts Oozie workflows from the Oozie Server. • Pig - Extracts Pig script runs from the JobTracker or Job History Server. • Spark - Extracts Spark job metadata from YARN logs. • Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server. Extracts job runs from the JobTracker or Job History Server. • YARN - Extracts job metadata from the ResourceManager.
  13. 13© 2014 Cloudera, Inc. All rights reserved. Metadata Indexing • Metadata is indexed to Solr for searching • Technical metadata key-value pairs, for example, “fileSystemPath:/tmp/hbase-staging” • Custom metadata key-value pairs, for example, “description:Banking*” • Hive extended attribute key-value pairs, • ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1'); • (sourceType:hive OR sourceType:hdfs) AND (type:table OR type:directory)
  14. 14© 2014 Cloudera, Inc. All rights reserved. Self-Service Data Discovery & Analytics For Business Users Effortlessly find and trust the data that matters most • Search across unified metadata repository • Gain context and visibility into data sets • Find similar, relevant data
  15. 15© 2014 Cloudera, Inc. All rights reserved. Technical & Business Metadata
  16. 16© 2014 Cloudera, Inc. All rights reserved. Modifying Metadata • HDFS file • /user/test/file1.txt • /user/test/.file1.txt.navigator { "name" : "aName", "description" : "a description", "properties" : { "prop1" : "value1", "prop2" : "value2" }, "tags" : [ "tag1" ] } • REST: http://Navigator_Metadata_Server_host:port/api/v 8/entities/ -u username:password -X POST -H "Content-Type: application/json" -d '{properties}'
  17. 17© 2014 Cloudera, Inc. All rights reserved. Navigator Analytics • Metadata - the number of files by creation and access times, size, block size, and replication count. • Audit – Activity tab - by directory which files have been accessed using the open operation and how many times they have been accessed. – Top Users tab - the top-n commands and the top-n users and top n commands those users performed
  18. 18© 2014 Cloudera, Inc. All rights reserved. Navigator Audit Architecture
  19. 19© 2014 Cloudera, Inc. All rights reserved. Compliance-Ready Governance & Protection For Compliance Officers Track, understand, and protect access to sensitive data • Search centralized audits for the entire ecosystem • See how data is used and changing with intuitive lineage • Protect all data with high-performance encryption and key management • Integrate with leading partner tools
  20. 20© 2014 Cloudera, Inc. All rights reserved. Policy Based Data Management • Automate data stewardship and curation activities with the policy engine • Data archive • Data delete • Metadata management • automatic naming with timestamp: entity.get(FSEntityProperties.ORIGINAL_NAME, Object.class) + " - " + new SimpleDateFormat("yyyy-MM- dd").format(entity.get(FSEntityProperties.CREATED, Instant.class).toDate()) • Ensured business continuity through built-in backup & disaster recovery • Integrate with leading partner tools
  21. 21© 2014 Cloudera, Inc. All rights reserved. Lineage • Lineage provides provenance information to show where data came from and how it has been transformed within the EDH • Cloudera Navigator provides column-level lineage within Cloudera EDH • Integrates with certified third party lineage solutions, such as Informatica, for enterprise-wide lineage information
  22. 22© 2014 Cloudera, Inc. All rights reserved. Lineage
  23. 23© 2014 Cloudera, Inc. All rights reserved. End-to-End Data Management Cloudera Navigator + Partners Lineage Auditing Metadata AugmentationConsumption
  24. 24© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  25. 25© 2014 Cloudera, Inc. All rights reserved. Background • Our customers are increasingly wanting to use HDFS to store sensitive data • Customers often are mandated to protect data at rest • National Security • Company confidential • Encryption of data at rest helps mitigate certain security threats • Rogue administrators (insider threat) • Lost/stolen hard drives
  26. 26© 2014 Cloudera, Inc. All rights reserved. Over the Wire Encryption • Uses certificates and TLS to encrypt and optionally authenticate network communication • Customers can use commercial certificate authorities, corporate CAs, or self- signed certificates • Active Directory Certificate Services is commonly used by customers • Secures Hadoop data processing components as well as Cloudera Manager agents and management services
  27. 27© 2014 Cloudera, Inc. All rights reserved. Data at Rest Encryption • Protects data on disk from unauthorized exposure • Protects the data from both online attacks while the system is running as well as offline attacks such as stealing physical drives • HDFS transparent encryption at rest is an open source technology available in Apache Hadoop • Navigator Encrypt is a proprietary technology that protects data outside HDFS • Backend databases, log directories, temp directories, landing zones • Navigator KeyTrustee Server is a proprietary key management server that can integrate with an enterprise HSM
  28. 28© 2014 Cloudera, Inc. All rights reserved. HDFS Encrypt + Navigator Encrypt + Key Trustee
  29. 29© 2014 Cloudera, Inc. All rights reserved.
  30. 30© 2014 Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
  31. 31© 2014 Cloudera, Inc. All rights reserved. Navigator Key Trustee Architecture
  32. 32© 2014 Cloudera, Inc. All rights reserved. Key Management Service (KMS) • When encrypting any data it is important to securely store your encryption keys away from the encrypted data • KMS is a Key Management Service for HDFS Encryption to store and retrieve encryption keys • KMS is open source and provides a standard interface for pluggable key providers • The default key provider for KMS is the Java Key Store • The Java Key Store is not recommended for production key management is meant for development and testing
  33. 33© 2014 Cloudera, Inc. All rights reserved. Key Management Service (KMS) ● Encryption occurs on the requesting client. ○ Data is encrypted before it lands on disk. ○ The KMS encrypts and decrypts specific key components. ○ The KMS does not encrypt content. ○ The KMS does not store keys.
  34. 34© 2014 Cloudera, Inc. All rights reserved. KMS Proxy Deployment considerations.
  35. 35© 2014 Cloudera, Inc. All rights reserved. KMS Proxy Deployment considerations.
  36. 36© 2014 Cloudera, Inc. All rights reserved. Navigator Key Trustee • Navigator Key Trustee provides secure, centralized and scalable key storage and administration • Is not open source and licensed with Cloudera Navigator • Is the recommended option for production deployments • Provides the hooks to integrate with Hardware Security Modules for physically tamper proof requirements (FIPS 140-2 level 3) • Also provides centralized Key Management for Navigator Encrypt
  37. 37© 2014 Cloudera, Inc. All rights reserved. • Customers may choose to use Hardware Security Modules (HSM) to improve the security of their Key store. • Key HSM is a universal Hardware Security Module (HSM) driver. • It acts as a translator between the target HSM Platform and Key Trustee. Key HSM
  38. 38© 2014 Cloudera, Inc. All rights reserved. Hardware Security Module (HSM) • There are a number of vendors out there that provide this. • They exists as appliances and attachable physical hardware. • If one is configured with Key Trustee it will be used as a Root of Trust. • Data inside of the Key Trustee Keystore will be encrypted by this Root of Trust. • The HSM "master" keys are generated in the HSM and never leave the HSM.
  39. 39© 2014 Cloudera and/or its affiliates. All rights reserved. HDFS Encryption Workflow
  40. 40© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Involved Parties HDFS KMS Key Trustee zHSM HSM Client optional Key authorization File authorization ©2014 Cloudera, Inc. All rights reserved.
  41. 41© 2014 Cloudera, Inc. All rights reserved. Keys Used in Encryption at Rest HDFS Encryption • Encryption Zone Key (EZKEY) • This key much like a mount key is associated with an encryption zone in HDFS. • Encrypted Data Encryption Key (EDEK) • This is an encrypted copy of a Data Encryption Key. • Data Encryption Key (DEK) • This is the real data encryption key used to encrypt data stored within a file, zone, or block device. This particular key concept is used in both Navigator Encrypt and HDFS Transparent Data Encryption (TDE).
  42. 42© 2014 Cloudera, Inc. All rights reserved. Keys Used in Encryption at Rest (1) When an EZ is created, the administrator specifies an encryption zone key (EZ Key) that is already stored in the backing keystore. The EZ Key encrypts the data encryption keys (DEKs) that are used in turn to encrypt each file. DEKs are encrypted with the EZ key to form an encrypted data encryption key (EDEK), which is stored on the NameNode via an extended attribute on the file (2) To encrypt a file, the client retrieves a new EDEK from the NameNode, and then asks the KMS to decrypt it with the corresponding EZ key. This step results in a DEK (3) the client uses a DEK to encrypt their data (3). (4)To decrypt a file, the client needs to again decrypt the file’s EDEK with the EZ key to get the DEK (2). Then, the client reads the encrypted data and decrypts it with the DEK .
  43. 43© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Writing a File HDFS KMS Client To Trustee 2 3 6 7 1 5 8 1. create file 2. generate key 3. encrypted key 4. store encrypted key 5. file handle & encrypted key 6. decrypt encrypted key 7. decrypted key 8. encrypt & write data 4 ©2014 Cloudera, Inc. All rights reserved.
  44. 44© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Reading a File HDFS KMS Client To Trustee 3 4 1 2 5 1. open file (passed read permission check) 2. file handle & encrypted key 3. decrypt encrypted key 4. decrypted key 5. read & decrypt data ©2014 Cloudera, Inc. All rights reserved.
  45. 45© 2014 Cloudera and/or its affiliates. All rights reserved. HDFS Encryption Implementation and Usage
  46. 46© 2014 Cloudera, Inc. All rights reserved. Enabling HDFS Encryption on a Cluster • Need recent version of libcrypto.so on HDFS and MapReduce client hosts • To check use the following command: hadoop checknative Output openssl: true /usr/lib64/libcrypto.so • yum install openssl openssl-devel • openssl package installs the library, openssl-devel creates the libcrypto.so symlink (you can manually create this as well) • Openssl provides AES-NI integration for Intel hardware
  47. 47© 2014 Cloudera, Inc. All rights reserved. Enabling HDFS Encryption on a Cluster Using Cloudera Manager 1) Adding the KMS Service - add service Java KeyStore KMS on a host 2) Enabling Java KeyStore KMS for the HDFS Service • HDFS service – configuration tab • Scope > HDFS (Service-Wide) • Category > All • KMS Service property – turn on radio button SAVE CHANGES Restart Cluster Deploy Client Configuration.
  48. 48© 2014 Cloudera, Inc. All rights reserved. Creating Encryption Zones • Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones. # Create an encryption key for your zone as the application user that will be using the key $ hadoop key create myKey # Create a new empty directory and make it an encryption zone $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName myKey -path /zone # To see the key zones $ hdfs crypto –listZones
  49. 49© 2014 Cloudera, Inc. All rights reserved. Adding Files to an Encryption Zones Remember they start empty! You cannot create a Zone in directories with data hadoop distcp /user/dir /user/enczone • By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. • When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. • Use -skipcrccheck and -update flags to avoid verifying checksums. • Also use the distcp flags to preserve all attributes (-prbugpcaxt)
  50. 50© 2014 Cloudera, Inc. All rights reserved. Unified Governance Foundation Unified Auditing Comprehensive Lineage Unified Metadata Universal Policies Search Define Analyze Profile Self-Service Discovery & Analytics Effortlessly find and trust the data that matters most Audit Track Encrypt Manage Keys Compliance-Ready Governance & Protection Track, understand, and protect access to sensitive data Report Optimize Migrate Maintain Models Active Data Optimization Configure Hadoop to boost user productivity Classify Steward Backup Retain Hadoop-Scale Data Lifecycle Management Maximize cluster performance at Hadoop scale with ease Cloudera Navigator The only integrated data management and governance platform for Hadoop
  51. 51© 2014 Cloudera, Inc. All rights reserved. Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems MasterCard Cloudera: The first PCI-Certified Hadoop Platform Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar Chief Technology Officer, Architecture MasterCard
  52. jarred@cloudera.com
Anúncio