Início
Conheça mais
Enviar pesquisa
Carregar
Entrar
Cadastre-se
Anúncio
Check these out next
Big Data Fundamentals
Cloudera, Inc.
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
John Zuniga Resume
John Zuniga
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
End to End Streaming Architectures
Cloudera, Inc.
1
de
52
Top clipped slide
大数据数据治理及数据安全
26 de Jul de 2017
•
0 gostou
2 gostaram
×
Seja o primeiro a gostar disto
mostrar mais
•
269 visualizações
visualizações
×
Vistos totais
0
No Slideshare
0
De incorporações
0
Número de incorporações
0
Baixar agora
Baixar para ler offline
Denunciar
Tecnologia
大数据数据治理及数据安全
Jianwei Li
Seguir
Sr SE & Solution Architect at Cloudera em Cloudera
Anúncio
Anúncio
Anúncio
Recomendados
大数据数据安全
Jianwei Li
239 visualizações
•
48 slides
sql on hadoop
Jianwei Li
286 visualizações
•
49 slides
快速数据快速分析引擎-Kudu
Jianwei Li
410 visualizações
•
58 slides
Risk Management for Data: Secured and Governed
Cloudera, Inc.
1.3K visualizações
•
36 slides
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
4K visualizações
•
28 slides
Security implementation on hadoop
Wei-Chiu Chuang
1.2K visualizações
•
49 slides
Mais conteúdo relacionado
Apresentações para você
(20)
Big Data Fundamentals
Cloudera, Inc.
•
5.9K visualizações
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
•
3.5K visualizações
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
•
2.5K visualizações
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
•
2.9K visualizações
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
•
851 visualizações
John Zuniga Resume
John Zuniga
•
571 visualizações
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
•
366 visualizações
End to End Streaming Architectures
Cloudera, Inc.
•
3.5K visualizações
Spark One Platform Webinar
Cloudera, Inc.
•
2.5K visualizações
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
•
3.3K visualizações
Hadoop Security
Timothy Spann
•
5.9K visualizações
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
•
4.1K visualizações
Hadoop and Data Access Security
Cloudera, Inc.
•
10.1K visualizações
Cloudera GoDataFest Security and Governance
GoDataDriven
•
301 visualizações
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
•
1.8K visualizações
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
•
3.3K visualizações
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
•
1.3K visualizações
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
•
1.4K visualizações
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
•
757 visualizações
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
•
4.9K visualizações
Similar a 大数据数据治理及数据安全
(20)
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
•
1.7K visualizações
Bringing Trus and Visibility to Apache Hadoop
DataWorks Summit
•
700 visualizações
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
•
1.8K visualizações
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
•
1.2K visualizações
Fighting cyber fraud with hadoop
Niel Dunnage
•
1.2K visualizações
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
•
9.5K visualizações
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
•
388 visualizações
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
•
1.7K visualizações
Application Architectures with Hadoop
hadooparchbook
•
3.2K visualizações
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
•
3.9K visualizações
TriHUG October: Apache Ranger
trihug
•
1.4K visualizações
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
•
8.3K visualizações
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
•
913 visualizações
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
•
4.7K visualizações
Application Architectures with Hadoop
hadooparchbook
•
3.9K visualizações
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
•
11.4K visualizações
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
•
740 visualizações
Data Science and CDSW
Jason Hubbard
•
1.3K visualizações
BigData Security - A Point of View
Karan Alang
•
740 visualizações
Architecting Applications with Hadoop
markgrover
•
756 visualizações
Anúncio
Último
(20)
Fourth-Industrial-Revolution-by-DR-SA-KANU.ppt
MdAbdullaAlMamun20
•
0 visão
ARTIFICIAL INTELLIGENCE.pptx
Butterfly education
•
0 visão
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
•
0 visão
如何办理一份高仿伦敦南岸大学毕业证成绩单?
aazepp
•
0 visão
solar panel.pptx
AbdulberBaig
•
0 visão
Fortnite Is Awsome!!!
YT SavageGuy
•
0 visão
KPIs&Goals.pdf
mennaHendy
•
0 visão
PEMBANGKIT_1.ppt
DediTriLaksono1
•
0 visão
Memory Matters: Drift Detection with a Low Memory Footprint for ML Models on ...
HostedbyConfluent
•
0 visão
Safeguarding - Protecting Your Kafka from Misbehaving Clients with Tom Scott
HostedbyConfluent
•
0 visão
Playing with Xbox Data with Dale Lane
HostedbyConfluent
•
0 visão
Pytexas: Build ChatGPT over SMS in Python
Elizabeth (Lizzie) Siegle
•
0 visão
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
HostedbyConfluent
•
0 visão
Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes w...
HostedbyConfluent
•
0 visão
如何办理一份高仿南达科他大学毕业证成绩单?
aazepp
•
0 visão
Nanotechnology.pdf
shikharbhadouria
•
0 visão
Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang
HostedbyConfluent
•
0 visão
Lesson-02 (1).pptx
ssuserc24e05
•
0 visão
Powering Consistent, High-throughput, Real-time Distributed Calculation Engin...
HostedbyConfluent
•
0 visão
Deep Dive into Kafka Connect Protocol with Catalin Pop
HostedbyConfluent
•
0 visão
大数据数据治理及数据安全
1© 2014 Cloudera,
Inc. All rights reserved. Data Governance and Protection in Hadoop Jianwei Li jarred@cloudera.com Introduction of Cloudera Navigator
2© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
3© 2014 Cloudera,
Inc. All rights reserved. Hadoop Ecosystem OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
4© 2014 Cloudera,
Inc. All rights reserved. The Benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
5© 2014 Cloudera,
Inc. All rights reserved. …Can Create Information Security Challenges Business Manager • Run high value workloads in cluster • Quickly adopt new innovations Information Security • Follow established policies and procedures • Maintain compliance IT/Operations • Integrate with existing IT investments • Minimize end-user support • Automate configuration
6© 2014 Cloudera,
Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
7© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
8© 2014 Cloudera,
Inc. All rights reserved. Data Management Challenges Compliance Officers • Who’s accessing what data? • What are they doing with the data? • Is sensitive data governed and protected? • Can I meet compliance needs? Data Stewards/Curators • How can I manage data from ingest to purge? • How do I classify data efficiently? • How can data be made available to end-users? Business Users • How do I find what’s relevant? • Can I trust what I find? • How can I explore data on my own? Database Admins • How is data being used today? • How can I optimize for future workloads? • How can I take advantage of Hadoop risk-free and fast?
9© 2014 Cloudera,
Inc. All rights reserved. Cloudera Navigator • Metadata Management • Audit • Policy Based Data Management • Data Analytics The only integrated data management and governance platform for Hadoop
10© 2014 Cloudera,
Inc. All rights reserved. Navigator Metadata Architecture
11© 2014 Cloudera,
Inc. All rights reserved. Metadata Extraction • HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. • Hive - Extracts database and table metadata from the Hive Metastore Server. • Impala - Extracts database and table metadata from the Hive Metastore Server. Extracts query metadata from the Impala Daemon lineage logs. • MapReduce - Extracts job metadata from the JobTracker
12© 2014 Cloudera,
Inc. All rights reserved. Metadata Extraction • Oozie - Extracts Oozie workflows from the Oozie Server. • Pig - Extracts Pig script runs from the JobTracker or Job History Server. • Spark - Extracts Spark job metadata from YARN logs. • Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server. Extracts job runs from the JobTracker or Job History Server. • YARN - Extracts job metadata from the ResourceManager.
13© 2014 Cloudera,
Inc. All rights reserved. Metadata Indexing • Metadata is indexed to Solr for searching • Technical metadata key-value pairs, for example, “fileSystemPath:/tmp/hbase-staging” • Custom metadata key-value pairs, for example, “description:Banking*” • Hive extended attribute key-value pairs, • ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1'); • (sourceType:hive OR sourceType:hdfs) AND (type:table OR type:directory)
14© 2014 Cloudera,
Inc. All rights reserved. Self-Service Data Discovery & Analytics For Business Users Effortlessly find and trust the data that matters most • Search across unified metadata repository • Gain context and visibility into data sets • Find similar, relevant data
15© 2014 Cloudera,
Inc. All rights reserved. Technical & Business Metadata
16© 2014 Cloudera,
Inc. All rights reserved. Modifying Metadata • HDFS file • /user/test/file1.txt • /user/test/.file1.txt.navigator { "name" : "aName", "description" : "a description", "properties" : { "prop1" : "value1", "prop2" : "value2" }, "tags" : [ "tag1" ] } • REST: http://Navigator_Metadata_Server_host:port/api/v 8/entities/ -u username:password -X POST -H "Content-Type: application/json" -d '{properties}'
17© 2014 Cloudera,
Inc. All rights reserved. Navigator Analytics • Metadata - the number of files by creation and access times, size, block size, and replication count. • Audit – Activity tab - by directory which files have been accessed using the open operation and how many times they have been accessed. – Top Users tab - the top-n commands and the top-n users and top n commands those users performed
18© 2014 Cloudera,
Inc. All rights reserved. Navigator Audit Architecture
19© 2014 Cloudera,
Inc. All rights reserved. Compliance-Ready Governance & Protection For Compliance Officers Track, understand, and protect access to sensitive data • Search centralized audits for the entire ecosystem • See how data is used and changing with intuitive lineage • Protect all data with high-performance encryption and key management • Integrate with leading partner tools
20© 2014 Cloudera,
Inc. All rights reserved. Policy Based Data Management • Automate data stewardship and curation activities with the policy engine • Data archive • Data delete • Metadata management • automatic naming with timestamp: entity.get(FSEntityProperties.ORIGINAL_NAME, Object.class) + " - " + new SimpleDateFormat("yyyy-MM- dd").format(entity.get(FSEntityProperties.CREATED, Instant.class).toDate()) • Ensured business continuity through built-in backup & disaster recovery • Integrate with leading partner tools
21© 2014 Cloudera,
Inc. All rights reserved. Lineage • Lineage provides provenance information to show where data came from and how it has been transformed within the EDH • Cloudera Navigator provides column-level lineage within Cloudera EDH • Integrates with certified third party lineage solutions, such as Informatica, for enterprise-wide lineage information
22© 2014 Cloudera,
Inc. All rights reserved. Lineage
23© 2014 Cloudera,
Inc. All rights reserved. End-to-End Data Management Cloudera Navigator + Partners Lineage Auditing Metadata AugmentationConsumption
24© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
25© 2014 Cloudera,
Inc. All rights reserved. Background • Our customers are increasingly wanting to use HDFS to store sensitive data • Customers often are mandated to protect data at rest • National Security • Company confidential • Encryption of data at rest helps mitigate certain security threats • Rogue administrators (insider threat) • Lost/stolen hard drives
26© 2014 Cloudera,
Inc. All rights reserved. Over the Wire Encryption • Uses certificates and TLS to encrypt and optionally authenticate network communication • Customers can use commercial certificate authorities, corporate CAs, or self- signed certificates • Active Directory Certificate Services is commonly used by customers • Secures Hadoop data processing components as well as Cloudera Manager agents and management services
27© 2014 Cloudera,
Inc. All rights reserved. Data at Rest Encryption • Protects data on disk from unauthorized exposure • Protects the data from both online attacks while the system is running as well as offline attacks such as stealing physical drives • HDFS transparent encryption at rest is an open source technology available in Apache Hadoop • Navigator Encrypt is a proprietary technology that protects data outside HDFS • Backend databases, log directories, temp directories, landing zones • Navigator KeyTrustee Server is a proprietary key management server that can integrate with an enterprise HSM
28© 2014 Cloudera,
Inc. All rights reserved. HDFS Encrypt + Navigator Encrypt + Key Trustee
29© 2014 Cloudera,
Inc. All rights reserved.
30© 2014 Cloudera,
Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
31© 2014 Cloudera,
Inc. All rights reserved. Navigator Key Trustee Architecture
32© 2014 Cloudera,
Inc. All rights reserved. Key Management Service (KMS) • When encrypting any data it is important to securely store your encryption keys away from the encrypted data • KMS is a Key Management Service for HDFS Encryption to store and retrieve encryption keys • KMS is open source and provides a standard interface for pluggable key providers • The default key provider for KMS is the Java Key Store • The Java Key Store is not recommended for production key management is meant for development and testing
33© 2014 Cloudera,
Inc. All rights reserved. Key Management Service (KMS) ● Encryption occurs on the requesting client. ○ Data is encrypted before it lands on disk. ○ The KMS encrypts and decrypts specific key components. ○ The KMS does not encrypt content. ○ The KMS does not store keys.
34© 2014 Cloudera,
Inc. All rights reserved. KMS Proxy Deployment considerations.
35© 2014 Cloudera,
Inc. All rights reserved. KMS Proxy Deployment considerations.
36© 2014 Cloudera,
Inc. All rights reserved. Navigator Key Trustee • Navigator Key Trustee provides secure, centralized and scalable key storage and administration • Is not open source and licensed with Cloudera Navigator • Is the recommended option for production deployments • Provides the hooks to integrate with Hardware Security Modules for physically tamper proof requirements (FIPS 140-2 level 3) • Also provides centralized Key Management for Navigator Encrypt
37© 2014 Cloudera,
Inc. All rights reserved. • Customers may choose to use Hardware Security Modules (HSM) to improve the security of their Key store. • Key HSM is a universal Hardware Security Module (HSM) driver. • It acts as a translator between the target HSM Platform and Key Trustee. Key HSM
38© 2014 Cloudera,
Inc. All rights reserved. Hardware Security Module (HSM) • There are a number of vendors out there that provide this. • They exists as appliances and attachable physical hardware. • If one is configured with Key Trustee it will be used as a Root of Trust. • Data inside of the Key Trustee Keystore will be encrypted by this Root of Trust. • The HSM "master" keys are generated in the HSM and never leave the HSM.
39© 2014 Cloudera
and/or its affiliates. All rights reserved. HDFS Encryption Workflow
40© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Involved Parties HDFS KMS Key Trustee zHSM HSM Client optional Key authorization File authorization ©2014 Cloudera, Inc. All rights reserved.
41© 2014 Cloudera,
Inc. All rights reserved. Keys Used in Encryption at Rest HDFS Encryption • Encryption Zone Key (EZKEY) • This key much like a mount key is associated with an encryption zone in HDFS. • Encrypted Data Encryption Key (EDEK) • This is an encrypted copy of a Data Encryption Key. • Data Encryption Key (DEK) • This is the real data encryption key used to encrypt data stored within a file, zone, or block device. This particular key concept is used in both Navigator Encrypt and HDFS Transparent Data Encryption (TDE).
42© 2014 Cloudera,
Inc. All rights reserved. Keys Used in Encryption at Rest (1) When an EZ is created, the administrator specifies an encryption zone key (EZ Key) that is already stored in the backing keystore. The EZ Key encrypts the data encryption keys (DEKs) that are used in turn to encrypt each file. DEKs are encrypted with the EZ key to form an encrypted data encryption key (EDEK), which is stored on the NameNode via an extended attribute on the file (2) To encrypt a file, the client retrieves a new EDEK from the NameNode, and then asks the KMS to decrypt it with the corresponding EZ key. This step results in a DEK (3) the client uses a DEK to encrypt their data (3). (4)To decrypt a file, the client needs to again decrypt the file’s EDEK with the EZ key to get the DEK (2). Then, the client reads the encrypted data and decrypts it with the DEK .
43© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Writing a File HDFS KMS Client To Trustee 2 3 6 7 1 5 8 1. create file 2. generate key 3. encrypted key 4. store encrypted key 5. file handle & encrypted key 6. decrypt encrypted key 7. decrypted key 8. encrypt & write data 4 ©2014 Cloudera, Inc. All rights reserved.
44© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Reading a File HDFS KMS Client To Trustee 3 4 1 2 5 1. open file (passed read permission check) 2. file handle & encrypted key 3. decrypt encrypted key 4. decrypted key 5. read & decrypt data ©2014 Cloudera, Inc. All rights reserved.
45© 2014 Cloudera
and/or its affiliates. All rights reserved. HDFS Encryption Implementation and Usage
46© 2014 Cloudera,
Inc. All rights reserved. Enabling HDFS Encryption on a Cluster • Need recent version of libcrypto.so on HDFS and MapReduce client hosts • To check use the following command: hadoop checknative Output openssl: true /usr/lib64/libcrypto.so • yum install openssl openssl-devel • openssl package installs the library, openssl-devel creates the libcrypto.so symlink (you can manually create this as well) • Openssl provides AES-NI integration for Intel hardware
47© 2014 Cloudera,
Inc. All rights reserved. Enabling HDFS Encryption on a Cluster Using Cloudera Manager 1) Adding the KMS Service - add service Java KeyStore KMS on a host 2) Enabling Java KeyStore KMS for the HDFS Service • HDFS service – configuration tab • Scope > HDFS (Service-Wide) • Category > All • KMS Service property – turn on radio button SAVE CHANGES Restart Cluster Deploy Client Configuration.
48© 2014 Cloudera,
Inc. All rights reserved. Creating Encryption Zones • Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones. # Create an encryption key for your zone as the application user that will be using the key $ hadoop key create myKey # Create a new empty directory and make it an encryption zone $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName myKey -path /zone # To see the key zones $ hdfs crypto –listZones
49© 2014 Cloudera,
Inc. All rights reserved. Adding Files to an Encryption Zones Remember they start empty! You cannot create a Zone in directories with data hadoop distcp /user/dir /user/enczone • By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. • When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. • Use -skipcrccheck and -update flags to avoid verifying checksums. • Also use the distcp flags to preserve all attributes (-prbugpcaxt)
50© 2014 Cloudera,
Inc. All rights reserved. Unified Governance Foundation Unified Auditing Comprehensive Lineage Unified Metadata Universal Policies Search Define Analyze Profile Self-Service Discovery & Analytics Effortlessly find and trust the data that matters most Audit Track Encrypt Manage Keys Compliance-Ready Governance & Protection Track, understand, and protect access to sensitive data Report Optimize Migrate Maintain Models Active Data Optimization Configure Hadoop to boost user productivity Classify Steward Backup Retain Hadoop-Scale Data Lifecycle Management Maximize cluster performance at Hadoop scale with ease Cloudera Navigator The only integrated data management and governance platform for Hadoop
51© 2014 Cloudera,
Inc. All rights reserved. Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems MasterCard Cloudera: The first PCI-Certified Hadoop Platform Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar Chief Technology Officer, Architecture MasterCard
jarred@cloudera.com
Anúncio