SlideShare uma empresa Scribd logo
1 de 26
1© Cloudera, Inc. All rights reserved.
Bringing Trust and Visibility to
Apache Hadoop
Mark Donsky, Product Management, Cloudera
Chang She, Software Engineering, Cloudera
2© Cloudera, Inc. All rights reserved.
The benefits of Hadoop...
One place for unlimited data
• All types
• More sources
• Faster, larger ingestion
Unified, multi-framework data access
• More users
• More tools
• Faster changes
3© Cloudera, Inc. All rights reserved.
…Cause trust, visibility, and governance challenges
Business Users
How do I find what’s
relevant?
Can I trust what I find?
How can I explore data on
my own?
Information
Security
Who’s accessing what data?
What are they doing with
the data?
Is sensitive data governed
and protected?
Can I meet compliance
needs?
Database
Admins
How is data being used
today?
How can I optimize for
future workloads?
How can I take advantage
of Hadoop risk-free and
fast?
4© Cloudera, Inc. All rights reserved.
Building blocks of governance in Hadoop
Audit Logs Lineage Data Policies
Technical
Metadata
Business
Metadata
5© Cloudera, Inc. All rights reserved.
Metadata
6© Cloudera, Inc. All rights reserved.
Enterprise metadata
The foundation for governance
Metadata enables you to put context and meaning to data
Operational
Job Run-Time Stats
Report Run Information
Hardware Usage
Scheduler Stats
Database Schema
File Definition
ETL Job Design
BI Report Definition
Data Model
Technical
Business Glossary
Enterprise Taxonomy
Ontology
Business
Data Lineage
Impact Analysis
Topology Understanding
Data Governance
Compliance Audits
7© Cloudera, Inc. All rights reserved.
Enterprise metadata
The foundation for governance
Metadata enables you to put context and meaning to data to
answer the important questions
Business Technical Operational
Unified Metadata Repository
What data or information exists?
Where is data being used?
What is the data’s business definition?
Who is responsible for the data?
How is it inter-related to other data?
Who is using the data?
Why do we need this data?
Can we trust this data?
When was this data last updated?
Who are the high-value
customers?
How do we define that?
How is high value calculated?
Where is customer data stored
and used?
Is the data reliable and
accurate?
8© Cloudera, Inc. All rights reserved.
Technical metadata – what’s available?
Hive
Query Text
Table name
Column name
Data Type
Owner
Partitions
Pig
Script name
Owner
Creation date
Last modified date
HDFS
Permissions
Owner
Group
Creation date
Last modified date
MR/YARN
JobID
Mapper Class
Reducer Class
Inputs
Outputs
9© Cloudera, Inc. All rights reserved.
Technical metadata – where can I find it?
Component Metadata
HDFS fsimage (ls –lRa /)
Hive Hive Metastore Server (database metadata tables)
MapReduce JobTracker
YARN Job History Server
Oozie Oozie Server
Pig JobTracker, Job History Server
10© Cloudera, Inc. All rights reserved.
Technical metadata – Hive metastore
Collection of structured tables containing technical
metadata about Hive databases, tables, views, and columns
11© Cloudera, Inc. All rights reserved.
Technical metadata – HCatalog
• HCatalog uses the Hive Metastore to provide a management layer
• Abstracts the file location and storage format
• Makes formats available to Pig, Hive, MapReduce, etc.
• Also accessible via REST API
12© Cloudera, Inc. All rights reserved.
Business metadata – can we do this in Hadoop?
• Custom metadata is vital for trust and visibility
• Find all files associated with a particular clinical trial
• Locate all statements for high-profile customers
• Where is my sensitive data?
• Where is the protected health information?
• No - Hadoop doesn’t support business metadata
13© Cloudera, Inc. All rights reserved.
Hadoop Auditing
14© Cloudera, Inc. All rights reserved.
Hadoop audit logs – what do they look like?
• Logs all file system
access requests
• Impala, HBase and
other components use
a similar format
• Implemented in log4j
at the INFO level
{ "allowed": true,
"serviceName": "HDFS-1”,
"username": "training”,
"src": "/user”,
"eventTime": 1398544478141,
"ipAddress": "10.20.187.39”,
"operation": "getfileinfo”,
"dest": null,
"permissions": null,
"impersonator": null,
"delegationTokenId": null
}
{ "serviceName": "HIVE-1",
"username": "admin",
"impersonator": null,
"ipAddress": "10.20.187.39",
"operation": "QUERY",
"eventTime": 1398402718797,
"operationText": "select count(*) from salesdata",
"allowed": true,
"databaseName": "default",
"tableName": "salesdata",
"resourcePath": "/user/hive/warehouse/salesdata",
"objectType": "TABLE"
}
HDFS Audit Log Hive Audit Log
HDFS Property: Log4j.logger.org.apache.hadoop.hdfs.
server.namenode.FSNamesystem.audit
15© Cloudera, Inc. All rights reserved.
Hadoop audit logs – where can I find them?
Component Default Location (CDH)
HDFS Audit Logs /var/log/hadoop-hdfs/audit
Hive Audit Logs /var/log/hive/audit
Impala Audit Logs /var/log/impalad/audit
HBase Audit Logs /var/log/hbase/audit
• Log files are automatically rotated when a size limit is reached
• Location and size limit are configurable
16© Cloudera, Inc. All rights reserved.
Hadoop audit logs – limitations
• Consolidation
• Persistence
• Filtering
• Integration
17© Cloudera, Inc. All rights reserved.
Lineage
18© Cloudera, Inc. All rights reserved.
Lineage – how to track lineage
• You can’t do this easily – you used to need to track this manually unless you’re
using a tool like Cloudera Navigator
• But…lineage is embedded in Hadoop technical metadata
• Job configurations provide inputs/outputs
• Hive metastore provides location of HDFS directory where data resides
• Hive/Impala queries can be interpreted to provide fine-grained column-level
lineage between query input-output
• Some relationships (e.g., directory–file) are implicit
19© Cloudera, Inc. All rights reserved.
Data Policies
20© Cloudera, Inc. All rights reserved.
Data policies – Hadoop limitations
• Information is of limited use unless it is actionable
• There is a treasure trove of actionable information in the metadata that the various
Hadoop services emit
• Archival of unused data
• Encryption of sensitive data
• Remediation of incorrect permissions
• Triggers should be configurable based on user-defined criteria
• Hadoop does not offer a sufficient policy engine or action framework
21© Cloudera, Inc. All rights reserved.
Building blocks of trust and visibility in Hadoop
Audit Logs Lineage Data Policies
Technical
Metadata
Business
Metadata
22© Cloudera, Inc. All rights reserved.
Cloudera Navigator
Overview & Demo
23© Cloudera, Inc. All rights reserved.
Cloudera Navigator
The only integrated data management and governance platform for Hadoop
Governance & Foundational Layer
Business Metadata Technical Metadata Lineage Policies Audit Logs
Self-Service
Discovery & Analytics
Data Scientists & BI Users
Effortlessly find and trust the data
that matters most
Search
Data definitions
Analytics
Profiling
Usage-Driven
Model Optimization
Hadoop Administrators & DBAs
Configure Hadoop to boost user
productivity
Migration
Optimization
Reporting
Model maintenance
Compliance-Ready
Governance & Protection
Information Security
Track, understand and protect
access to sensitive data
Auditing
Lineage
Encryption
Key management
Active Data Management &
Information Lifecycle
Management
Data Stewards & Curators
Maximize cluster performance at
Hadoop scale with ease
Classification
Stewardship
Backup
Retention
24© Cloudera, Inc. All rights reserved.
Trust and visibility is an ecosystem
Data
Systems
Enterprise Data Hub
Security and Administration
Unlimited Storage
Process Discover Model Serve
System Integration
Infrastructure
More than 1,600 partners
ensure compatibility with existing
investments, lower skill barriers, and
help maximize value from your data.
Operational
Tools
Applications
25© Cloudera, Inc. All rights reserved.
Learn more!
Please stop by our
booth at P13
• See a demo of Cloudera Enterprise,
including our governance solution
that’s used by nearly 200 production
customers for over two years!
• Find out what makes Cloudera
Enterprise the only PCI-certified
Hadoop distribution
• Learn about our 1600+ partner
ecosystem
26© Cloudera, Inc. All rights reserved.
Thank You!
@markdonsky
@changhiskhan

Mais conteúdo relacionado

Mais procurados

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Voltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopVoltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopHPE Security - Data Security
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...DataWorks Summit/Hadoop Summit
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 

Mais procurados (20)

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Voltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopVoltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in Hadoop
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 

Destaque

Maintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour MarketMaintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour MarketLwazi Leroy Sibisi
 
Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...Airedale International Air Conditioning Ltd
 
Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3CALSTART
 
Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica Internet World
 
APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...Association for Project Management
 
BAE Systems IFF Program Overview
BAE Systems IFF Program OverviewBAE Systems IFF Program Overview
BAE Systems IFF Program OverviewWilliam Banfi
 
Fral fdnf44 spec sheet
Fral fdnf44 spec sheetFral fdnf44 spec sheet
Fral fdnf44 spec sheetmoisturecare
 
Darwin Melgar CV November 2016
Darwin Melgar CV November 2016Darwin Melgar CV November 2016
Darwin Melgar CV November 2016Darwin Melgar
 
Maintenance Secretary
Maintenance SecretaryMaintenance Secretary
Maintenance SecretaryDarwin Melgar
 
Cultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked InCultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked Incindyhardy
 
05 controller erc & ak rc 10x optyma controller
05 controller erc & ak rc 10x   optyma controller05 controller erc & ak rc 10x   optyma controller
05 controller erc & ak rc 10x optyma controllermaldini all
 

Destaque (14)

Optimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air ForceOptimizing the image analyst's workflow for the United States Air Force
Optimizing the image analyst's workflow for the United States Air Force
 
Maintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour MarketMaintaining the Flex in Flexibility in a contentious South African Labour Market
Maintaining the Flex in Flexibility in a contentious South African Labour Market
 
Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...Innovation in Renovation - Airedale International Air Conditioning, presented...
Innovation in Renovation - Airedale International Air Conditioning, presented...
 
Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3Calstart fuel cell bus review short v3
Calstart fuel cell bus review short v3
 
Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica Moving beyond Big Data, BAE Systems Detica
Moving beyond Big Data, BAE Systems Detica
 
APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...APM Conference Manchester: What have military aircraft done for the Northwest...
APM Conference Manchester: What have military aircraft done for the Northwest...
 
BAE Systems IFF Program Overview
BAE Systems IFF Program OverviewBAE Systems IFF Program Overview
BAE Systems IFF Program Overview
 
Fral fdnf44 spec sheet
Fral fdnf44 spec sheetFral fdnf44 spec sheet
Fral fdnf44 spec sheet
 
Darwin Melgar CV November 2016
Darwin Melgar CV November 2016Darwin Melgar CV November 2016
Darwin Melgar CV November 2016
 
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
International Peacekeeping challenges in the DRC: plus ça change, plus c´est ...
 
Maintenance Secretary
Maintenance SecretaryMaintenance Secretary
Maintenance Secretary
 
Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06Cytochrome c Oxidase Jan06
Cytochrome c Oxidase Jan06
 
Cultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked InCultural Alignment Post Merger Linked In
Cultural Alignment Post Merger Linked In
 
05 controller erc & ak rc 10x optyma controller
05 controller erc & ak rc 10x   optyma controller05 controller erc & ak rc 10x   optyma controller
05 controller erc & ak rc 10x optyma controller
 

Semelhante a Bringing Trus and Visibility to Apache Hadoop

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataCloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceGoDataDriven
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Cloudera, Inc.
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 

Semelhante a Bringing Trus and Visibility to Apache Hadoop (20)

大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Bringing Trus and Visibility to Apache Hadoop

  • 1. 1© Cloudera, Inc. All rights reserved. Bringing Trust and Visibility to Apache Hadoop Mark Donsky, Product Management, Cloudera Chang She, Software Engineering, Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. The benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
  • 3. 3© Cloudera, Inc. All rights reserved. …Cause trust, visibility, and governance challenges Business Users How do I find what’s relevant? Can I trust what I find? How can I explore data on my own? Information Security Who’s accessing what data? What are they doing with the data? Is sensitive data governed and protected? Can I meet compliance needs? Database Admins How is data being used today? How can I optimize for future workloads? How can I take advantage of Hadoop risk-free and fast?
  • 4. 4© Cloudera, Inc. All rights reserved. Building blocks of governance in Hadoop Audit Logs Lineage Data Policies Technical Metadata Business Metadata
  • 5. 5© Cloudera, Inc. All rights reserved. Metadata
  • 6. 6© Cloudera, Inc. All rights reserved. Enterprise metadata The foundation for governance Metadata enables you to put context and meaning to data Operational Job Run-Time Stats Report Run Information Hardware Usage Scheduler Stats Database Schema File Definition ETL Job Design BI Report Definition Data Model Technical Business Glossary Enterprise Taxonomy Ontology Business Data Lineage Impact Analysis Topology Understanding Data Governance Compliance Audits
  • 7. 7© Cloudera, Inc. All rights reserved. Enterprise metadata The foundation for governance Metadata enables you to put context and meaning to data to answer the important questions Business Technical Operational Unified Metadata Repository What data or information exists? Where is data being used? What is the data’s business definition? Who is responsible for the data? How is it inter-related to other data? Who is using the data? Why do we need this data? Can we trust this data? When was this data last updated? Who are the high-value customers? How do we define that? How is high value calculated? Where is customer data stored and used? Is the data reliable and accurate?
  • 8. 8© Cloudera, Inc. All rights reserved. Technical metadata – what’s available? Hive Query Text Table name Column name Data Type Owner Partitions Pig Script name Owner Creation date Last modified date HDFS Permissions Owner Group Creation date Last modified date MR/YARN JobID Mapper Class Reducer Class Inputs Outputs
  • 9. 9© Cloudera, Inc. All rights reserved. Technical metadata – where can I find it? Component Metadata HDFS fsimage (ls –lRa /) Hive Hive Metastore Server (database metadata tables) MapReduce JobTracker YARN Job History Server Oozie Oozie Server Pig JobTracker, Job History Server
  • 10. 10© Cloudera, Inc. All rights reserved. Technical metadata – Hive metastore Collection of structured tables containing technical metadata about Hive databases, tables, views, and columns
  • 11. 11© Cloudera, Inc. All rights reserved. Technical metadata – HCatalog • HCatalog uses the Hive Metastore to provide a management layer • Abstracts the file location and storage format • Makes formats available to Pig, Hive, MapReduce, etc. • Also accessible via REST API
  • 12. 12© Cloudera, Inc. All rights reserved. Business metadata – can we do this in Hadoop? • Custom metadata is vital for trust and visibility • Find all files associated with a particular clinical trial • Locate all statements for high-profile customers • Where is my sensitive data? • Where is the protected health information? • No - Hadoop doesn’t support business metadata
  • 13. 13© Cloudera, Inc. All rights reserved. Hadoop Auditing
  • 14. 14© Cloudera, Inc. All rights reserved. Hadoop audit logs – what do they look like? • Logs all file system access requests • Impala, HBase and other components use a similar format • Implemented in log4j at the INFO level { "allowed": true, "serviceName": "HDFS-1”, "username": "training”, "src": "/user”, "eventTime": 1398544478141, "ipAddress": "10.20.187.39”, "operation": "getfileinfo”, "dest": null, "permissions": null, "impersonator": null, "delegationTokenId": null } { "serviceName": "HIVE-1", "username": "admin", "impersonator": null, "ipAddress": "10.20.187.39", "operation": "QUERY", "eventTime": 1398402718797, "operationText": "select count(*) from salesdata", "allowed": true, "databaseName": "default", "tableName": "salesdata", "resourcePath": "/user/hive/warehouse/salesdata", "objectType": "TABLE" } HDFS Audit Log Hive Audit Log HDFS Property: Log4j.logger.org.apache.hadoop.hdfs. server.namenode.FSNamesystem.audit
  • 15. 15© Cloudera, Inc. All rights reserved. Hadoop audit logs – where can I find them? Component Default Location (CDH) HDFS Audit Logs /var/log/hadoop-hdfs/audit Hive Audit Logs /var/log/hive/audit Impala Audit Logs /var/log/impalad/audit HBase Audit Logs /var/log/hbase/audit • Log files are automatically rotated when a size limit is reached • Location and size limit are configurable
  • 16. 16© Cloudera, Inc. All rights reserved. Hadoop audit logs – limitations • Consolidation • Persistence • Filtering • Integration
  • 17. 17© Cloudera, Inc. All rights reserved. Lineage
  • 18. 18© Cloudera, Inc. All rights reserved. Lineage – how to track lineage • You can’t do this easily – you used to need to track this manually unless you’re using a tool like Cloudera Navigator • But…lineage is embedded in Hadoop technical metadata • Job configurations provide inputs/outputs • Hive metastore provides location of HDFS directory where data resides • Hive/Impala queries can be interpreted to provide fine-grained column-level lineage between query input-output • Some relationships (e.g., directory–file) are implicit
  • 19. 19© Cloudera, Inc. All rights reserved. Data Policies
  • 20. 20© Cloudera, Inc. All rights reserved. Data policies – Hadoop limitations • Information is of limited use unless it is actionable • There is a treasure trove of actionable information in the metadata that the various Hadoop services emit • Archival of unused data • Encryption of sensitive data • Remediation of incorrect permissions • Triggers should be configurable based on user-defined criteria • Hadoop does not offer a sufficient policy engine or action framework
  • 21. 21© Cloudera, Inc. All rights reserved. Building blocks of trust and visibility in Hadoop Audit Logs Lineage Data Policies Technical Metadata Business Metadata
  • 22. 22© Cloudera, Inc. All rights reserved. Cloudera Navigator Overview & Demo
  • 23. 23© Cloudera, Inc. All rights reserved. Cloudera Navigator The only integrated data management and governance platform for Hadoop Governance & Foundational Layer Business Metadata Technical Metadata Lineage Policies Audit Logs Self-Service Discovery & Analytics Data Scientists & BI Users Effortlessly find and trust the data that matters most Search Data definitions Analytics Profiling Usage-Driven Model Optimization Hadoop Administrators & DBAs Configure Hadoop to boost user productivity Migration Optimization Reporting Model maintenance Compliance-Ready Governance & Protection Information Security Track, understand and protect access to sensitive data Auditing Lineage Encryption Key management Active Data Management & Information Lifecycle Management Data Stewards & Curators Maximize cluster performance at Hadoop scale with ease Classification Stewardship Backup Retention
  • 24. 24© Cloudera, Inc. All rights reserved. Trust and visibility is an ecosystem Data Systems Enterprise Data Hub Security and Administration Unlimited Storage Process Discover Model Serve System Integration Infrastructure More than 1,600 partners ensure compatibility with existing investments, lower skill barriers, and help maximize value from your data. Operational Tools Applications
  • 25. 25© Cloudera, Inc. All rights reserved. Learn more! Please stop by our booth at P13 • See a demo of Cloudera Enterprise, including our governance solution that’s used by nearly 200 production customers for over two years! • Find out what makes Cloudera Enterprise the only PCI-certified Hadoop distribution • Learn about our 1600+ partner ecosystem
  • 26. 26© Cloudera, Inc. All rights reserved. Thank You! @markdonsky @changhiskhan

Notas do Editor

  1. Cloudera partners more broadly and deeply across the Hadoop ecosystem than any other vendor. With over 1200 partners and counting, our partnerships offer: Compatibility with your existing tools and skills 160+ certified on Cloudera 5, including all 12 of the 12 Gartner Business Intelligence Magic Quadrant leaders Flexible deployment options On-premises Public, private, or hybrid cloud Appliances and engineered systems Partnerships you can trust Deep engineering relationships Comprehensive certification program