SlideShare uma empresa Scribd logo
1 de 33
Apache Atlas:
Why Big Data Management
Requires Hierarchical Taxonomies
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development,
may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from
inception to release through Apache, however, technical feasibility, market demand, user feedback and
the overarching Apache Software Foundation community development process can all effect timing
and final delivery.
This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Hortonworks to deliver these features in any
generally available product.
Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not
rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speakers
Andrew Ahn
Governance Director
Product Management
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Atlas Overview
• Near term roadmap
• Taxonomy Benefits
• Questions
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Overview
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DGI* Community becomes Apache Atlas
May
2015
Proto-type
Built
Apache
Atlas
Incubation
DGI group
Kickoff
Feb
2015
Dec
2014
July
2015
HDP 2.3
Foundation
GA Release
First kickoff to GA in 7 months
Global Financial
Company
* DGI: Data Governance Initiative
Faster & Safer
Co-Development driven
by customer use cases
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
STRUCTURED
UNSTRUCTURED
Vision - Enterprise Data Governance Across Platfroms
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Project 1
Project 5
Project 4
Project 3
Metadata
Project 6
DATA
LAKE
Atlas: Metadata Truth in Hadoop
Data Management
along the entire data lifecycle with integrated
provenance and lineage capability
Modeling with Metadata
enables comprehensive data lineage through a
hybrid approach with enhanced tagging and
attribute capabilities
Interoperable Solutions
across the Hadoop ecosystem, through a common
metadata store
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Metadata Services
• Cross- component dataset
lineage. Centralized location for
all metadata inside HDP
• Single Interface point for
Metadata Exchange with
platforms outside of HDP
• Business Taxonomy based
classification. Conceptual,
Logical And Technical
Apache Atlas
Hive
Ranger
Falcon
Sqoop
Storm
Kafka
Spark
NiFi
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data Management Through Metadata
Management Scalability
Many traditional tools and patterns do not scale when applied to multi-tenant data lakes.
Many enterprise have silo’d data and metadata stores that collide in the data lake. This is
compounded by the ability to have very large windows (years). Can traditional EDW tools
manage 100 million entities effectively with room to grow ?
Metadata Tools
Scalable, decoupled, de-centralized manage driven through metadata is the only via solution.
This allows quick integration with automation and other metamodels
Tags for Management, Discovery and Security
Proper metadata is the foundation for business taxonomy, stewardship, attribute based
security and self-service.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas High Level Architecture
Type System
Repository
Search DSL
Bridge
Hive Storm
Falcon Others
REST API
Graph DB
Search
Kafka
Sqoop
Connectors
MessagingFramework
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Taxonomies Benefits:
• Discovery – Business catalog of conceptual,
logical and physical assets
• Security --Dynamic metadata based Access
control
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Roadmap:
Summer 2016
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Dataset Lineage
Custom
Activity
Reporter
Metadata
Repository
RDBMS
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Catalog
Breadcrumbs for
taxonomy context path
Contents at
taxonomy context
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Technical and Logical Metadata Exchange
Knowledge
Store
Atlas
REST API
Structured
Unstructured
Files:
XML / JSON
3rd Party
Vendors
Custom
Reporter
Non-Hadoop Taxonomy
Data Lineage
Technical Metadata
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance Ready Certification Program
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self Service
Visualization
Curated: Selected group of vendor partners to provide rich,
complimentary and complete features
Choice: Customers choose features that they want to
deploy—a la carte versus vendor lock
Agile: Low switching costs, Faster deployement and
innovation
Standard: Common SLA & common open metadata store
Flexibility: Interoperability of products through Atlas
metadata
HDP at core to provide stability and interoperability
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Taxonomy Inheritance
Human
Resources
Drivers
(Dimension)
Timesheets
(Facts)
PII
PIIPII
Parent
ChildChild
Logical
Business
Taxonomy
Data
Assets
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy
Apache Ranger + Atlas Integration
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy Driven by metadata
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tag-based Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements must be tag
based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP substitution
maybe required. The rule enforcement but be geo aware.
• Time-based policy – Timer for data access, de-coupled from deletion
of data.
• Prohibitions – Prevention of combination of Hive tables that may
pose a risk together.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Atlas work with Ranger at scale?
Atlas provides: Metadata
• Business Classification (taxonomy): Company > HR > Driver
• Hierarchy with Inheritance of attribute to child objects: Sensitive
“PII” tag of department HR will be inherited by group HR> Driver
• Atlas will notify Ranger via Kafka Topic for changes
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Atlas provides the
metadata tag to
create policies
Ranger provides: Access & Entitlements
• Ranger will cache tags and asset mapping for performance
• Ranger will have a policy based on tags instead of roles.
• Example: PII = <group> This can work for a may assets.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use cases drives design – high reliability
Metastore
• Tags
• Assets
• Entities
Notification
Framework
Kafka Topics
Atlas
Atlas Client
• Subscribes to
Topic
• Gets Metadata
Updates
PDP
Resource Cache
Ranger
Notification Metadata
updates
Message
durability
Optimized
for Speed
Event driven
updates
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Security
• Discovery & Lineage
Preview Demo
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Availability:
- Tech Preview VMs: May 2016
- GA Release: Summer 2016
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions ?
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Online Resources
VM: https://s3.amazonaws.com/demo-drops.hortonworks.com/HDP-
Atlas-Ranger-TP.ova —> Download Public Preview VM
Tutorial: https://github.com/hortonworks/tutorials/tree/atlas-ranger-
tp/tutorials/hortonworks/atlas-ranger-preview
Blog: http://hwxjojo.wpengine.com/blog/the-next-generation-of-
hadoop-based-security-data-governance/ (this is giving an error, right
now)
Learn More: http://hortonworks.com/solutions/atlas-ranger-
integration/
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tag Based Security Video:
https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view?usp=sharing
https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view
?usp=sharing
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF: Dataflow Governance Solution
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Dataflow Security Use case Requirements
Accelerated Data Collection: An
integrated, data source agnostic
collection platform
Increased Security and
Unprecedented Chain of Custody:
Secure from source to storage with
high fidelity data provenance
The Internet of Any Thing (IoAT): A
Proven Platform for the Internet of
Things
http://hortonworks.com/hdf/
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Grade Governance Dataflow Solution
Filtered
Metadata
• HDP Taxonomy
• Centrallized
Metadata
Repository
• Downstream HDP
Impacts
• Cross component
lineage
• 3rd Party
integration
• Guaranteed
Delivery
• Data Buffering
• Prioritized
Queueing
• Flow specific QoS
• Visual Command
& Control
Months
Lineage
Years
Lineage
Reference
Taxonomy
(Tags)
Event level
versus Dataset
level
HDF - NiFI
Operation
Control
Maximum
Fidelity
Event Level
HDP – Atlas
Governance
Management
Medium / Low
Fidelity
Dataset Level
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Expanded visibility throughout the eco-system
HDF
ETL
Hive
Hive Hook
(Native)
Security
Appliance
Data
Metadata
NiFi
NiFi
NiFi
NiFi
Kafka
Hive Hook
(Native)
Hive
Hive Hook
(Native)
HDP
Atlas
Metadata
Repository
Centralized
Repository for
multiple NiFi
Deployments
End to end
data lineage
Security
Appliance
Security
Appliance
Security
Appliance
Security
Appliance
Security
Appliance

Mais conteúdo relacionado

Mais procurados

Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentationinam_slides
 
SAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaSAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaOracle
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 

Mais procurados (20)

Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
SAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaSAP HANA Data integration using Informatica
SAP HANA Data integration using Informatica
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 

Destaque

Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopDataWorks Summit
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Apache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaApache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaPrajwal Rao
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataDataWorks Summit
 
Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Seetharam Venkatesh
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료Han Woo PARK
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지NAVER D2
 

Destaque (20)

Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Apache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaApache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & Grafana
 
2015 Automic Automation Heroes
2015 Automic Automation Heroes2015 Automic Automation Heroes
2015 Automic Automation Heroes
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
 
Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료
빅데이터 네트워크 분석 노드엑셀 따라잡기 보도자료
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지
 

Semelhante a Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies

Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark Summit
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteHortonworks
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 

Semelhante a Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies (20)

Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies

  • 1. Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speakers Andrew Ahn Governance Director Product Management
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Atlas Overview • Near term roadmap • Taxonomy Benefits • Questions
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Overview
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DGI* Community becomes Apache Atlas May 2015 Proto-type Built Apache Atlas Incubation DGI group Kickoff Feb 2015 Dec 2014 July 2015 HDP 2.3 Foundation GA Release First kickoff to GA in 7 months Global Financial Company * DGI: Data Governance Initiative Faster & Safer Co-Development driven by customer use cases
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved STRUCTURED UNSTRUCTURED Vision - Enterprise Data Governance Across Platfroms TRADITIONAL RDBMS METADATA MPP APPLIANCES Project 1 Project 5 Project 4 Project 3 Metadata Project 6 DATA LAKE Atlas: Metadata Truth in Hadoop Data Management along the entire data lifecycle with integrated provenance and lineage capability Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Metadata Services • Cross- component dataset lineage. Centralized location for all metadata inside HDP • Single Interface point for Metadata Exchange with platforms outside of HDP • Business Taxonomy based classification. Conceptual, Logical And Technical Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Management Through Metadata Management Scalability Many traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ? Metadata Tools Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels Tags for Management, Discovery and Security Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas High Level Architecture Type System Repository Search DSL Bridge Hive Storm Falcon Others REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomies Benefits: • Discovery – Business catalog of conceptual, logical and physical assets • Security --Dynamic metadata based Access control
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Near Term Roadmap: Summer 2016
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Catalog Breadcrumbs for taxonomy context path Contents at taxonomy context
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Technical and Logical Metadata Exchange Knowledge Store Atlas REST API Structured Unstructured Files: XML / JSON 3rd Party Vendors Custom Reporter Non-Hadoop Taxonomy Data Lineage Technical Metadata
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization Curated: Selected group of vendor partners to provide rich, complimentary and complete features Choice: Customers choose features that they want to deploy—a la carte versus vendor lock Agile: Low switching costs, Faster deployement and innovation Standard: Common SLA & common open metadata store Flexibility: Interoperability of products through Atlas metadata HDP at core to provide stability and interoperability
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Taxonomy Inheritance Human Resources Drivers (Dimension) Timesheets (Facts) PII PIIPII Parent ChildChild Logical Business Taxonomy Data Assets
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Apache Ranger + Atlas Integration
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Driven by metadata
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tag-based Access Policy Requirements • Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation. • Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware. • Time-based policy – Timer for data access, de-coupled from deletion of data. • Prohibitions – Prevention of combination of Hive tables that may pose a risk together.
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Atlas work with Ranger at scale? Atlas provides: Metadata • Business Classification (taxonomy): Company > HR > Driver • Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver • Atlas will notify Ranger via Kafka Topic for changes Apache Atlas Hive Ranger Falcon Kafka Storm Atlas provides the metadata tag to create policies Ranger provides: Access & Entitlements • Ranger will cache tags and asset mapping for performance • Ranger will have a policy based on tags instead of roles. • Example: PII = <group> This can work for a may assets.
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use cases drives design – high reliability Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • Security • Discovery & Lineage Preview Demo
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Availability: - Tech Preview VMs: May 2016 - GA Release: Summer 2016
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions ?
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Online Resources VM: https://s3.amazonaws.com/demo-drops.hortonworks.com/HDP- Atlas-Ranger-TP.ova —> Download Public Preview VM Tutorial: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview Blog: http://hwxjojo.wpengine.com/blog/the-next-generation-of- hadoop-based-security-data-governance/ (this is giving an error, right now) Learn More: http://hortonworks.com/solutions/atlas-ranger- integration/
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tag Based Security Video: https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view?usp=sharing https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view ?usp=sharing
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF: Dataflow Governance Solution
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Dataflow Security Use case Requirements Accelerated Data Collection: An integrated, data source agnostic collection platform Increased Security and Unprecedented Chain of Custody: Secure from source to storage with high fidelity data provenance The Internet of Any Thing (IoAT): A Proven Platform for the Internet of Things http://hortonworks.com/hdf/
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Grade Governance Dataflow Solution Filtered Metadata • HDP Taxonomy • Centrallized Metadata Repository • Downstream HDP Impacts • Cross component lineage • 3rd Party integration • Guaranteed Delivery • Data Buffering • Prioritized Queueing • Flow specific QoS • Visual Command & Control Months Lineage Years Lineage Reference Taxonomy (Tags) Event level versus Dataset level HDF - NiFI Operation Control Maximum Fidelity Event Level HDP – Atlas Governance Management Medium / Low Fidelity Dataset Level
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Expanded visibility throughout the eco-system HDF ETL Hive Hive Hook (Native) Security Appliance Data Metadata NiFi NiFi NiFi NiFi Kafka Hive Hook (Native) Hive Hive Hook (Native) HDP Atlas Metadata Repository Centralized Repository for multiple NiFi Deployments End to end data lineage Security Appliance Security Appliance Security Appliance Security Appliance Security Appliance

Notas do Editor

  1. TALK TRACK Data is powering successful clinical care and successful operations. [NEXT SLIDE]
  2. How fast ? 7 months !
  3. 7
  4. Apache Atlas is the only open source project created to solve the governance challenge in the open. The founding members of the project include all the members of the data governance initiative and others from the Hadoop community. The core functionality defined by the project includes the following: Data Classification – create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources Centralized Auditing – provide a framework to capture and report on access to and modifications of data within Hadoop Search & Lineage – allow pre-defined and ad hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed Security and Policy Engine – implement engines to protect and rationalize data access and according to compliance policy
  5. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  6. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  7. Which Vendors would you be interested in ?
  8. The point of Atlas is to leverage metadata to drive exchange, agility and scalability in the HDP gov solution.   The paradigm shift requires that in a true data lake with multi-tenant environment with 10K+ of objects, conventional management of entitlement and enforcement will not work and new patterns must be used.   One group cannot both understand the data and manage policy efficiently — the domain is too large.  These activities must be de-coupled.   The data stewards curate the data as they are the SMEs (tagging), and the policy folks create a policy once based on tags (access rules).    In our thinking, this the ONLY scalable solution.   We have it and CDH does not.
  9. Apache Atlas = low level service like yarn. It will be common to the whole HDP platform, providing core metadata services and enriching the whole HDP stack. We start with Hive in HDP 2.3 and will extend to Ranger and Falcon in M10 and continue with Kafka and Storm by the end of 2015. Yellow + Atlas = governance features.
  10. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  11. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagonsis ** bring meta from external systems into hadoop – keep it together