The interest in Data Catalogs is growing as more business & technical users are looking to gain insight from data using a self-service approach. Architectural techniques for Data Provisioning and Metadata Cataloging have evolved to cater to these new audiences and ways of working. This webinar provides concrete methods of architecting your Self-service BI & Analytics environment to foster collaboration while at the same time maintaining Data Quality and reducing risk.
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Data Catalogues - Architecting for Collaboration & Self-Service
1. Copyright Global Data Strategy, Ltd. 2019
Data Catalogues: Architecting for
Collaboration & Self-Service
Donna Burbank
Global Data Strategy, Ltd.
August 26th, 2019
Follow on Twitter @donnaburbank
Twitter Event hashtag: #DAStrategies
2. Global Data Strategy, Ltd. 2019
Donna Burbank
2
Donna is a recognised industry expert in
information management with over 20 years
of experience in data strategy, information
management, data modeling, metadata
management, and enterprise architecture.
Her background is multi-faceted across
consulting, product development, product
management, brand strategy, marketing,
and business leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment of
business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of the
leading data management products in the
market.
As an active contributor to the data
management community, she is a long time
DAMA International member, Past President
and Advisor to the DAMA Rocky Mountain
chapter, and was recently awarded the
Excellence in Data Management Award from
DAMA International in 2016.
Donna is also an analyst at the Boulder BI
Train Trust (BBBT) where she provides advice
and gains insight on the latest BI and
Analytics software in the market. She was on
several review committees for the Object
Management Group’s for key information
management and process modeling
notations.
She has worked with dozens of Fortune 500
companies worldwide in the Americas,
Europe, Asia, and Africa and speaks regularly
at industry conferences. She has co-
authored two books: Data Modeling for the
Business and Data Modeling Made Simple
with ERwin Data Modeler and is a regular
contributor to industry publications. She can
be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
Follow on Twitter @donnaburbank
Twitter Event hashtag: #DAStrategies
3. Global Data Strategy, Ltd. 2019
DATAVERSITY Data Architecture Strategies
• January 24 - on demand Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February 18 - on demand Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March 28 - on demand Data Modeling at the Environment Agency of England - Case Study
• April 25 - on demand Data Governance - Combining Data Management with Organizational Change
• May 23 - on demand Master Data Management - Aligning Data, Process, and Governance
• June 27 - on demand Enterprise Architecture vs. Data Architecture
• July 25 - on demand Metadata Management: Technical Architecture & Business Techniques
• August 22 - on demand Data Quality Best Practices (w/ guest Nigel Turner)
• Sept 26 Data Catalogues: Architecting for Collaboration & Self-Service
• October 24 Data Modeling Best Practices: Business and Technical Approaches
• December 3 Building a Future-State Data Architecture Plan: Where to Begin?
3
This Year’s Lineup
4. Global Data Strategy, Ltd. 2019
Today’s Topic
The interest in Data Catalogs is growing as more business & technical users are looking to gain insight
from data using a self-service approach.
Architectural techniques for Data Provisioning and Metadata Cataloging have evolved to cater to
these new audiences and ways of working.
This webinar provides concrete methods of architecting your Self-service BI & Analytics environment
to foster collaboration while at the same time maintaining Data Quality and reducing risk.
4
5. Global Data Strategy, Ltd. 2019
What is a Data Catalog?
A data catalog creates and maintains an inventory of data
assets through the discovery, description and organization of
distributed datasets. The data catalog provides context to
enable data stewards, data/business analysts, data engineers,
data scientists and other line of business (LOB) data consumers
to find and understand relevant datasets for the purpose of
extracting business value.
Modern machine-learning-augmented data catalogs automate
various tedious tasks involved in data cataloging, including
metadata discovery, ingestion, translation, enrichment and the
creation of semantic relationships between metadata.
• Gartner, 12 September 2019 - ID G00394570
5
6. Global Data Strategy, Ltd. 2019
Data Catalog or Metadata Catalog?
6
M.C.Escher from Wikimedia Commons
7. Global Data Strategy, Ltd. 2019
Data Catalog or Metadata Repository?
• There exists functional differences between full metadata repositories and data catalogs.
• Like any tools, functionality is a continuum, with overlap depending on a vendor, but be careful to
consider your use cases before purchasing any tool.
7
Metadata Repository
• Automated technical metadata discovery
• Search capability
• Data lineage
• Impact analysis
• Standards enforcement
• Business rule alignment
• Semantic Framework
Data Catalog
• Automated metadata discovery
• Intuitive user search
• Collaboration & User Ranking
• “Light touch” standards enforcement
Vendor Functionality Spectrum
Encyclopedia
9. Global Data Strategy, Ltd. 2019
Product Catalog
9
Easily Search & Discover
Key Items of Interest
Collaborate with other
Users
Understand Relevance &
Ranking
View Related Items
Organize by Subject Area
or Department
Easily Obtain / Purchase
Items of Interest
View Product Details and
Specifications
10. Global Data Strategy, Ltd. 2019
Product Data Management
10
Product Master Data
To align common:
• SKUs
• Product Name
• Description
• Price
• Etc.
PIM and/or Doc/Image Mgt.
To align common:
• Images
• Branding
• Etc.
Operational Data
To track:
• Customer Purchase Activity
Reference Data
To track:
• Common Departments, Regions,
Brands, etc.
NoSQL and/or Graph Database
To track:
• Recommendations
• Usage Ranking
• Etc.
Semantic Layer
Data models, taxonomies, hierarchies, etc. to
track:
• Product hierarchy
• Organizational structure
• Brand structure
• Etc.
11. Global Data Strategy, Ltd. 2019
Data Catalog
11
Discussion Forum
JoeD “This doesn’t include lapsed
customers – where do I find that?
MaryK “Does anyone have a NPS
query I could use?”
Table: Customer
Description: The Customer Table provides a list of de-duplicated individuals who have
purchased one or more products within the past 18 months.
Columns
Name Data Type Description
First Name Char(20) Given name of customer
Last Name Char (50) Family name(s) of customer
Gender Varchar(1) Biological gender
Member Since Date Date joining loyalty program.
Views
Customer_Demographics
Customer_Address
Related Dashboards
Customer Segmentation
Top Customers by Region
Usage Ranking
Business Areas
- Marketing
- Development
- Sales
Data Assets
- Tables
- Views
- Dashboards
CustomerSearch:
12. Global Data Strategy, Ltd. 2019
Data Catalog
12
Discussion Forum
JoeD “This doesn’t include lapsed
customers – where do I find that?
MaryK “Does anyone have a NPS
query I could use?”
Table: Customer
Description: The Customer Table provides a list of de-duplicated individuals who have
purchased one or more products within the past 18 months.
Columns
Name Data Type Description
First Name Char(20) Given name of customer
Last Name Char (50) Family name(s) of customer
Gender Varchar(1) Biological gender
Member Since Date Date joining loyalty program.
Views
Customer_Demographics
Customer_Address
Related Dashboards
Customer Segmentation
Top Customers by Region
Usage Ranking
Business Areas
- Marketing
- Development
- Sales
Data Assets
- Tables
- Views
- Dashboards
CustomerSearch:
Easily Search & Discover
Key Items of Interest
Collaborate with other
Users
Understand Relevance &
Ranking
View Related Items
Organize by Subject Area
or Department
Easily Obtain / “Purchase”
Items of Interest
View Details and
Specifications
13. Global Data Strategy, Ltd. 2019
Metadata Repository
13
Metadata Storage, Integration
& Publication
Data Lineage & Impact Analysis
14. Global Data Strategy, Ltd. 2019
Machine Learning & Metadata Discovery
• Machine Learning offers ways to automate
tedious tasks that may have been done
manually before:
• e.g. Data Mapping
• SSN -> Field1_SSN
• SSN -> Soc_Num
• Etc.
• Machine Learning Pattern Matching
• NNN-NN-NNNN -> Field_X follows this
pattern, it must be a SSN
14
Source kdnuggets.com
• There is a place for both methods:
• Sometimes you want to define specific mapping rules
• Sometimes you want a pattern-matching, discovery-
style approach.
15. Global Data Strategy, Ltd. 2019
Collaboration to Support the Self-Service User
15
“If there are standardized
data sets, I’d love to use
them!”
e.g. Master Data, Data Warehouse
“Published documentation,
metadata, & standard
definitions are super-helpful!”
e.g. Glossaries, data models, etc.
“I want to integrate these data
sets with my own exploratory
data for analysis & modeling!”
e.g. Self-Service Data Prep & Analysis Tools
“How can I leverage what other
people have done, and see
what is most relevant?
e.g. Data Cataloguing & Crowdsourcing
Today’s self-service data preparation & reporting user makes use of a wide variety of tools & technologies.
16. Global Data Strategy, Ltd. 2019
Integration with Data Governance is Key
• In order to use data catalogues effectively for business success, clear processes and procedures
need to be in place for the governance of and interaction between these different data
landscapes. Examples include:
• Data stewardship roles for curated data sets
• Automated feedback loops to encourage collaborative input for business definitions and rules
• Review cycles for standard data sets, reports, analytical models, etc.
• Publication and distribution mechanisms for shared data sets
• Processes for data promotion between data discovery and enterprise use
• Data lifecycle and workflow
16
17. Global Data Strategy, Ltd. 2019
Implement “Just Enough” Data Governance
• Know what to manage closely and what to leave alone
• The more the data is shared across & beyond the organization, the more formal governance needs to be
17
Core Enterprise
Data
Functional & Operational
Data
Exploratory Data
Reference &
Master Data
Core Enterprise Data
• Common data elements used by multiple
stakeholders, departments, etc. (e.g. DW)
• Highly governed
• Highly published & shared
Functional & Operational Data
• Lightly modeled & prepared data for
limited sharing & reuse
• Collaboration-based governance
• May be future candidates for core data
Exploratory Data
• Raw or lightly prepped data for
exploratory analysis
• Mainly ad hoc, one-off analysis
• Light touch governance
Examples
• Operational Reporting
• Non-productionized analytical model data
• Ad hoc reporting & discovery
Examples
• Raw data sets for exploratory analytics
• External & Open data sources
Examples
• Common Financial Metrics: for Financial & Regulatory Reporting
• Common Attributes: Core attributes reused across multiple areas
(e.g. Customer name, Address, etc.)
Master & Reference Data
• Common data elements used by multiple stakeholders
across functional areas, applications, etc.
• Highly governed
• Highly published & shared
Examples
• Reference Data: Department Codes, Country Codes, etc.
• Master Data: Customer, Product, Student, Supplier, etc.
Exploratory analysis
uses core data sets
when applicable
Derived variables of
value can be fed into
Core Enterprise, or
even Master Data.
PublishPromote
18. Global Data Strategy, Ltd. 2019
Summary
• Data Catalogues providing an intuitive way to access and discover core enterprise data
• Collaboration and Feedback loops are critical to success
• Integration with Data Governance is important to maintain the Data Catalogue effectively n
the long-term
• Understand your use case before choosing a tool – e.g. rigorous standards and lineage or
looser, collaborative approach?
19. Global Data Strategy, Ltd. 2019
DATAVERSITY Data Architecture Strategies
• January 24 - on demand Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February 18 - on demand Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March 28 - on demand Data Modeling at the Environment Agency of England - Case Study
• April 25 - on demand Data Governance - Combining Data Management with Organizational Change
• May 23 - on demand Master Data Management - Aligning Data, Process, and Governance
• June 27 - on demand Enterprise Architecture vs. Data Architecture
• July 25 - on demand Metadata Management: Technical Architecture & Business Techniques
• August 22 - on demand Data Quality Best Practices (w/ guest Nigel Turner)
• Sept 26 – soon on demand Data Catalogues: Architecting for Collaboration & Self-Service
• October 24 Data Modeling Best Practices: Business and Technical Approaches
• December 3 Building a Future-State Data Architecture Plan: Where to Begin?
19
Join Us Next Month
20. Global Data Strategy, Ltd. 2019
About Global Data Strategy, Ltd
• Global Data Strategy is an international information management consulting company that
specializes in the alignment of business drivers with data-centric technology.
• Our passion is data, and helping organizations enrich their business opportunities through data and
information.
• Our core values center around providing solutions that are:
• Business-Driven: We put the needs of your business first, before we look at any technology solution.
• Clear & Relevant: We provide clear explanations using real-world examples.
• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s
size, corporate culture, and geography.
• High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of
technical expertise in the industry.
20
Data-Driven Business Transformation
Business Strategy
Aligned With
Data Strategy
Visit www.globaldatastrategy.com for more information