This document discusses data quality in metadata. It addresses several topics related to metadata standards and ensuring metadata provides meaningful communication. Key points include:
- The importance of using standard definitions for metadata elements rather than local interpretations to ensure consistent understanding across trading partners.
- The value of "small data" or uniquely identified data points that distinguish individual works from each other over broad keywords.
- Issues that can arise when metadata does not clearly specify market information like territories, which can result in works being distributed in unintended markets.
- The opportunity for ONIX 3.0 to improve how different markets and their attributes are specified in metadata.
- Recommendations for organizations to carefully map their needs to metadata
3. Data Quality
Benchmarking
From IIBA BABOK Guide version 2.0
Purpose:
Benchmark studies are performed to compare the strengths
and weaknesses of an organization against its peers and
competitors.
4. Data Quality
Benchmarking
From IIBA BABOK Guide version 2.0
Description
Benchmark studies are conducted to compare organizational
practices against the best in-class practices that exist within
competitor enterprises.
5. Data Quality
Data producers focus on requests made by end users
and what they support.
End users make sense of the data they receive.
One mirrors the other.
6. Data Quality
Identical metadata cannot be traded using multiple
trading partners’ definitions, but it can be traded to
multiple partners by using the standard’s definitions.
Any business decision to use a standard should
include the decision to use the definitions of the
standard.
13. Data Quality
Nothing stops companies from systematically
refusing to implement a portion of a standard.
Standards are particularly vulnerable to
requests from, or received wisdom about,
large players.
17. Data Quality
ONIX for Books
Acknowledgement Format Specification
A means to trade information to support metadata
trading, including two way communication.
Amazon intends to start using this tool.
19. Data Quality
“Harry Potter” & “COVID” are now ineffective searches.
Author searches return comparable authors.
Entries need careful study to determine irrelevancy.
Processing includes pointless repetition of indexed
entries: series, author, title, audience and subject fields.
20. Data Quality
Does it make any sense to supply to trading
partners keywords that are designed to manipulate
consumer search results on a specific retail site?
Can you manipulate Amazon’s search meaningfully
given it has the world’s most developed and
complex consumer data and that keyword
creators can only access a small portion of it?
21. Data Quality
Retail search optimization may belong with the
site owner, and not with their data suppliers,
but I can confirm that
search degraded on BNC CataLIst when
keyword searching was added.
23. Data Quality
Data quality, when defined as
meaningful communication,
would include retailers being clear about
what they index to support their search.
25. Data Quality
SMALL DATA
as an aspirational definition
Small data is when a specific data point appears on less
than 10 percent of the books in an aggregated database
BUT
more than 90 percent of the books in that aggregate
carry more than one of these small data points.
26. Data Quality
Keywords are a poor substitute for small data.
Thema, a structured predictable
subject system of great nuance,
would provide far better support
for the use of small data.
28. Data Quality
Does Canada represent a distinct market?
Canadian metadata regularly does not supply,
or supplies partial,
market information.
29. Data Quality
Some common and convenient metadata choices
that blur market differences
• Publisher Sales Rights statements used to cover distributor rights.
• Not assigning distributors with a territory
• Supplying both CAD and USD pricing to exclusive distributors of
one market
• Not applying price territory
• Inconsistent territory statements that don’t carry any logical
pattern for the product
30. Data Quality
Automated processing of Canadian metadata by
major US wholesalers and distributors has resulted
in them assigning Canadian distribution rights to
themselves.
What Canadian companies don’t supply can’t be
processed by any data aggregator
31. Data Quality
The transition to ONIX 3.0 is an
opportunity to solve this.
ONIX 3.0 provides for and expects
greatly improved market statements
32. Data Quality
Each Block 6 “Product Supply” represents a unique
Market and should include a Market composite
Its triple A support can include
• A market specific publishing status
• A market specific publication date
• A market specific embargo date
33. Data Quality
Of over 300,000 ONIX 3.0 records in BiblioShare
• Less than 10% carry the “expected” market
composite
• 59% of the Supplier Roles, a “mandatory” data
point, are coded as “undefined”
35. Data Quality
Do your own mapping.
• Know how your business needs are communicated in
any standard you use.
• Make sure staff doing data entry have access to the
full ONIX code lists including their definitions.
• Benchmark your needs against the standard as well
as what your trading partners support & request.
36. Data Quality
• Add “small data” values that reflect your business
• Let your trading partners and standards organizations
know what you support.
• Let your trading partners and standards orgaznistions
know what you need.
Expect your trading partners to do the same.