Brief and skeptical presentation about wikidata and its potential for use and abuse in the cultural heritage data ecosystem, presented at the PCC/LDAC forum on wikidata, November 12th, 2021.
4. Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Ecosystems - Systems
Wikidata is a single, centralized system:
• Technology dependent (wikibase, “unique” snak format)
• Ontology is only internal (Pxxx properties)
• Vocabulary is only internal (Q instances)
• Doesn’t try to be part of the web, only on the web
• Does provide identifiers for the entity in existing systems
Wikidata is the Facebook of Metadata
5. Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Ecosystems - Community
Is there concern about wikidata establishing a “monopoly on
open data”?
• You can’t compete with free; first to market advantage
• Concerted branding/differentiation effort from
Wikimedia to establish “wikimedians” rather than
“contributors”
• Is there oxygen left for discussions about perhaps better
ways to do this?
6. Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Trust - Accuracy
Using wikidata is not a issue of control, it’s an issue of trust.
Accuracy – Does the data appropriately describe reality?
Trust – Will it continue to do so in the future?
Trust is hard given the data is open and constantly changing.
How will the reputation of your organization be affected
(positively or negatively) by using constantly changing data?
10. Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Usability – Ontology and Data
Some challenges, beyond the “unique” format:
• Properties are just identifiers (“P31”) not named (“type”)
• Records are flat … apart from qualifiers and references
• Which solve two issues, but not all
• Constantly changing records and ontology
• Without an effective synchronization mechanism
• Non-traditional conflation of ontology and vocabulary
together as instance data
11. Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Usability – Use?
What can we use wikidata for?
• Source of external identifiers!
• Source of names in different languages
• More specific information, e.g. date not year
• Relationships to other entities
Wikidata is great for augmenting cultural knowledge
with details and relationships beyond traditional catalogs
Or is that the Meta of Metadata. I’m just as confused as you.
BBC example. Impossible to know what changes will come. Today inaccuracy is mostly well-intentioned robots making poorly coded judgements. Matt Miller adding LC identifiers for places
Duplicates, Permanent Duplicates, no separation between classes and instances. How can we expect data of sufficient quality for use, when there’s no distinction between these different entities?
And it did not end there I don’t trust a system where the norms say I cannot describe myself. This is an innocuous case, but you can imagine many similar scenarios that would be very damaging to have information available, or not available.
Usability of the data is determined by the audience, and for data that is the software developer.