Technical usability of Wikidata’s linked data: evaluation of machine interoperability and data interpretability

QOD 2019 – 2nd Workshop on Quality of Open Data
June 2019
Technical usability of Wikidata’s linked data:
evaluation of machine interoperability and data interpretability
Nuno Freire, Antoine Isac

Title here
CC BY-SA
Outline
CC BY-SA
● About Europeana
● Europeana and linked data
● Why investigate Wikidata and linked data for Europeana?
● Use cases from data applications in our study
● Study setup and system
● Results
● Conclusions, ongoing and future work

Title here
CC BY-SA
Europeana
The Platform for Europe’s Digital Cultural Heritage
● Aggregates and makes available data:
• From all EU countries
• From ~3,700 galleries, libraries,
archives and museums
• Under a CC0 licence
• More than 58M objects and
• In about 50 languages
CC BY-SA
We aggregate metadata:
• From all EU countries
• ~3,700 galleries, libraries,
archives and museums
• More than 58M objects
• In more than 40 languages
• High amount of references
to places, agents, concepts,
time periods

Title here
CC BY-SA
Europeana
The Platform for Europe’s Digital Cultural Heritage
CC BY-SA
Data aggregation
focused on metadata
… with cultural
objects as the main
entity

Title here
CC BY-SA
CC BY-SA
Europeana Linked Data Strategy
Our lines of work
● The Europeana Data Model (EDM) offers a base for linking
metadata
● We apply automatic enrichment to link object metadata to
reference datasets
● We encourage data providers to contribute their own links to
vocabularies
● We encourage alignment activities between domain
vocabularies

Title here
CC BY-SA
CC BY-SA
Why Wikidata?
Complies with all the Europeana’s criteria for selecting a vocabulary:
● Properly documented and supported by a community
● Technically available on the web according to the Linked Data best practices and
recipes
● Available under an open licence
● Multilingual (Wikidata offers labels in about 124 languages from which 48 match the languages
that Europeana supports)
● Apply the best practices and standards for the representation, structure and
description of vocabularies
● Well-connected internally and externally to other vocabularies (works great as a “pivot”
vocabulary)
Additionally…
● It gives fairly complete and accurate descriptive metadata about
entities
Currently, Wikidata is a data source for enrichment of metadata in
Europeana
✔
✔
✔
✔
✔
✔

Title here
CC BY-SA
Motivation for evaluating Wikidata
and linked data
● Wikidata can be a datasource of cultural heritage objects
• Increasing interest from cultural heritage institutions in sharing
descriptions of their digital objects
● We are investigating linked data for innovating the process of
aggregation of metadata:
• Aggregation of linked data has been the subject of a case study
• Schema.org is suitable for describing cultural heritage resources
CC BY-SA

Title here
CC BY-SA
Motivation for machine interoperability
and data interpretability
● Linked data sources of cultural data are numerous but data is
heterogeneous across them
● Effective and sustainable usage of these sources must be supported
by automatic means
• A minimum level of compliance with the Semantic Web is
necessary
CC BY-SA

Title here
CC BY-SA
Use cases of linked data
consumption addressed in this
study
CC BY-SA

Title here
CC BY-SA
Our study setup
(1/3)
CC BY-SA

Title here
CC BY-SA
Our
study
setup
(2/3)
CC BY-SA

Title here
CC BY-SA
Our
study
setup
(3/3)
CC BY-SA

Title here
CC BY-SA
CC BY-SA
The linked data system

Title here
CC BY-SA
Results
CC BY-SA
● Wikidata’s RDF presents some difficulties for cross-domain
applications
● Wikidata is using a very limited number of general data processing
properties
○ most of the properties in use are labels for human users
● Wikidata has chosen to use properties from its own ontology instead
of equivalent RDF, RDF-Schema, OWL or SKOS properties
○ Without human support, applications are unable to interpret
Wikidata’s properties

Title here
CC BY-SA
The other namespaces in use in
Wikidata’s RDF output
CC BY-SA
Occurrences in the 11.798 Wikidata resources of our sample

Title here
CC BY-SA
Results - general linked data
processing
CC BY-SA
● Wikidata makes limited use of rdf:type
○ It is used just to state that the RDF resource is an Item from the
Wikibase ontology (http://wikiba.se/ontology#Item)
○ For further types, the property wdt:P31 is used.
● Not all Wikidata RDF predicate URIs are resolvable
○ In the case of property wdt:P31, it is stated in the data as
http://www.wikidata.org/prop/direct/P31, which is not resolvable.
The resolvable corresponding URI is
http://www.wikidata.org/entity/P31
○ These unresolvable URIs limit machine’s interpretation of the
predicates

Title here
CC BY-SA
Results - general linked data
processing
CC BY-SA
● In order to proceed with the experiment, we manually added
alignment statements in our knowledge base
● In fact, most of the alignments are already recorded in Wikidata, but
they are expressed using predicates from Wikidata’s namespaces
○ … limiting the interpretation by machines
(The alignments are presented in a later slide)

Title here
CC BY-SA
Results - acquiring Schema.org
semantics from Wikidata
CC BY-SA
● Equivalence and specialisation relations between classes and properties are
used to infer (direct or infered) mappings between Wikidata and Schema.org
● We came across two obstacles.
○ For finding alignments, we faced again the non-resolvable URI’s
○ For navigating Wikidata’s class and property hierarchy, we had to
manually add alignment statements in our knowledge base
■ Wikidata data properties are used to state the class and property
hierarchy
● Adding additional alignments in our knowledge base was necessary

Title here
CC BY-SA
Alignments added for enabling
automatic data processing and
ontology reasoning
CC BY-SA

Title here
CC BY-SA
CC BY-SA
● For classes, we found 102 distinct ones 57% of which had alignments to
Schema.org
○ 49% are direct alignments and 7,9% are alignments inherited from super
classes
● For properties, we found 266 distinct ones 44% of which had alignments to
Schema.org
○ only direct alignments were found for properties.

Title here
CC BY-SA
CC BY-SA
These results are a good indicator that many applications would be able
to make use of the structured data.
The listing of the individual alignments found may be consulted online
https://github.com/nfreire/data-aggregation-lab/blob/master/data-aggregation-
casestudies/documentation/wikidata/SchemaOrg-ontology-alignments-listing.md

Title here
CC BY-SA
Conclusions
CC BY-SA
● Currently, a human operator must assist linked data applications to
interpret Wikidata’s RDF
○ it requires training of human resources on Wikidata’s data model and its
representation in RDF
○ The usage of predicates from Wikidata’s own ontology makes
uninterpretable for data crawlers based on of properties for general data
processing of the Semantic Web
● Another difficulty is the use of namespaces that are not resolvable for
Wikidata’s properties
● ...but Wikidata contains enough alignment data to RDF, RDFS, OWL,
SKOS and Schema.org:
Machine interpretation of Wikidata is just a few steps away

Title here
CC BY-SA
Ongoing and future work
CC BY-SA
● At this time, we are analyzing the results of evaluating the domain-
specific use case, which is using Wikidata data for input into
Europeana's cultural heritage metadata. Our first hints are that
Wikidata provides high enough quality for this case
● In future work, we expect to evaluate the linked data published by
data providers of Europeana in terms of machine processing

Thank you for your attention
nuno.freire@tecnico.ulisboa.pt
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
Acknowledgments
Fundação para a Ciência e a Tecnologia (FCT): UID/CEC/50021/2013
European Commission contract number 30-CE-0885387/00-80.

Technical usability of Wikidata’s linked data: evaluation of machine interoperability and data interpretability

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Nuno Freire

Mais de Nuno Freire (8)

Último

Último (20)

Technical usability of Wikidata’s linked data: evaluation of machine interoperability and data interpretability