Wikidata is an outstanding data source with potential application in many scenarios. Wikidata provides its data openly in RDF. Our study aims to evaluate the usability of Wikidata as a data source for robots operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning. We evaluated from the perspective of two use cases of data crawling robots, which are guided by our general motivation to acquire richer data for Europeana, a data aggregator from the Cultural Heritage domain. The first use case regards general data consumption applications based on RDF, RDF-Schema, OWL, SKOS and linked data. The second case regards applications that explore semantics relying on Schema.org and SKOS. We conclude that a human operator must assist linked data applications to interpret Wikidata’s RDF because of the choices that were taken at Wikidata in the definition of its expression in RDF. The semantics of the RDF output from Wikidata is “locked-in” by the usage of Wikidata’s own ontology, resulting in the need for human intervention. Wikidata is only a few steps away from high quality machine interpretation, however. It contains extensive alignment data to RDF, RDFS, OWL, SKOS and Schema.org, but a machine interpretation of those alignments can only be done if some essential Wikidata alignment properties are known.
Research paper presentation at QOD 2019 – 2nd Workshop on Quality of Open Data, June 2019
Uncommon Grace The Autobiography of Isaac Folorunso
Technical usability of Wikidata’s linked data: evaluation of machine interoperability and data interpretability
1. QOD 2019 – 2nd Workshop on Quality of Open Data
June 2019
Technical usability of Wikidata’s linked data:
evaluation of machine interoperability and data interpretability
Nuno Freire, Antoine Isac
2. Title here
CC BY-SA
Outline
CC BY-SA
● About Europeana
● Europeana and linked data
● Why investigate Wikidata and linked data for Europeana?
● Use cases from data applications in our study
● Study setup and system
● Results
● Conclusions, ongoing and future work
3. Title here
CC BY-SA
Europeana
The Platform for Europe’s Digital Cultural Heritage
● Aggregates and makes available data:
• From all EU countries
• From ~3,700 galleries, libraries,
archives and museums
• Under a CC0 licence
• More than 58M objects and
• In about 50 languages
CC BY-SA
We aggregate metadata:
• From all EU countries
• ~3,700 galleries, libraries,
archives and museums
• More than 58M objects
• In more than 40 languages
• High amount of references
to places, agents, concepts,
time periods
4. Title here
CC BY-SA
Europeana
The Platform for Europe’s Digital Cultural Heritage
CC BY-SA
Data aggregation
focused on metadata
… with cultural
objects as the main
entity
5. Title here
CC BY-SA
CC BY-SA
Europeana Linked Data Strategy
Our lines of work
● The Europeana Data Model (EDM) offers a base for linking
metadata
● We apply automatic enrichment to link object metadata to
reference datasets
● We encourage data providers to contribute their own links to
vocabularies
● We encourage alignment activities between domain
vocabularies
6. Title here
CC BY-SA
CC BY-SA
Why Wikidata?
Complies with all the Europeana’s criteria for selecting a vocabulary:
● Properly documented and supported by a community
● Technically available on the web according to the Linked Data best practices and
recipes
● Available under an open licence
● Multilingual (Wikidata offers labels in about 124 languages from which 48 match the languages
that Europeana supports)
● Apply the best practices and standards for the representation, structure and
description of vocabularies
● Well-connected internally and externally to other vocabularies (works great as a “pivot”
vocabulary)
Additionally…
● It gives fairly complete and accurate descriptive metadata about
entities
Currently, Wikidata is a data source for enrichment of metadata in
Europeana
✔
✔
✔
✔
✔
✔
7. Title here
CC BY-SA
Motivation for evaluating Wikidata
and linked data
● Wikidata can be a datasource of cultural heritage objects
• Increasing interest from cultural heritage institutions in sharing
descriptions of their digital objects
● We are investigating linked data for innovating the process of
aggregation of metadata:
• Aggregation of linked data has been the subject of a case study
• Schema.org is suitable for describing cultural heritage resources
CC BY-SA
8. Title here
CC BY-SA
Motivation for machine interoperability
and data interpretability
● Linked data sources of cultural data are numerous but data is
heterogeneous across them
● Effective and sustainable usage of these sources must be supported
by automatic means
• A minimum level of compliance with the Semantic Web is
necessary
CC BY-SA
14. Title here
CC BY-SA
Results
CC BY-SA
● Wikidata’s RDF presents some difficulties for cross-domain
applications
● Wikidata is using a very limited number of general data processing
properties
○ most of the properties in use are labels for human users
● Wikidata has chosen to use properties from its own ontology instead
of equivalent RDF, RDF-Schema, OWL or SKOS properties
○ Without human support, applications are unable to interpret
Wikidata’s properties
15. Title here
CC BY-SA
The other namespaces in use in
Wikidata’s RDF output
CC BY-SA
Occurrences in the 11.798 Wikidata resources of our sample
16. Title here
CC BY-SA
Results - general linked data
processing
CC BY-SA
● Wikidata makes limited use of rdf:type
○ It is used just to state that the RDF resource is an Item from the
Wikibase ontology (http://wikiba.se/ontology#Item)
○ For further types, the property wdt:P31 is used.
● Not all Wikidata RDF predicate URIs are resolvable
○ In the case of property wdt:P31, it is stated in the data as
http://www.wikidata.org/prop/direct/P31, which is not resolvable.
The resolvable corresponding URI is
http://www.wikidata.org/entity/P31
○ These unresolvable URIs limit machine’s interpretation of the
predicates
17. Title here
CC BY-SA
Results - general linked data
processing
CC BY-SA
● In order to proceed with the experiment, we manually added
alignment statements in our knowledge base
● In fact, most of the alignments are already recorded in Wikidata, but
they are expressed using predicates from Wikidata’s namespaces
○ … limiting the interpretation by machines
(The alignments are presented in a later slide)
18. Title here
CC BY-SA
Results - acquiring Schema.org
semantics from Wikidata
CC BY-SA
● Equivalence and specialisation relations between classes and properties are
used to infer (direct or infered) mappings between Wikidata and Schema.org
● We came across two obstacles.
○ For finding alignments, we faced again the non-resolvable URI’s
○ For navigating Wikidata’s class and property hierarchy, we had to
manually add alignment statements in our knowledge base
■ Wikidata data properties are used to state the class and property
hierarchy
● Adding additional alignments in our knowledge base was necessary
20. Title here
CC BY-SA
Results - acquiring Schema.org
semantics from Wikidata
CC BY-SA
● For classes, we found 102 distinct ones 57% of which had alignments to
Schema.org
○ 49% are direct alignments and 7,9% are alignments inherited from super
classes
● For properties, we found 266 distinct ones 44% of which had alignments to
Schema.org
○ only direct alignments were found for properties.
21. Title here
CC BY-SA
Results - acquiring Schema.org
semantics from Wikidata
CC BY-SA
These results are a good indicator that many applications would be able
to make use of the structured data.
The listing of the individual alignments found may be consulted online
https://github.com/nfreire/data-aggregation-lab/blob/master/data-aggregation-
casestudies/documentation/wikidata/SchemaOrg-ontology-alignments-listing.md
22. Title here
CC BY-SA
Conclusions
CC BY-SA
● Currently, a human operator must assist linked data applications to
interpret Wikidata’s RDF
○ it requires training of human resources on Wikidata’s data model and its
representation in RDF
○ The usage of predicates from Wikidata’s own ontology makes
uninterpretable for data crawlers based on of properties for general data
processing of the Semantic Web
● Another difficulty is the use of namespaces that are not resolvable for
Wikidata’s properties
● ...but Wikidata contains enough alignment data to RDF, RDFS, OWL,
SKOS and Schema.org:
Machine interpretation of Wikidata is just a few steps away
23. Title here
CC BY-SA
Ongoing and future work
CC BY-SA
● At this time, we are analyzing the results of evaluating the domain-
specific use case, which is using Wikidata data for input into
Europeana's cultural heritage metadata. Our first hints are that
Wikidata provides high enough quality for this case
● In future work, we expect to evaluate the linked data published by
data providers of Europeana in terms of machine processing
24. Thank you for your attention
nuno.freire@tecnico.ulisboa.pt
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
Acknowledgments
Fundação para a Ciência e a Tecnologia (FCT): UID/CEC/50021/2013
European Commission contract number 30-CE-0885387/00-80.