This presentation was provided by Jim Hahn of The University of Pennsylvania, during the NISO event "Transforming Search: What the Information Community Can and Should Build." The virtual conference was held on August 26, 2020.
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Hahn "Wikidata as a hub to library linked data re-use"
1. Wikidata as a hub to library
linked data re-use
Jim Hahn
Head of Metadata Research, University of Pennsylvania
2. Objectives
Attendees will ….
● gain an appreciation of data re-use as applied to linked data in order to
advance discovery enhancement projects at their institution;
● understand how Wikidata is utilized for enriching and distributing linked data
on the web in order to make use of Wikidata as a persistent structured data
source;
● become aware of a Penn pilot project for Penn Faculty visibility on Wikidata
and associated graph networks.
3. Defining Data Re-use
From Carlson & Anderson (2007, pp. 635) in the Journal of Computer Mediated
Communication:
Investments in e-science technologies are motivated by a multiplicity of factors.
First is the urgency to manage the increasingly large quantities of complex data
produced by digital technologies and digitally enabled science. ‘‘Deluge,’’ ‘‘waves,’’
and ‘‘knowledge overload’’ are some of the terms used to describe the situation
(Hey & Trefethen,2003). Another related factor is the concern of funding
bodies to ‘‘repurpose’’ their investments in data to avoid what is, in turn,
termed ‘‘data tombs in mono-disciplinary silos’’ and to see a maximum
return on their investments.
4. Data Re-use
In What are data? The many kinds of data and their implications for data re-use.
Samuelle Carlson and Ben Anderson (2007), compile several case studies to
illustrate the “life-stages” of data, including:
● Data Collection;
● Data Formatting;
● Data Release;
● Data Re-Use.
Significant findings for data re-use suggest challenges ahead -- namely that there
are a variety of data practices and assumptions across the disciplines studied..
5. Importance of Data Re-use
Wikidata practices and Library data practices are not necessarily incompatible, but
must be negotiated for understanding and importantly, context.
Wikidata itself is a more contemporary project with less legacy data problems than
libraries.
Library data are wonderfully expressive, unique, and complex. Library data were
created as strings, and only very recently has entity based classification and
cataloging processes started.
7. Why Wikidata
OCLC made use of the infrastructure (Wikibase) for Project Passage is continuing
to expand upon use Wikibase for the Mellon Funded Entity Management Project.
There is an opportunity to re-use the Wikidata properties for ongoing Linked Data
Research at the Libraries Wibase in particular offers several advantages, including
the following:
● local control over linked data from several disparate projects in linked data
● Creating links among UPenn Scholarship and the broader web of linked data.
Recent studies have pointed to the importance of Wikimedia content to
search engines/discovery on the web writ large (Vincent & Hecht, 2020).
8. Wikimedia content in recommenders...
● Recommender systems development to promote library content (Tsuji, 2019).
○ Linked data offers some advantages over big data for providing personalization/recommender
services (Campbell & Cowan, 2016); e.g. recommendations are based on structured
knowledge not based on personal data mining.
9. How Libraries make use of Wikidata
Libraries have made a series of contributions to Wikidata…
● LD4 (Linked Data 4 Production) in Particular has engaged a Wikidata Affinity
Group
● OCLC Entity Management Project
● Share-VDE
● Library of Congress NACO (Name Authority Cooperative Program)
10. Wikidata Examples in LD4
https://www.wikidata.org/wiki/Wikidata:WikiProject_Linked_Data_for_Production/Practical_Wikidata_for_Librarians
11. Wikidata Example in OCLC
https://www.oclc.org/en/worldcat/oclc-and-linked-data.html
18. Problem Statement
Penn Libraries would like to foster the discovery of Penn scholarship on the web.
Most search engines will crawl Wikidata for incorporation of structured data onto
their search results.
Most search engines will now create Knowledge Panels for authors, agents, and
works. Current inventory keeping tools are not crawled by search engines and this
is a problem area for supporting visibility of Penn faculty.
20. Wikidata Processing at Penn
Begin by integrating school level structured data into Wikidata:
https://www.wikidata.org/wiki/Q7896091
21. Wikidata Processing at Penn
Add Department Level Structured Data using the “part of” property for associating
with school...
https://www.wikidata.org/wiki/Q89100047
22. Wikidata Processing at Penn
Add Wikidata for Faculty...
https://www.wikidata.org/wiki/Q6127558
25. Faculty Data Re-use in Wikidata
For faculty pages we ….
● Add Faculty IDs if available: VIAF ID, ISNI, Library of Congress authority ID,
Share-VDE author ID, WorldCat Identities ID
● Associate Faculty with Department using "member of (P463)" property
● Associate Publications to faculty
For non-existing works, we created work pages and add the "author" property
linked with Q number for author.
26. Scholia page re-using structured data in Wikidata
https://scholia.toolforge.org/organization/Q89100047
27. Penn researcher profile re-using structured Wikidata
https://scholia.toolforge.org/author/Q6127558
30. Program for Cooperative Cataloging (PCC) Pilot
Charge: The Wikidata Working group will lead Penn participation in the PCC Pilot
Project for Identity Management in Wikidata. Penn's initial focus will be to leverage
Online Books/Back Files serials to the PCC Wikidata objectives.
Activities: For PCC Pilot - We are making sure that serial issues in Penn Libraries
Deep Backfiles have Wikidata entries that clearly identify them and distinguish
them from other serials.
31. Resources
Campbell, D. G., & Cowan, S. R. (2016). The Paradox of Privacy: Revisiting a Core Library Value in an Age of Big Data and
Linked Data. Library Trends, 64(3), 492–511. https://doi.org/10.1353/lib.2016.0006
Carlson, S. & Anderson, B. (2007). What are data? The many kinds of data and their implications for data re-use. Journal of
Computer-Mediated Communication, 12, 635-651.
DOI: 10.1111/j.1083-6101.2007.00342.x
Possemato, T. (2018). From MARC to BIBFRAME in the SHARE-VDE project. ALA Annual Meeting.
https://www.loc.gov/bibframe/news/pdf/share-vde-alaal2018.pdf
Tsuji, K. (2019). Book Recommender System for Wikipedia Article Readers in a University Library.
8th International Congress on Advanced Applied Informatics (IIAI-AAI), 121–126.
https://doi.org/10.1109/IIAI-AAI.2019.00034
Vincent, N., & Hecht, B. (2020). A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search
Engines. https://arxiv.org/abs/2004.10265
32. Suggested Reading
Experimentations with Wikidata/Wikibase. Hanging Together: The OCLC Research Blog.
https://hangingtogether.org/?p=8002
Vanderbot: a python script for writing to Wikidata: https://baskauf.blogspot.com/2020/02/vanderbot-python-script-for-writing-to.html