1. Wikidata Introductory Workshop
Swiss Archive of the Performing Arts
Beat Estermann, Bern, 30 June 2018
Unless otherwise noted, the content of this presentation is made available under the CC BY 4.0 license.
2. ▶ Short introduction to Wikidata
• What is its purpose?
• How does it work?
▶ Wikidata + GLAM
• Aim & vision
• Where do we stand?
• Zooming in on performing arts related data
▶ Wikidata + Wikipedia
• Discussion
▶ Let’s Practice!
• Querying Wikidata
• Editing Wikidata
On the Programme Today
If you want to build a ship, don’t drum up people together to collect wood and don’t
assign them tasks and work, but rather teach them to long for the endless immensity
of the sea. – Antoine de Saint Exupery
Course page on Wikidata:
https://tinyurl.com/wd-atelier2018
6. Purpose of Wikidata
▶ Centralized Interwiki-Links [Example: Bern]
▶ Centralized Data Management for Infoboxes [Example: Ferdinand de Saussure]
▶ Centralized Data Management for Lists [Example: Lista de pinturas de A. Norfini]
▶ Possibility of Querying the Data in a Standardized Format
[Example Queries / External Applications]
« The Sum of All Human Knowledge» as Linked Open Data
Multilingual
With Sourced Statements
Freely usable by anyone (CC Zero)
7. Structure of Wikidata – RDF Triples
BernBern SwitzerlandSwitzerlandis the capital ofis the capital of
Subject Predicate Object
SwitzerlandSwitzerland capitalcapital BernBern
Predicate ObjectSubject
SwitzerlandSwitzerland
Subject
is ais a
Predicate
CountryCountry
Object
instance
class
property
instance
instance
property
SwitzerlandSwitzerland
Subject
GDPGDP
Predicate
518 Mia. $ value
property
instance
point in time 2015 value
qualifier
8. Structure of Wikidata – Linked Data
Subject Predicate Object
Bern
(Q70)
is a
(P31 - instance of)
municipality of Switzerland (
Q70208)
Bern
(Q70)
is the capital of
(P1376 - is the capital of)
Switzerland
(Q39)
Berlin
(Q64)
is a
(P31 - instance of)
municipality of Germany
(Q262166)
Berlin
(Q64)
is the capital of
(P1376 - is the capital of)
Germany
(Q183)
Switzerland
(Q39)
is a
(P31 - instance of)
country
(Q6256)
Germany
(Q183)
is a
(P31 - instance of)
country
(Q6256)
municipality of Switzerland
(Q70208)
is a subclass of
(P279 - subclass of)
municipality
(Q15284)
municipality of Germany
(Q262166)
is a subclass of
(P279 - subclass of)
municipality
(Q15284)
BernBern SwitzerlandSwitzerlandis the capital ofis the capital of
Subject Predicate Object
URI
URI URI
9. Structure of a Wikidata Entry
Douglas
Adams
Douglas
Adams Jane BelsonJane Belsonspousespouse
Subject
Predicate
Object
start / end time 25 Nov. 1991 – 11 May
Ref.
10. It’s Your Turn!
• Does the WD item of your
place of residence /
provenance have a statement
for the mayor? Is it up to
date?
• Does the Wikipedia page
contain this information?
• How about the Catalan
Wikipedia? US Department of Commerce, Bureau of the Census, Public
Information Office, around 1940. NARA. Public Domain.
Find the information on the Internet
and add it to Wikidata!
Add it also to Wikipedia! (in your
language and in Catalan!)
11. Wikidata + GLAM
• Aims & vision
• Where do we stand?
• Zooming in on performing arts related data
12. ▶ The aim of this project is to coordinate, facilitate and promote
the ingestion of cultural heritage related data into
Wikidata, to facilitate the cleansing and enhancement of this
data and to promote its use across Wikipedia, its sister
projects and beyond.
▶ It is our vision to establish Wikidata as a central hub for data
integration, data enhancement, and data management in
the heritage domain.
Aim and Vision (WikiProject Cultural Heritage)
13. ▶ Establish Wikidata as a database that covers the entire world’s
cultural heritage.
▶ Establish Wikidata as a central hub that interlinks GLAM collections
around the world; and provides links to bibliographic, genealogic,
scientifc and other collections of information; create the ultimate
authority file.
▶ Foster truly multilingual and global collaboration among people
from various backgrounds.
▶ Leverage synergies between institutions, reduce duplicate work.
▶ Encourage debate in the community by highlighting and
interrogating differences in perspective.
▶ Provide a single source of data for some of the most popular web
sites and apps, including Wikipedia infoboxes and lists.
Vision (Blog posts: Stinson et al. 2016; Thornton / Cochrane 2016; Poulter 2017)
15. Current Trends in the Heritage Sector
Source: OpenGLAM Benchmark Survey
N = 1560
Wikidata
16. Core Aspects of Linked Data Publication
Source: eCH-0205 – Linked Open Data
17. ▶ http://make.opendata.ch/wiki/data:glam_ch
• Personnalités Vaudoises (BCUL)
• Swiss Photography Metadata (Büro für Fotografiegeschichte)
• Artist data from the SIKART Lexicon on art in Switzerland (SIK-ISEA)
• Metadata of the Historical Dictionary of Switzerland (HLS)
• PCP Inventory (Federal Office for Civil Protection)
• Inventory of Historical Monuments (Canton of Zurich)
• Inventory of Historical Monuments (City of Zurich)
• Inventory of classified Gardens and Parks (City of Zurich)
• Art in the Urban Space (City of Zurich)
• Swiss GLAM Inventory (OpenGLAM)
• Inventory of Research Libraries in Switzerland (Swissbib)
• ISplus Swiss (G)LAM Inventory (Swiss National Library)
• Schauspielhaus Zürich Repertoire of Theatre and other Productions, 1938–1968
• Swiss Theatre Metadata (Swiss Theatre Collection)
• Plazi TreatmentBank (repository of the world's species) (Plazi.org)
• Historical Statistics of Switzerland (University of Zurich)
Data Provision – Which Datasets are Useful?
21. ▶ Coping with the Bazaar:
• Sometimes changes to property definitions are too easily made by
volunteers
• There is a rigorous process for creating new properties, but not for
changing definitions of properties or creating new classes
• No master language; how to keep translations of definitions in synch?
• Sometimes different approaches are used to model the same thing.
▶ What are good design principles?
• Re-usability of properties across various domains
• Select high priority areas first, do not try to solve everything overnight for
the entire cultural heritage domain
• …
▶ Finding a balance between:
• The expressive power of an ontology
• Its practicability when it comes to large scale use by many people
• Its queryability (usability from the perspective of data users)
Challenges Related to Ontology Development (2/2)
22. ▶ Mapping Between Data Models
• Getting an overview of appropriate properties and classes can be a
time-consuming exercise.
• Creating new properties requires community agreement and may involve
lengthy discussions and compromises.
• There is still a lot of work to be done in the area of typologies and
thesauri [Example]
▶ Matching Items / Disambiguation
• There are tools like Mix’n’Match and OpenRefine to support this, but it
remains a major challenge, esp. with datasets which haven’t resolved this
issue internally.
▶ Incorrect / Incoherent Data on Wikidata
• Many data ingestion projects require cleansing up of existing data.
▶ Repeated Ingestion / Updates
• How to approach the historicization of data?
• How to set up processes to regularly update data?
Challenges Related to Data Ingestion
N.B.: We are not filling a void or starting from scratch, but contributing to an
existing ecosystem of data, data models, and community members!
25. ▶ Establishing and Documenting Data Quality
• Getting rid of duplicates
• Dealing with incorrect and inconsistent data
• How to monitor data quality and data completeness?
▶ Building a Network of Trust
• Linking all statements to a reliable source
• In the future: “Signed Statements”
▶ Data Exchange Between Wikidata and Primary Databases
▶ Data synchronization: How to keep data mutually up to date?
▶ How to make it easier for GLAM employees to follow
changes/improvements to their data on Wikidata?
Challenges Related to Data Maintenance
26. ▶ Chicken and Egg Problem:
• Data usage drives data quality & completeness
• Data quality & completeness are prerequisites of data use
Challenges Related to Data Use
28. ▶ Realize an international performing arts database on the
basis of Wikidata
▶ Provide a powerful finding aid for performing arts related
content on Wikimedia Commons
▶ Promote Wikidata-powered performing arts related
information in the various language versions of Wikipedia
▶ Get heritage institutions to make their performing arts
related data and content available through Wikidata &
Wikimedia Commons
Vision
29. Status Quo – Wikidata
• First experiences with the data ingestion
process
• Case reports
• Guidelines
• Performing Arts Data Model partly
implemented
• Initiated an overview of existing data
sources
• Data cleansing & linking is a great
challenge
• Still very little performance data ingested
30. ▶ Project framework established for Performing Arts Productions, Corporate
Bodies, Venues and Events:
• WikiProject « Performing Arts »
• WikiProject « Cultural Venues »
• WikiProject « Cultural Events »
Status Quo – Wikidata
Achievements & Current Challenges
•First experiences with the data ingestion process
•Case reports
•Guidelines
•Performing Arts Data Model partly implemented
•Initiated an overview of existing data sources
•Data cleansing & linking is a great challenge
•Still very little performance data ingested
PLUS: synergies with existing projects in
the area of bibliographic records (works)
and authority control (persons)
31. ▶ There is plenty of relevant material, but it needs organizing and
curation
▶ Structured data on Wikimedia Commons is expected to be a great
enabler
Status Quo – Wikimedia Commons
32. ▶ Current examples show great potential for the inclusion of
Wikidata-powered content in the field of the performing arts.
▶ Current initiatives may benefit from improved coordination –
also across linguistic borders.
▶ Despite many examples of how structured data is used, the
data is usually not pulled from Wikidata.
▶ Large parts of the structured data in Wikipedia related to the
performing arts is not available on Wikidata.
Status Quo – Wikipedia
33. Example: List of Productions of « Les Galas
Karsenty » (French Wikipedia)
39. Example: List of Artistic Directors and Well-Known
Artists at Stadttheater Bern (German Wikipedia)
40. ▶ Numerous Wikipedians & Wikidataists (partly organized
through WikiProjects)
▶ Swiss Archive for the Performing Arts (data provider)
▶ Various Belgian institutions (data providers)
▶ Carnegie Hall (data provider)
▶ Various existing data/content providers not explicitly linked
to the project
▶ Bern University of Applied Sciences
(students’ projects in the area of data ingestion)
Status Quo – Contributors
41. ▶ Dataset held by the Zurich Municipal Archives, describing 699
productions at the Schauspielhaus
▶ Some elements of the ontology had already been present in Wikidata;
others had first to be implemented:
Newly Created Classes:
Pilot Data Ingest –
Repertoire of Schauspielhaus Zürich, 1938-1968
Newly Created Properties:
performing arts production (Q43099500)
dance production (Q43099869)
series of performances (Q43100730)
representation of (P4646)
location of first performance (P4647)
premiere type (P4634)
name of the character role (P4633)
scenographer (P4608)
42. ▶ theatrical production
(in the original language, with links to all the character roles):
Der Hauptmann von Köpenick (The Captain of Köpenick) (Q40289399)
▶ theatrical production
(in a translated version, with labels for the character roles)
Eine kleine Stadt (Our Town) (Q43689202)
▶ guest performance (series of performances)
(in the original language, with labels for the character roles):
L'école des femmes (The School for Wives) (Q43759980)
▶ premiere (single performance)
Der Hauptmann von Köpenick (The Captain of Köpenick) (Q39907209)
For further examples (e.g. for actors or character roles), see the project
page.
Example Items
43. ▶ Objective: Ingest of the Carnegie Hall Performance History Database
(extract of opera-related entries)
▶ In preparation:
• Ingest of operas and arias from The Opera Database
• Scraping and ingest of data about operatic roles from German
Wikipedia (English Wikipedia to follow)
Data Ingest – Operas, Arias, Operatic Roles
46. Querying & Editing Wikidata
Querying & Editing Wikidata
▶Schauspielhaus Productions without a “based on” statement
▶Swiss heritage institutions without a “director” statement
Querying Wikidata & Editing Wikipedia
▶The performers with the most appearances in plays at
Schauspielhaus Zürich but without a Wikipedia article in German
Editing Wikidata
▶Pilot Ingest Ehrenreich Collection
Exploring Ontologies & Editing Wikidata
The task descriptions can be found on the course page:
https://tinyurl.com/wd-intro2018
47. Thank You for Your Attention!
I Hope You Will Enjoy Wikidata… ;-)
Contact
Beat Estermann
Bern University of Applied Sciences
beat.estermann@bfh.ch
+41 31 848 34 38