Presentation of the Performing Arts Project on Wikidata, Meeting of the Archival Working Group of the German Society for Theatre Research, Frankfurt a.M., 16 January 2018.
1. Performing Arts + Wikidata
Beat Estermann, Frankfurt am Main, 16 January 2018
▶ Bern University of Applied Sciences, E-Government Institute
Mosaic depicting the choregos and tragic actors from the tablinum in the House of the Tragic. Roman Theatre Pompeii, Italy (Pinterest)
Unless otherwise noted, the content of these slides is provided under the CC BY 4.0 license.
3. ▶ The aim of this project is to coordinate, facilitate and promote
the ingestion of cultural heritage related data into Wikidata, to
facilitate the cleansing and enhancement of this data and to
promote its use across Wikipedia, its sister projects and
beyond.
▶ It is our vision to establish Wikidata as a central hub for data
integration, data enhancement, and data management in the
heritage domain.
Aim and Vision (WikiProject Cultural Heritage)
4. ▶ Establish Wikidata as a database that covers the entire world’s cultural
heritage.
▶ Establish Wikidata as a central hub that interlinks GLAM collections
around the world; and provides links to bibliographic, genealogic,
scientifc and other collections of information; create the ultimate
authority file.
▶ Foster truly multilingual and global collaboration among people from
various backgrounds.
▶ Leverage synergies between institutions, reduce duplicate work.
▶ Encourage debate in the community by highlighting and interrogating
differences in perspective.
▶ Provide a single source of data for some of the most popular web
sites and apps, including Wikipedia infoboxes and lists.
Vision (Blog posts: Stinson et al. 2016; Thornton / Cochrane 2016; Poulter 2017)
7. ▶ Realize an international performing arts database on the
basis of Wikidata
▶ Provide a powerful finding aid for performing arts related content
on Wikimedia Commons
▶ Promote Wikidata-powered performing arts related information in
the various language versions of Wikipedia
▶ Get heritage institutions to make their performing arts related
data and content available through Wikidata & Wikimedia
Commons
Vision
8. Status Quo – Wikidata
• First experiences with the data ingestion
process
• Case reports
• Guidelines
• Performing Arts Data Model partly
implemented
• Initiated an overview of existing data
sources
• Data cleansing & linking is a great
challenge
• Still very little performance data ingested
9. ▶ Project framework established for Performing Arts Productions, Corporate Bodies,
Venues and Events:
• WikiProject « Performing Arts »
• WikiProject « Cultural Venues »
• WikiProject « Cultural Events »
Status Quo – Wikidata
Achievements & Current Challenges
• First experiences with the data ingestion process
• Case reports
• Guidelines
• Performing Arts Data Model partly implemented
• Initiated an overview of existing data sources
• Data cleansing & linking is a great challenge
• Still very little performance data ingested
PLUS: synergies with existing projects in
the area of bibliographic records
(works) and authority control (persons)
10. ▶ There is plenty of relevant material, but it needs organizing and curation
▶ Structured data on Wikimedia Commons is expected to be a great enabler
Status Quo – Wikimedia Commons
11. ▶ Current examples show great potential for the inclusion of Wikidata-
powered content in the field of the performing arts.
▶ Current initiatives may benefit from improved coordination – also
across linguistic borders.
▶ Despite many examples of how structured data is used, the data is
usually not pulled from Wikidata.
▶ Large parts of the structured data in Wikipedia related to the
performing arts is not available on Wikidata.
Status Quo – Wikipedia
12. Example: List of Productions of « Les Galas Karsenty »
(French Wikipedia)
18. Example: List of Artistic Directors and Well-Known
Artists at Stadttheater Bern (German Wikipedia)
19. ▶ Numerous Wikipedians & Wikidataists (partly organized through
WikiProjects)
▶ Swiss Archive for the Performing Arts (data provider)
▶ Various Belgian institutions (data providers)
▶ Carnegie Hall (potential data provider)
▶ Various existing data/content providers not explicitly linked to the
project
▶ Bern University of Applied Sciences
(students’ projects in the area of data ingestion)
Status Quo – Contributors
20. ▶ Dataset held by the Zurich Municipal Archives, describing 699 productions
at the Schauspielhaus
▶ Some elements of the ontology had already been present in Wikidata;
others had first to be implemented:
Newly Created Classes:
Pilot Data Ingest –
Repertoire of Schauspielhaus Zürich, 1938-1968
Newly Created Properties:
performing arts production (Q43099500)
dance production (Q43099869)
series of performances (Q43100730)
representation of (P4646)
location of first performance (P4647)
premiere type (P4634)
name of the character role (P4633)
scenographer (P4608)
21. ▶ theatrical production
(in the original language, with links to all the character roles):
Der Hauptmann von Köpenick (The Captain of Köpenick) (Q40289399)
▶ theatrical production
(in a translated version, with labels for the character roles)
Eine kleine Stadt (Our Town) (Q43689202)
▶ guest performance (series of performances)
(in the original language, with labels for the character roles):
L'école des femmes (The School for Wives) (Q43759980)
▶ premiere (single performance)
Der Hauptmann von Köpenick (The Captain of Köpenick) (Q39907209)
For further examples (e.g. for actors or character roles), see the project page.
Example Items
22. ▶ Get a thorough understanding of the source and the target data
models and provide an initial mapping between the two.
• Becoming familiar with the data structures on Wikdiata takes some time.
The documentation could be better and needs to be developed as part of
the data ingest if the ingest covers new ground.
▶ Create missing elements of the target data model
(i.e. create new classes and properties in Wikidata)
• Note that the creation of new properties requires community consensus.
This means that the proposals need to be well argued and the creation of
properties takes some time – count at least 2-3 weeks from proposal to
creation. The property discussion should be approached with an open
mind, ready to re-think one’s initial modelling decisions.
Main Steps of the Data Ingestion Process and Related
Challenges (1/2)
23. ▶ Complement the source data set with the Q-numbers of
corresponding Wikidata items.
• Detection and mapping to already existing items can partly be
automatized with OpenRefine; but some work remains to be done
manually.
• Datasets are not always completely tidy; therefore some data cleansing
may be necessary for some data fields.
• Disambiguation of items (sorting out different items with the same name
as well as cases where the same item is listed under different names)
also happens at this stage and may require some extra research.
• Items which do not exist in Wikidata need to be newly created, possibly
making an iterative data ingest necessary.
▶ Apply the data mapping to the source data in order to produce
statements that can be fed into the Quick Statements Tool which
writes them to Wikidata.
Main Steps of the Data Ingestion Process and Related
Challenges (2/2)
24. ▶ Sorting out the existing ontology elements and correcting
problematic entries in Wikidata; there may be cases where
different approaches have been used to model the same thing
(e.g. theatre seasons).
▶ Clarification of the modelling of the FRBR Group 1 classes, both
with regard to literary/choreographic/musical works and with
regard to performance works; we need to be able to correctly
describe the relations between various instances of these
classes (adaptations, translations, etc.).
▶ Modelling of artefacts which are related to particular productions
or representations (photos, stage designs, etc.).
▶ Modelling of archival structures (ISAD-G/RiC) in Wikidata.
▶ Ingesting further production databases, dealing with special
cases that were absent in the pilot ingest due to the relative
homogeneity of the dataset.
What is Next? (not necessarily in this order)
25. ▶ Implementation of further elements of the Swiss Performing Arts Data
Model on Wikidata; mapping between the two data models
▶ Implementation and promotion of data modelling examples on Wikidata
▶ Data ingestion (adding further data sets)
▶ Community building
Further tasks down the road:
• Ingest structured data from Wikipedia to Wikidata
• Monitor data quality and completeness on Wikidata
• Change existing templates to pull the data directly from Wikidata;
create additional templates where they are missing
• Enhance the metadata of performing arts related media objects on
Wikimedia Commons (linking them to a performance database
implemented on Wikidata)
Next Steps – Join us!
27. Wikidata, Wikipedia, Wikimedia
Commons
• Complementary data (artists, works,
etc.)
• Potential for crowdsourcing / community-
sourcing certain aspects of data
maintenance
• Application of the data model in an
international context
• Visibility, exposure of performing arts
related information
Swiss Performing Arts Platform
• Plenty of performance data from
Switzerland
• Comprehensive data model for the
performing arts domain
• Plenty of know-how and source material
in the area of the performing arts
• Digital content (in the longer term, due to
copyright issues)
Areas for Cooperation
• Mutual interlinking of data & content
• Community building and outreach to further data providers
• Establishing a widely accepted data modelling practice in the area of the performing arts
• Organization of editathons and similar events
29. ▶ Incorrect / incoherent data in Wikidata
▶ Lack of an explicit ontology; varying interpretations of how to use certain
properties; no explicit master language when it comes to defining properties and
classes in a multilingual environment
▶ Can Wikidata be used as the main database for heritage institutions? (as
opposed to mirroring data that is maintained in other databases)
▶ Monitoring data quality and data completeness
▶ Lack of ergonomic, customizable user interfaces for manual data entry
▶ Tools for data ingestion and monitoring need to be improved
▶ The evolving tools landscape constitutes a challenge when establishing
processes and working with guidelines
▶ How to improve guidelines, community structures, etc. in order to be able to
involve more GLAM personnel in Wikidata?
Challenges related to Wikidata
30. Thank You for Your Attention!
Contact
Beat Estermann
Bern University of Applied Sciences
beat.estermann@bfh.ch
+41 31 848 34 38