Europeana.eu aggregates metadata describing more than 50 million cultural heritage objects from libraries, museums, archives and audiovisual archives across Europe. The need for quality of metadata is particularly motivated by its impact on user experience, information retrieval and data re-use in other contexts. One of the key goals of Europeana is to enable users to retrieve cultural heritage resources irrespective of their origin and the material's metadata language. The presence of multilingual metadata description is therefore essential to successful cross-language retrieval. Quantitatively determining Europeana's cross-lingual reach is a prerequisite for enhancing the quality of metadata in various languages.
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Evaluating Data Quality in Europeana: Metrics for Multilinguality
1. Evaluating Data Quality in Europeana:
Metrics for Multilinguality
Péter Király1
, Juliane Stiller2
, Valentine Charles3
, Werner Bailer4
, Nuno Freire5
1
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
2
Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
3
Europeana Foundation, The Hague
4
Joanneum Research Forschungsgesellschaft mbH, Graz
5
INESC-ID, Lisbon
MTSR 2018 - Track on Cultural Collections and Applications, Limassol, Oct. 24, 2018
1
Nummertjes by Fabio (CC BY-NC 2.0)
2. Agenda
1. Europeana
2. Multilingual Information in Europeana’s Metadata
3. Multilinguality as a Facet of Quality Dimensions
4. Results
5. Demo
2
8. Multilinguality on Field Level
<#record> a ore:Proxy ;
dc:subject “Ballet”, “Opera”@en
<#record> a ore:Proxy ; edm:europeanaProxy true ;
dc:subject <http://data.europeana.eu/concept/base/264>.
<http://data.europeana.eu/concept/base/264> a skos:Concept .
skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru,
"Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv .
Europeana Dereferencing
Literal, literal with language tag
9. Processes Contributing to Multilinguality
dc: subject
“subject”@en
dc:creator
<http://vocab.getty.edu/...>
dc:type
<http://voc.example./…>
dc:subject
<http://dbpedia.org/
aSubjectID>
dc:subject
“Subject”
Data from Provider
dc:creator
new labels in
different languages
Data added by Europeana: dereferencing step
Quantifiable: “term”@language annotation
dc:subject
New labels in different
languages
10. Quantify Multilinguality of Data to:
○ Establish a sense of the multilingual reach of Europeana, incl.
distribution of languages
○ Identify the impact of different workflows / processes on
multilinguality of data
○ Take measures to improve multilinguality in data
○ Devise strategies for underrepresented languages
11. What Could be Measured?
○ Number of (distinct) languages in the metadata
○ Number of language-tagged literals
○ Tagged literals per language
○ Existence of language information fields such as dc:language
○ Consistency and conformity of language information
13. Completeness
○ This dimension:
○ expresses the number (fraction) of fields present in a dataset
○ identifies non-empty values in a record or (sub-)collection.
○ Multilingual completeness is captured by:
○ Presence of value in dc:language
○ Share of fields with language tags to overall available fields
14. Consistency
○ Describes the logical coherence of metadata
○ Assesses variety of language values in the dc:language field:
how many distinct values?
○ Contributes to features like language-based facet
15. Conformity
○ Describes the conformity to a given standard such as ISO-639-2
○ Example: English is expressed as: English, ENG, en, en-uk, …
○ Share of values that comply or do not comply
16. Accessibility
○ Access to information and data across languages
○ Distribution of linguistic information in metadata
○ Quantifying the language tag
○ The more language tags, the higher the multilingual reach
17. Dimensions, Criteria & Measures
Dimension Criteria Measure
Completeness Presence or absence of values in fields
relating to the language of the object or
the metadata
Share of multilingual fields to overall
fields
Presence or absence of dc:language
field
Consistency Variance in language notation Distinct language notations
Conformity Compliance to ISO-639-2 Share of values that comply
Accessibility Accessibility across languages
expressed through language tags
Number of distinct languages
Number of languages/Number of
tagged literals
tagged literals per language