1. The document discusses measuring the multilingual degree of metadata in Europeana, a platform providing access to over 54 million digital cultural heritage objects from over 50 languages.
2. It presents a multilingual score for metadata based on factors like presence of language tags, number of languages per field, and links to multilingual vocabularies.
3. The score is implemented by processing Europeana metadata using techniques like Apache Spark and visualized through APIs and tools to analyze the distribution of languages and identify areas for improvement.
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s Metadata
1. Multilinguality of Metadata
Measuring the Multilingual Degree of Europeana‘s Metadata
Juliane Stiller1, Péter Király2
1 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
2 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
ISI 2017, March 14, 2017
1
Languages by eltpics
2. Agenda
1. Multilinguality in Europeana
2. Multilingual Score for Metadata
3. Implementation
4. Discussion & Future Work
2
7. The Multilingual Problem
7
○ Mona Lisa 456 results
○ La Gioconda 365 results
○ La Joconde 71 results
http://www.europeana.eu/portal/en/r
ecord/90402/RP_F_00_351.html
9. Quantify the Multilinguality of Data to
○ Take measures to improve multilinguality in data
○ Establish a sense of the multilingual reach of Europeana
○ Distribution of languages
○ Devise strategies for underrepresented languages
11. Multilingual saturation of metadata
11
Text w/o language annotation (dc.subject: Germany)
Text w language annotation (dc.subject: Germany@en)
Text w several language annotations (dc.subject:
Germany@en, Deutschland@de)
Link to (multilingual) vocabulary (http://www.geonames.org
/2921044/ federal-republic-of-germany)
12. Calculation
Missing field
Text string without language tag (language not known)
Text string with 2-3 different language tags
Text string with 4-9 different language tags
Text string with more than 10 different language tags
Link to (multilingual) vocabulary
Text string with language tag (language known)
NA
0
1
2
2.3
2.6
3
13. Example score
13
Text w/o language annotation (dc.subject: Germany):
Text w language annotation (dc.subject: Germany@en)
Text w several language annotations (dc.subject:
Germany@en, Deutschland@de)
Link to (multilingual) vocabulary (http://www.geonames.org
/2921044/ federal-republic-of-germany)
0
1
2
3
14. Aggregation of property dc:subject
The Wittgenstein
Archives at the
University of
Bergen: high
saturation
National Library Portugal: low
saturation
14http://144.76.218.178/europeana-qa/saturation.php?collectionId=all&field=proxy_dc_subject&type=average
15. Good examples
"Die Mauer muß weg!"@de
"Die Mauer muß weg! (The Wall
must go!)"@en
15
"Kommentiertes Fotorama mit
Bildern von 1989-1990 in
Berlin"@de
"Annotated images from 1989-
1990 in Berlin"@en
dc:descriptiondc:title
"Brandenburger Tor"@de
"Brandenburg Gate"@en
"Grenzübergang Potsdamer Platz"@de
"Postdamer Platz border crossing"@en
"Reichstag"@de
"Reichstag building"@en
Place/skos:prefLabel
Descriptive fields Subject headings
21. extension I. recalculation
The new metrics
★ Distinct languages per object
★ Language tags per object
★ Literals per language
★ Number of multilingual properties (a.k.a. fields)
★ Number of multilingual statements (a.k.a. field instances)
★ Average number of languages per property with language
★ Average number of languages per proxy
21
22. extension II. record views
ex:providerProxy
dc:subject "special relativity"@en ;
dc:creator <http://vocab.getty.eu/ulan/500240971> ;
dc:type <http://udcdata.info/001684> .
ex:europeanaProxy
dc:subject <http://dbpedia.org/resource/Physics> .
<http://vocab.getty.edu/ulan/500240971>
skos:prefLabel "Einstein, Albert"@de .
standard vocabulary
<http://dbpedia.org/resource/Physics>
skos:prefLabel "Physics"@en .
<http://udcdata.info/001684>
skos:prefLabel "Books in general"@en .
standard vocabulary
non-standard vocabulary
22
23. extension II. record views
source field link value ① ② ③ ④
ex:providerProxy dc:subject literal "special relativity"@en ① ② ③ ④
dc:creator standard "Einstein, Albert"@de ① ② ③ ④
dc:type non-std "Books in general"@en ② ④
ex:europeanaProxy dc:subject standard "Physics"@en ③ ④
① data provider's proxy and standard enrichments
② data provider's proxy and enrichments
③ all proxies and standard enrichments
④ all proxies and enrichments
23
25. Appendix
Europeana data structure in 30 sec
provider proxy
Europeana proxy
Agent
Concept
Place
Timespan
descriptive fields
subject headings
semanticweb
Notas do Editor
Neu machen
Warum hat nun der Link zu einem kontrollierten Vokabular die höchste Sättigung? Das es uns parallele Sprachvarianten in verschiedenen Sprachen bietet von dene wir sicher sind dass es Übersetzungen sind.NOTE (Péter): Antoine distinguished at least 2 categories:1) link to vocabulary which is deferencable by Europeana (such as Geonames, VIAF, GND etc. - I would call them standard vocabularies)2) link to other vocabulary
Warum hat nun der Link zu einem kontrollierten Vokabular die höchste Sättigung? Das es uns parallele Sprachvarianten in verschiedenen Sprachen bietet von dene wir sicher sind dass es Übersetzungen sind.