O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Multilinguality of Metadata
Measuring the Multilingual Degree of Europeana‘s Metadata
Juliane Stiller1, Péter Király2
1 Be...
Agenda
1. Multilinguality in Europeana
2. Multilingual Score for Metadata
3. Implementation
4. Discussion & Future Work
2
Plattform for Cultural
Heritage Material
www.europeana.eu
3
○ Books, newspapers, letters, paintings,
photographs, radio shows, films, etc.
○ Text, images, video, audio, sounds, 3D
○ ...
Thumbnail
Metadata
Link to Provider
Metadata Multilinguality
6+ 40 other languages....
The Multilingual Problem
7
○ Mona Lisa 456 results
○ La Gioconda 365 results
○ La Joconde 71 results
http://www.europeana....
Metadata Enrichment
8
Quantify the Multilinguality of Data to
○ Take measures to improve multilinguality in data
○ Establish a sense of the mult...
Multilingual Score for Metadata
10
Multilingual saturation of metadata
11
Text w/o language annotation (dc.subject: Germany)
Text w language annotation (dc.s...
Calculation
Missing field
Text string without language tag (language not known)
Text string with 2-3 different language ta...
Example score
13
Text w/o language annotation (dc.subject: Germany):
Text w language annotation (dc.subject: Germany@en)
T...
Aggregation of property dc:subject
The Wittgenstein
Archives at the
University of
Bergen: high
saturation
National Library...
Good examples
"Die Mauer muß weg!"@de
"Die Mauer muß weg! (The Wall
must go!)"@en
15
"Kommentiertes Fotorama mit
Bildern v...
Implementation
source codes: http://pkiraly.github.io/about/#source-codes
data source: http://hdl.handle.net/21.11101/0000...
Data processing workflow
web interfacestatistical analysismeasuringingestion
★ OAI-PMH
★ Europeana API
★ Hadoop
★ NoSQL
★ ...
Visualization
1818
APIs,
abstraction,
reusing
"Place/skos:altLabel": {
"instances": [
{"TRANSLATION": 2.0},
{"TRANSLATION": 2.0},
{"TRANSLATI...
Discussion & Future
Work
20
extension I. recalculation
The new metrics
★ Distinct languages per object
★ Language tags per object
★ Literals per langu...
extension II. record views
ex:providerProxy
dc:subject "special relativity"@en ;
dc:creator <http://vocab.getty.eu/ulan/50...
extension II. record views
source field link value ① ② ③ ④
ex:providerProxy dc:subject literal "special relativity"@en ① ②...
Questions
○ contact
juliane.stiller@ibi.hu-berlin.de
peter.kiraly@gwdg.de
○ Metadata Quality Assurance
Framework
http://14...
Appendix
Europeana data structure in 30 sec
provider proxy
Europeana proxy
Agent
Concept
Place
Timespan
descriptive fields...
Próximos SlideShares
Carregando em…5
×

Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s Metadata

711 visualizações

Publicada em

Presentation at 15th International Symposium of Information Science (ISI 2017, isi2017.ib.hu-berlin.de/), Berlin, March 14, 2017

Publicada em: Dados e análise
  • Get the best essay, research papers or dissertations. from ⇒ www.WritePaper.info ⇐ A team of professional authors with huge experience will give u a result that will overcome your expectations.
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Überprüfen Sie die Quelle ⇒ www.WritersHilfe.com ⇐ . Diese Seite hat mir geholfen, eine Diplomarbeit zu schreiben.
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s Metadata

  1. 1. Multilinguality of Metadata Measuring the Multilingual Degree of Europeana‘s Metadata Juliane Stiller1, Péter Király2 1 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin 2 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen ISI 2017, March 14, 2017 1 Languages by eltpics
  2. 2. Agenda 1. Multilinguality in Europeana 2. Multilingual Score for Metadata 3. Implementation 4. Discussion & Future Work 2
  3. 3. Plattform for Cultural Heritage Material www.europeana.eu 3
  4. 4. ○ Books, newspapers, letters, paintings, photographs, radio shows, films, etc. ○ Text, images, video, audio, sounds, 3D ○ Over 54 million objects ○ > 50 languages Europeana - Facts http://statistics.europeana.eu/europeana 4
  5. 5. Thumbnail Metadata Link to Provider
  6. 6. Metadata Multilinguality 6+ 40 other languages....
  7. 7. The Multilingual Problem 7 ○ Mona Lisa 456 results ○ La Gioconda 365 results ○ La Joconde 71 results http://www.europeana.eu/portal/en/r ecord/90402/RP_F_00_351.html
  8. 8. Metadata Enrichment 8
  9. 9. Quantify the Multilinguality of Data to ○ Take measures to improve multilinguality in data ○ Establish a sense of the multilingual reach of Europeana ○ Distribution of languages ○ Devise strategies for underrepresented languages
  10. 10. Multilingual Score for Metadata 10
  11. 11. Multilingual saturation of metadata 11 Text w/o language annotation (dc.subject: Germany) Text w language annotation (dc.subject: Germany@en) Text w several language annotations (dc.subject: Germany@en, Deutschland@de) Link to (multilingual) vocabulary (http://www.geonames.org /2921044/ federal-republic-of-germany)
  12. 12. Calculation Missing field Text string without language tag (language not known) Text string with 2-3 different language tags Text string with 4-9 different language tags Text string with more than 10 different language tags Link to (multilingual) vocabulary Text string with language tag (language known) NA 0 1 2 2.3 2.6 3
  13. 13. Example score 13 Text w/o language annotation (dc.subject: Germany): Text w language annotation (dc.subject: Germany@en) Text w several language annotations (dc.subject: Germany@en, Deutschland@de) Link to (multilingual) vocabulary (http://www.geonames.org /2921044/ federal-republic-of-germany) 0 1 2 3
  14. 14. Aggregation of property dc:subject The Wittgenstein Archives at the University of Bergen: high saturation National Library Portugal: low saturation 14http://144.76.218.178/europeana-qa/saturation.php?collectionId=all&field=proxy_dc_subject&type=average
  15. 15. Good examples "Die Mauer muß weg!"@de "Die Mauer muß weg! (The Wall must go!)"@en 15 "Kommentiertes Fotorama mit Bildern von 1989-1990 in Berlin"@de "Annotated images from 1989- 1990 in Berlin"@en dc:descriptiondc:title "Brandenburger Tor"@de "Brandenburg Gate"@en "Grenzübergang Potsdamer Platz"@de "Postdamer Platz border crossing"@en "Reichstag"@de "Reichstag building"@en Place/skos:prefLabel Descriptive fields Subject headings
  16. 16. Implementation source codes: http://pkiraly.github.io/about/#source-codes data source: http://hdl.handle.net/21.11101/0000-0001-781F-7 (Europeana snapshot, 2015 december) 16
  17. 17. Data processing workflow web interfacestatistical analysismeasuringingestion ★ OAI-PMH ★ Europeana API ★ Hadoop ★ NoSQL ★ Spark ★ Hadoop ★ Java ★ Apache Solr ★ Spark ★ R ★ PHP ★ D3.js ★ highchart.js ★ NoSQL json csv json, png html, svg 17
  18. 18. Visualization 1818
  19. 19. APIs, abstraction, reusing "Place/skos:altLabel": { "instances": [ {"TRANSLATION": 2.0}, {"TRANSLATION": 2.0}, {"TRANSLATION": 2.0}, ... {"TRANSLATION": 2.40}, {"STRING": 0.0}, ], "score": { "sum": 20.40, "average": 1.85454545, "normalized": 0.649681 } }
  20. 20. Discussion & Future Work 20
  21. 21. extension I. recalculation The new metrics ★ Distinct languages per object ★ Language tags per object ★ Literals per language ★ Number of multilingual properties (a.k.a. fields) ★ Number of multilingual statements (a.k.a. field instances) ★ Average number of languages per property with language ★ Average number of languages per proxy 21
  22. 22. extension II. record views ex:providerProxy dc:subject "special relativity"@en ; dc:creator <http://vocab.getty.eu/ulan/500240971> ; dc:type <http://udcdata.info/001684> . ex:europeanaProxy dc:subject <http://dbpedia.org/resource/Physics> . <http://vocab.getty.edu/ulan/500240971> skos:prefLabel "Einstein, Albert"@de . standard vocabulary <http://dbpedia.org/resource/Physics> skos:prefLabel "Physics"@en . <http://udcdata.info/001684> skos:prefLabel "Books in general"@en . standard vocabulary non-standard vocabulary 22
  23. 23. extension II. record views source field link value ① ② ③ ④ ex:providerProxy dc:subject literal "special relativity"@en ① ② ③ ④ dc:creator standard "Einstein, Albert"@de ① ② ③ ④ dc:type non-std "Books in general"@en ② ④ ex:europeanaProxy dc:subject standard "Physics"@en ③ ④ ① data provider's proxy and standard enrichments ② data provider's proxy and enrichments ③ all proxies and standard enrichments ④ all proxies and enrichments 23
  24. 24. Questions ○ contact juliane.stiller@ibi.hu-berlin.de peter.kiraly@gwdg.de ○ Metadata Quality Assurance Framework http://144.76.218.178/europeana-qa ○ Europeana Data Quality Committee http://pro.europeana.eu/page/dat a-quality-committee 24
  25. 25. Appendix Europeana data structure in 30 sec provider proxy Europeana proxy Agent Concept Place Timespan descriptive fields subject headings semanticweb

×