SlideShare uma empresa Scribd logo
1 de 24
Measuring completeness as metadata
quality metric in Europeana
Péter Király
peter.kiraly@gwdg.de
Gesellschaft für wissenschaftliche
Datenverarbeitung mbH Göttingen, Germany
Digital Humanities 2017 (Montréal, Canada)
9th August, 2017
Measuring completeness. Glossary
bit.ly/mq-dh2017 - 2
★ Metadata here: cultural heritage metadata (descriptions of books etc.)
★ Europeana a metadata aggregator from 3500+ cultural heritage
institutions with 53M metadata records http://europeana.eu
★ Big Data here: 10-100 million metadata records, 100 GB - 1.5 TB
★ EDM Europeana Data Model, Europeana’s metadata schema
Measuring completeness.
bit.ly/mq-dh2017 - 3
The problem
Measuring completeness. Generic title and bad thumbnail
bit.ly/mq-dh2017 - 4
affects search and identification
★ The Royal Library: The National Library of Denmark
and Copenhagen University Library (40,680)
★ The Royal Library: The National Library of Denmark
and Copenhagen University Library (20,688)
Measuring completeness. Non normalized institution names
5
★ National Library of the Netherlands (1,291,139)
★ National Library of the Netherlands - Koninklijke
Bibliotheek (554,068)
★ Bodleian Libraries, University of Oxford (354,441)
★ Bodleian Libraries, Oxford University (3,243)
★ Cinecittà Luce S.p.A. (372,412)
★ Cinecittà Luce (2,405)
★ LUCE (105)
difference in whitespaces (“n “)
affects “filter by institution”
function & web widget
difference in name (translations, extra attributes)
bit.ly/mq-dh2017 - 5
Measuring completeness. Non normalized values in “year” facet
6
good
★ 1666
★ 1914
bad
★ -1988
★ 13436
★ 97500000
★ 20140409
★ 1146345933
affects “filter by year” function
bit.ly/mq-dh2017 - 6
Measuring completeness. Multilinguality problem
7
★ Mona Lisa → 456 results
★ La Gioconda → 365
results
★ La Joconde → 71 results
http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html
affects search function
bit.ly/mq-dh2017 - 7
Measuring completeness. Empty fields
8
no useful information
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015) bit.ly/mq-dh2017 - 8
Measuring completeness. The question
9
How can we determine which records should be improved
and which are good enough?
we would like to have metrics like this:
support of functional requirements
good
acceptable
bad
bit.ly/mq-dh2017 - 9
Measuring completeness. Why data quality is important?
10
“Fitness for purpose” (QA principle)
purpose: to access content
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft, https://www.w3.org/TR/dwbp/
bit.ly/mq-dh2017 - 10
Measuring completeness. Hypothesis
11
by measuring structural elements we
can approximate metadata record quality
≃ metadata smell
bit.ly/mq-dh2017 - 11
Measuring completeness. Purposes
12
★improve the metadata
★services: good data → reliable functions
★better metadata schema & documentation
★propagate “good practice”
bit.ly/mq-dh2017 - 12
Measuring completeness. Proposal I. - an organization
13
Europeana Data Quality Committee
★ analyzing/revising metadata schema
★ functional requirement analysis
○ defining “enabling” elements
★ problem catalog
★ multilinguality
bit.ly/mq-dh2017 - 13
Measuring completeness. Proposal II. - a tool proposal
14
“Metadata Quality Assurance Framework”
a generic tool for measuring metadata quality
★ adaptable to different metadata schemes
★ scalable (to Big Data)
★ understandable reports for data curators
★ open source
bit.ly/mq-dh2017 - 14
Measuring completeness. What to measure?
15
★Structure and semantics
Completeness, cardinality, uniqueness, length, dictionary entry, data type
conformance, multilinguality (see [bibliography])
★Functional requirements
Requirements of the most important functions, discovery scenarios
★Problem catalog
Known metadata problems
bit.ly/mq-dh2017 - 15
Measuring completeness. Completeness categories
16
★ simple completeness
ratio of filled fields
★ cardinality of fields
which fields are filled and how intensively
★ functionalities
field groups supporting functions
○ mandatory elements
○ descriptiveness
○ searchability
○ contextualization
○ identification
○ browsing
○ …
bit.ly/mq-dh2017 - 16
Measuring completeness. Measurement levels
17
overall view collection view record view
Completeness
Field cardinality
Uniqueness
Multilinguality
Language specification
Problem catalog
etc.
links
measurements
aggregated statistics
metrics
bit.ly/mq-dh2017 - 17
Measuring completeness. Completeness score calculation
18
Weighted
cardinality
Completeness
score
Weighted
functionality
Pearson’s correlation
coefficient is 0.52
Method I Method II
weight: 2.5 × score
bit.ly/mq-dh2017 - 18
Measuring completeness. Completeness score distribution
19
Distribution of completeness scores in one dataset.
functionality-based method
★ higher scores
★ more variant
cardinality-based method
★ lower scores
★ less variant
combined method
★ closer to functionality
bit.ly/mq-dh2017 - 19
Measuring completeness. Results
20
★ lots of records miss semantic enrichments (contextual entities)
○ 6% have agent, 28% place, 32% timespan, 40% concept entities
○ only a couple of data providers have 100% coverage
★ only mandatory elements appear in each record
★ there are unused fields
★ there are overused fields
○ suggestion: generic fields → specific field
○ dc:description → dc:subject, dct:alternative, dct:tableOfContents
bit.ly/mq-dh2017 - 20
Measuring completeness. Visualization
21
bit.ly/mq-dh2017 - 21
Measuring completeness. Technical background
22
★ OAI-PMH
★ Europeana API
★ Hadoop
★ NoSQL
★ Spark
★ Hadoop
★ Java
★ Apache Solr
★ Spark
★ Scala
★ R
★ PHP
★ D3.js
★ highchart.js
★ NoSQL
ingest measure statistical
analysis
web
interface
processing workflow
json csv html, svg
json, jpg
bit.ly/mq-dh2017 - 22
Measuring completeness. Further steps
23
★scores into recommendations
★communication
★expert evaluation
★cooperation with other projects
★ingestion process
★W3C recommendations
○ Shape Constraint Language
○ Data Quality Vocabulary
★is usage in-line with scores?
★do scores change?
★machine learning-based
classification & clustering
human technical
bit.ly/mq-dh2017 - 23
Measuring completeness. Credits and links
bit.ly/mq-dh2017 - 24
This research is conducted in close collaboration with the Europeana Data
Quality Committee, thanks to all the members! Special thanks to Marco
Büchler & the eTRAP team and to the GWDG HPC experts!
★Europeana Data Quality Committee // http://pro.europeana.eu/europeana-
tech/data-quality-committee
★demo site // http://144.76.218.178/europeana-qa/
★source code (GPL v3.0) // http://pkiraly.github.io/about/#source-codes
★Europeana data (CC0) // http://hdl.handle.net/21.11101/0000-0001-781F-7
★[bibliography] // http://zotero.org/groups/metadata_assessment
★contact // peter.kiraly@gwdg.de, @kiru slides // http://bit.ly/mq-dh2017

Mais conteúdo relacionado

Semelhante a Measuring completeness as metadata quality metric in Europeana (DH 2017)

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefineOpen Knowledge Belgium
 
How Data Science can help energy companies map their infrastructure
How Data Science can help energy companies map their infrastructureHow Data Science can help energy companies map their infrastructure
How Data Science can help energy companies map their infrastructureAlex Combessie
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Datadapaasproject
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Péter Király
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSAPRBETTER
 
Big data analysis and modelling
Big data analysis and modellingBig data analysis and modelling
Big data analysis and modellingkeivan mahdavi
 
Produktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4jProduktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4jNeo4j
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryMarcus Hanwell
 
Unlocking the value : metadata and linked data at the British Library / Alan ...
Unlocking the value : metadata and linked data at the British Library / Alan ...Unlocking the value : metadata and linked data at the British Library / Alan ...
Unlocking the value : metadata and linked data at the British Library / Alan ...CIGScotland
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
 

Semelhante a Measuring completeness as metadata quality metric in Europeana (DH 2017) (20)

Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
How Data Science can help energy companies map their infrastructure
How Data Science can help energy companies map their infrastructureHow Data Science can help energy companies map their infrastructure
How Data Science can help energy companies map their infrastructure
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Alexia Meyermann: Building a research infrastructure for educational studies ...
Alexia Meyermann: Building a research infrastructure for educational studies ...Alexia Meyermann: Building a research infrastructure for educational studies ...
Alexia Meyermann: Building a research infrastructure for educational studies ...
 
Big data analysis and modelling
Big data analysis and modellingBig data analysis and modelling
Big data analysis and modelling
 
Produktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4jProduktdatenmanagement mit Neo4j
Produktdatenmanagement mit Neo4j
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
 
Unlocking the value : metadata and linked data at the British Library / Alan ...
Unlocking the value : metadata and linked data at the British Library / Alan ...Unlocking the value : metadata and linked data at the British Library / Alan ...
Unlocking the value : metadata and linked data at the British Library / Alan ...
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart Logistics
 

Mais de Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
 
Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Péter Király
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)Péter Király
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Péter Király
 

Mais de Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)
 

Último

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Measuring completeness as metadata quality metric in Europeana (DH 2017)

  • 1. Measuring completeness as metadata quality metric in Europeana Péter Király peter.kiraly@gwdg.de Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Germany Digital Humanities 2017 (Montréal, Canada) 9th August, 2017
  • 2. Measuring completeness. Glossary bit.ly/mq-dh2017 - 2 ★ Metadata here: cultural heritage metadata (descriptions of books etc.) ★ Europeana a metadata aggregator from 3500+ cultural heritage institutions with 53M metadata records http://europeana.eu ★ Big Data here: 10-100 million metadata records, 100 GB - 1.5 TB ★ EDM Europeana Data Model, Europeana’s metadata schema
  • 4. Measuring completeness. Generic title and bad thumbnail bit.ly/mq-dh2017 - 4 affects search and identification
  • 5. ★ The Royal Library: The National Library of Denmark and Copenhagen University Library (40,680) ★ The Royal Library: The National Library of Denmark and Copenhagen University Library (20,688) Measuring completeness. Non normalized institution names 5 ★ National Library of the Netherlands (1,291,139) ★ National Library of the Netherlands - Koninklijke Bibliotheek (554,068) ★ Bodleian Libraries, University of Oxford (354,441) ★ Bodleian Libraries, Oxford University (3,243) ★ Cinecittà Luce S.p.A. (372,412) ★ Cinecittà Luce (2,405) ★ LUCE (105) difference in whitespaces (“n “) affects “filter by institution” function & web widget difference in name (translations, extra attributes) bit.ly/mq-dh2017 - 5
  • 6. Measuring completeness. Non normalized values in “year” facet 6 good ★ 1666 ★ 1914 bad ★ -1988 ★ 13436 ★ 97500000 ★ 20140409 ★ 1146345933 affects “filter by year” function bit.ly/mq-dh2017 - 6
  • 7. Measuring completeness. Multilinguality problem 7 ★ Mona Lisa → 456 results ★ La Gioconda → 365 results ★ La Joconde → 71 results http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html affects search function bit.ly/mq-dh2017 - 7
  • 8. Measuring completeness. Empty fields 8 no useful information more examples in Report and Recommendations from the Task Force on Metadata Quality (2015) bit.ly/mq-dh2017 - 8
  • 9. Measuring completeness. The question 9 How can we determine which records should be improved and which are good enough? we would like to have metrics like this: support of functional requirements good acceptable bad bit.ly/mq-dh2017 - 9
  • 10. Measuring completeness. Why data quality is important? 10 “Fitness for purpose” (QA principle) purpose: to access content no metadata no access to data no data usage more explanation: Data on the Web Best Practices W3C Working Draft, https://www.w3.org/TR/dwbp/ bit.ly/mq-dh2017 - 10
  • 11. Measuring completeness. Hypothesis 11 by measuring structural elements we can approximate metadata record quality ≃ metadata smell bit.ly/mq-dh2017 - 11
  • 12. Measuring completeness. Purposes 12 ★improve the metadata ★services: good data → reliable functions ★better metadata schema & documentation ★propagate “good practice” bit.ly/mq-dh2017 - 12
  • 13. Measuring completeness. Proposal I. - an organization 13 Europeana Data Quality Committee ★ analyzing/revising metadata schema ★ functional requirement analysis ○ defining “enabling” elements ★ problem catalog ★ multilinguality bit.ly/mq-dh2017 - 13
  • 14. Measuring completeness. Proposal II. - a tool proposal 14 “Metadata Quality Assurance Framework” a generic tool for measuring metadata quality ★ adaptable to different metadata schemes ★ scalable (to Big Data) ★ understandable reports for data curators ★ open source bit.ly/mq-dh2017 - 14
  • 15. Measuring completeness. What to measure? 15 ★Structure and semantics Completeness, cardinality, uniqueness, length, dictionary entry, data type conformance, multilinguality (see [bibliography]) ★Functional requirements Requirements of the most important functions, discovery scenarios ★Problem catalog Known metadata problems bit.ly/mq-dh2017 - 15
  • 16. Measuring completeness. Completeness categories 16 ★ simple completeness ratio of filled fields ★ cardinality of fields which fields are filled and how intensively ★ functionalities field groups supporting functions ○ mandatory elements ○ descriptiveness ○ searchability ○ contextualization ○ identification ○ browsing ○ … bit.ly/mq-dh2017 - 16
  • 17. Measuring completeness. Measurement levels 17 overall view collection view record view Completeness Field cardinality Uniqueness Multilinguality Language specification Problem catalog etc. links measurements aggregated statistics metrics bit.ly/mq-dh2017 - 17
  • 18. Measuring completeness. Completeness score calculation 18 Weighted cardinality Completeness score Weighted functionality Pearson’s correlation coefficient is 0.52 Method I Method II weight: 2.5 × score bit.ly/mq-dh2017 - 18
  • 19. Measuring completeness. Completeness score distribution 19 Distribution of completeness scores in one dataset. functionality-based method ★ higher scores ★ more variant cardinality-based method ★ lower scores ★ less variant combined method ★ closer to functionality bit.ly/mq-dh2017 - 19
  • 20. Measuring completeness. Results 20 ★ lots of records miss semantic enrichments (contextual entities) ○ 6% have agent, 28% place, 32% timespan, 40% concept entities ○ only a couple of data providers have 100% coverage ★ only mandatory elements appear in each record ★ there are unused fields ★ there are overused fields ○ suggestion: generic fields → specific field ○ dc:description → dc:subject, dct:alternative, dct:tableOfContents bit.ly/mq-dh2017 - 20
  • 22. Measuring completeness. Technical background 22 ★ OAI-PMH ★ Europeana API ★ Hadoop ★ NoSQL ★ Spark ★ Hadoop ★ Java ★ Apache Solr ★ Spark ★ Scala ★ R ★ PHP ★ D3.js ★ highchart.js ★ NoSQL ingest measure statistical analysis web interface processing workflow json csv html, svg json, jpg bit.ly/mq-dh2017 - 22
  • 23. Measuring completeness. Further steps 23 ★scores into recommendations ★communication ★expert evaluation ★cooperation with other projects ★ingestion process ★W3C recommendations ○ Shape Constraint Language ○ Data Quality Vocabulary ★is usage in-line with scores? ★do scores change? ★machine learning-based classification & clustering human technical bit.ly/mq-dh2017 - 23
  • 24. Measuring completeness. Credits and links bit.ly/mq-dh2017 - 24 This research is conducted in close collaboration with the Europeana Data Quality Committee, thanks to all the members! Special thanks to Marco Büchler & the eTRAP team and to the GWDG HPC experts! ★Europeana Data Quality Committee // http://pro.europeana.eu/europeana- tech/data-quality-committee ★demo site // http://144.76.218.178/europeana-qa/ ★source code (GPL v3.0) // http://pkiraly.github.io/about/#source-codes ★Europeana data (CC0) // http://hdl.handle.net/21.11101/0000-0001-781F-7 ★[bibliography] // http://zotero.org/groups/metadata_assessment ★contact // peter.kiraly@gwdg.de, @kiru slides // http://bit.ly/mq-dh2017