SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
ECLAP2013
Metadata Quality assessment tool for Open Access
Cultural Heritage institutional repositories
Emanuele Bellini, Paolo Nesi
Problem
a) Low metadata quality affects the discover, find, identify, select, obtain -
ability of digital resources.
b) Identifying and fixing metadata quality issues is a really time consuming
task. With a significant number of records this task might be impossible to
execute.
Factors
The creation of metadata automatically or by authors who are not familiar with
commonly accepted cataloguing rules, indexing, or vocabulary control can
create quality problems. Mandatory elements may be missed or used
incorrectly. Metadata content terminology may be inconsistent, making difficult
to locate relevant information.
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Objective of the work
Definition of
a) A community driven Metadata Quality Profile and related quality
dimensions able to be assessed through automatic processes
b) A set of High Level and Low level metrics to be used as statistical tool
for assessing and monitoring the IR implementation in terms of metadata
quality, trustworthiness and standard compliance
c) A set of measurement tools to asses the defined metrics
d) A Metadata Quality assessment service prototype for automatic evaluation
and report
Methodology
Goal Question Metric (GQM) Approach
1) Planning phase : Metadata Quality profile definition
2) Definition phase: High Level Metrics (HLM) and Low Level Metrics (LLM)
definition
3) Data collection phase: Measurement Plan definition (ISO/IEC 15939
Measurement Information Model)
4) Interpretation phase: Look at the measurement results in a post-mortem
fashion. According to the ISO/IEC 15939 this phase foresees the check
against thresholds and targets values to define the quality index of the
repository
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Prototype implementation
Open Archive metadata quality issues analysis
Research conducted over 1200 OA-IR – more than 15M of records
analysed
Fragmentary Landscape
More than 100 different metadata sets
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Open Archive metadata quality issues analysis
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
0%2%
69%
1%
2%
15%
2%
7% 2% .jpg
image / .jpeg
image/jpeg
image/jpg
Imatge/jpeg
jpeg
JPEG (Joint Photographic
Experts Group)
jpg
others
Language field
Type field
Wrong values
Collected MIME types
contains wrong coding in more
than 10% of cases
•Lack of awareness of recommended open standards
•Difficulties in implementing standards in some cases, due to
lack of expertise, immaturity of the standards, or poor support
for the standards
•Software tools and interfaces not suitable
•Not well defined duties (which department will be in charge to
the IR), publication workflow, rules, policies and responsibilities
in the institutions that aims to set up an IR
•Lack of fund and/or human resources for managing IRs
Open Archive metadata quality issues analysis
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Metadata quality requirements
IFLA FRBR model.
Using the descriptive metadata:
▪ to find materials that correspond to the user’s stated search or
discovery criteria
▪ to identify a resource and to check that the document described in a
record corresponds to the document sought by the user, or to
distinguish between two resources that have the same title
▪ to select a resource that is appropriate to the user’s needs (e.g., to
select a text in a different language or version )
▪ to obtain access to the resource described (e.g. to access in a reliable
way to an online electronic document stored on a remote computer)
Metadata quality -> “fitness for use” in a particular typified task/context
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Metadata Quality Frameworks
- NISO Framework of Guidance for Building
Good Digital Collections
- ISO/IEC 9126
- ISO25000 SQuaRE
- ISO/IEC 14598
- Bruce & Hillman
- Stvilia et al.
- Moen et al
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Actually many dimensions
are not computable
automatically
OA community driven Quality Profile
Filtering results
0,000
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
C
:C
ontributor
DC
:C
overage
DC
:C
reatorDC
:D
ATE
C
:D
escription
DC
:Form
at
DC
:Identifier
DC
:Language
DC
:Publisher
DC
:R
elationDC
:R
ights
DC
:Source
DC
:SubjectDC
:TitleDC
:TypeNot Filtered
Filtered
Critical target: Students can represent a “noise”.
Low level of knowledge: The 17% of the responders
stated their knowledge of the DC schema is less then
5 (1 to 10 Never worked with metadata
•The work 6,3% of the responders does not include
the definition and use of metadata
Never dealt with metadata quality: The 11,1% of the
responders has never dealt with the quality of
metadata
Researchers 20,6%,
Professors 12,7%,
ICT experts15,9%,
Archivists15,9%,
Librarians 25,4%
Students 9,5%
Questionnaire results
Data Filtering
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Quality profile
Filed importance from: 1 (the field can be omitted without affect the use of the record) to
10 (absolutely mandatory, the lack of the field makes the record totally unusable).
Range from 1 to 5,5 is considered as not important,
a) The quality assessment on the field f can be avoided if the Avg weight is 5,5 or less
b) The quality assessment on the field f can be avoided if the difference between
the AVG weights and the level of confidence is 5,5 or less.
Coverage
Publisher
Relation
Source
0,000
0,500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
5,500
6,000
6,500
7,000
7,500
8,000
8,500
9,000
9,500
10,000
10,500
D
C
:C
ontributor
D
C
:C
overage
D
C:CreatorD
C
:D
ate
D
C
:D
escription
D
C
:Form
at
D
C
:Identifier
D
C
:Language
D
C:Publisher
D
C
:R
elation
D
C
:R
ights
D
C:Source
D
C
:SubjectD
C:TitleD
C
:Type
Field selection results
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Quality profiles
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Each field has a different level of relevance in a record
The relevance weights assigned to each field are the normalized
Averages of the weights assigned by the AO experts
High Level Metrics (HLM)
Completeness
Accuracy
Consistency
MQ Base level
MQ Higher level
Dimensions
Assessment applied to
Assessment
results
All fields in metadata
schema
Subset of Complete
fields
Subset of Accurate
fields
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
High Level Metrics (HLM) examples
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Completeness: If a filed is empty or not.
Accuracy:- there are not typographical errors in the free text fields,
- the values in the fields are in the format defined by standard of reference.
(e.g. ISO639-1 standard for the DC:language)
Consistency: no logical errors
(e.g. a resource results “published” before to be “created”, MIME type
declared is different respect to the real bitstream associated, the language
of the document if different to the language expressed in the metadata field
DC:language the link to the digital objects is broken, etc).
Low Level Metrics (LLM)
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Completeness of a Record y ComR(y)=




)(
1
)(
1
*))((
ynField
j
j
ynField
i
ii
Com
w
wyxf
value ranged from 0 to 1.
Average Completeness of a Repository AvComR=
cordsn
yComR
ycordsn
i
Re
)(
)(Re
1

222
222
/1/1/1
/)(/)(/)(
ConAccCom
ConAccCom yAvConRyAvAccRyAvComR



Quality of Repository r QR(r)=
value ranged from 0 to 1.
Completeness of a Field =




otherwise1,
emptyisfieldtheif0,
)(xf
Measurement methods
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Assessment results examples
Case study – University of Pisa in details
Completeness
0
50
100
150
200
250
300
350
400
450
500title
creator
subjectdescriptionpublishercontributor
date
type
form
atidentifier
sourcelanguage
relationcoverage
rights
Analysis: Only few records have the field Contributor with a value and no
records have the field Language. This might means that a priori the repository
system does not manage/ require those fields while for the others,
their workflow seems reliable.
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Case study – University of Pisa
Accuracy
0
0,2
0,4
0,6
0,8
1
1,2
title subject description date type format identifier language
Analysis: The fields Description and Title are those less accurate. This might be
due to the type of the field (free text). Since the measurement criteria defined for
those field are language detection and spelling check, this chart shows an high
number of failures that might be due to typos for instance
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Prototype description
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Prototype description
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Step 1: The process starts form the OAI-PMH harvesting form the Open Access repository.
The OAI-PMH harvester is implemented through an AXCP GRID rule. This process collects the
metadata records and stores them in the database.
Step 2: The second step is performed by the metadata processing rule. This rule extracts each single
field form the metadata table and populate a table with rdf-like tripe and each row represents a field.
Step 3: Then the rules for completeness assessment can be lunched.
Step 4: The accuracy can be assessed for each field through a proper evaluation rules, exploiting
open source 3rd part applications like JHOVE or file format evaluation and ASPELL for spelling check.
Step 5: This step addresses the consistency estimation. It can be lunched only on the field that have
passed positively the completeness and the accuracy evaluation.
Step 6: The metric assessment, calculates MQ for the repository.
The prototype is based on AXMEDIS AXCP tool framework, an open source infrastructure that allows
through parallel executions of processes (called rules) allocated on one or more computers/nodes,
massive harvesting, metadata processing and evaluation, automatic periodic quality monitoring, and so
forth.
Axmedis GRID infrastructure
Data collection workflow Data collection through Axmedis GRID
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
3rd Party software
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
GNU ASPELL - http://aspell.net/
JHOVE - JSTOR/Harvard Object Validation Environment
http://hul.harvard.edu/jhove/
GNU Aspell is a Free and Open Source spell checker designed to eventually
replace Ispell. It can either be used as a library or as an independent spell checker.
The Per Language Detect is a Free PHP application able to recognize the
language in input. The precision of the results depends from the length of the tens
in input.
PEAR Language Detect http://pear.php.net/package/Text_LanguageDetect
JHOVE provides functions to perform format-specific identification, validation,
and characterization of digital objects. Format identification is the process of
determining the format to which a digital object conforms. Format validation is the
process of determining the level of compliance of a digital object to the
specification for its purported format.
0
0,2
0,4
0,6
0,8
1
contributor
creator
date
description
format
identifierlanguage
rights
subject
title
type
MQC
CRUI
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Comparison with other derived profiles
The Kiviatt chart shows the differences
between our Quality Profile
respect to Quality Profile derived from
CRUI guidelines
a) The Completeness seems to be well addressed by all IR analyzed
b) There are some issues in the Accuracy dimension. The major problems
were detected on the free – text fields such Title and Description
c) The DC is not expressive enough to support the complexity of the
resources and their descriptive needs
d) We showed the validity of QP model respect to those derived from the
other guidelines (e.g CRUI)
e) There are some cases in which the values could be considered accurate
but their encoding format was not included in the our measurement model.
Shared measuring modalities should be defined
Conclusions
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
Thank you very much!
Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories

Mais conteúdo relacionado

Semelhante a Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories

Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Baden Hughes
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortPéter Király
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcherbwestra
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesPreliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesNikos Palavitsinis, PhD
 
Towards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata QualityTowards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata QualityXavier Ochoa
 
Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Péter Király
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Europeana
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsAlignedProject
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search APIOCLC Research
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionPéter Király
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Semelhante a Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories (20)

Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - short
 
Data Quality
Data QualityData Quality
Data Quality
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcher
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesPreliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
 
Towards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata QualityTowards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata Quality
 
Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)Towards an extensible measurement of metadata quality (DATeCH 2017)
Towards an extensible measurement of metadata quality (DATeCH 2017)
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search API
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full version
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 

Último

PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)twfkn8xj
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTharshitverma1762
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》rnrncn29
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderArianna Varetto
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Sonam Pathan
 
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...Amil baba
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxNarayaniTripathi2
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Champak Jhagmag
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 

Último (20)

PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713
 
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...
NO1 Certified Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Ami...
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptx
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 

Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories

  • 1. ECLAP2013 Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Emanuele Bellini, Paolo Nesi
  • 2. Problem a) Low metadata quality affects the discover, find, identify, select, obtain - ability of digital resources. b) Identifying and fixing metadata quality issues is a really time consuming task. With a significant number of records this task might be impossible to execute. Factors The creation of metadata automatically or by authors who are not familiar with commonly accepted cataloguing rules, indexing, or vocabulary control can create quality problems. Mandatory elements may be missed or used incorrectly. Metadata content terminology may be inconsistent, making difficult to locate relevant information. Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 3. Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Objective of the work Definition of a) A community driven Metadata Quality Profile and related quality dimensions able to be assessed through automatic processes b) A set of High Level and Low level metrics to be used as statistical tool for assessing and monitoring the IR implementation in terms of metadata quality, trustworthiness and standard compliance c) A set of measurement tools to asses the defined metrics d) A Metadata Quality assessment service prototype for automatic evaluation and report
  • 4. Methodology Goal Question Metric (GQM) Approach 1) Planning phase : Metadata Quality profile definition 2) Definition phase: High Level Metrics (HLM) and Low Level Metrics (LLM) definition 3) Data collection phase: Measurement Plan definition (ISO/IEC 15939 Measurement Information Model) 4) Interpretation phase: Look at the measurement results in a post-mortem fashion. According to the ISO/IEC 15939 this phase foresees the check against thresholds and targets values to define the quality index of the repository Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Prototype implementation
  • 5. Open Archive metadata quality issues analysis Research conducted over 1200 OA-IR – more than 15M of records analysed Fragmentary Landscape More than 100 different metadata sets Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 6. Open Archive metadata quality issues analysis Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories 0%2% 69% 1% 2% 15% 2% 7% 2% .jpg image / .jpeg image/jpeg image/jpg Imatge/jpeg jpeg JPEG (Joint Photographic Experts Group) jpg others Language field Type field Wrong values Collected MIME types contains wrong coding in more than 10% of cases
  • 7. •Lack of awareness of recommended open standards •Difficulties in implementing standards in some cases, due to lack of expertise, immaturity of the standards, or poor support for the standards •Software tools and interfaces not suitable •Not well defined duties (which department will be in charge to the IR), publication workflow, rules, policies and responsibilities in the institutions that aims to set up an IR •Lack of fund and/or human resources for managing IRs Open Archive metadata quality issues analysis Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 8. Metadata quality requirements IFLA FRBR model. Using the descriptive metadata: ▪ to find materials that correspond to the user’s stated search or discovery criteria ▪ to identify a resource and to check that the document described in a record corresponds to the document sought by the user, or to distinguish between two resources that have the same title ▪ to select a resource that is appropriate to the user’s needs (e.g., to select a text in a different language or version ) ▪ to obtain access to the resource described (e.g. to access in a reliable way to an online electronic document stored on a remote computer) Metadata quality -> “fitness for use” in a particular typified task/context Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 9. Metadata Quality Frameworks - NISO Framework of Guidance for Building Good Digital Collections - ISO/IEC 9126 - ISO25000 SQuaRE - ISO/IEC 14598 - Bruce & Hillman - Stvilia et al. - Moen et al Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Actually many dimensions are not computable automatically
  • 10. OA community driven Quality Profile Filtering results 0,000 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 C :C ontributor DC :C overage DC :C reatorDC :D ATE C :D escription DC :Form at DC :Identifier DC :Language DC :Publisher DC :R elationDC :R ights DC :Source DC :SubjectDC :TitleDC :TypeNot Filtered Filtered Critical target: Students can represent a “noise”. Low level of knowledge: The 17% of the responders stated their knowledge of the DC schema is less then 5 (1 to 10 Never worked with metadata •The work 6,3% of the responders does not include the definition and use of metadata Never dealt with metadata quality: The 11,1% of the responders has never dealt with the quality of metadata Researchers 20,6%, Professors 12,7%, ICT experts15,9%, Archivists15,9%, Librarians 25,4% Students 9,5% Questionnaire results Data Filtering Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 11. Quality profile Filed importance from: 1 (the field can be omitted without affect the use of the record) to 10 (absolutely mandatory, the lack of the field makes the record totally unusable). Range from 1 to 5,5 is considered as not important, a) The quality assessment on the field f can be avoided if the Avg weight is 5,5 or less b) The quality assessment on the field f can be avoided if the difference between the AVG weights and the level of confidence is 5,5 or less. Coverage Publisher Relation Source 0,000 0,500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500 10,000 10,500 D C :C ontributor D C :C overage D C:CreatorD C :D ate D C :D escription D C :Form at D C :Identifier D C :Language D C:Publisher D C :R elation D C :R ights D C:Source D C :SubjectD C:TitleD C :Type Field selection results Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 12. Quality profiles Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Each field has a different level of relevance in a record The relevance weights assigned to each field are the normalized Averages of the weights assigned by the AO experts
  • 13. High Level Metrics (HLM) Completeness Accuracy Consistency MQ Base level MQ Higher level Dimensions Assessment applied to Assessment results All fields in metadata schema Subset of Complete fields Subset of Accurate fields Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 14. High Level Metrics (HLM) examples Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Completeness: If a filed is empty or not. Accuracy:- there are not typographical errors in the free text fields, - the values in the fields are in the format defined by standard of reference. (e.g. ISO639-1 standard for the DC:language) Consistency: no logical errors (e.g. a resource results “published” before to be “created”, MIME type declared is different respect to the real bitstream associated, the language of the document if different to the language expressed in the metadata field DC:language the link to the digital objects is broken, etc).
  • 15. Low Level Metrics (LLM) Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Completeness of a Record y ComR(y)=     )( 1 )( 1 *))(( ynField j j ynField i ii Com w wyxf value ranged from 0 to 1. Average Completeness of a Repository AvComR= cordsn yComR ycordsn i Re )( )(Re 1  222 222 /1/1/1 /)(/)(/)( ConAccCom ConAccCom yAvConRyAvAccRyAvComR    Quality of Repository r QR(r)= value ranged from 0 to 1. Completeness of a Field =     otherwise1, emptyisfieldtheif0, )(xf
  • 16. Measurement methods Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 17. Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Assessment results examples
  • 18. Case study – University of Pisa in details Completeness 0 50 100 150 200 250 300 350 400 450 500title creator subjectdescriptionpublishercontributor date type form atidentifier sourcelanguage relationcoverage rights Analysis: Only few records have the field Contributor with a value and no records have the field Language. This might means that a priori the repository system does not manage/ require those fields while for the others, their workflow seems reliable. Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 19. Case study – University of Pisa Accuracy 0 0,2 0,4 0,6 0,8 1 1,2 title subject description date type format identifier language Analysis: The fields Description and Title are those less accurate. This might be due to the type of the field (free text). Since the measurement criteria defined for those field are language detection and spelling check, this chart shows an high number of failures that might be due to typos for instance Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 20. Prototype description Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 21. Prototype description Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Step 1: The process starts form the OAI-PMH harvesting form the Open Access repository. The OAI-PMH harvester is implemented through an AXCP GRID rule. This process collects the metadata records and stores them in the database. Step 2: The second step is performed by the metadata processing rule. This rule extracts each single field form the metadata table and populate a table with rdf-like tripe and each row represents a field. Step 3: Then the rules for completeness assessment can be lunched. Step 4: The accuracy can be assessed for each field through a proper evaluation rules, exploiting open source 3rd part applications like JHOVE or file format evaluation and ASPELL for spelling check. Step 5: This step addresses the consistency estimation. It can be lunched only on the field that have passed positively the completeness and the accuracy evaluation. Step 6: The metric assessment, calculates MQ for the repository. The prototype is based on AXMEDIS AXCP tool framework, an open source infrastructure that allows through parallel executions of processes (called rules) allocated on one or more computers/nodes, massive harvesting, metadata processing and evaluation, automatic periodic quality monitoring, and so forth.
  • 22. Axmedis GRID infrastructure Data collection workflow Data collection through Axmedis GRID Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 23. 3rd Party software Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories GNU ASPELL - http://aspell.net/ JHOVE - JSTOR/Harvard Object Validation Environment http://hul.harvard.edu/jhove/ GNU Aspell is a Free and Open Source spell checker designed to eventually replace Ispell. It can either be used as a library or as an independent spell checker. The Per Language Detect is a Free PHP application able to recognize the language in input. The precision of the results depends from the length of the tens in input. PEAR Language Detect http://pear.php.net/package/Text_LanguageDetect JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects. Format identification is the process of determining the format to which a digital object conforms. Format validation is the process of determining the level of compliance of a digital object to the specification for its purported format.
  • 24. 0 0,2 0,4 0,6 0,8 1 contributor creator date description format identifierlanguage rights subject title type MQC CRUI Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories Comparison with other derived profiles The Kiviatt chart shows the differences between our Quality Profile respect to Quality Profile derived from CRUI guidelines
  • 25. a) The Completeness seems to be well addressed by all IR analyzed b) There are some issues in the Accuracy dimension. The major problems were detected on the free – text fields such Title and Description c) The DC is not expressive enough to support the complexity of the resources and their descriptive needs d) We showed the validity of QP model respect to those derived from the other guidelines (e.g CRUI) e) There are some cases in which the values could be considered accurate but their encoding format was not included in the our measurement model. Shared measuring modalities should be defined Conclusions Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories
  • 26. Thank you very much! Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories