From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Metadata quality in digital repositories
1. Metadata Quality Issues
in Learning Object Repositories
PhD Candidate
Nikos Palavitsinis
PhD Supervisors
Ass. Prof. Salvador Sanchez-Alonso,
Dr. Nikos Manouselis
4. Problem
• Generic Problem: Low quality metadata in
digital repositories that affects resource
discovery
• Specific Problem: How might we insert quality
assurance mechanisms in the digital
repository lifecycle, to enhance metadata
quality
Introduction/Problem
4
5. Background
• Relevant studies that look into quality issues:
– Study based on the Open Language Archives
Community (Hughes, 2004)
– Studies based on the National Science Digital
Repository (Zeng et al., 2005; Bui & Ran Park, 2006)
– Studies based on ARIADNE Federation repositories
(Najjar et al., 2004; Ochoa et al., 2011)
5
Introduction/Background
6. Aim of Digital Repositories
• Databases used for storing and/or enabling the
interoperability of Learning Objects (McGreal, 2007)
• Enable the efficient search & discovery of objects
(Richards et al., 2002)
• How can the digital repositories fulfill their goals, if
the quality of the metadata provided is poor?
– Is it that poor?
6
Digital Repositories & Federations/Aim of Digital Repositories
9. Metadata
• Metadata is structured information that describes,
explains, locates, or otherwise makes it easier to
retrieve, use, or manage an information resource
• …vital component of the learning object economy
(Currier et al., 2004)
9
Metadata & Education/Metadata
10. Metadata in Education
• In the field of Technology-Enhanced Learning, the
need for describing resources with information that
extends the scope of regular metadata has been
identified early (Recker & Wiley, 2001)
• Most commonly used metadata schemas in
education are IEEE LOM & Dublin Core
• For users of Educational Repositories, problems in
metadata result to poor recall of resources and
inconsistent search results (Currier et al., 2004)
10
Metadata & Education/Metadata in Education
11. Quality
• Level of excellence; A property or attribute that
differentiates a thing or person
• Quality is the suitability of procedures, processes and
systems in relation to the strategic objectives
• Metadata are of high importance to the success of
Learning Object Repositories (LORs)
– Heery & Anderson, 2005; Guy et al., 2004; Robertson 2005
11
Quality & Metadata/Quality
12. Quality in Metadata
• Poor quality metadata can mean that a resource is
essentially invisible within a repository of archive that
remains unused (Barton et al., 2003)
• Different settings and purposes require different
approach to what represents quality in metadata
(Robertson, 2005)
– Quality cannot be discussed in a vacuum (Bruce & Hillman, 2004)
12
Quality & Metadata/Quality in Metadata
13. Metadata Creators
• In some cases, subject matter experts have been
proven to be better in metadata creation than
information specialists (Greenberg et al., 2001; Park, 2009)
• Neither resource creators nor the information
specialists handle pedagogic aspects of metadata
well (Barton et al., 2003)
• Importance of having only trained professionals
providing metadata (Holden, 2003)
13
Quality & Metadata/Metadata Creators
14. Metadata experts VS Domain experts
14
I have studied
information
management
I know how to
create & manage
data sources
I have been involved
in EU projects for
digital libraries
I have a PhD in
education
I know how to
create educational
resources
I have worked with
teachers for over 20
years
I think I can use
the expertise of
both…
15. Metadata Creation
• Metadata today is likely to be created by people
without metadata training, working largely in
isolation and without adequate documentation
• Metadata records are also created automatically,
often with poorly documented methodology and
little or no indication of provenance
• Unsurprisingly, the metadata resulting from these
processes varies strikingly in quality and often does
not play well together (Hillman et al., 2004)
15
Quality & Metadata/Metadata Creation
16. Metadata Quality Metrics (1/2)
• Completeness
– Number of element values provided by annotator,
compared to the total possible number of values
• Accuracy
– Metadata descriptions correspond to the actual resource
they describe
• Consistency
– Degree of conformance of the metadata provided
according to the rules metadata application profile used
16
Quality & Metadata/Metadata Quality Metrics
17. Metadata Quality Metrics (2/2)
• Objectiveness
– Degree in which the metadata provided describe the
resource in an unbiased way
• Appropriateness
– Fitness of use of the metadata provided when considered
in terms of the envisaged services of the environment/tool
deployed
• Correctness
– Usage of the language in the metadata, syntactically
and/or grammatically
17
Quality & Metadata/Metadata Quality Metrics
18. Back to the problem
• How might we insert quality assurance
mechanisms in the digital repository lifecycle,
to enhance metadata quality?
• Solution that capitalizes more on the human
factor but also on automated methods of
examining metadata quality
Metadata Quality Assessment Certification Process/Introduction
18
22. Metadata Design Phase
• Description
– Metadata specification / application profiling of an existing
metadata schema that will be used in a specific context
• Quality Assurance Methods
– Metadata Understanding Session
– Preliminary Metadata Hands-on Annotation
• Actors
– Subject-matter experts & metadata experts
• Outcomes
– Initial input for metadata specification
– Paper-based metadata records
22
Metadata Quality Assessment Certification Process/Metadata Design Phase
23. Testing Phase
• Description
– The envisaged system/tool is implemented & the users are
working with the first implementation of the metadata standard
• Quality Assurance Methods
– Test implementation of the tool
– Hands-on annotation experiment
– Metadata Quality Review of test sample of resources
• Actors
– Subject-matter experts & metadata experts
• Outcomes
– Good & Bad Metadata Practices Guide
– Feedback for the development of the system/tool
23
Metadata Quality Assessment Certification Process/Testing Phase
24. Calibration Phase
• Description
– The envisaged system/tool is deployed in a controlled
environment and the subject matter experts continuously
upload resources on it
• Quality Assurance Methods
– Metadata Quality Peer Review Exercise
• Actors
– Subject-matter experts & metadata experts
• Outcomes
– Good & Bad Metadata Practices Guide updated
– Recommendations for metadata improvement
– Peer Review results related to the quality of metadata for the
resources examined
24
Metadata Quality Assessment Certification Process/Calibration Phase
25. Building Critical Mass Phase
• Description
– Tools have reached a high-maturity phase and the
metadata application profile has been finalized. Repository
accepts a large number of resources
• Quality Assurance Methods
– Analysis of Usage Data coming from the tool(s)
– Metadata Quality Certification Mark
• Actors
– Metadata experts
• Outcomes
– Minor changes to application profile
– Recommendations for metadata improvement
25
Metadata Quality Assessment Certification Process/Building Critical Mass Phase
26. Regular Operation Phase
• Description
– Metadata used in the tool(s) are finalized and content
providers are uploading resources regularly. This period
lasts for as long as the deployed services are online
• Quality Assurance Methods
– Regular Analysis of Usage Data coming from the tool(s)
– Online Peer Review Mechanism
– Quality Prizes/Awards for selected resources
• Actors
– Metadata experts & Content users/consumers
• Outcomes
– Recommendations for metadata improvement
26
Metadata Quality Assessment Certification Process/Regular Operation Phase
28. Case Study
• Metadata Quality Assessment Certification
Process applied in the Organic.Edunet
Federation of Learning Repositories
• Each respective Phase is presented focusing
on its application in the Organic.Edunet case
28
Metadata Quality Assessment Certification Process/Case Study
29. Metadata Design Phase
29
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
• Metadata Understanding Session
– Form that assesses elements easiness to
understand, usefulness and appropriateness for
the application domain
– Also asking whether or not each element should
be mandatory, recommended or optional
Duration 2 hours
Annotated Objects 0
Actors involved 20 metadata & subject-matter experts
31. 31
• Preliminary Hands-on Annotation
– Subject matter experts annotate a sample of their
resources using the suggested metadata
application profile
– Session organized with the participation of all
content providers with supervised annotation of
resources
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
Metadata Design Phase
33. Results
33
Results
Question Totally Disagree Disagree Neutral Agree Totally Agree
Is the element easy for you to
understand?
0% 4% 21% 42% 33%
Is this element useful for describing
Organic.Edunet content resources?
0% 12% 33% 41% 14%
Is the selection of the element’s
possible values clear and appropriate?
0% 4% 37% 50% 9%
Best rated Rating
Is the element easy for you to understand?
General.
Keyword
Technical.
Format
Technical.
Size
9.2 / 10
Is this element useful for describing
Organic.Edunet content resources?
General.
Identifier
General.
Description
Technical.
Format
8.8 / 10
Is the selection of the element’s possible
values clear and appropriate?
General.
Description
Rights.
Cost
Format.Size 8.1 / 10
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
34. Results
34
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
Worst rated Rating
Is the element easy for you to
understand?
Classification.
Taxon
Relation.
Resource
Educational.
Semantic Density
3.1 to 4.8 / 10
Is this element useful for describing
Organic.Edunet content resources?
Classification.
Taxon
Annotation.
Entity
Annotation.Date 2.3 to 3.1 / 10
Is the selection of the element’s
possible values clear and appropriate?
Classification.
Taxon
Classificatio
n.Purpose
General.
Identifier
2.9 to 4 / 10
Mandatory Recommended Optional
Question Before After Before After Before After
Should this element be mandatory, recommended
or optional?
19 25 26 21 12 11
Percentile change in overall number of mandatory
/ recommended or optional elements
+31% -19% -8,3%
35. Testing Phase
• Hands-on annotation experiment
– Core metadata quality criteria
– Related more with information management
practices and less with the content itself
– Issues that are not connected to the domain of
use for the resources
35
Metadata Quality Assessment Certification Process/Case Study/Testing Phase
Duration 1 week
Annotated Objects 500 objects (5%)
Actors involved 4 metadata experts
Resources Reviewed 15 per metadata expert (60)
37. Results
37
Title “Please use a more comprehensive title. For example the CRC acronym, can
be refined as Cooperative Research Centre just to provide the user with a
way to understand what this learning resource is about.”
Keyword “More keywords needed. Just one keyword is not enough, and even so, the
keyword text here is misleading. These keywords should be provided
separately as “turkey” and “poultry” along with some others, and not as one
“turkey poultry”.”
Typical Age
Range
“…why is it that simple pictures of pigs in the snow with no scientific details
on them cannot be used for children that are less than 10 years old? Couldn’t
these pictures be used in the context of a primary class?”
Context “Since the age range is from 15 years old to undefined, it only makes sense
that the Educational context cannot be limited to higher education but
should also consider high school. Be very careful because in this sense, these
two elements should not conflict.”
Metadata Quality Assessment Certification Process/Case Study/Testing Phase
38. Calibration Phase
• Metadata Quality Peer Review Exercise
– Peer reviewing metadata records using a pre-
defined quality grid assessing metadata quality
metrics
• Completeness, accuracy, correctness of language, etc
based on Bruce & Hillman’s model
38
Duration 3 weeks
Annotated Objects 1.000 objects (10%)
Actors involved 20 subject matter experts
Resources Reviewed 105 resources (5 per expert)
Metadata Quality Assessment Certification Process/Case Study/Calibration Phase
41. Building Critical Mass Phase
• Analysis of Usage Data coming from tool(s)
– Expecting to verify findings from the experiment
in the “Metadata Design” Phase
• Necessary elements, being used more,
• Elements with values easy to understand being used
correctly, etc.
• Beginning of the intensive content population
41
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
Duration 1 week
Annotated Objects 6.600 objects (60%)
Actors involved 2 metadata experts
Resources Analyzed 6.600
42. Building Critical Mass Phase
42
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
• “1” shows that an element is completed whereas “0”
shows the opposite
• In the case of elements with multiplicity >1, values
can be “2”, “3”, etc.
– Interesting to look at the case of keywords, classification
terms and/or educational elements
45. Compare & Contrast
45
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
Best rated Rating
Is the element easy for you to understand?
General.
Keyword
Technical.
Format
Technical.
Size
9.2 / 10
Is the selection of the element’s possible
values clear and appropriate?
General.
Description
Rights.
Cost
Format.Size 8.1 / 10
46. Building Critical Mass Phase
• Metadata Quality Certification Mark
– Introduced the concept of a “Quality Seal” for
each metadata record that a content provider
uploads to the Organic.Edunet Federation
– In meta.metadata element
46
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
47. Regular Operation Phase
• Regular Analysis of Usage Data coming from
the tool(s)
– Any improvement to the quality of the metadata?
– Measuring completeness only
– Analysis conducted on October 2010
47
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
Duration 1 week
Annotated Objects 11.000 objects (100%)
Actors involved 2 metadata experts
Resources Analyzed 11.000
54. Progress VS Publications (1/2)
Experiment Phase Date Published
Application Profile Questionnaire &
Hands-on annotation
Metadata Design 1/2009 JIAC 2009
Palavitsinis et al.: Interoperable metadata for a federation of learning repositories on organic
agriculture and agroecology
Metadata Record review from metadata
experts
Testing 4/2009 MTSR 2009
Palavitsinis et al.: Evaluation of a Metadata Application Profile for Learning Resources on
Organic Agriculture
Metadata Record review from subject
matter experts
Calibration 6/2009
ED-MEDIA
2011
Palavitsinis et al.: Metadata quality in learning repositories: Issues and considerations
54
PhD Work
55. Progress VS Publications (2/2)
55
PhD Work
Experiment Phase Date Published
Log files analysis from Annotation Tool Metadata Design 9/2009 ICSD 2009
Palavitsinis et al.: Evaluating Metadata Application Profiles based on Usage Data
Log files analysis from Annotation Tool Testing 10/2010
ED-MEDIA
2011
Palavitsinis et al.: Metadata quality in learning repositories: Issues and considerations
56. Early Publications
• Knowledge Organization Systems
– Online study of Knowledge Organization Systems
on agricultural and environmental sciences
• Palavitsinis & Manouselis, ITEE 2009
• Metadata Lifecycle
– “Towards a Digital Curation Framework for
Learning Repositories: Issues & Considerations”
• Palavitsinis et al., SE@M 2010
56
PhD Work
57. Real Users
• Organized a series of workshops involving
users annotating resources
– Organic.Edunet Summer School 2009
– Joint Technology Enhanced Learning Summer
School 2010
– American Farm School & Ellinogermaniki Agogi
workshops
– HSci Conference in Crete
• Working with users (i.e. subject-matter experts,
educators and metadata experts)
PhD Work/User Events
57
58. Stakeholder Consultation
• e-Conference: held during October 2010
(6/10-30/10)
• Experts on Quality for e-learning
• Two phases – four topics
• Provided input for a separate PhD chapter
PhD Work/e-Conference
58
59. Topics
• Each main topic, had 4 refining questions,
• Each main topic, had 1 or 2 moderators
• The e-Conference had 2 administrators
• 1 keynote was recorded from Mrs. Amee Evans Godwin of the
Institute for Knowledge Management in Education (IKSME)
PhD Work/e-Conference/Topics
59
61. Next Experiments
• Pilot Experiment in Agricultural Learning
Resources’ Repository completedcompleted
– Organic.Edunet (Confolio)
• Validation Experiment in Scientific/Scholarly
Content Repository ongoingongoing
– VOA3R case (in Calibration Phase)
• Validation Experiment in Cultural Content
Repository ongoingongoing
– Natural Europe case (in Testing Phase)
61
Timetable
62. Timeline
5/09 5/10 10/10
Literature
Review (A)
Adapted
MeQuACeP
2/11
Pilot
Experiment
Validation
Experiments
12/11 9/12
Introductory
Research
Literature
Review (B)
6/12
62
Timetable
WRITING
63. Next Steps
• 11/2011 – Journal paper on Metadata
Quality Assessment Certification Process
readyready
• 4/2012 – Journal paper on MeQuACeP
applied in other contexts pendingpending
• 6-9/2012 – Writing of thesis
63
Next Steps