The document summarizes a presentation about evaluating information searching in digital cultural heritage collections beyond typical search evaluations. It discusses how the PATHS project evaluated multiple components that support exploration, including search, recommendations, visualizations, related items, hierarchies and user-created paths. The project used both system-oriented and user-oriented evaluations, like controlled user tests, to evaluate individual components and the integrated system. It identifies challenges in evaluating complex systems that go beyond simple ad hoc search tasks.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Paul Clough Sheffield iSchool Evaluating Info Searching in Digital Cultural Heritage
1. Evaluating Information Searching in
Digital Cultural Heritage: Thinking
Outside the (Search) Box?
Paul Clough
Information School, University of Sheffield, UK
Presented at the Evaluating Use and Impact Workshop 2016
2. Presented at the Evaluating Use and Impact Workshop 2016
• Evaluating search success
• Summary of the PATHS project
• Evaluation activities in PATHS
• Some issues and challenges
Outline
3. Presented at the Evaluating Use and Impact Workshop 2016
• Whether it retrieves ‘relevant’ documents
• How quickly it returns results
• How well it supports user interaction
• Whether the user is satisfied with the results
• How easily users can use the system
• Whether the system helps users resolve their information
needs, carry our their tasks or make decisions
• Whether the system impacts on the wider use environment
• …..
Which of these are the most important?
What makes a search system successful?
Depends on
who you ask,
the users and
their context
4. Focus (and contexts) of evaluation
Presented at the Evaluating Use and Impact Workshop 2016
Tefko Saracevic (1995)
5. Evaluating search
• Most typical evaluation in IR focuses on assessing the
quality of search results (system-oriented evaluation or IR)
– Evaluation typically comparative (Systems A vs. B)
– Most common evaluation criteria include relevancy, retrieval
effectiveness and retrieval efficiency
– Common evaluation measures include precision, recall, speed of
response
– Methods include standardised benchmarks (e.g. test collections) or
use of ad hoc (heuristic) testing
Presented at the Evaluating Use and Impact Workshop 2016
6. Evaluating search
• But often we want to measure aspects of retrieval
performance beyond system effectiveness (user-oriented
evaluation or Interactive IR)
– User satisfaction with results, usability of the interface,
engagement, user performance with a task and effects of system
changes on user behaviour
– Common criteria include satisfaction, usability, utility, etc.
– Evaluation methods include lab-based controlled experiment,
naturalistic observation, predictive evaluation, etc.
– Measures often include characteristics of interaction (e.g. number
of queries issued), performance measures (e.g. number of saved
relevant document) or subjective measures (e.g. usability,
engagement)
Presented at the Evaluating Use and Impact Workshop 2016
7. Landscape of (I)IR evaluation
Presented at the Evaluating Use and Impact Workshop 2016
Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users.
Foundations and Trends in Information Retrieval, 3(1-2), 1-224. DOI: 10.1561/1500000012.
8. Thinking outside the search box
• Evaluation typically focuses on search box but search-
based applications are typically rich in features to support
information searching and seeking
• Many search applications involve multiple components
– e.g. visualisations, recommendations, taxonomies, facets
• In practice evaluation will take place during system
development (formative and summative) using
combinations of system- and user-oriented methods
• But how do we evaluate components beyond the search
box from a system and user-oriented perspective?
Presented at the Evaluating Use and Impact Workshop 2016
Clough, P. (2015) Evaluation: Thinking Outside the (Search) Box, In Proceedings of the Forum for Information
Retrieval Evaluation (FIRE '14), Prasenjit Majumder, Mandar Mitra, Madhulika Agrawal, and Parth Mehta
(Eds.), ACM, New York, NY, USA, pp. 1-9.
http://ir.shef.ac.uk/cloughie/papers/Clough_FIRE2014.pdf
9. The PATHS project
• PATHS (Personalised Access To cultural Heritage Spaces) project
funded under EU FP7
• Multidisciplinary project involving academic and industrial partners from
various disciplines
– Cultural Heritage, Library and Information Science and Computer
Science
• Developed techniques to support expert and non-expert users with
navigating and using cultural heritage materials from Europeana
• Investigated use of trails/paths to facilitate narrative-like structures
through digital collections for use as guides and learning aids (like
exhibitions/guides in physical space)
Presented at the Evaluating Use and Impact Workshop 2016
“Cultural heritage involves rich and highly heterogeneous collections that are challenging to
archive and convey to the general public” Hardman, L., Aroyo, L., van Ossenbruggen, J. and
Hyvönen, E. (2009) Using AI to Access and Experience Cultural Heritage, IEEE Intelligent
Systems, 24(2), pp. 23-25.
10. The PATHS system
• More generally the PATHS system aims to support information seeking
and the ‘information journey’
– Recognising an information need
– Acquiring information
– Interpreting and validating this information
– Using the information
• PATHS also aims to support exploration (exploratory search) and help
users to make sense of concepts and items in a digital library collection
(sense-making)
– Providing functionalities to overview collections, aid interpretation of
information, use information to create paths
Presented at the Evaluating Use and Impact Workshop 2016
Clough, P. (2015) Supporting Exploration and Use of Digital Cultural Heritage Materials,
EuropeanaTech Insight, Issue 4. http://ir.shef.ac.uk/cloughie/publications.html
12. Interface components
• Interface components developed to support activities in
conceptual model and specific requirements gathered from
user studies
– Standard search box and facets
– Thematic map-based visualisation (similar to Google Maps)
– Thesaurus based on data-driven concept hierarchy
– Links to related items (based on typed similarity)
– Item-level (non-personalised) recommendations (based on
mining Europeana logs)
– Features for creating, editing, publishing and following ‘paths’
(tree structures)
• Components used in desktop and mobile (iPad) interfaces
http://paths.sheffield.ac.uk/pathsui
Presented at the Evaluating Use and Impact Workshop 2016
14. Presented at the Evaluating Use and Impact Workshop 2016
Example ‘paths’
Goodale, P., Clough, P., Hall, M.,
Stevenson, M, Fernie, K., Griffiths, J., and
Agirre, E. (2013) Pathways to Discovery:
Supporting Exploration and Information Use
in Cultural Heritage Collections. In
Proceedings of Museums and the Web Asia
2013, Hong Kong, 9-12 December, 2013.
15. • System development followed classic user-centred approach
of requirements gathering prototyping evaluate [repeat]
• Evaluation of components
– Search box
– Recommender systems
– Visualisations
– Related/similar items
– Subject hierarchies and facets
• System architecture/infrastructure testing
• Evaluation of user interface designs
• Evaluation of the integrated prototype
– Controlled lab-based user testing
– Field trials
Evaluations carried out by
researchers to select best
algorithms; had to learn from
domains beyond search
Evaluations carried out
by software developers
Evaluations carried out
by UI designers
Evaluations carried out
by ‘end users’
Evaluation activities in PATHS
Presented at the Evaluating Use and Impact Workshop 2016
17. • Many issues and challenges facing evaluation when we
think outside the search box, including
– Combining user- and system-oriented approaches (e.g. to
‘inform and predict’)
– Understanding the relationship between evaluation criteria (and
associated measures)
– Sharing evaluation practices between domains and disciplines
– Thinking beyond ad hoc search tasks
– Combining the evaluation results (e.g. does the whole=sum of
parts?)
– Evaluating whole-page relevance
• What constitutes success?
• It depends - on the stakeholder, the user and their context
Issues and challenges
Presented at the Evaluating Use and Impact Workshop 2016
18. Otegi, A.; Agirre, E.; Clough, P., "Personalised PageRank for making recommendations in digital
cultural heritage collections," Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on , vol.,
no., pp.49,52, 8-12 Sept. 2014 [Recommendations]
Hall, M., Fernando, S., Clough, P., Soroa, A., Agirre, E., and Stevenson, M. (2014) Evaluating
hierarchical organisation structures for exploring digital libraries, Information Retrieval, Volume
17(4), pp. 351-379. [Automatic hierarchy induction]
Aletras, N., Stevenson, M. and Clough, P. (2013) Computing Similarity between Items in a Digital
Library of Cultural Heritage, Journal on Computing and Cultural Heritage, Volume 5(4), Article 16.
[Similarity of items]
Goodale, P., Clough, P., Hall, M., Stevenson, M, Fernie, K., Griffiths, J., and Agirre, E.
(2013) Pathways to Discovery: Supporting Exploration and Information Use in Cultural Heritage
Collections. In Proceedings of Museums and the Web Asia 2013, Hong Kong, 9-12 December,
2013. [Analysis of manually-created paths]
Agirre, E., Aletras, N., Clough, P., Fernando, S., Goodale, P., Hall, M., Soroa, A., and Stevenson,
M.,(2013) PATHS: A System for Accessing Cultural Heritage Collections, In Proceedings of 51st
Annual Meeting of the Association for Computational Linguistics (ACL'13), Sofia, Bulgaria, August
4-9 2013. pp. 151-166. [Project overview]
Hall, M. and Clough, P. (2013) Exploring Large Digital Library Collections using a Map-based
Visualisation, In Proceedings of The International Conference on Theory and Practice of Digital
Libraries (TPDL 2013), pp. 216-227. [Collection visualisations]
Presented at the Evaluating Use and Impact Workshop 2016