Understanding Medical Image Search Behavior Through Log Analysis
1. Log Analysis to Understand Medical
Professionals' Image Searching
Behaviour
Theodora Tsikrika
Henning Müller
Charles E. Kahn
2. Overview
• Medical image retrieval
• Motivation of our work
• Methods
• Log file analysis
• Search strategies
• Frequent information needs
• Use as topics for a retrieval benchmark
• Conclusions
2
3. Medical image retrieval
• Medical professionals frequently and
increasingly search for visual information
(images, videos)
• Particularly radiologists often search for
images
• Internet search increasingly replaces search in
reference books and discussions with colleagues
• Images are important for differential diagnosis,
finding explications for unclear visual patterns
• Different types of image search systems
• Text-based search for images
3
4. Motivation
• Knowing search tasks, goals and formulations
of user groups for information retrieval is
important
• To build new IR systems or benchmark existing ones
• Several surveys have been performed
• Log file analyses were done as well
• MedLine log files, not really for images
• HONmedia search, less focused as not radiologists, but
rather general public, health professionals
• Image search on the Internet for radiologists
has increased strongly
4
5. Log file analysis
• Session level, query level, term level
• Search logs have received much attention to
learn more on user behavior
• Bad example: release of AOL log, privacy!!
• Amount of information differs, IP addresses, time
stamps
• Session level is interesting as much is
learned on behavior, query modifications,
even satisfaction
• Terms added, removed, changed?
• Query and term level often focus on 5
6. Methods
• ARRS Goldminer made a log file available
• 25’000 consecutive searches of medical
professionals
• Search system is very popular with radiologists
• Allows search terms, selection of gender, age and
modality
• Search term normalization
• All lower case, removing special characters, quotes
• Manual work: “xray”, “x-ray”, “x ray” all equals “xray”
• Removal of identical consecutive queries
• No time stamps available, no IP address 6
7. Results of the analysis
• 23’033 queries after preprocessing, 14’413 of
these are unique queries (63%)
• Query length 2.24 words, 2.46 for unique
queries
• Similar to web search, one term less than MedLine
• Imaging modalities:
• MRI (586), CT (425), ultrasound (199), xray (139),
PET (34), PET/CT (13), angiography (13), echo (11),
radiography (10), tomography (6), fMRI (3), PET/MRI
(1)
• This despite the possibility to filter for modalities
7
9. Query modification
• 5713 consecutive query pairs sharing at
least one term, assumed to be single
session
9
10. Use of terms for topics in
ImageCLEF
• ImageCLEF, image retrieval benchmark
• Using images and text as queries, 17 groups
participated in 2012
• Taking most frequent searches, at least two
terms
• Radiologist ranked these search terms by
usefulness in radiology
• Most useful terms were checked to find
whether documents in PubMedCentral fulfill
the need
• 30 most useful, most frequent, available
10
11. Conclusions
• Analysis of log files can help understand
user behavior
• Help build better systems based on user models
and analyze current approaches, also
shortcomings
• Time stamps and user identification are
important for query session analysis
• We used implicit knowledge for this
• People do not know all details of systems
• Search for modalities in text and through filters
• Depending on results, users change terms
11
12. Questions?
• More information can be found at
• http://www.khresmoi.eu/
• http://medgift.hevs.ch/
• http://publications.hevs.ch/
• Contact:
• Henning.mueller@hevs.ch
12