4. What Context?
• Collection context
– One “main” IPTC category per image
• 96,351 out of 97,760 images in 100k Belga
Collection
• Note: noisy data, in spite of it being edited
content!
E.g., we found lifestyle Beckham images annotated
as SPO, and even typos in IPTC category
assignment!
• User context
– Classified 813 users into IPTC categories to
represent their main interest (based on Belga
input about the user’s organizations)
5. Filter on IPTC?
//image[@IPTC eq SPO][about(.,Beckham)]
• Bad for recall:
– Not all images have been assigned IPTC
categories
• Bad for precision:
– Noisy assignment of IPTC categories to
images
• At least 4 of the top 10 SPO Beckham results do
not show Beckham taking part in sporting activities
6. Retrieval Model
• Re-rank results based on cluster
membership
λρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d)
P(Q|D) P(D|c)
P(Q|c)
– Modify scores based on document’s context
Oren Kurland and Lillian Lee.
ACM Transactions on Information Systems (TOIS), 27(3),
2009.
• Novelty in Vitalas:
– Modify scores based on user’s context
• Cluster formation based on user clicks
• Cluster selection based on user context
7. Retrieval Model
• Cluster formation:
– IPTC-image categories; forms disjoint clusters
– IPTC-user categories of users who clicked the
image; gives overlapping clusters
• Cluster selection:
– {d∈c}: cluster contains document
– {u∈c}: cluster/@category corresponds to
user's interests
8. Results on Click Prediction
image image image image user user user User
NDCG D
0.0 0.1 0.4 0.7 0.0 0.1 0.4 0.7
ACE 0.1724 0.1423 0.1741 0.1721 0.1721 0.2070 0.1978 0.1767 0.1747
EBF 0.5527 0.4744 0.5460 0.5497 0.5504 0.4882 0.5519 0.5509 0.5509
EDU 0.0145 0.0163 0.0145 0.0145 0.0145 0.0165 0.0167 0.0155 0.0146
HTH 0.1308 0.1347 0.1308 0.1308 0.1308 0.6342 0.3712 0.1934 0.1414
HUM 0.1849 0.1612 0.1798 0.1772 0.1849 0.2109 0.2043 0.1776 0.1760
LAB 0.1331 0.1543 0.1331 0.1331 0.1331 0.2164 0.2339 0.1817 0.1380
LIF 0.1245 0.0888 0.1234 0.1233 0.1232 0.1894 0.1555 0.1121 0.1253
POL 0.0723 0.0586 0.0704 0.0717 0.0721 0.1054 0.0990 0.0916 0.0769
SOI 0.2880 0.1806 0.2883 0.2880 0.2880 0.2964 0.2970 0.2968 0.3008
SPO 0.1811 0.1801 0.1809 0.1806 0.1807 0.2151 0.2005 0.1839 0.1820
Related literature on evaluation methodology: Carterette and Jones, NIPS
2007, and, Carterette, Allan, and Sitaraman, SIGIR 2006.
14. SPO Observations
• Re-ranking pushes the sports-related
images to the top
– No more images about the fires
– When λ=0.0 the initial retrieval score is not
taken into account (initial text ranking
ignored)
• Minimal differences between collection-
based and user-based cluster formation
– Archivists consider as sports-related those
images that users with sports-related
interests click on
19. POL Observations
• Re-ranking for a politics context shows a
difference in interpretation between the
archivist and the user group
– Archivists focussed on the actual political
rallies etc.
– Users focussed on the forest fires
27. Conclusions this far
• Adaptation also retrieves images not
assigned IPTC category, by considering
clusters formed by the images clicked by
users with the same interests
• Alternative cluster formation approaches
can be investigated; e.g., using visual
features
• Method easily adapted for personalised
and/or collaborative search
28. Potential for Personalization
• Which queries have the potential to
benefit by context adaptation
(personalisation)?
• The ones for which different users click on
different results
– Can be studied looking at nDCG of one user
assuming another user’s clicks are ideal
Jaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for
Personalization. ACM Transactions on Computer-Human Interaction (ToCHI)
special issue on Data Mining for Understanding User Needs, 17(1), March
2010.
• Novel in Vitalas: compare IPTC-defined
user groups (instead of individual users)
37. Dean: Temporal Effect
• Log files: “Dean” = “Hurricane Dean”
• Still, query is quite ambiguous:
– James Dean
– Agyness Dean (a model)
– a (university) dean
– Dean Dealannoi
– Howard Dean
– Dean Martin
• Context adaptation for “Dean” requires
archivist
38. Future Work
• Address various normalization issues
– In context adaptation (due to NLLR
approximation)
– In “potential for personalization”/adaptation
• Explore temporal dimension
– Combinations of collection and user context?
• Explore cross-media cluster-based
retrieval
– Use visual features in cluster formation
39. See also
“CWI” Vitalas demonstrations:
http://www.ins.cwi.nl/projects/M4/vitalas/
Collection context instead of user context:
http://www.ins.cwi.nl/projects/M4/vitalas/context_adap
tation.html
Detectors trained by query log
http://olympus.ee.auth.gr/diou/civr2009/