2. Multimodal Information
Single language text-only retrieval reach a limit.
Content-based Image Retrieval is computational
costly and still in infancy stages.
Digital Information is increasingly becoming
multimodal
Example: Wikipedia
3. Modality
Dictionary: A tendency to conform to a general
pattern or belong to a particular group or
category.
Definition of Modality in Information Retrieval
It is unclear, fuzzy
1st Definition: Modality = Media
2nd Definition: Modality = Data Stream
4. MMRetrieval.net
A Product of Cooperation
Started June, 2010
Avi Arampatzis, Lecturer D.U.T.H.
Konstantinos Zagoris, ph.D. D.U.T.H
Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
5. ImageCLEF 2010
Wikipedia Retrieval Task
ImageCLEF 2010 Wikipedia Collection
Consisting of 237434 items
Image Primary Media
Noisy and Incomplete User Supplied Textual
Annotations
Wikipedia Articles Containing the Images
Written in any combination of English, German,
French, or any other unidentified language
6. Wikipedia Collection
<image id="244845" file="images/25/244845.jpg">
<name>Balloons Festival - Chateaux d'Oex.jpg</name>
<text xml:lang="en">
<description/>
<comment/>
<caption article="text/en/4/331622">Balloon
festival </caption>
</text>
<text xml:lang="de">
<description/>
<comment/>
<caption/>
</text>
<text xml:lang="fr">
<description/>
<comment/>
<caption/>
</text>
<comment>(Balloon festival in Chateaux d'Oex.
Category:Chateau d'Oex Category:Hot air balloons)
</comment>
<license>GFDL</license>
</image>
7. ImageCLEF 2010
Wikipedia Retrieval Task
70 test topics
consisting of a textual and a visual part
three title fields (one per language—English,
German, French)
one or more example images
8. Wikipedia Topic
<topic>
<number>8</number>
<title xml:lang="en">tennis player on court</title>
<title xml:lang="de">tennisspieler auf dem platz</title>
<title xml:lang="fr">joueur de tennis sur le terrain</title>
<image>2197587684_94542c6fbd.jpg</image>
<image>777629689_443a25ba08.jpg</image>
</topic>
9. Extraction of Modalities
Joint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
description
comment
caption
article
name
English,
French,
German
Lemur Toolkit V4.11 and Indri V2.11 with
the tf.idf retrieval model
11. Fusion in Information Retrieval
combining evidence about relevance from
different sources of information
from several modalities
fusion consists of two components
score normalization
score combination
12. Score Normalization
the relevance scores are not comparable
popular text retrieval models (tf.idf) can be turned to
probabilities of relevance via the score-distributional
method
image descriptors does not fit
MinMax (maps linearly to the [0,1] )
Zscore (maps to the number of standard deviations it
lies above or below the mean score)
non-linear Known-Item Aggregate Cumulative Density
Function (KIACDF)
16. Fusion Problems
appropriate weighing of modalities and score
normalization/combination are not trivial
problems
if results are assessed by visual similarity only,
fusion is not a theoretically sound method
17. Content-based Image Retrieval
Problems
Content-based Image Retrieval (CBIR) with global
features is notoriously noisy for image queries of
low generality, i.e. the fraction of relevant images
in a collection.
does not scale up well to large databases
efficiency-wise
18. Two – Stage Image Retrieval
how it works: first use the secondary modality to rank the
collection then perform CBIR only on the top-K items
assumption: primary (image) – secondary (text) modalities
hypothesis: CBIR can do better than text retrieval in small
sets or sets of high query generality
efficient benefit: Using a ‘cheaper’ secondary modality, this
improves also efficiency by cutting down on costly CBIR
operations
possible drawback: relevant images with empty or very
noise secondary modalities would be completely missed
19. Previous Work
Best results re-ranking by visual content has been
seen before
mostly in different setups
All these approaches employed a static predefined
K for all queries
not clear if it works
20. Our Two-Stage Method
dynamic K
calculated dynamically per query
optimize a predefined effectiveness measure
without using external information or training
data
22. Best Fusion Method – Max of Sums
i the index running over example images (i=1,2,…)
j running over the visual descriptors (𝑗∈{1,2})
DESCji is the score against the ith example image
for the jth descriptor
parameter w controls the relative contribution of
the two media
𝑠 = 1 − 𝑤 max
𝑖
𝑗
𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
24. Implementation
• developed in the C#/.NET
Framework 4.0
• HTML, CSS and JavaScript (AJAX)
technologies for the interface
• requires a fairly modern browser
25. Directions for Further Research
Multi-stage retrieval for multimodal databases
based on modality hierarchy.
Fuzzy Fusion (replace w with membership
function m).
Create artificial modalities (not only from
relevance scores)
pseudo relevance feedback – cross media
feedback