SlideShare uma empresa Scribd logo
1 de 27
MMRetrieval.net
A Multimodal Search Engine
Multimodal Information
 Single language text-only retrieval reach a limit.
 Content-based Image Retrieval is computational
costly and still in infancy stages.
 Digital Information is increasingly becoming
multimodal
 Example: Wikipedia
Modality
 Dictionary: A tendency to conform to a general
pattern or belong to a particular group or
category.
 Definition of Modality in Information Retrieval
 It is unclear, fuzzy
 1st Definition: Modality = Media
 2nd Definition: Modality = Data Stream
MMRetrieval.net
 A Product of Cooperation
 Started June, 2010
 Avi Arampatzis, Lecturer D.U.T.H.
 Konstantinos Zagoris, ph.D. D.U.T.H
 Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
ImageCLEF 2010
Wikipedia Retrieval Task
 ImageCLEF 2010 Wikipedia Collection
 Consisting of 237434 items
 Image Primary Media
 Noisy and Incomplete User Supplied Textual
Annotations
 Wikipedia Articles Containing the Images
 Written in any combination of English, German,
French, or any other unidentified language
Wikipedia Collection
<image id="244845" file="images/25/244845.jpg">
<name>Balloons Festival - Chateaux d'Oex.jpg</name>
<text xml:lang="en">
<description/>
<comment/>
<caption article="text/en/4/331622">Balloon
festival </caption>
</text>
<text xml:lang="de">
<description/>
<comment/>
<caption/>
</text>
<text xml:lang="fr">
<description/>
<comment/>
<caption/>
</text>
<comment>(Balloon festival in Chateaux d'Oex.
Category:Chateau d'Oex Category:Hot air balloons)
</comment>
<license>GFDL</license>
</image>
ImageCLEF 2010
Wikipedia Retrieval Task
 70 test topics
 consisting of a textual and a visual part
 three title fields (one per language—English,
German, French)
 one or more example images
Wikipedia Topic
<topic>
<number>8</number>
<title xml:lang="en">tennis player on court</title>
<title xml:lang="de">tennisspieler auf dem platz</title>
<title xml:lang="fr">joueur de tennis sur le terrain</title>
<image>2197587684_94542c6fbd.jpg</image>
<image>777629689_443a25ba08.jpg</image>
</topic>
Extraction of Modalities
Joint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
description
comment
caption
article
name
English,
French,
German
Lemur Toolkit V4.11 and Indri V2.11 with
the tf.idf retrieval model
MMRetrieval.net Structure
Fusion in Information Retrieval
 combining evidence about relevance from
different sources of information
 from several modalities
 fusion consists of two components
 score normalization
 score combination
Score Normalization
 the relevance scores are not comparable
 popular text retrieval models (tf.idf) can be turned to
probabilities of relevance via the score-distributional
method
 image descriptors does not fit
 MinMax (maps linearly to the [0,1] )
 Zscore (maps to the number of standard deviations it
lies above or below the mean score)
 non-linear Known-Item Aggregate Cumulative Density
Function (KIACDF)
Score Combination
 CompSUM
 CompMULT
 CompMAX
 CompMED
 CompWSUM
Results
Participant MAP
1 xrce 0.2765
2 unt 0.2251
3 telecom 0.2227
4 i2rcviu 0.2126
5 dcu 0.2039
6 cheshire 0.2014
7 duth 0.1998
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5200
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4836
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514
Corrected Results
Participant MAP
1 xrce 0.2765
2 duth 0.2561
3 unt 0.2251
4 telecom 0.2227
5 i2rcviu 0.2126
6 dcu 0.2039
7 cheshire 0.2014
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5257
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4900
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514
Fusion Problems
 appropriate weighing of modalities and score
normalization/combination are not trivial
problems
 if results are assessed by visual similarity only,
fusion is not a theoretically sound method
Content-based Image Retrieval
Problems
 Content-based Image Retrieval (CBIR) with global
features is notoriously noisy for image queries of
low generality, i.e. the fraction of relevant images
in a collection.
 does not scale up well to large databases
efficiency-wise
Two – Stage Image Retrieval
 how it works: first use the secondary modality to rank the
collection then perform CBIR only on the top-K items
 assumption: primary (image) – secondary (text) modalities
 hypothesis: CBIR can do better than text retrieval in small
sets or sets of high query generality
 efficient benefit: Using a ‘cheaper’ secondary modality, this
improves also efficiency by cutting down on costly CBIR
operations
 possible drawback: relevant images with empty or very
noise secondary modalities would be completely missed
Previous Work
 Best results re-ranking by visual content has been
seen before
 mostly in different setups
 All these approaches employed a static predefined
K for all queries
 not clear if it works
Our Two-Stage Method
 dynamic K
 calculated dynamically per query
 optimize a predefined effectiveness measure
 without using external information or training
data
Retrieval Results
cockpit of an airplane
Image Only
Text Only
Static K=25
Dynamic K
Best Fusion Method – Max of Sums
 i the index running over example images (i=1,2,…)
 j running over the visual descriptors (𝑗∈{1,2})
 DESCji is the score against the ith example image
for the jth descriptor
 parameter w controls the relative contribution of
the two media
𝑠 = 1 − 𝑤 max
𝑖
𝑗
𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
Fusion vs Two-Stage
Implementation
• developed in the C#/.NET
Framework 4.0
• HTML, CSS and JavaScript (AJAX)
technologies for the interface
• requires a fairly modern browser
Directions for Further Research
 Multi-stage retrieval for multimodal databases
based on modality hierarchy.
 Fuzzy Fusion (replace w with membership
function m).
 Create artificial modalities (not only from
relevance scores)
 pseudo relevance feedback – cross media
feedback
Publications
 Multimedia Search with Noisy Modalities: Fusion and
Multistage Retrieval. Avi Arampatzis, Savvas A.
Chatzichristofis, and Konstantinos Zagoris. In: CLEF
(Notebook Papers/LABs/Workshops), 22-23
September, Padua, Italy, 2010.
 www.MMRetrieval.net: A Multimodal Search Engine.
Konstantinos Zagoris, Avi Arampatzis, and Savvas A.
Chatzichristofis. In: Proceedings of the 3rd
International Conference on SImilarity Search and
APplications, SISAP 2010, Istanbul, Turkey, September
18-19, 2010. © Association for Computing Machinery
(ACM).
MultiModal Retrieval Image

Mais conteúdo relacionado

Mais procurados

Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
Vinh Bui
 
Test PDF
Test PDFTest PDF
Test PDF
AlgnuD
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
IAEME Publication
 
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approachAn adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
Cemal Ardil
 

Mais procurados (20)

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
 
Test PDF
Test PDFTest PDF
Test PDF
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Optimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral ImagesOptimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral Images
 
50120140501016
5012014050101650120140501016
50120140501016
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
 
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wild
 
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approachAn adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
 
A1804010105
A1804010105A1804010105
A1804010105
 

Semelhante a MultiModal Retrieval Image

Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 

Semelhante a MultiModal Retrieval Image (20)

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
Big-Data Analytics for Media Management
Big-Data Analytics for Media ManagementBig-Data Analytics for Media Management
Big-Data Analytics for Media Management
 
Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
2008.11560v2.pdf
2008.11560v2.pdf2008.11560v2.pdf
2008.11560v2.pdf
 
An Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their ClassifiersAn Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their Classifiers
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
Deep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection SystemDeep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection System
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 
Poster
PosterPoster
Poster
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

MultiModal Retrieval Image

  • 2. Multimodal Information  Single language text-only retrieval reach a limit.  Content-based Image Retrieval is computational costly and still in infancy stages.  Digital Information is increasingly becoming multimodal  Example: Wikipedia
  • 3. Modality  Dictionary: A tendency to conform to a general pattern or belong to a particular group or category.  Definition of Modality in Information Retrieval  It is unclear, fuzzy  1st Definition: Modality = Media  2nd Definition: Modality = Data Stream
  • 4. MMRetrieval.net  A Product of Cooperation  Started June, 2010  Avi Arampatzis, Lecturer D.U.T.H.  Konstantinos Zagoris, ph.D. D.U.T.H  Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
  • 5. ImageCLEF 2010 Wikipedia Retrieval Task  ImageCLEF 2010 Wikipedia Collection  Consisting of 237434 items  Image Primary Media  Noisy and Incomplete User Supplied Textual Annotations  Wikipedia Articles Containing the Images  Written in any combination of English, German, French, or any other unidentified language
  • 6. Wikipedia Collection <image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license> </image>
  • 7. ImageCLEF 2010 Wikipedia Retrieval Task  70 test topics  consisting of a textual and a visual part  three title fields (one per language—English, German, French)  one or more example images
  • 8. Wikipedia Topic <topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image> </topic>
  • 9. Extraction of Modalities Joint Composite Descriptor (JCD) Spartial Color Distribution (SpCD) description comment caption article name English, French, German Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model
  • 11. Fusion in Information Retrieval  combining evidence about relevance from different sources of information  from several modalities  fusion consists of two components  score normalization  score combination
  • 12. Score Normalization  the relevance scores are not comparable  popular text retrieval models (tf.idf) can be turned to probabilities of relevance via the score-distributional method  image descriptors does not fit  MinMax (maps linearly to the [0,1] )  Zscore (maps to the number of standard deviations it lies above or below the mean score)  non-linear Known-Item Aggregate Cumulative Density Function (KIACDF)
  • 13. Score Combination  CompSUM  CompMULT  CompMAX  CompMED  CompWSUM
  • 14. Results Participant MAP 1 xrce 0.2765 2 unt 0.2251 3 telecom 0.2227 4 i2rcviu 0.2126 5 dcu 0.2039 6 cheshire 0.2014 7 duth 0.1998 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5200 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4836 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  • 15. Corrected Results Participant MAP 1 xrce 0.2765 2 duth 0.2561 3 unt 0.2251 4 telecom 0.2227 5 i2rcviu 0.2126 6 dcu 0.2039 7 cheshire 0.2014 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5257 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4900 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  • 16. Fusion Problems  appropriate weighing of modalities and score normalization/combination are not trivial problems  if results are assessed by visual similarity only, fusion is not a theoretically sound method
  • 17. Content-based Image Retrieval Problems  Content-based Image Retrieval (CBIR) with global features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.  does not scale up well to large databases efficiency-wise
  • 18. Two – Stage Image Retrieval  how it works: first use the secondary modality to rank the collection then perform CBIR only on the top-K items  assumption: primary (image) – secondary (text) modalities  hypothesis: CBIR can do better than text retrieval in small sets or sets of high query generality  efficient benefit: Using a ‘cheaper’ secondary modality, this improves also efficiency by cutting down on costly CBIR operations  possible drawback: relevant images with empty or very noise secondary modalities would be completely missed
  • 19. Previous Work  Best results re-ranking by visual content has been seen before  mostly in different setups  All these approaches employed a static predefined K for all queries  not clear if it works
  • 20. Our Two-Stage Method  dynamic K  calculated dynamically per query  optimize a predefined effectiveness measure  without using external information or training data
  • 21. Retrieval Results cockpit of an airplane Image Only Text Only Static K=25 Dynamic K
  • 22. Best Fusion Method – Max of Sums  i the index running over example images (i=1,2,…)  j running over the visual descriptors (𝑗∈{1,2})  DESCji is the score against the ith example image for the jth descriptor  parameter w controls the relative contribution of the two media 𝑠 = 1 − 𝑤 max 𝑖 𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
  • 24. Implementation • developed in the C#/.NET Framework 4.0 • HTML, CSS and JavaScript (AJAX) technologies for the interface • requires a fairly modern browser
  • 25. Directions for Further Research  Multi-stage retrieval for multimodal databases based on modality hierarchy.  Fuzzy Fusion (replace w with membership function m).  Create artificial modalities (not only from relevance scores)  pseudo relevance feedback – cross media feedback
  • 26. Publications  Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.  www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).