SlideShare uma empresa Scribd logo
1 de 19
Large Scale Online Learning of Image Similarity
Through Ranking
from G. Chechik, V. Sharma, U. Shalit, S. Bengio – JML 2010

by Lukas Tencer
Motivation
• Needed for applications, which compare any kind of data:
    – image, video, web-page, document
• Two levels of similarity:
    – Features (visual for images)
    – Semantic
• Large-scale learning: limited by computational cost, not by
  availability of data
• What similarity the user wants to express, visual or semantic?
• Presented approach deals with semantic similarity once we have
  visual similarity
• Similarity learning requires pairwise distance, not always available
• Instead pairwise distance use relative distance, two images are
  close:
    – if are returned by same query
    – if does have the same label
Example of query
• Especially problem in QVE (Query by Visual Example)

• Query:



• Images retrieved for     vs.    visually similar images
  “mount royal park”
Motivation II
• Relationship to classification:
   – Similarity measure could be used as metric for
     classification
   – Good classification infers labels, which induce
     similarity across images
• Constrain on semidefinite positive
  similarity matrix:
   – for small data prevents overfitting
   – for big data, with enough of samples could
     be removed to reduce computational cost
Problem Statement
• Get pairwise similarity function S on given data
  on relative pairs of image simlarities
• Given data P and rij r ( pi , p j ) relative
  similarities
• We do not have access to all values of r, where
  it is not available equals 0
• Then S ( pi , p j ) is defined as:
S ( pi , pi )     S ( pi , pi ),   pi , pi , pi        P, such as r ( pi , pi )   r ( pi , pi )

SW ( pi , p j )    piTWp j , whereW               Rd   d
Online Algorithm
• Passive-Aggressive family of learning
  algorithms, online learning algorithm (iterative)
   – PA 1:
                      1      2
   wt   1   arg min   2
                        w wt , such as l ( w; ( xt , yt ))   0
              w Rn


   – Passive, if loss function is 0
   – Aggressive, if loss is positive, enforces to satisfy
     regardless of the step size l ( w; ( xt , yt )) 0

   – PA2: Trade off between proximity and desired
     margin – constrained optimization problem
Online Algorithm II
• So we are searching for S, with safety margin of 1, to
  then:
                    SW ( pi , pi )           SW ( pi , pi ) 1
• The hinge loss function is defined as:
        lW ( pi , pi , pi )    max{ 0,1 SW ( pi , pi ) SW ( pi , pi )}

        LW                    lW ( pi , pi , pi )
                 ( pi , pi , pi ) P
• Then the PA 2 constrained optimization problem is:
                 i           1            i 1 2
               w arg min W W                      C
                      W      2                Fro

               such that lW ( pi , pi , pi )      and 0
  where C is the parameter, which controls tradeoff
  between margin enforcement and proximity of solution
Online Algorithm III




• Loss bound could be derived by rewriting
  into linear classification problem
Sampling strategy
• Uniformly sample pi from P
• Uniformly sample pi+ from images with same category
• Uniformly sample pi- from images which does not share
  category with pi,
   – pi- could be chosen by random from all images, if number of
     categories and number of queries is very large
• If relevance feedback r(pi,pj) is not just binary function,
  then sampling of positive examples could be changed
  to prioritize samples with higher relevance
Image representation
• bag-of-word approach (bag-of-local-descriptors)
   – get regions of interest
   – calculate local descriptors
   – treat them independently
• Divide image into overlapping square blocks
• Extract color and edge descriptors
   – Edge: uniform Local Binary Patterns – difference of intensities
     at circular neighborhood,
       • 2^8 possible sequence = 256 bin histogram
       • Non-uniform sequences could be merged  59 bin histogram
   – Color: histograms from k-means clustering
       • Train color codebook and map block pixel to closes value in codebook
   – Concatenate in the end
Image representation II
• Aim for high dimensional sparse vector representation
• Thus representing local descriptor as visual term and
  image is represented as binary vector indicating
  presence/absence of visual term
• Visual terms are rated according to term frequency and
  inverse document frequency

• Parameters of setup:
   –   20 bins for colors
   –   10000 visterm vocabulary size (approx 70 non 0 values / img)
   –   Blocks of 64x64 overlapping each 32 pixels
   –   Blocks extracted at different scales, by downscaling images by
       factor of 1:25 until less then 10 block remains
Experiments and evaluation
• Tested in 2 settings
   – Caltech256 dataset (30k images)
   – Web-Scale experiment (2.7 M images)
   – (another databases for image retrieval testing: MIRFLICK
     1M, Corel5k, Corel30k, UCID)
• Web-Scale Experiment:



   – Queries from Google Image Search and relevance feedback
   – Stop condition for training is value of mean average precision (160M
     iterations) ~ 4000 min on single CPU
   – Evaluation Criterion: mAP and precision at top k
Failure cases
Scalability
•   Comparison with Largest Margin Nearest Neighbour LMNN
•   Scales linearly with number of images
Caltech 256 test
Discussion
• Metric learning could help to capture semantic relationships, once
  visual similarity is available
• Relevance feedback or semantic similarity measure (class
  modeling) is required to capture semantic similarity
• Compared to raw visual similarity comparison precision at top k
  and mAP increases,
   • recall is hard to measure for databases, which are not fully
      annotated
• Online metric learning is an ongoing problem (Davis 2007) (Jain
  2008) (Chechik 2010) and even though applied to images, could
  be used in other fields to capture semantic similarity
   • Images: object semantics vs. visual features
   • Documents: topics vs. textual features (dtf,tf-idf)
   • SBIR: relative object mapping vs. sketch features
Thank you for your attention
              Available at: http://www.slideshare.net/lukastencer

Mais conteúdo relacionado

Mais procurados

Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
Universitat Politècnica de Catalunya
 

Mais procurados (20)

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
Deep Learning behind Prisma
Deep Learning behind PrismaDeep Learning behind Prisma
Deep Learning behind Prisma
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
Deep image generating models
Deep image generating modelsDeep image generating models
Deep image generating models
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...Efficient initialization for nonnegative matrix factorization based on nonneg...
Efficient initialization for nonnegative matrix factorization based on nonneg...
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
 

Destaque

Data driven recruiting
Data driven recruitingData driven recruiting
Data driven recruiting
Brendan Browne
 
AIT presentation
AIT presentationAIT presentation
AIT presentation
Shan .
 

Destaque (14)

Web-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrievalWeb-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrieval
 
ICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and InteractionICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and Interaction
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability Distibution
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to Probability
 
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
Supervised Learning of Semantic Classes for Image Annotation and RetrievalSupervised Learning of Semantic Classes for Image Annotation and Retrieval
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
 
Telnet and SSH
Telnet and SSHTelnet and SSH
Telnet and SSH
 
Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20
 
Slovakia Presentation at Day of Cultures
Slovakia Presentation at Day of CulturesSlovakia Presentation at Day of Cultures
Slovakia Presentation at Day of Cultures
 
Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1
 
Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011
 
Data driven recruiting
Data driven recruitingData driven recruiting
Data driven recruiting
 
Computer graphics on web and in mobile devices
Computer graphics on web and in mobile devicesComputer graphics on web and in mobile devices
Computer graphics on web and in mobile devices
 
AIT presentation
AIT presentationAIT presentation
AIT presentation
 

Semelhante a Large Scale Online Learning of Image Similarity Through Ranking

Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbir
Ikram Moalla
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
Wei Lu
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
ActiveEon
 

Semelhante a Large Scale Online Learning of Image Similarity Through Ranking (20)

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
2. IP Fundamentals.pdf
2. IP Fundamentals.pdf2. IP Fundamentals.pdf
2. IP Fundamentals.pdf
 
Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbir
 
Computer Vision descriptors
Computer Vision descriptorsComputer Vision descriptors
Computer Vision descriptors
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
1 chayes
1 chayes1 chayes
1 chayes
 
Image processing 1-lectures
Image processing  1-lecturesImage processing  1-lectures
Image processing 1-lectures
 
Digital Image Fundamentals - II
Digital Image Fundamentals - IIDigital Image Fundamentals - II
Digital Image Fundamentals - II
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Large Scale Online Learning of Image Similarity Through Ranking

  • 1. Large Scale Online Learning of Image Similarity Through Ranking from G. Chechik, V. Sharma, U. Shalit, S. Bengio – JML 2010 by Lukas Tencer
  • 2. Motivation • Needed for applications, which compare any kind of data: – image, video, web-page, document • Two levels of similarity: – Features (visual for images) – Semantic • Large-scale learning: limited by computational cost, not by availability of data • What similarity the user wants to express, visual or semantic? • Presented approach deals with semantic similarity once we have visual similarity • Similarity learning requires pairwise distance, not always available • Instead pairwise distance use relative distance, two images are close: – if are returned by same query – if does have the same label
  • 3. Example of query • Especially problem in QVE (Query by Visual Example) • Query: • Images retrieved for vs. visually similar images “mount royal park”
  • 4. Motivation II • Relationship to classification: – Similarity measure could be used as metric for classification – Good classification infers labels, which induce similarity across images • Constrain on semidefinite positive similarity matrix: – for small data prevents overfitting – for big data, with enough of samples could be removed to reduce computational cost
  • 5. Problem Statement • Get pairwise similarity function S on given data on relative pairs of image simlarities • Given data P and rij r ( pi , p j ) relative similarities • We do not have access to all values of r, where it is not available equals 0 • Then S ( pi , p j ) is defined as: S ( pi , pi ) S ( pi , pi ), pi , pi , pi P, such as r ( pi , pi ) r ( pi , pi ) SW ( pi , p j ) piTWp j , whereW Rd d
  • 6. Online Algorithm • Passive-Aggressive family of learning algorithms, online learning algorithm (iterative) – PA 1: 1 2 wt 1 arg min 2 w wt , such as l ( w; ( xt , yt )) 0 w Rn – Passive, if loss function is 0 – Aggressive, if loss is positive, enforces to satisfy regardless of the step size l ( w; ( xt , yt )) 0 – PA2: Trade off between proximity and desired margin – constrained optimization problem
  • 7. Online Algorithm II • So we are searching for S, with safety margin of 1, to then: SW ( pi , pi ) SW ( pi , pi ) 1 • The hinge loss function is defined as: lW ( pi , pi , pi ) max{ 0,1 SW ( pi , pi ) SW ( pi , pi )} LW lW ( pi , pi , pi ) ( pi , pi , pi ) P • Then the PA 2 constrained optimization problem is: i 1 i 1 2 w arg min W W C W 2 Fro such that lW ( pi , pi , pi ) and 0 where C is the parameter, which controls tradeoff between margin enforcement and proximity of solution
  • 8. Online Algorithm III • Loss bound could be derived by rewriting into linear classification problem
  • 9. Sampling strategy • Uniformly sample pi from P • Uniformly sample pi+ from images with same category • Uniformly sample pi- from images which does not share category with pi, – pi- could be chosen by random from all images, if number of categories and number of queries is very large • If relevance feedback r(pi,pj) is not just binary function, then sampling of positive examples could be changed to prioritize samples with higher relevance
  • 10. Image representation • bag-of-word approach (bag-of-local-descriptors) – get regions of interest – calculate local descriptors – treat them independently • Divide image into overlapping square blocks • Extract color and edge descriptors – Edge: uniform Local Binary Patterns – difference of intensities at circular neighborhood, • 2^8 possible sequence = 256 bin histogram • Non-uniform sequences could be merged  59 bin histogram – Color: histograms from k-means clustering • Train color codebook and map block pixel to closes value in codebook – Concatenate in the end
  • 11. Image representation II • Aim for high dimensional sparse vector representation • Thus representing local descriptor as visual term and image is represented as binary vector indicating presence/absence of visual term • Visual terms are rated according to term frequency and inverse document frequency • Parameters of setup: – 20 bins for colors – 10000 visterm vocabulary size (approx 70 non 0 values / img) – Blocks of 64x64 overlapping each 32 pixels – Blocks extracted at different scales, by downscaling images by factor of 1:25 until less then 10 block remains
  • 12. Experiments and evaluation • Tested in 2 settings – Caltech256 dataset (30k images) – Web-Scale experiment (2.7 M images) – (another databases for image retrieval testing: MIRFLICK 1M, Corel5k, Corel30k, UCID) • Web-Scale Experiment: – Queries from Google Image Search and relevance feedback – Stop condition for training is value of mean average precision (160M iterations) ~ 4000 min on single CPU – Evaluation Criterion: mAP and precision at top k
  • 13.
  • 15.
  • 16. Scalability • Comparison with Largest Margin Nearest Neighbour LMNN • Scales linearly with number of images
  • 18. Discussion • Metric learning could help to capture semantic relationships, once visual similarity is available • Relevance feedback or semantic similarity measure (class modeling) is required to capture semantic similarity • Compared to raw visual similarity comparison precision at top k and mAP increases, • recall is hard to measure for databases, which are not fully annotated • Online metric learning is an ongoing problem (Davis 2007) (Jain 2008) (Chechik 2010) and even though applied to images, could be used in other fields to capture semantic similarity • Images: object semantics vs. visual features • Documents: topics vs. textual features (dtf,tf-idf) • SBIR: relative object mapping vs. sketch features
  • 19. Thank you for your attention Available at: http://www.slideshare.net/lukastencer