SlideShare a Scribd company logo
A location-aware embedding technique for
accurate landmark recognition
Federico Magliani, Navid Mahmoudian Bidgoli, Andrea Prati
ICDSC 2017 – Stanford, USA – 5-7 September 2017
Agenda
2
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Motivations
3
Landmark Recognition problem
➢ try to understand what’s is
in front of you
➢ using client-server
communication
➢ helping with geolocalization
(GPS)
Motivations
4
➢ Challenges
○ high accuracy retrieval (precision)
○ fast research (response to query)
○ reduced memory occupied (mobile friendly)
○ work well with big data (>100k data)
➢ Possible applications
○ augmented reality (tourism)
➢ Why mobile based?
○ everyone owns a mobile phone
○ a mobile phone has powerful HW, that allows to run some applications
Motivations
5
“Changes in the image resolution, illumination conditions, viewpoint and the presence
of distractors such as trees or traffic signs (just to mention some) make the task of
matching features between a query image and the database rather difficult.”
➢ In order to mitigate these problems, the existing approaches rely on feature
description with a certain degree of invariance to scale, orientation and
illumination changes.
Agenda
6
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Summary of contribution
7
➢ A location-aware version of VLAD, called locVLAD, that allows to outperform the state
of the art in the intra-dataset problem. It tries to overcome a weakness of VLAD,
reducing the noise of the features in the borders of the images
➢ The time for vocabulary creation is significantly reduced, using only ⅕ random of the
detected features
➢ A new balanced version of the public dataset ZuBuD is proposed and made available
to the scientific community (ZuBuD+)
Agenda
8
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Related work
9
➢ Bag of Words (BoW): first method for solving the problem (different
techniques: vocabulary tree, …)
➢ Fisher vector: embedding based on Fisher kernel
➢ VLAD and its variants: simplified version of Fisher vector
➢ Hamming embedding: embedding based on binarized descriptors
➢ CNN based: deep neural network, that at the end contain
classification layers
10
Proposed Pipeline
Agenda
11
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
VLAD (Vector of Locally Aggregated Descriptors)
C = {c1
,.., ck
} codebook of k visual words (K-means clustering)
1. Every local descriptor x, extracted from the image, is assigned to the closest cluster
center of the codebook (ci
= NN(xj
))
2. vi
= ∑ (x - ci
) (residuals)
3. VLAD vector is the concatenation of vi
vectors (i = 1, …, k) d-dimensional
4. VLAD normalization to contrast the burstiness problem
16 centroids, features described with SIFT 128d → D=128x16=2048 12
VLAD normalization
13
➢ Signed Square Rooting normalization: sign(xi
) sqrt(|xi
|) followed by L2
norm
➢ Residual normalization: independent residual L2
norm followed by L2
norm
➢ Z-Score normalization: residual normalization followed by subtraction of the mean
from every vector and division by the standard deviation
➢ Power normalization: sign(xi
)|xi
|α
(usually α=0.2) followed by L2
norm
Agenda
14
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Proposed approach: locVLAD
➢ This method allows to improve the performance of VLAD vectors in the recognition
problem.
➢ It tackles this problem by reducing the influence of features found at the borders of the
image.
How does it work?
It consists in a new global descriptor, that is the mean of VLAD descriptors of the original
query image (v̇) and a VLAD descriptor calculated on a cropped query image (v̇cropped
).
15
Proposed approach: locVLAD
The dimension of the cropped image is a parameter, that depends on the used dataset
➢ ZuBuD → 90% of the original query images
➢ Holidays → 70% of the original query images.
16424 features detected 367 features detected
Why does it increase the performance?
Because, usually, the important features for the recognition are located in the center of the
images while the features close to the border are noisy features.
Why not applying VLAD encoding directly on the cropped image?
Because useful information might be lost. Not any guarantee that features in the borders
are only noisy features.
Why not creating a cropped vocabulary?
Experiments were conducted but results were poor.
Proposed approach: locVLAD
17
Agenda
18
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Datasets
➢ INRIA Holidays (1491 images in 2448x3264: 500 classes, 500 query)
➢ ZuBuD (1005 images in 640x480: 201 classes, 115 query in 320x240)
➢ ZuBuD+ (1005 images in 640x480: 201 classes, 1005 query in 320x240)
19
Holidays
20
ZuBuD
21
ZuBuD+
2222
It is the balanced version of ZuBuD
➢ 1005 query in 320x240 instead of 115 query.
➢ The new query images are random choices of database images, but different from other
query images
○ rotation (±90°) and resize
○ resize only
Download: http://implab.ce.unipr.it/?page_id=194
Evaluation Metrics
2323
Different evaluation metrics are used to compare with the state-of-the-art approaches:
➢ Top1 → accuracy retrieval, evaluating only the first position of the ranking
➢ 5 x Recall in Top5 → average of how many times the correct image is in the top 5
results in the ranking
➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results) for
each query, based on the position in the ranking
Results on ZuBuD (and ZuBuD+)
24
Results on ZuBuD (and ZuBuD+)
25
Method Descriptor size Top1 5 x Recall in Top5
Tree histogram (ZuBuD) [7] 10M 98.00 % -
Decision tree (ZuBuD) [9] n/a 91.00 % -
Sparse coding (ZuBuD) [22] 8k*64+1k*36 - 4.538
VLAD (ZuBuD) [12] 4281*128 99.00 % 4.416
VLAD (ZuBuD+) [12] 4281*128 99.00 % 4.526
locVLAD (ZuBuD) 4281*128 100.00 % 4.469
locVLAD (ZuBuD+) 4281*128 100.00 % 4.543
It is worth to note that on ZuBuD the method based on sparse coding slightly outperforms the proposed one.
This is due to an unbalanced query set and, probably, on the use of color information.
Results on Holidays
26
Results on Holidays
27
Method Descriptor size mAP
Sparse coding [22] 8k*64+1k*36 76.51 %
VLAD [12] 4281*128 74.43 %
locVLAD 4281*128 77.20 %
Sparse coding [4] 20k*128 79.00 %
VLAD [12] 20k*128 78.78 %
locVLAD 20k*128 80.89 %
Vocabulary creation
28
Agenda
29
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to VLAD
➢ Proposed approach (locVLAD)
➢ Experimental results
➢ Conclusions and Future Works
Conclusions
➢ The proposed locVLAD technique includes, at a certain degree, information on
the location of the features, by mitigating the negative effects of distractors
found at the image borders.
➢ Experiments are performed on two public datasets, namely ZuBuD and Holidays,
and demonstrate superior recognition accuracy w.r.t. the state of the art.
30
Future works
➢ Compression: try to reduce the dimension of the descriptors, while keeping the
same accuracy in retrieval (mobile friendly).
➢ Indexing: create a system for the evaluation in a large scale domain (adding until 1M
distractors). Passing from Nearest Neighbor problem to Approximate Nearest
Neighbor problem. We are working with kd tree and permutation-based methods.
➢ Sparse coding: new methods for the creation of the vocabulary and the assignment
of the features to the VLAD vector.
31
Thank you for your attention!
questions?
http://implab.ce.unipr.it
32

More Related Content

What's hot

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsNAVER Engineering
 
Learning loss for active learning
Learning loss for active learningLearning loss for active learning
Learning loss for active learningNAVER Engineering
 
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionNÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionKai Katsumata
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
NeuralArt 電腦作畫
NeuralArt 電腦作畫NeuralArt 電腦作畫
NeuralArt 電腦作畫Mark Chang
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Universitat Politècnica de Catalunya
 
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...Kodaira Tomonori
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksSang Jun Lee
 
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...CSCJournals
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 

What's hot (20)

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernels
 
Learning loss for active learning
Learning loss for active learningLearning loss for active learning
Learning loss for active learning
 
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionNÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
NeuralArt 電腦作畫
NeuralArt 電腦作畫NeuralArt 電腦作畫
NeuralArt 電腦作畫
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 

Similar to A location-aware embedding technique for accurate landmark recognition

Data quality evaluation & orbit identification from scatterometer
Data quality evaluation & orbit identification from scatterometerData quality evaluation & orbit identification from scatterometer
Data quality evaluation & orbit identification from scatterometerMudit Dholakia
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Mohammad Shaker
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...SamanthaGallone
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Olaf Hartig
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...Mohamed Elawady
 
ICCSA 2010 Conference Presentation
ICCSA 2010 Conference PresentationICCSA 2010 Conference Presentation
ICCSA 2010 Conference PresentationGonçalo Amador
 

Similar to A location-aware embedding technique for accurate landmark recognition (20)

Data quality evaluation & orbit identification from scatterometer
Data quality evaluation & orbit identification from scatterometerData quality evaluation & orbit identification from scatterometer
Data quality evaluation & orbit identification from scatterometer
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
 
Visual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic VideosVisual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic Videos
 
ICCSA 2010 Conference Presentation
ICCSA 2010 Conference PresentationICCSA 2010 Conference Presentation
ICCSA 2010 Conference Presentation
 

Recently uploaded

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdfKamal Acharya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...Amil baba
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfKamal Acharya
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdfKamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Electivekarthi keyan
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptssuser9bd3ba
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdfKamal Acharya
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 
Antenna efficency lecture course chapter 3.pdf
Antenna  efficency lecture course chapter 3.pdfAntenna  efficency lecture course chapter 3.pdf
Antenna efficency lecture course chapter 3.pdfAbrahamGadissa
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopEmre Günaydın
 

Recently uploaded (20)

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Antenna efficency lecture course chapter 3.pdf
Antenna  efficency lecture course chapter 3.pdfAntenna  efficency lecture course chapter 3.pdf
Antenna efficency lecture course chapter 3.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 

A location-aware embedding technique for accurate landmark recognition

  • 1. A location-aware embedding technique for accurate landmark recognition Federico Magliani, Navid Mahmoudian Bidgoli, Andrea Prati ICDSC 2017 – Stanford, USA – 5-7 September 2017
  • 2. Agenda 2 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 3. Motivations 3 Landmark Recognition problem ➢ try to understand what’s is in front of you ➢ using client-server communication ➢ helping with geolocalization (GPS)
  • 4. Motivations 4 ➢ Challenges ○ high accuracy retrieval (precision) ○ fast research (response to query) ○ reduced memory occupied (mobile friendly) ○ work well with big data (>100k data) ➢ Possible applications ○ augmented reality (tourism) ➢ Why mobile based? ○ everyone owns a mobile phone ○ a mobile phone has powerful HW, that allows to run some applications
  • 5. Motivations 5 “Changes in the image resolution, illumination conditions, viewpoint and the presence of distractors such as trees or traffic signs (just to mention some) make the task of matching features between a query image and the database rather difficult.” ➢ In order to mitigate these problems, the existing approaches rely on feature description with a certain degree of invariance to scale, orientation and illumination changes.
  • 6. Agenda 6 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 7. Summary of contribution 7 ➢ A location-aware version of VLAD, called locVLAD, that allows to outperform the state of the art in the intra-dataset problem. It tries to overcome a weakness of VLAD, reducing the noise of the features in the borders of the images ➢ The time for vocabulary creation is significantly reduced, using only ⅕ random of the detected features ➢ A new balanced version of the public dataset ZuBuD is proposed and made available to the scientific community (ZuBuD+)
  • 8. Agenda 8 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 9. Related work 9 ➢ Bag of Words (BoW): first method for solving the problem (different techniques: vocabulary tree, …) ➢ Fisher vector: embedding based on Fisher kernel ➢ VLAD and its variants: simplified version of Fisher vector ➢ Hamming embedding: embedding based on binarized descriptors ➢ CNN based: deep neural network, that at the end contain classification layers
  • 11. Agenda 11 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 12. VLAD (Vector of Locally Aggregated Descriptors) C = {c1 ,.., ck } codebook of k visual words (K-means clustering) 1. Every local descriptor x, extracted from the image, is assigned to the closest cluster center of the codebook (ci = NN(xj )) 2. vi = ∑ (x - ci ) (residuals) 3. VLAD vector is the concatenation of vi vectors (i = 1, …, k) d-dimensional 4. VLAD normalization to contrast the burstiness problem 16 centroids, features described with SIFT 128d → D=128x16=2048 12
  • 13. VLAD normalization 13 ➢ Signed Square Rooting normalization: sign(xi ) sqrt(|xi |) followed by L2 norm ➢ Residual normalization: independent residual L2 norm followed by L2 norm ➢ Z-Score normalization: residual normalization followed by subtraction of the mean from every vector and division by the standard deviation ➢ Power normalization: sign(xi )|xi |α (usually α=0.2) followed by L2 norm
  • 14. Agenda 14 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 15. Proposed approach: locVLAD ➢ This method allows to improve the performance of VLAD vectors in the recognition problem. ➢ It tackles this problem by reducing the influence of features found at the borders of the image. How does it work? It consists in a new global descriptor, that is the mean of VLAD descriptors of the original query image (v̇) and a VLAD descriptor calculated on a cropped query image (v̇cropped ). 15
  • 16. Proposed approach: locVLAD The dimension of the cropped image is a parameter, that depends on the used dataset ➢ ZuBuD → 90% of the original query images ➢ Holidays → 70% of the original query images. 16424 features detected 367 features detected
  • 17. Why does it increase the performance? Because, usually, the important features for the recognition are located in the center of the images while the features close to the border are noisy features. Why not applying VLAD encoding directly on the cropped image? Because useful information might be lost. Not any guarantee that features in the borders are only noisy features. Why not creating a cropped vocabulary? Experiments were conducted but results were poor. Proposed approach: locVLAD 17
  • 18. Agenda 18 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 19. Datasets ➢ INRIA Holidays (1491 images in 2448x3264: 500 classes, 500 query) ➢ ZuBuD (1005 images in 640x480: 201 classes, 115 query in 320x240) ➢ ZuBuD+ (1005 images in 640x480: 201 classes, 1005 query in 320x240) 19
  • 22. ZuBuD+ 2222 It is the balanced version of ZuBuD ➢ 1005 query in 320x240 instead of 115 query. ➢ The new query images are random choices of database images, but different from other query images ○ rotation (±90°) and resize ○ resize only Download: http://implab.ce.unipr.it/?page_id=194
  • 23. Evaluation Metrics 2323 Different evaluation metrics are used to compare with the state-of-the-art approaches: ➢ Top1 → accuracy retrieval, evaluating only the first position of the ranking ➢ 5 x Recall in Top5 → average of how many times the correct image is in the top 5 results in the ranking ➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results) for each query, based on the position in the ranking
  • 24. Results on ZuBuD (and ZuBuD+) 24
  • 25. Results on ZuBuD (and ZuBuD+) 25 Method Descriptor size Top1 5 x Recall in Top5 Tree histogram (ZuBuD) [7] 10M 98.00 % - Decision tree (ZuBuD) [9] n/a 91.00 % - Sparse coding (ZuBuD) [22] 8k*64+1k*36 - 4.538 VLAD (ZuBuD) [12] 4281*128 99.00 % 4.416 VLAD (ZuBuD+) [12] 4281*128 99.00 % 4.526 locVLAD (ZuBuD) 4281*128 100.00 % 4.469 locVLAD (ZuBuD+) 4281*128 100.00 % 4.543 It is worth to note that on ZuBuD the method based on sparse coding slightly outperforms the proposed one. This is due to an unbalanced query set and, probably, on the use of color information.
  • 27. Results on Holidays 27 Method Descriptor size mAP Sparse coding [22] 8k*64+1k*36 76.51 % VLAD [12] 4281*128 74.43 % locVLAD 4281*128 77.20 % Sparse coding [4] 20k*128 79.00 % VLAD [12] 20k*128 78.78 % locVLAD 20k*128 80.89 %
  • 29. Agenda 29 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  • 30. Conclusions ➢ The proposed locVLAD technique includes, at a certain degree, information on the location of the features, by mitigating the negative effects of distractors found at the image borders. ➢ Experiments are performed on two public datasets, namely ZuBuD and Holidays, and demonstrate superior recognition accuracy w.r.t. the state of the art. 30
  • 31. Future works ➢ Compression: try to reduce the dimension of the descriptors, while keeping the same accuracy in retrieval (mobile friendly). ➢ Indexing: create a system for the evaluation in a large scale domain (adding until 1M distractors). Passing from Nearest Neighbor problem to Approximate Nearest Neighbor problem. We are working with kd tree and permutation-based methods. ➢ Sparse coding: new methods for the creation of the vocabulary and the assignment of the features to the VLAD vector. 31
  • 32. Thank you for your attention! questions? http://implab.ce.unipr.it 32