ICRA Nathan Piasco

Learning Scene Geometry for Visual Localization in Challenging Conditions
Nathan Piasco1,2, Désiré Sidibé1, Valérie Gouet-Brunet2 and Cédric Demonceaux1
20/05/2019
2019 IEEE International Conference on Robotics and Automation
1. VIBOT ERL,CNRS 6000, ImViA, Université Bourgogne
Franche-Comté
2. LaSTIG, IGN, ENSG, Université Paris-Est, F-94160
Saint-Mandé, France

Problem statement
We aim to recover
the position of an
unknown query.
2 / 18

Problem statement
We have access
to geo-localized
data.
2 / 18

Problem statement
We cast the image
localisation prob-
lem as an image-
retrieval problem.
2 / 18

Problem statement
We transfer the
position of the re-
trieved candidate
to the query.
2 / 18

Challenge in visual based localization
Drastic visual
changes occur due
to season/day-
night cycles.
3 / 18

Geometry to the rescue
However, geometric information still remains the same.
4 / 18

Unfortunately, geometric information is not always available.
4 / 18

How to use partial geometric information to improve image descriptor for localization?
4 / 18

Learning through missing modality

System components
CNN
CNN
CNN
Image
anchor
Positive
example
Negative
example
Triplet
lossDesc.
Desc.
Desc.
Desc.
We use a CNN as trainable global image descriptor1
.
1
Relja Arandjelovi´c, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic, NetVLAD: CNN architecture for weakly supervised place recognition,
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pages 5297–5307, 2017
5 / 18

System components
CNN
CNN
CNN
Image
anchor
Positive
example
Negative
example
Triplet
lossDesc.
Desc.
Desc.
Desc.
The deep image descriptor can be train with image triplet and triplet loss.
5 / 18

System components
CNN
CNN
CNN
Image
anchor
Positive
example
Negative
example
Triplet
lossDesc.
Desc.
Desc.
Desc.
Images/Depth
pair
CNN decoder
L1 Loss LθG
CNN encoder
We trained a encoder-decoder to produce depth map according to monocular images.
5 / 18

Our method: learning through missing modality
Image triplet
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
We use triplet loss to produce
strong image descriptor.
6 / 18

Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Latent image representation is
given to a CNN decoder to
reproduce the scene geometry.
The CNN decoder is simply
trained by minimizing L1
distance between ground truth
and reconstructed depth.
6 / 18

Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
We use another CNN to
produce strong depth map
descriptor.
6 / 18

Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Final descriptor is obtained by
concatenating image and
depth map descriptors.
6 / 18

Training policy
Image triplet
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Data required to train the
descriptor part of our system:
7 / 18

Training policy
Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Data required to train the
depth reconstruction part of
our system:
7 / 18

System deployment
NN Encoder
NN decoder
Compact descriptor
Nearest
Neighbor fast
indexing
Reference images + poses Most similar image, with pose
Unknown pose query
Inferred depth map
f mage poseg
NN Encoder
Reference descriptors
The depth information is only needed during the training step!
8 / 18

Localization in challenging conditions

Dataset for Localization in challenging conditions
Dataset: we train and test our proposal on RobotCar1
dataset
Deep architectures: Resnet18 or Alexnet encoder combined with NetVLAD or MAC descriptor
1
Will Maddern, Geoﬀrey Pascoe, Chris Linegar, and Paul Newman, 1 year, 1000 km: The Oxford RobotCar dataset, The International Journal of
Robotics Research (IJRR), 2016
2
Judy Hoﬀman, Saurabh Gupta, and Trevor Darrell, Learning with Side Information through Modality Hallucination, In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pages 826–834, 2016
9 / 18

dataset
Competitors: Hallucination network2
& same descriptor only trained with RGB
1
2
9 / 18

dataset
Competitors: Hallucination network2
& same descriptor only trained with RGB
Metric: we compare the distance between the top ranked returned database image position and the
query ground truth position.
1
2
9 / 18

Results: long-term localization
Queries atabase
Example of queries and nearest image in
the database.
20 30 40 50
D - Distance to top 1 candidate (m)
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Recall@D(%)
0 10 20
0.5
1
RGBtD (our) + NetVLAD (Rt)
RGB + NetVLAD (Rt)
RGBtD (our) + NetVLAD (A)
RGBtD (hall) + NetVLAD (A)
RGB + NetVLAD (A)
RGBtD (our) + MAC (R)
RGB + MAC (R)
RGBtD (our) + MAC (A)
RGBtD (hall) + MAC (A)
RGB+ MAC (A)
Competitors:
--- Only RGB [RGB]
-x- Hallucination network
[RGBtD (Hall)]
-o- Our proposal [RGBtD
(our)]
10 / 18

Results: cross-season localization
Queries atabase
the database.
20 30 40 50
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Recall@D(%)
0 10 20
0.5
1
RGB + NetVLAD (Rt)
RGB + NetVLAD (A)
RGB + MAC (R)
RGB+ MAC (A)
Competitors:
--- Only RGB [RGB]
[RGBtD (Hall)]
(our)]
11 / 18

Failure case: Night to day
atabaseQueries
the database.
20 30 40 50
0.05
0.1
0.15
0.2
Recall@D(%)
0 10 20
0.5
1
RGB + NetVLAD (Rt)
RGB + NetVLAD (A)
RGB + MAC (R)
RGB+ MAC (A)
Competitors:
--- Only RGB [RGB]
[RGBtD (Hall)]
(our)]
12 / 18

Improving night to day localization
Night images that serve
as input for the
generative network
Ground truth depth maps
Generated depth maps
Our network is not able to generate proper depth maps from night images.
13 / 18

Recall - Training policy
Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Thanks to the design of our
method, we can improve
generation performances of
the decoder without impacting
the descriptors networks.
14 / 18

Recall - Training policy
Images/Depth
pair
Reconstructed
depth
CNN decoder
L1 Loss
Triplet Loss
Descriptors
concatenation
LθG
LθD;θI
Triplet Loss
LθI
Triplet Loss
LθD
CNN
CNN
Desc.
Desc.
Thanks to the design of our
method, we can improve
generation performances of
the decoder without impacting
the descriptors networks.
We ﬁne tune our system with
pairs of night image/depth
map.
14 / 18

Improving night to day localization
Night images that
serve as input for
the generative
network
Ground truth
depth maps
Generated depth
maps
Generated depth
maps, with ﬁne
tuning
15 / 18

Results: Night to day (fine tuning)
20 30 40 50
0.05
0.1
0.15
0.2
Recall@D(%)
Night vs daytime (w/o fine tuning)
20 30 40 50
0.05
0.1
0.15
0.2
0.25
Recall@D(%)
Night vs daytime (w/ fine tuning)
With fine tuning with a small amount of data, we are able to nearly double localization performances.
16 / 18

Conclusion
Summary:
• new global image descriptor trained with side depth modality,
• especially well suited to long term and cross season outdoor localization,
• promising results for challenging night to day image retrieval.
17 / 18

Conclusion
Summary:
• new global image descriptor trained with side depth modality,
• especially well suited to long term and cross season outdoor localization,
• promising results for challenging night to day image retrieval.
For future works:
• use other modalities,
• investigate less trivial descriptor fusion method,
• test on other visual localisation tasks (e.g. direct pose regression).
17 / 18

References
Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic, NetVLAD: CNN
architecture for weakly supervised place recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI), pages 5297–5307, 2017.
Judy Hoffman, Saurabh Gupta, and Trevor Darrell, Learning with Side Information through
Modality Hallucination, In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 826–834, 2016.
Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman, 1 year, 1000 km: The Oxford
RobotCar dataset, The International Journal of Robotics Research (IJRR), 2016.

ICRA Nathan Piasco

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a ICRA Nathan Piasco

Semelhante a ICRA Nathan Piasco (20)

Último

Último (20)

ICRA Nathan Piasco