https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
4. INTRODUCTION
Curriculum Learning for Recurrent VOS - 4 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
5. INTRODUCTION
Curriculum Learning for Recurrent VOS - 5 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
6. INTRODUCTION
Curriculum Learning for Recurrent VOS - 6 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
THE DATASET
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
7. INTRODUCTION
Curriculum Learning for Recurrent VOS - 7 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
THE DATASET THE MODEL
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
11. INTRODUCTION
Curriculum Learning for Recurrent VOS - 11 of 144
THE TASK
Estimated by the modelGiven to the model
Semi-supervised or “one-shot” Video Object Segmentation
13. KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 13 of 144
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
14. KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 14 of 144
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
Its video sequences present challenges:
15. KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 15 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
16. KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 16 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
17. KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 17 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
19. THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Curriculum Learning for Recurrent VOS - 19 of 144
Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network
for Video Object Segmentation”, CVPR 2019.
20. THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Curriculum Learning for Recurrent VOS - 20 of 144
Athar, A., Mahadevan, S., Oˇsep, A., Leal-Taix´e, L., Leibe, B.: Stem-seg: Spatio-temporal embeddings for instance segmentation in videos., ECCV (2020)
22. SETS OF EXPERIMENTS
All techniques tested on two sets of experiments:
Resolution Batch Size Length clip
287x950 2 3
Resolution Batch Size Length clip
256x448 4 5
Curriculum Learning for Recurrent VOS - 22 of 144
23. METRICS
The results have been evaluated on the official metrics of the MOTS Challenge.
- sMOTSA has been defined as the reference metric:
Curriculum Learning for Recurrent VOS - 23 of 144
Paul Voigtlaender et al. “MOTS: Multi-Object Tracking and Segmentation”, CVPR 2019.
24. METRICS
The results have been evaluated on the official metrics of the MOTS Challenge.
- sMOTSA has been defined as the reference metric:
Curriculum Learning for Recurrent VOS - 25 of 144
Paul Voigtlaender et al. “MOTS: Multi-Object Tracking and Segmentation”, CVPR 2019.
59. FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 59 of 144
Ideally:
But we have limitations (e.g. memory constraints)
…
…
..
N
fram
es of the sequence
…
...
.
75. FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 75 of 144
76. FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 76 of 144
77. FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 77 of 144
83. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 83 of 144
KITTI-MOTS is a crowded dataset:
84. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 84 of 144
time (frame sequence)
space(objectsequence)
85. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 85 of 144
time (frame sequence)
TEMPORAL RECURRENCE
86. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 86 of 144
space(objectsequence)
SPATIAL RECURRENCE
87. TEMPORAL AND SPATIAL RECURRENCES
Proposed curriculum:
Curriculum Learning for Recurrent VOS - 87 of 144
88. TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 88 of 144
89. TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 89 of 144
90. TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
all training
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 90 of 144
91. TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
all training
Only temporal during
the first half of training
Only temporal during the
second half of training
Curriculum Learning for Recurrent VOS - 91 of 144
92. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 92 of 144
93. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 93 of 144
94. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 94 of 144
Ground-truth
95. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 95 of 144
Only Spatio-Temporal
Ground-truth
96. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 96 of 144
Only Spatio-Temporal Only Temporal
Ground-truth
97. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 97 of 144
Only Spatio-Temporal Only Temporal
Only Temporal first half
Ground-truth
98. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 98 of 144
Only Spatio-Temporal Only Temporal
Only Temporal first half Only Temporal second half
Ground-truth
99. TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 99 of 144
119. YouTube-VOS
Curriculum Learning for Recurrent VOS - 119 of 144
Ning Xu et al. “YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark”, ECCV 2018
121. YouTube-VOS
Curriculum Learning for Recurrent VOS - 121 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
122. YouTube-VOS
Curriculum Learning for Recurrent VOS - 122 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
123. YouTube-VOS
Curriculum Learning for Recurrent VOS - 123 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
129. CONCLUSIONS
Curriculum Learning for Recurrent VOS - 129 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
130. CONCLUSIONS
Curriculum Learning for Recurrent VOS - 130 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
131. CONCLUSIONS
Curriculum Learning for Recurrent VOS - 131 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
132. CONCLUSIONS
Curriculum Learning for Recurrent VOS - 132 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
133. CONCLUSIONS
Curriculum Learning for Recurrent VOS - 133 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
140. FUTURE WORK
Curriculum Learning for Recurrent VOS - 140 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
141. FUTURE WORK
Curriculum Learning for Recurrent VOS - 141 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
Other curriculums
142. FUTURE WORK
Curriculum Learning for Recurrent VOS - 142 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
Other curriculums
Combination of the
best curriculums