"Fast and Accurate RMNet: A New Neural Network for Embedded Vision," a Presentation from Intel

© 2019 IOTG Computer Vision (ICV), Intel
Fast and Accurate RMNet: A New
Neural Network for Embedded
Vision
Ilya Krylov
IOTG Computer Vision (ICV), Intel
May 2019

Agenda
▪ Introduction to the Person re-identification problem
▪ Metric learning approach
▪ Feature extractor and distance function selection
▪ RMNet backbone design
▪ Training: losses, data, sampling
▪ Person re-identification task results
▪ Other applications of RMNet backbone
2

Person Re-identification problem statement
Person re-identification (Re-ID) task is to find a given person (probe) in a
gallery of pedestrian images.
Complexity: different cameras, lighting conditions, various poses, angles of
view, not accurate results of detector and so on.
3

Person Re-identification quality metrics
Re-ID output: similarity measure between two images.
Quality metrics evaluation includes following steps:
1. Compute the similarity measure between the probe image and each
pedestrian image in the gallery.
2. Measure the model quality by:
▪ Cumulative matching curve (CMC) at rank@1 – Evaluates the ability
of a method to find the most appropriate gallery image, tells
nothing about the robustness of method
▪ Mean average precision (mAP) - Evaluates the ability of a method to
find all appropriate gallery images, describes how well method can
extract the internal data representation
4

Common approach: Metric learning
▪ Extract internal representation from each
image by feature extractor _
that maps similar images from images
space to close points in embedding
space .
▪ Compare the pair of internal
representations to measure similarity by
distance function that
computes similarity between two points
in embedding space.
5

Distance function
Strong solution:
▪ Parametric model: ― use neural network as distance function.
Fast solution:
▪ Standard non-parametric models: something like L1, L2, cosine
distances.
Problem: we want to split pedestrian images into groups of images of
the same person ―> distance matrix: values ―> estimate all
pairwise distances: pairs.
6

Feature extractor
Lightweight backbone
▪ ResNet50 is too heavy
Single branch solution
▪ Limit the number of auxiliary branches to save computation resources
Small embeddings
▪ Normalized embeddings, 256 floats
▪ Increasing embedding size ―> growing distance computation time ―>
quadratic number of calls of distance function ―> growth of total time
7
Backbone Head

General problems of publicly available backbones:
▪ Developed to solve classification task mostly
▪ No restriction on the capacity to show state of the art at any cost
8
Getting fast net
Train task-specific lightweight
network directly
Deal with state of the art
networks
Quantization Pruning

▪ Select key requirement and grow net architecture by solving problems
▪ Use best-working practices
9
Deep
network
Gradient
Flow
Heavy
net
Bottlenecks
Depth-wise
convolutions
Initialization
Pre-training
Activation
Function
ResNet-like
bottlenecks
3x3 depth-wise
internal conv
Orthogonal
ELU
Residual
connections
Regularization
Dropout in
each block
Strong
approximator
Significant
nonlinearity
Large
receptive field
Properties Key Requirement Problems Best-working Practices Solutions

RMNet
10
Very deep but lightweight
▪ ResNet-like, 109 layers with max 256
channels
▪ Residual block structure:
▪ Squeeze 1x1 -> 3x3 dw -> Expand 1x1
▪ Pre-training on the classification task
▪ Dropout after each residual block
▪ Exponential Linear Unit (ELU) activations
▪ Orthogonal initialization weights

Person Re-identification head
No fully connected layers (to reduce computation time)
▪ Replace default pooling stage (FC) to Global Max Pooling (GMP) layer
Use extra parametrization
▪ Inverted bottleneck with extra nonlinearity: 256 ―> 512 ―> 256
▪ Calibration layer (to mix different target embedding)
11
HxWx256 1x1x256
Global Max
Pooling
1x1x512
conv
1x1x256
conv
ELU
L2
Norm
Re-ID Head 1x1x256
1x1x256
conv
Calibration
L2
Norm
Local
Structure Loss
Global
Structure Loss
Network
output

Overall training scheme
Key components:
▪ Lightweight backbone: RMNet
▪ Strong head after backbone
▪ Hard sample mining procedure
▪ Multi target training: AM-Softmax, Center, PushPlus, Glob PushPlus
losses
12
Data
Training
Sampling
Backbone Head

Losses
▪ AM-Softmax – splits classes with margin
▪ Center loss – makes points of the same class closer to its center
▪ PushPlus losses – makes points of different classes farther apart with
margin greater than inter-class distance
13

Training
Hard sample mining procedure:
▪ Sample k augmented frames for each person from training data
▪ Estimate the weighted loss (AM-Softmax + Center + Glob PushPlus) for
each sample
▪ Train net on mini-batches taken from top 50% of hardest samples
▪ Make stronger data augmentation and repeat
14

Data
Do you need a better CNN-based solution? ―> You need more data!
Train data:
▪ Market1501 (~700 train IDs)
▪ Viper (632 IDs)
▪ MARS (~1200 train + test IDs)
▪ Internal data (~1500 IDs)
Test data:
▪ Internal data (~1300 IDs)
▪ Market1501 (~700 test IDs)
15
~15K samples, ~1300 IDs
~1.3M samples, ~2700 IDs
~20k samples, ~200 IDs
imbalance

Ablation study
16

Person Re-identification models
17
Model Input resolution GFlops MParam
Market-1501 quality
rank@1 mAP
Strong 128x384 0.594 0.820 0.9237 0.8253
Light 64x160 0.124 0.820 0.9166 0.8163
Very fast 48x96 0.028 0.028 0.7791 0.6180

Results
18
• FPS values were obtained using OpenVINO on Intel® Core™ i7. Values are approximate since
backbone inference time is measured only.
• RK stands for Re-ranking technique. Flip means that both original and flipped (mirrored)
images are used for embeddings computation.

Other RMNet-based models
SSD head can be connected
to RMNet backbone to get
fast and good enough object
detectors.
▪ Person detector
▪ Person and face detector
▪ Person, vehicle, bike
detection
▪ People detection and
action recognition
19

Conclusion
▪ RMNet has been developed as fast and accurate network for Person
re-identification task.
▪ It combines near state-of-the-art quality and superior performance.
▪ RMNet backbone can be easily used in other tasks such as object
detection.
▪ All presented models are available in Open Model Zoo.
20

Resources
▪ “Fast and Accurate Person Re-Identification with RMNet” paper
https://arxiv.org/pdf/1812.02465.pdf
▪ Open Model Zoo - contains RMNet-based and other models trained
by Intel
https://github.com/opencv/open_model_zoo
▪ OpenVINO
https://software.intel.com/en-us/openvino-toolkit
21

"Fast and Accurate RMNet: A New Neural Network for Embedded Vision," a Presentation from Intel

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Edge AI and Vision Alliance

Mais de Edge AI and Vision Alliance (20)

Último

Último (20)

"Fast and Accurate RMNet: A New Neural Network for Embedded Vision," a Presentation from Intel