SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
© 2019 IOTG Computer Vision (ICV), Intel
Fast and Accurate RMNet: A New
Neural Network for Embedded
Vision
Ilya Krylov
IOTG Computer Vision (ICV), Intel
May 2019
© 2019 IOTG Computer Vision (ICV), Intel
Agenda
▪ Introduction to the Person re-identification problem
▪ Metric learning approach
▪ Feature extractor and distance function selection
▪ RMNet backbone design
▪ Training: losses, data, sampling
▪ Person re-identification task results
▪ Other applications of RMNet backbone
2
© 2019 IOTG Computer Vision (ICV), Intel
Person Re-identification problem statement
Person re-identification (Re-ID) task is to find a given person (probe) in a
gallery of pedestrian images.
Complexity: different cameras, lighting conditions, various poses, angles of
view, not accurate results of detector and so on.
3
© 2019 IOTG Computer Vision (ICV), Intel
Person Re-identification quality metrics
Re-ID output: similarity measure between two images.
Quality metrics evaluation includes following steps:
1. Compute the similarity measure between the probe image and each
pedestrian image in the gallery.
2. Measure the model quality by:
▪ Cumulative matching curve (CMC) at rank@1 – Evaluates the ability
of a method to find the most appropriate gallery image, tells
nothing about the robustness of method
▪ Mean average precision (mAP) - Evaluates the ability of a method to
find all appropriate gallery images, describes how well method can
extract the internal data representation
4
© 2019 IOTG Computer Vision (ICV), Intel
Common approach: Metric learning
▪ Extract internal representation from each
image by feature extractor _
that maps similar images from images
space to close points in embedding
space .
▪ Compare the pair of internal
representations to measure similarity by
distance function that
computes similarity between two points
in embedding space.
5
© 2019 IOTG Computer Vision (ICV), Intel
Distance function
Strong solution:
▪ Parametric model: ― use neural network as distance function.
Fast solution:
▪ Standard non-parametric models: something like L1, L2, cosine
distances.
Problem: we want to split pedestrian images into groups of images of
the same person ―> distance matrix: values ―> estimate all
pairwise distances: pairs.
6
© 2019 IOTG Computer Vision (ICV), Intel
Feature extractor
Lightweight backbone
▪ ResNet50 is too heavy
Single branch solution
▪ Limit the number of auxiliary branches to save computation resources
Small embeddings
▪ Normalized embeddings, 256 floats
▪ Increasing embedding size ―> growing distance computation time ―>
quadratic number of calls of distance function ―> growth of total time
7
Backbone Head
© 2019 IOTG Computer Vision (ICV), Intel
Lightweight backbone
General problems of publicly available backbones:
▪ Developed to solve classification task mostly
▪ No restriction on the capacity to show state of the art at any cost
8
Getting fast net
Train task-specific lightweight
network directly
Deal with state of the art
networks
Quantization Pruning
© 2019 IOTG Computer Vision (ICV), Intel
Lightweight backbone
▪ Select key requirement and grow net architecture by solving problems
▪ Use best-working practices
9
Deep
network
Gradient
Flow
Heavy
net
Bottlenecks
Depth-wise
convolutions
Initialization
Pre-training
Activation
Function
ResNet-like
bottlenecks
3x3 depth-wise
internal conv
Orthogonal
ELU
Residual
connections
Regularization
Dropout in
each block
Strong
approximator
Significant
nonlinearity
Large
receptive field
Properties Key Requirement Problems Best-working Practices Solutions
© 2019 IOTG Computer Vision (ICV), Intel
RMNet
10
Very deep but lightweight
▪ ResNet-like, 109 layers with max 256
channels
▪ Residual block structure:
▪ Squeeze 1x1 -> 3x3 dw -> Expand 1x1
▪ Pre-training on the classification task
▪ Dropout after each residual block
▪ Exponential Linear Unit (ELU) activations
▪ Orthogonal initialization weights
© 2019 IOTG Computer Vision (ICV), Intel
Person Re-identification head
No fully connected layers (to reduce computation time)
▪ Replace default pooling stage (FC) to Global Max Pooling (GMP) layer
Use extra parametrization
▪ Inverted bottleneck with extra nonlinearity: 256 ―> 512 ―> 256
▪ Calibration layer (to mix different target embedding)
11
HxWx256 1x1x256
Global Max
Pooling
1x1x512
conv
1x1x256
conv
ELU
L2
Norm
Re-ID Head 1x1x256
1x1x256
conv
Calibration
L2
Norm
Local
Structure Loss
Global
Structure Loss
Network
output
© 2019 IOTG Computer Vision (ICV), Intel
Overall training scheme
Key components:
▪ Lightweight backbone: RMNet
▪ Strong head after backbone
▪ Hard sample mining procedure
▪ Multi target training: AM-Softmax, Center, PushPlus, Glob PushPlus
losses
12
Data
Training
Sampling
Backbone Head
© 2019 IOTG Computer Vision (ICV), Intel
Losses
▪ AM-Softmax – splits classes with margin
▪ Center loss – makes points of the same class closer to its center
▪ PushPlus losses – makes points of different classes farther apart with
margin greater than inter-class distance
13
© 2019 IOTG Computer Vision (ICV), Intel
Training
Hard sample mining procedure:
▪ Sample k augmented frames for each person from training data
▪ Estimate the weighted loss (AM-Softmax + Center + Glob PushPlus) for
each sample
▪ Train net on mini-batches taken from top 50% of hardest samples
▪ Make stronger data augmentation and repeat
14
© 2019 IOTG Computer Vision (ICV), Intel
Data
Do you need a better CNN-based solution? ―> You need more data!
Train data:
▪ Market1501 (~700 train IDs)
▪ Viper (632 IDs)
▪ MARS (~1200 train + test IDs)
▪ Internal data (~1500 IDs)
Test data:
▪ Internal data (~1300 IDs)
▪ Market1501 (~700 test IDs)
15
~15K samples, ~1300 IDs
~1.3M samples, ~2700 IDs
~20k samples, ~200 IDs
imbalance
© 2019 IOTG Computer Vision (ICV), Intel
Ablation study
16
© 2019 IOTG Computer Vision (ICV), Intel
Person Re-identification models
17
Model Input resolution GFlops MParam
Market-1501 quality
rank@1 mAP
Strong 128x384 0.594 0.820 0.9237 0.8253
Light 64x160 0.124 0.820 0.9166 0.8163
Very fast 48x96 0.028 0.028 0.7791 0.6180
© 2019 IOTG Computer Vision (ICV), Intel
Results
18
• FPS values were obtained using OpenVINO on Intel® Core™ i7. Values are approximate since
backbone inference time is measured only.
• RK stands for Re-ranking technique. Flip means that both original and flipped (mirrored)
images are used for embeddings computation.
© 2019 IOTG Computer Vision (ICV), Intel
Other RMNet-based models
SSD head can be connected
to RMNet backbone to get
fast and good enough object
detectors.
▪ Person detector
▪ Person and face detector
▪ Person, vehicle, bike
detection
▪ People detection and
action recognition
19
© 2019 IOTG Computer Vision (ICV), Intel
Conclusion
▪ RMNet has been developed as fast and accurate network for Person
re-identification task.
▪ It combines near state-of-the-art quality and superior performance.
▪ RMNet backbone can be easily used in other tasks such as object
detection.
▪ All presented models are available in Open Model Zoo.
20
© 2019 IOTG Computer Vision (ICV), Intel
Resources
▪ “Fast and Accurate Person Re-Identification with RMNet” paper
https://arxiv.org/pdf/1812.02465.pdf
▪ Open Model Zoo - contains RMNet-based and other models trained
by Intel
https://github.com/opencv/open_model_zoo
▪ OpenVINO
https://software.intel.com/en-us/openvino-toolkit
21

Mais conteúdo relacionado

Mais de Edge AI and Vision Alliance

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...Edge AI and Vision Alliance
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...Edge AI and Vision Alliance
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...Edge AI and Vision Alliance
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from InstrumentalEdge AI and Vision Alliance
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AIEdge AI and Vision Alliance
 

Mais de Edge AI and Vision Alliance (20)

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

"Fast and Accurate RMNet: A New Neural Network for Embedded Vision," a Presentation from Intel

  • 1. © 2019 IOTG Computer Vision (ICV), Intel Fast and Accurate RMNet: A New Neural Network for Embedded Vision Ilya Krylov IOTG Computer Vision (ICV), Intel May 2019
  • 2. © 2019 IOTG Computer Vision (ICV), Intel Agenda ▪ Introduction to the Person re-identification problem ▪ Metric learning approach ▪ Feature extractor and distance function selection ▪ RMNet backbone design ▪ Training: losses, data, sampling ▪ Person re-identification task results ▪ Other applications of RMNet backbone 2
  • 3. © 2019 IOTG Computer Vision (ICV), Intel Person Re-identification problem statement Person re-identification (Re-ID) task is to find a given person (probe) in a gallery of pedestrian images. Complexity: different cameras, lighting conditions, various poses, angles of view, not accurate results of detector and so on. 3
  • 4. © 2019 IOTG Computer Vision (ICV), Intel Person Re-identification quality metrics Re-ID output: similarity measure between two images. Quality metrics evaluation includes following steps: 1. Compute the similarity measure between the probe image and each pedestrian image in the gallery. 2. Measure the model quality by: ▪ Cumulative matching curve (CMC) at rank@1 – Evaluates the ability of a method to find the most appropriate gallery image, tells nothing about the robustness of method ▪ Mean average precision (mAP) - Evaluates the ability of a method to find all appropriate gallery images, describes how well method can extract the internal data representation 4
  • 5. © 2019 IOTG Computer Vision (ICV), Intel Common approach: Metric learning ▪ Extract internal representation from each image by feature extractor _ that maps similar images from images space to close points in embedding space . ▪ Compare the pair of internal representations to measure similarity by distance function that computes similarity between two points in embedding space. 5
  • 6. © 2019 IOTG Computer Vision (ICV), Intel Distance function Strong solution: ▪ Parametric model: ― use neural network as distance function. Fast solution: ▪ Standard non-parametric models: something like L1, L2, cosine distances. Problem: we want to split pedestrian images into groups of images of the same person ―> distance matrix: values ―> estimate all pairwise distances: pairs. 6
  • 7. © 2019 IOTG Computer Vision (ICV), Intel Feature extractor Lightweight backbone ▪ ResNet50 is too heavy Single branch solution ▪ Limit the number of auxiliary branches to save computation resources Small embeddings ▪ Normalized embeddings, 256 floats ▪ Increasing embedding size ―> growing distance computation time ―> quadratic number of calls of distance function ―> growth of total time 7 Backbone Head
  • 8. © 2019 IOTG Computer Vision (ICV), Intel Lightweight backbone General problems of publicly available backbones: ▪ Developed to solve classification task mostly ▪ No restriction on the capacity to show state of the art at any cost 8 Getting fast net Train task-specific lightweight network directly Deal with state of the art networks Quantization Pruning
  • 9. © 2019 IOTG Computer Vision (ICV), Intel Lightweight backbone ▪ Select key requirement and grow net architecture by solving problems ▪ Use best-working practices 9 Deep network Gradient Flow Heavy net Bottlenecks Depth-wise convolutions Initialization Pre-training Activation Function ResNet-like bottlenecks 3x3 depth-wise internal conv Orthogonal ELU Residual connections Regularization Dropout in each block Strong approximator Significant nonlinearity Large receptive field Properties Key Requirement Problems Best-working Practices Solutions
  • 10. © 2019 IOTG Computer Vision (ICV), Intel RMNet 10 Very deep but lightweight ▪ ResNet-like, 109 layers with max 256 channels ▪ Residual block structure: ▪ Squeeze 1x1 -> 3x3 dw -> Expand 1x1 ▪ Pre-training on the classification task ▪ Dropout after each residual block ▪ Exponential Linear Unit (ELU) activations ▪ Orthogonal initialization weights
  • 11. © 2019 IOTG Computer Vision (ICV), Intel Person Re-identification head No fully connected layers (to reduce computation time) ▪ Replace default pooling stage (FC) to Global Max Pooling (GMP) layer Use extra parametrization ▪ Inverted bottleneck with extra nonlinearity: 256 ―> 512 ―> 256 ▪ Calibration layer (to mix different target embedding) 11 HxWx256 1x1x256 Global Max Pooling 1x1x512 conv 1x1x256 conv ELU L2 Norm Re-ID Head 1x1x256 1x1x256 conv Calibration L2 Norm Local Structure Loss Global Structure Loss Network output
  • 12. © 2019 IOTG Computer Vision (ICV), Intel Overall training scheme Key components: ▪ Lightweight backbone: RMNet ▪ Strong head after backbone ▪ Hard sample mining procedure ▪ Multi target training: AM-Softmax, Center, PushPlus, Glob PushPlus losses 12 Data Training Sampling Backbone Head
  • 13. © 2019 IOTG Computer Vision (ICV), Intel Losses ▪ AM-Softmax – splits classes with margin ▪ Center loss – makes points of the same class closer to its center ▪ PushPlus losses – makes points of different classes farther apart with margin greater than inter-class distance 13
  • 14. © 2019 IOTG Computer Vision (ICV), Intel Training Hard sample mining procedure: ▪ Sample k augmented frames for each person from training data ▪ Estimate the weighted loss (AM-Softmax + Center + Glob PushPlus) for each sample ▪ Train net on mini-batches taken from top 50% of hardest samples ▪ Make stronger data augmentation and repeat 14
  • 15. © 2019 IOTG Computer Vision (ICV), Intel Data Do you need a better CNN-based solution? ―> You need more data! Train data: ▪ Market1501 (~700 train IDs) ▪ Viper (632 IDs) ▪ MARS (~1200 train + test IDs) ▪ Internal data (~1500 IDs) Test data: ▪ Internal data (~1300 IDs) ▪ Market1501 (~700 test IDs) 15 ~15K samples, ~1300 IDs ~1.3M samples, ~2700 IDs ~20k samples, ~200 IDs imbalance
  • 16. © 2019 IOTG Computer Vision (ICV), Intel Ablation study 16
  • 17. © 2019 IOTG Computer Vision (ICV), Intel Person Re-identification models 17 Model Input resolution GFlops MParam Market-1501 quality rank@1 mAP Strong 128x384 0.594 0.820 0.9237 0.8253 Light 64x160 0.124 0.820 0.9166 0.8163 Very fast 48x96 0.028 0.028 0.7791 0.6180
  • 18. © 2019 IOTG Computer Vision (ICV), Intel Results 18 • FPS values were obtained using OpenVINO on Intel® Core™ i7. Values are approximate since backbone inference time is measured only. • RK stands for Re-ranking technique. Flip means that both original and flipped (mirrored) images are used for embeddings computation.
  • 19. © 2019 IOTG Computer Vision (ICV), Intel Other RMNet-based models SSD head can be connected to RMNet backbone to get fast and good enough object detectors. ▪ Person detector ▪ Person and face detector ▪ Person, vehicle, bike detection ▪ People detection and action recognition 19
  • 20. © 2019 IOTG Computer Vision (ICV), Intel Conclusion ▪ RMNet has been developed as fast and accurate network for Person re-identification task. ▪ It combines near state-of-the-art quality and superior performance. ▪ RMNet backbone can be easily used in other tasks such as object detection. ▪ All presented models are available in Open Model Zoo. 20
  • 21. © 2019 IOTG Computer Vision (ICV), Intel Resources ▪ “Fast and Accurate Person Re-Identification with RMNet” paper https://arxiv.org/pdf/1812.02465.pdf ▪ Open Model Zoo - contains RMNet-based and other models trained by Intel https://github.com/opencv/open_model_zoo ▪ OpenVINO https://software.intel.com/en-us/openvino-toolkit 21