SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
SSD: Single Shot MultiBox Detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, Alexander C.Berg
[arXiv][demo][code] (Mar 2016)
Slides by Míriam Bellver
Computer Vision Reading Group, UPC
28th October, 2016
Outline
▷ Introduction
▷ Related Work
▷ The Single-Shot Detector
▷ Experimental Results
▷ Conclusions
1.
Introduction
SSD: Single Shot MultiBox Detector
Introduction
Object detection
Introduction
Current object detection systems
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
Current object detection systems
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
Current object detection systems
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
SSD: First deep network based object detector
that does not resample pixels or features for
bounding box hypotheses and is as accurate as
approaches that do.
Improvement in speed vs accuracy trade-off
Introduction
Introduction
Contributions:
▷ A single-shot detector for multiple categories that is faster
than state of the art single shot detectors (YOLO) and as
accurate as Faster R-CNN
▷ Predicts category scores and boxes offset for a fixed set of
default BBs using small convolutional filters applied to
feature maps
▷ Predictions of different scales from feature maps of
different scales, and separate predictions by aspect ratio
▷ End-to-end training and high accuracy, improving speed vs
accuracy trade-off
2.
Related Work
SSD: Single Shot MultiBox Detector
Object Detection prior to CNNs
Two different traditional approaches:
▷ Sliding Window: e.g. Deformable Part Model (DPM)
▷ Object proposals: e.g. Selective Search
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model
Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition
Object detection with CNN’s
▷ R-CNN
Girshick, R. (2015). Fast r-cnn
Leveraging the object proposals
bottleneck
▷ SPP-net
▷ Fast R-CNN
He, K., Zhang, X., Ren, S., & Sun, J. (2014, September). Spatial pyramid pooling in deep convolutional networks for visual recognition
Girshick, R. (2015). Fast r-cnn.
Improving quality of proposals using
CNNs
Low-level features object proposals
Proposals generated directly from a DNN
E.g. : MultiBox, Faster R-CNN
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks
Szegedy, C., Reed, S., Erhan, D., & Anguelov, D. (2014). Scalable, high-quality object detection.
Single-shot detectors
Instead of having two networks
Region Proposals Network + Classifier Network
In Single-shot architectures, bounding boxes and confidences
for multiple categories are predicted directly with a single
network
e.g. : Overfeat, YOLO
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You only look once: Unified, real-time object detection
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using
convolutional networks.
Single-shot detectors
Main differences of SSD over YOLO and Overfeat:
▷ Small conv. filters to predict object categories and offsets in BBs
locations, using separate predictors for different aspect ratios, and
applying them on different feature maps to perform detection on multiple
scales
3.1
The Single Shot Detector (SSD)
Model
The Single Shot Detector (SSD)
base network fc6 and fc7
converted to
convolutional
collection of bounding boxes
and scores for each category
The Single Shot Detector (SSD)
base network fc6 and fc7
converted to
convolutional
Multi-scale feature maps for detection: observe how conv feature maps decrease in size and
allow predictions at multiple scales
collection of bounding boxes
and scores for each category
The Single Shot Detector (SSD)
Comparison to YOLO
The Single Shot Detector (SSD)
Convolutional predictors for detection: We apply on top of each conv
feature map a set of filters that predict detections for different aspect ratios
and class categories
The Single Shot Detector (SSD)
What is a detection ?
Described by four parameters (center
bounding box x and y, width and height)
Class category
For all categories we need for
a detection a total of #classes
+ 4 values
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
BUT there are different types of detections!
The Single Shot Detector (SSD)
Different “classes” of detections
aspect ratio 2:1
for cats
aspect ratio 1:2
for cats
aspect ratio 1:1
for cats
The Single Shot Detector (SSD)
Default boxes and aspect ratios
Similar to the anchors of Faster R-CNN, with the difference that SSD
applies them on several feature maps of different resolutions
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
as we have #default boxes, we need
(classes + 4) x #default boxes detectors
The Single Shot Detector (SSD)
Convolutional predictors for detection: We apply on top of each conv
feature map a set of filters that predict detections for different aspect ratios
and class categories
The Single Shot Detector (SSD)
For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to
produce either a score for a category, or a shape offset relative to a default
bounding box coordinates
1024
3
3
Example for conv6:
shape
offset xo to
default box
of aspect
ratio: 1 for
category 1
1024
3
3
score for
category 1
for the
default box
of aspect
ratio: 1
up to classes+4 filters for each default box
considered at that conv feature map
The Single Shot Detector (SSD)
For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to
produce either a score for a category, or a shape offset relative to a default
bounding box coordinates
So, for each conv layer considered, there are
(classes + 4) x default boxes x m x n
outputs
3.2
The Single Shot Detector (SSD)
Training
The Single Shot Detector (SSD)
SSD requires that ground-truth data
is assigned to specific outputs in the
fixed set of detector outputs
The Single Shot Detector (SSD)
Matching strategy:
For each ground truth box we have to select from all the
default boxes the ones that best fit in terms of location, aspect
ratio and scale.
▷ We select the default box with best jaccard overlap. Then
every box has at least 1 correspondence.
▷ Default boxes with a jaccard overlap higher than 0.5 are
also selected
The Single Shot Detector (SSD)
Training objective:
Similar to MultiBox but handles multiple categories.
confidence loss
softmax loss
localization loss
Smooth L1 lossN: number of default matched BBs
x: is 1 if the default box is matched to a
determined ground truth box, and 0
otherwise
l: predicted bb parameters
g: ground truth bb parameters
c: class
is 1 by
cross-validation
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ Feature maps from different layers are used to handle scale variance
▷ Specific feature map locations learn to be responsive to specific areas
of the image and particular scales of objects
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ If m feature maps are used for prediction, the scale of the default
boxes for each feature map is:
0.1
0.95
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ At each scale, different aspect ratios are considered:
width and height of default bbox
The Single Shot Detector (SSD)
Hard negative mining:
Significant imbalance between positive and negative training examples
▷ Use negative samples with higher confidence score
▷ Then the ratio of positive-negative samples is 3:1
The Single Shot Detector (SSD)
Data augmentation:
Each training sample is randomly sampled by one of the
following options:
▷ Use the original image
▷ Sample a path with a minimum jaccard overlap with
objects
▷ Randomly sample a path
4.
Experimental Results
SSD: Single Shot MultiBox Detector
Experimental Results
Base network:
▷ VGG16 (with fc6 and fc7 converted to conv layers and pool5 from 2x2 to
3x3 using atrous algorithm, removed fc8 and dropout)
▷ It is fine-tuned using SGD
▷ Training and testing code is built on caffe
Database:
▷ Training: VOC2007 trainval and VOC2012 trainval (16551 images)
▷ Testing: VOC2007 test (4952 images)
Experimental Results
Mean Average Precision for PASCAL ‘07
Experimental Results
Model analysis
Experimental Results
Visualization for performance
Experimental Results
Sensitivity to different object characteristics with PASCAL
2007 :
Experimental Results
Database:
▷ Training: VOC2007 trainval and test and VOC2012
trainval (21503 images)
▷ Testing: VOC2012 test (10991 images)
Mean Average Precision for PASCAL ‘12
Experimental Results
MS COCO Database:
▷ A total of 300k images
Test-dev results:
Experimental Results
Inference time :
Non-maximum suppression has to be efficient because of the
large number of boxes generated
Visualizations
Visualizations
5.
Conclusions
SSD: Single Shot MultiBox Detector
Conclusions
▷ Single-shot object detector for multiple categories
▷ One key feature is to use multiple convolutional maps
to deal with different scales
▷ More default bounding boxes, the better results
obtained
▷ Comparable accuracy to state-of-the-art object
detectors, but much faster
▷ Future direction: use RNNs to detect and track objects
in video
Thank you for your
attention! Questions?

Mais conteúdo relacionado

Mais procurados

Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
 

Mais procurados (20)

Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
YOLO
YOLOYOLO
YOLO
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
You only look once
You only look onceYou only look once
You only look once
 
Cnn
CnnCnn
Cnn
 
Yolo
YoloYolo
Yolo
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Yolo
YoloYolo
Yolo
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 

Destaque

Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation홍배 김
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)Takanori Ogata
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose MachinesTakanori Ogata
 
Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Ildoo Kim
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization홍배 김
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가Yongha Kim
 

Destaque (9)

Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose Machines
 
Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
 

Semelhante a SSD: Fast and Accurate Object Detection in a Single Shot

#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSSoma Boubou
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 

Semelhante a SSD: Fast and Accurate Object Detection in a Single Shot (20)

Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 

Mais de Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

Mais de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Último

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Último (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

SSD: Fast and Accurate Object Detection in a Single Shot

  • 1. SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C.Berg [arXiv][demo][code] (Mar 2016) Slides by Míriam Bellver Computer Vision Reading Group, UPC 28th October, 2016
  • 2. Outline ▷ Introduction ▷ Related Work ▷ The Single-Shot Detector ▷ Experimental Results ▷ Conclusions
  • 5. Introduction Current object detection systems resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 6. Introduction Current object detection systems Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 7. Introduction Current object detection systems Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 8. Introduction SSD: First deep network based object detector that does not resample pixels or features for bounding box hypotheses and is as accurate as approaches that do. Improvement in speed vs accuracy trade-off
  • 10. Introduction Contributions: ▷ A single-shot detector for multiple categories that is faster than state of the art single shot detectors (YOLO) and as accurate as Faster R-CNN ▷ Predicts category scores and boxes offset for a fixed set of default BBs using small convolutional filters applied to feature maps ▷ Predictions of different scales from feature maps of different scales, and separate predictions by aspect ratio ▷ End-to-end training and high accuracy, improving speed vs accuracy trade-off
  • 11. 2. Related Work SSD: Single Shot MultiBox Detector
  • 12. Object Detection prior to CNNs Two different traditional approaches: ▷ Sliding Window: e.g. Deformable Part Model (DPM) ▷ Object proposals: e.g. Selective Search Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition
  • 13. Object detection with CNN’s ▷ R-CNN Girshick, R. (2015). Fast r-cnn
  • 14. Leveraging the object proposals bottleneck ▷ SPP-net ▷ Fast R-CNN He, K., Zhang, X., Ren, S., & Sun, J. (2014, September). Spatial pyramid pooling in deep convolutional networks for visual recognition Girshick, R. (2015). Fast r-cnn.
  • 15. Improving quality of proposals using CNNs Low-level features object proposals Proposals generated directly from a DNN E.g. : MultiBox, Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks Szegedy, C., Reed, S., Erhan, D., & Anguelov, D. (2014). Scalable, high-quality object detection.
  • 16. Single-shot detectors Instead of having two networks Region Proposals Network + Classifier Network In Single-shot architectures, bounding boxes and confidences for multiple categories are predicted directly with a single network e.g. : Overfeat, YOLO Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You only look once: Unified, real-time object detection Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks.
  • 17. Single-shot detectors Main differences of SSD over YOLO and Overfeat: ▷ Small conv. filters to predict object categories and offsets in BBs locations, using separate predictors for different aspect ratios, and applying them on different feature maps to perform detection on multiple scales
  • 18. 3.1 The Single Shot Detector (SSD) Model
  • 19. The Single Shot Detector (SSD) base network fc6 and fc7 converted to convolutional collection of bounding boxes and scores for each category
  • 20. The Single Shot Detector (SSD) base network fc6 and fc7 converted to convolutional Multi-scale feature maps for detection: observe how conv feature maps decrease in size and allow predictions at multiple scales collection of bounding boxes and scores for each category
  • 21. The Single Shot Detector (SSD) Comparison to YOLO
  • 22. The Single Shot Detector (SSD) Convolutional predictors for detection: We apply on top of each conv feature map a set of filters that predict detections for different aspect ratios and class categories
  • 23. The Single Shot Detector (SSD) What is a detection ? Described by four parameters (center bounding box x and y, width and height) Class category For all categories we need for a detection a total of #classes + 4 values
  • 24. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection
  • 25. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection BUT there are different types of detections!
  • 26. The Single Shot Detector (SSD) Different “classes” of detections aspect ratio 2:1 for cats aspect ratio 1:2 for cats aspect ratio 1:1 for cats
  • 27. The Single Shot Detector (SSD) Default boxes and aspect ratios Similar to the anchors of Faster R-CNN, with the difference that SSD applies them on several feature maps of different resolutions
  • 28. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection as we have #default boxes, we need (classes + 4) x #default boxes detectors
  • 29. The Single Shot Detector (SSD) Convolutional predictors for detection: We apply on top of each conv feature map a set of filters that predict detections for different aspect ratios and class categories
  • 30. The Single Shot Detector (SSD) For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to produce either a score for a category, or a shape offset relative to a default bounding box coordinates 1024 3 3 Example for conv6: shape offset xo to default box of aspect ratio: 1 for category 1 1024 3 3 score for category 1 for the default box of aspect ratio: 1 up to classes+4 filters for each default box considered at that conv feature map
  • 31. The Single Shot Detector (SSD) For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to produce either a score for a category, or a shape offset relative to a default bounding box coordinates So, for each conv layer considered, there are (classes + 4) x default boxes x m x n outputs
  • 32. 3.2 The Single Shot Detector (SSD) Training
  • 33. The Single Shot Detector (SSD) SSD requires that ground-truth data is assigned to specific outputs in the fixed set of detector outputs
  • 34. The Single Shot Detector (SSD) Matching strategy: For each ground truth box we have to select from all the default boxes the ones that best fit in terms of location, aspect ratio and scale. ▷ We select the default box with best jaccard overlap. Then every box has at least 1 correspondence. ▷ Default boxes with a jaccard overlap higher than 0.5 are also selected
  • 35. The Single Shot Detector (SSD) Training objective: Similar to MultiBox but handles multiple categories. confidence loss softmax loss localization loss Smooth L1 lossN: number of default matched BBs x: is 1 if the default box is matched to a determined ground truth box, and 0 otherwise l: predicted bb parameters g: ground truth bb parameters c: class is 1 by cross-validation
  • 36. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ Feature maps from different layers are used to handle scale variance ▷ Specific feature map locations learn to be responsive to specific areas of the image and particular scales of objects
  • 37. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ If m feature maps are used for prediction, the scale of the default boxes for each feature map is: 0.1 0.95
  • 38. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ At each scale, different aspect ratios are considered: width and height of default bbox
  • 39. The Single Shot Detector (SSD) Hard negative mining: Significant imbalance between positive and negative training examples ▷ Use negative samples with higher confidence score ▷ Then the ratio of positive-negative samples is 3:1
  • 40. The Single Shot Detector (SSD) Data augmentation: Each training sample is randomly sampled by one of the following options: ▷ Use the original image ▷ Sample a path with a minimum jaccard overlap with objects ▷ Randomly sample a path
  • 41. 4. Experimental Results SSD: Single Shot MultiBox Detector
  • 42. Experimental Results Base network: ▷ VGG16 (with fc6 and fc7 converted to conv layers and pool5 from 2x2 to 3x3 using atrous algorithm, removed fc8 and dropout) ▷ It is fine-tuned using SGD ▷ Training and testing code is built on caffe Database: ▷ Training: VOC2007 trainval and VOC2012 trainval (16551 images) ▷ Testing: VOC2007 test (4952 images)
  • 43. Experimental Results Mean Average Precision for PASCAL ‘07
  • 46. Experimental Results Sensitivity to different object characteristics with PASCAL 2007 :
  • 47. Experimental Results Database: ▷ Training: VOC2007 trainval and test and VOC2012 trainval (21503 images) ▷ Testing: VOC2012 test (10991 images) Mean Average Precision for PASCAL ‘12
  • 48. Experimental Results MS COCO Database: ▷ A total of 300k images Test-dev results:
  • 49. Experimental Results Inference time : Non-maximum suppression has to be efficient because of the large number of boxes generated
  • 52. 5. Conclusions SSD: Single Shot MultiBox Detector
  • 53. Conclusions ▷ Single-shot object detector for multiple categories ▷ One key feature is to use multiple convolutional maps to deal with different scales ▷ More default bounding boxes, the better results obtained ▷ Comparable accuracy to state-of-the-art object detectors, but much faster ▷ Future direction: use RNNs to detect and track objects in video
  • 54. Thank you for your attention! Questions?