Object Detection Using R-CNN Deep Learning Framework

Object Detection
Using R-CNN Deep Learning
Framework
Nader Karimi Bavandpour (nader.karimi.b@gmail.com)
Summer School of Intelligent Learning
IPM, 2019

Table of Content
● Machine Learning Key Point: Inductive Bias
● From Classification to Instance Segmentation
● Region Proposal
● R-CNN Framework
2

Machine Learning Key Point:
Inductive Bias
3

Deﬁnition of Inductive Bias
The kind of necessary assumptions about the nature of the target function are subsumed in the phrase
inductive bias.
- Wikipedia
Every machine learning algorithm with any ability to generalize beyond the training data that it sees has
some type of inductive bias.
- StackOverflow
4

Examples of Inductive Bias
● Maximum Margin: Maximize the width of the boundary between two classes
● Nearest Neighbors: Most of the cases in a small neighborhood in feature space belong to the same
class
● Minimum Cross-Validation Error: Select the hypothesis with the lowest cross-validation error
5
○ Although cross-validation may seem to be free of bias,
the "no free lunch" theorems show that cross-validation must be biased.
● Locality of Receptive Field: Use convolutional layers instead of fc layers

From Classiﬁcation to
Instance Segmentation
6

Object Classiﬁcation
7
● Image Category Recognition
● Input: image
● Output: Class label
● Types:
○ Binary/Multi-class Classification
○ Multiclass Classification
○ Binary/Multi-label Classification

Object Localization
8
● Object Bounding Box Recognition
● Input: image
● Output: Box in the image (x, y, w, h)

Semantic Segmentation
9
● Pixel Category Recognition
● Input: Image
● Output: Category-aware pixel labels

Instance Segmentation
10
● Instance-Aware Pixel Category Recognition
● Input: Image
● Output: Instance-aware pixel labels

Intersection Over Union (IoU)
Important measurement for object localization
Used in both training and evaluation
11

Datasets: ImageNet Challenge
● 1000 Classes
● Each image has 1 class with at least one bounding box
● About 800 Training images per class
● Algorithm produces 5 (class + bounding box) guesses
● Correct if at least one of guess has correct class and bounding box
at least 50% intersection over union.
12

Selective Search for Region Proposal
● A region proposal algorithm used in object detection
● Designed to be fast with a very high recall
● Based on computing hierarchical grouping of similar regions based on
color, texture, size and shape compatibility
15

● First takes an image as input
16

● Generates initial sub-segmentations
17

● Combines the similar regions to form a larger region
○ based on color similarity, texture similarity, size
similarity, and shape compatibility
● Finally, these regions produce the Regions of
Interest (RoI)
18

R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
20

R-CNN Family
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
21

Problems with R-CNN
● Extracting 2,000 regions for each image based on selective search
● Extracting features using CNN for every image region. Suppose we have N images, then the number of
CNN features will be N*2,000
● The entire process of object detection using R-CNN has three models:
○ CNN for feature extraction
○ Linear SVM classifier for identifying objects
○ Regression model for tightening the bounding boxes
27

R-CNN Family
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
28

Fast RCNN
● Selective search as a proposal method
to find the Regions of Interest is slow
● Takes around 2 seconds per image to
detect objects, which is much better
compared to RCNN
29

R-CNN Family
30

Faster RCNN
● Region Proposal Network (RPN) for region proposal
○ Input: Image of any size
○ Output: A set of rectangular object proposals and objectness
scores
○ Related to attention mechanisms
31

Faster RCNN
● Feature maps from CNN are passed to the
Region Proposal Network (RPN)
● k Anchor boxes of different shapes are
generated using a sliding window in the RPN
● Anchor boxes are fixed sized boundary boxes
that are placed throughout the image and
have different shapes and size
32

Faster RCNN
● For each anchor, RPN predicts two things:
○ The first is the probability that an anchor is an object (it does not consider which
class the object belongs to)
○ Second is the bounding box regressor for adjusting the anchors to better fit the
object
33

R-CNN Family
34

Mask R-CNN
● Extends Faster R-CNN by adding a
branch for predicting an object mask in
parallel with the existing branch for
bounding box recognition
35

Mask R-CNN
● Defines a multi-task loss on each sampled RoI
as:
L = L_cls + L_box + L_mask
36

Object Detection Using R-CNN Deep Learning Framework

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Object Detection Using R-CNN Deep Learning Framework

Semelhante a Object Detection Using R-CNN Deep Learning Framework (20)

Último

Último (20)

Object Detection Using R-CNN Deep Learning Framework