This document summarizes research on using computer vision and machine learning techniques to perform object detection without full bounding box annotations.
It describes a new multiple instance learning (MIL) method called CR-MILBOOST that can learn from weakly labeled image data where only the image label is provided, not exact object locations. It also discusses adapting pre-trained deep convolutional neural networks from classification to detection by fine-tuning layers on detection data. Experimental results show these methods can dramatically improve object detection performance compared to fully supervised training even when annotations are weak.
13. Smart cars
Mobileye
Vision systems currently in high-end BMW, GM, Volvo models
By 2010: 70% of car manufacturers
Slide content courtesy of Amnon Shashua
32. Reminder: What is MIL?
31
Supervised Learning
Each instance has an associated label
MIL: Weaker Supervision
Examples come in bags
Each Bag has a label
Negative Bag: all instances in bag are negative
Positive Bag: at least one instance in bag is positive
33. Supervised vs MIL (binary)
32
Supervised Learning MI Learning
xi, yi( ) Î RD
´ -1,1{ } Xi = xi1,… , xiK{ }, yi( )Î RD
( )
K
´ -1,1{ }
j x( )> 0 if y = +1
j x( )< 0 if y = -1
max
j
j(xij ) > 0 if yi = +1
max
j
j(xij ) < 0 if yi = -1
j*
x( )= argmin
j x( )
L j;x, y( ) j*
x( )= argmin
j x( )
L j;X, y( )
34. Related Methods
33
How to estimate latent labels for positives
Gartner, ICML’02
Xi =
1
N
xijå
Xu, ICML’04
j(Xi ) =
1
N
j(xijå )
Andrews, NIPS’03
j(Xi )= max
j
j(xij )
Bunescu, ICML’07
SVM Constraints
Viola, NIPS’07
pi =1-Õj (1- pij )
Supervised MIL
37. CR-MILBOOST
36
Two Step Procedure
Estimate Probabilities on latent label
Integrate estimate in new loss
Mitigates label estimation error by incorporating
priors
38. CR-MILBOOST
37
Q = j1 x( ),j2 x( ),… ,jq x( ){ }
hij º P yij = yi Q( )=
1
1+e
-yi jq xij( )å
hi º P yi Q( )= max
j
hij
Step 1
41. Experiments: Features
40
h*
e,R (x) =
xe (x,m)
mÎR
å
xd (x,m)
dÎF,mÎR
å
Weak Learners:
An edge orientation
A sub-window
A threshold
e,R,t( )
Simple, Efficient
Q=4, number of stumps
f x( ) = akhk x( )
k
å
52. Conclusion
51
New MIL method: CR-MILBOOST
Two step procedure
Dramatic increase in performance 200% on two
datasets
Quality of selected examples still suffer from
additional ambiguity when compared to the fully
supervised examples
53. Joint work with Judy Hoffman, Eric Tzeng,
Sergio Guadarrama and Trevor Darrell at UC
Berkeley
Adapting Deep CNNs from
Classification to Detection
53
54. Recall: classification is easier than detection
54
Classification label: Easy to label
Detection label: much more difficult and costly!
dog apple
dog apple
56. cat: 0.90
dog: 0.85
airplane: 0.05
person: 0.10
layers 1-5
fc6 fc7
fcA
fcB
Classification data
from categories A and B
Train Classification
CNN
Deep Convolutional Neural Network
57. dog: 0.87
person: 0.15
cat: 0.90
dog: 0.85
background: 0.25
airplane: 0.05
person: 0.10
layers 1-5
det
layers 1-5
fc6
det
fc6
fc7
det
fc7
fcA
fcB
det
fcB
Classification data
from categories A and B
Train Classification
CNN
Detection data
from categories B
Labeled
warped region
Train adapted
detection CNN
dog
background
background: 0.25
det
layers 1-5
det
fc6
det
fc7
Final Combined and
fully adapted CNN
cat: 0.90
airplane: 0.02det
fcA
dog: 0.45
person: 0.15
det
fcB
adapt
background
(c) Output Layer Adaptation
(a)ClassificationCNN
(b) Hidden Layer Adaptation
63. Conclusion
63
Presented two new methods for object detector
training with minimal bounding box annotation
MIL based method for learning from results of image
search
Adaptation from classification to detection task