1. Object Recognition with
Deformable Models
Pedro F. Felzenszwalb
Department of Computer Science
University of Chicago
Joint work with: Dan Huttenlocher, Joshua Schwartz,
David McAllester, Deva Ramanan.
2. Example Problems
Detecting rigid objects PASCAL challenge
Medical image
Detecting non-rigid objects analysis
Segmenting cells
3. Deformable Models
• Significant challenge:
- Handling variation in appearance within object classes
- Non-rigid objects, generic categories, etc.
• Deformable models approach:
- Consider each object as a deformed version of a template
- Compact representation
- Leads to interesting modeling and algorithmic problems
4. Overview
• Part I: Pictorial Structures
- Deformable part models
- Highly efficient matching algorithms
• Part II: Deformable Shapes
- Triangulated polygons
- Hierarchical models
• Part III: The PASCAL Challenge
- Recognizing 20 object categories in realistic scenes
- Discriminatively trained, multiscale, deformable part models
5. Part I: Pictorial Structures
• Introduced by Fischler and Elschlager in 1973
• Part-based models:
- Each part represents local visual properties
- “Springs” capture spatial relationships
Matching model to image involves
joint optimization of part locations
“stretch and fit”
6. Local Evidence + Global Decision
• Parts have a match quality at each image location
• Local evidence is noisy
- Parts are detected in the context of the whole model
part
test image match quality
7. Matching Problem
• Model is represented by a graph G = (V, E)
- V = {v ,...,v } are the parts
1 n
- (v ,v ) ∈ E indicates a connection between parts
i j
• mi(li) is a cost for placing part i at location li
• dij(li,lj) is a deformation cost
• Optimal configuration for the object is L = (l1,...,ln) minimizing
n
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E
8. Matching Problem
n
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E
• Assume n parts, k possible locations for each part
- There are k n configurations L
• If graph is a tree we can use dynamic programming
- O(nk ) algorithm
2
• If dij(li,lj) = g(li-lj) we can use min-convolutions
- O(nk) algorithm
- As fast as matching each part separately!
9. Dynamic Programming on Trees
n v2
E(L) = ∑ m (l ) + ∑ d (l ,l )
i i ij i j
i=1 (vi,vj) ∈ E v1
• For each l1 find best l2:
- Best (l ) = min [m (l ) + d
2 1
l2
2 2 12(l1,l2) ]
• “Delete” v2 and solve problem with smaller model
• Keep removing leafs until there is a single part left
10. Min-Convolution Speedup
v2
Best2(l1) = min [m2(l2) + d12(l1,l2)] v1
l2
• Brute force: O(k2) --- k is number of locations
• Suppose d12(l1,l2) = g(l1-l2):
- Best (l ) = min [m (l ) + g(l -l )]
2 1
l2
2 2 1 2
• Min-convolution: O(k) if g is convex
13. Human Tracking
Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance
IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
14. Part II: Deformable Shapes
• Shape is a fundamental cue for recognizing objects
• Many objects have no well defined parts
- We can capture their outlines using deformable models
15. Triangulated Polygons
• Polygonal templates
• Delauney triangulation gives natural decomposition of an object
• Consider deforming each triangle “independently”
Rabbit ear can be bent by
changing shape of a single
triangle
16. Structure of Triangulated Polygons
There are 2 graphs associated with a
triangulated polygon
If the polygon is simple (no holes):
Dual graph is a tree
Graphical structure of triangulation is a 2-tree
17. Deformable Matching
Consider piecewise affine maps from model
to image (taking triangles to triangles)
Find globally optimal deformation using
Model dynamic programming over 2-tree
Matching to MRI data
18. Hierarchical Shape Model
• Shape-tree of curve from a to b:
- Select midpoint c, store relative location c | a,b.
- Left child is a shape-tree of sub-curve from a to c.
- Right child is a shape-tree of sub-curve from c to b.
h
f c d i
e g c | a,b
b
a
e | a,c d | c,b
f | a,e g | e,c h | c,d i | d,b
19. Deformations
• Independently perturb relative locations stored in a shape-tree
- Local and global properties are preserved
- Reconstructed curve is perceptually similar to original
20. Matching
h
f c d i
e g c | a,b
a
b w p
e | a,c d | c,b
r
v f | a,e g | e,c h | c,d i | d,b
q
u
model curve
Match(v, [p,q]) = w1
Match(u, [q,r]) = w2
Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))
similar to parsing with the CKY algorithm
21. Recognizing Leafs
Nearest neighbor classification
15 species
Shape-tree 96.28
75 examples per species
Inner distance 94.13
(25 training, 50 test)
Shape context 88.12
22. Part III: PASCAL Challenge
• ~10,000 images, with ~25,000 target objects
- Objects from 20 categories (person, car, bicycle, cow, table...)
- Objects are annotated with labeled bounding boxes
23.
24. Model Overview
detection root filter part filters deformation
models
Model has a root filter plus deformable parts
25. Histogram of Gradient (HOG) Features
• Image is partitioned into 8x8 pixel blocks
• In each block we compute a histogram of gradient orientations
- Invariant to changes in lighting, small deformations, etc.
• We compute features at different resolutions (pyramid)
26. Filters
• Filters are rectangular templates defining weights for features
• Score is dot product of filter and subwindow of HOG pyramid
H
W
Score of H at this location is H ⋅ W
HOG pyramid
27. Object Hypothesis
Score is sum of filter
scores plus deformation
scores
Image pyramid HOG feature pyramid
Multiscale model captures features at two-resolutions
28. Training
• Training data consists of images with labeled bounding boxes
• Need to learn the model structure, filters and deformation costs
Training
29. Connection With Linear Classifiers
• Score of model is sum of filter scores plus deformation scores
- Bounding box in training data specifies that score should be
high for some placement in a range
w is a model
x is a detection window
z are filter placements
concatenation of filters and concatenation of features
deformation parameters and part displacements
34. Overall Results
• 9 systems competed in the 2007 challenge
• Out of 20 classes we get:
- First place in 10 classes
- Second place in 6 classes
• Some statistics:
- It takes ~2 seconds to evaluate a model in one image
- It takes ~3 hours to train a model
- MUCH faster than most systems
36. Summary
• Deformable models provide an elegant framework for object
detection and recognition
- Efficient algorithms for matching models to images
- Applications: pose estimation, medical image analysis,
object recognition, etc.
• We can learn models from partially labeled data
- Generalized standard ideas from machine learning
- Leads to state-of-the-art results in PASCAL challenge
• Future work: hierarchical models, grammars, 3D objects