Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.
The harmony potential: fusing local and global information for semantic image segmentation
1. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
The harmony potential:
fusing local and global information for semantic image segmentation
Andrew D. Bagdanov
bagdanov@cvc.uab.es
Departamento de Ciencias de la Computación
Universidad Autónoma de Barcelona
CVPR 2010 (to appear)
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
2. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
3. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Outline
1 Introduction
Semantic image segmentation
Semantic categories
Our main idea
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
4. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Giving semantics to pixels
Image Object Class
Semantic image segmentation is not object segmentation
Only for simple cases are they the same
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
5. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Turning a hard problem into a harder one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
6. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Make that a very hard one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
Occlusions, varying viewpoint and size complicate things
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
7. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Semantic categories
20 semantic categories for Pascal
aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow,
diningtable, dog, horse, motorbike, person, potted plant, sheep,
sofa, train, and tv/monitor.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
8. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
SOA: Conditional Random Fields (CRFs)
One of the most successful approaches to image segmentation is
the Hierarchical CRF approach.
Using potential functions, information at different scales can be
incorporated into the segmentation.
We identify three levels of scale: local, mid-level and global [Zhu,
NIPS2008].
We show how these three levels of scale can be integrated in a
way that preserves their unique characteristics.
Existing techniques apply overly-simplified models of context that
do not generalize upward from local to global scales.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
9. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Global constraints on label combinations
Our principal idea is to use global classification to enhance
segmentation results.
Global image classification results tend to be less noisy than ones.
We will use them to constrain the combinations of semantic labels
we are likely to encounter during segmentation.
We also show how the resulting optimization problem can be
made tractable by learning to efficiently subsample label
combinations at the global level.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
10. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Outline
1 Introduction
2 Graph cuts for image segmentation
Smoothness potentials
Potts potentials
Robust PN
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
11. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Some terminology
We represent our segmentation problem as a graph: G = (V, E)
V is used for indexing random variables, and E is the set of
undirected edges representing compatibility relationships between
random variables.
X = {Xi} denotes the set of random variables or nodes, for i ∈ V.
An energy function will be defined over graphical configurations of
random variables.
By the Hammersley-Clifford theorem, the energy of a configuration
of x = {xi} can be written as the negative exponential of an
energy function E(x) = c∈C ϕc(xc), where ϕc is the potential
function of clique c ∈ C.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
12. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Consistency potentials for labeling problems
The energy function of G can be written as:
E(x) =
i∈V
φ(xi) +
(i,j)∈EL
ψL(xi, xj) +
(i,g)∈EG
ψG(xi, xg).
The unary term φ(xi) depends on a single probability P(Xi = xi|i),
where i is the observation that affects Xi in the model.
The smoothness potential ψL(xi, xj) determines the pairwise
relationship between two local nodes.
The consistency potential ψG(xi, xg) expresses the dependency
between local nodes and a global node.
And the Maximum a Posteriori (MAP) estimate of the optimal
labeling is:
x∗
= arg min
x
E(x).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
13. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Representing semantic segmentations
Each node represents an image region
Nodes take single label from the set of semantic categories
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
14. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Smoothness: only local constraints
Adds additional constraint on neighboring nodes
Usually enforces gradual (local) changes
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
15. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Potts: ψG(xi, xg) = γl
i T[xi = xg]
New node enforces global consistency among local labels
Consistency with a single global label [Plath, ICML2009]
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
16. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Robust PN
: consistency + “anything goes”
Free
Extends Potts potential [Kohili, CVPR2008]
“Free label” at global node allows any local combination
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
17. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
Motivation revisited
Blowing up the problem
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
18. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Different features for discriminations
The previously mentioned approaches all try to make global
distinctions using local information.
Either by voting of local observations (Potts).
Or, by penalizing rampantly discordant local label assignments
PN.
None of these techniques try to exploit truly global information to
constrain local labels.
And none incorporate the notion of encoding combinations of
primitive node labels at the global level.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
19. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
The harmony potential: symphony of semantics
Let L = {l1, . . . , lM} denote the set of semantic class labels from
which local nodes Xi, take their labels.
The global node Xg, instead, will take labels from P(L), the power
set of L.
In this way, we can represent any combinations of primitive labels
from L at the global node.
The harmony potential is now defined as:
ψG(xi, xg) = γl
i T[xi /∈ xg].
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
20. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
The harmony potential: selective subsets
Only labels that do not agree with subset are penalized.
Can represent more diverse combinations.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
21. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Potentials: the gory details
The unary potential of the local nodes is:
φL(xi) = −µLKiωL(xi) log P(Xi = xi|i),
where µL is the weighting factor of the local unary potential, Ki
normalizes over the number of pixels inside superpixel i, and
ωL(xi) is a learned per-class normalization.
P(Xi = xi|i) is the classification score given an observed
representation i of the region, which is based on a bag-of-words
built from features of superpixel i and those superpixels adjacent
to it.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
22. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
More potentials
The global unary potential is defined as:
φG(xg) = −µGωG(xg) log P(Xg = xg|g),
where µG is the weighting factor of the global unary potential, and
ωG(xg) is again a per-class normalization like the one used in the
local unary potential.
The main difference comes in the computation of P(Xg = xg|g),
which is the posterior:
P(Xg = xg|g) ∝ P(g|Xg = xg)P(Xg = xg).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
23. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Holy crap that’s a lot of labels!
We have turned a barely tractable optimization problem into a
(seemingly) spectacularly intractable one.
To optimize the energy function, we must optimize over 2|L|
possible global node labels.
If we had an analytic form for P( = x∗
g |O) we might be able to do
something.
We don’t. Instead, we will use the probability that a certain label
∈ P(L) appears in x∗, given all the observations O required by
the model.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
24. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Ranked subsampling of P(L)
We can do this using the following posterior:
P( ⊆ x∗
g |) ∝ P( ⊆ x∗
g )P(O| ⊆ x∗
g ).
This allows us to effectively rank possible global node labels, and
thus to prioritize candidates in the search for the optimal label x∗
g .
P( ⊆ x∗
g |O) establishes an order on subsets of the (unknown)
optimal labeling of the global node x∗
g that guides the
consideration of global labels.
We may not be able to exhaustively consider all labels in P(L), but
at least we consider the most likely candidates for x∗
g .
And image classification can give us an estimate of this posterior.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
25. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
26. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Datasets
We have evaluated the harmony potential approach on two
standard, publicly available datasets.
The Pascal VOC 2009 Segmentation Challenge dataset contains
2250 color images of 20 different semantic classes.
This set is split into 750 images for training, 750 images for
testing, and 750 for validation.
The Microsoft MSRC-21 dataset contains 591 color images of 21
object classes.
We do our own splits for cross-validation on MSRC-21.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
27. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Unsupervised segmentation
Images are first over-segmented to with quick-shift to derive
super-pixels [Fulkerson, ICCV 2009].
This preserves object boundaries while simplifying the
representation.
Working at the super-pixel level reduces the number of nodes in
the CRF by 102 to 105 per image.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
28. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Local classification scores: P(Xi = xi|Oi)
We extract patches with 50% overlap on a regular grid at several
resolutions (12, 24, 36 and 48 pixels in diameter).
Patches are described with SIFT, color and for MSCR-21 location
features.
A vocabulary is constructed using k-means to quantize to 1000
SIFT words and 400 color words.
An SVM classifier using an intersection kernel is built for each
semantic category.
A similar number of positive and negative examples are used:
around a total of 8.000 superpixel samples for MSCR-21, and
20.000 for VOC 2009 for each class.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
29. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Global classification scores: P(Xg = xg|Og)
For the Pascal 2009 dataset we use our entry to the 2009 VOC
Classification Challenge
[Khan, PAMI2010 (submitted)].
It uses a bag-of-words representation based on SIFT and color
SIFT, plus spatial pyramids and color attention
[Khan, ICCV 2009].
An SVM classifier with a χ2 kernel is trained for each semantic
category in the dataset.
SVM outputs are re-normalized to generate an estimate of the
global label: P(Xg = xg|Og).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
30. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
MAP inference
The optimal MAP label configuration x∗ is inferred using
α-expansion graph cuts [Kolmogorov, PAMI2004].
The global node uses the 100 most probable label subsets
obtained from ranked subsampling.
No significant improvements were observed by considering more
than 100 label subsets.
The average time to do MAP inference for an image in MSCR-21
is 0.24 seconds and in VOC 2009 is 0.32 seconds.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
31. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Cross-validation of CRF parameters
For MSCR-21 we learn the CRF parameters with a 5-fold
cross-validation of the union of training and validation sets.
If we only use the validation set of 59 images, we overfit to this
small set.
For VOC 2009, we used the available validation set to train CRF
parameters.
Since the background class always appears in combination with
other classes, we do not allow the harmony potential to apply any
penalization to the background class.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
32. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
33. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results (II)
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
34. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Quantitative results
Background
Aeroplane
Bicycle
Bird
Boat
Bottle
Bus
Car
Cat
Chair
BONN 83.9 64.3 21.8 21.7 32.0 40.2 57.3 49.4 38.8 5.2
BROOKES 79.6 48.3 6.7 19.1 10.0 16.6 32.7 38.1 25.3 5.5
Harmony potential 80.5 62.3 24.1 28.3 30.5 32.7 42.2 48.1 22.8 9.1
Cow
DinningTable
Dog
Horse
Motorbike
Person
PottedPlant
Sheep
Sofa
Train
TV/Monitor
Average
BONN 28.5 22.0 19.6 33.6 45.5 33.6 27.3 40.4 18.1 33.6 46.1 36.3
BROOKES 9.4 25.1 13.3 12.3 35.5 20.7 13.4 17.1 18.4 37.5 36.4 24.8
Harmony potential 30.1 7.9 21.5 41.9 49.6 31.5 26.1 37.0 20.1 39.4 31.1 34.1
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
35. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
36. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
Computational considerations
The future
Reflections
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
37. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
A modest cluster proposal
4 Dell R610i 1U Rack Servers
Each with: 2x Intel Xeon E5502 Quad Core CPUs
Each with: 24GB RAM
Each with: 4x Broadcom 10Gb Ethernet adapters
Each with: 1x 160GB 7.2K RPM Disk
Two units with: PERC 6/i SAS RAID Controller
One unit with: 5x 300GB 10K RPM Disk
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
38. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Organizing computations
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
39. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Some (mostly meaningless) numbers
Days of pascal challenge: 45
Seconds of computation: 3,888,000.00
Estimated GFLOPS: 307.2
Sustainded CPU utilization: 80%
Total GFLOP: 955,514,880.00
Images: 15,000
Pixels (assuming 640 × 480): 4,608,000,000.00
GFLOP/Image: 63,700.99
GFLOP/Pixel: 0.21
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
40. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Conclusions
The harmony potential works well for fusing global information into
local segmentations.
It works by modeling global observations as subsets of the local
label set.
Ranked sub-sampling, driven by the same posterior as used to
define the global potential function, renders the optimization
problem tractable.
The harmony potential gets state-of-the-art results are difficult,
publicly available datasets.
Most useful when multiple semantic classes co-occur frequently.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
41. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Prospectus
Semantic image segmentation has come a long way, but still has a
long way to go.
Segmentation will become mainstream event in Pascal VOC 2010
We have shown that combining global information with local can
be tractable and improves on state-of-the-art.
Currently, combining mid-level information is where the game is
being played.
Detection is probably the key.
We can also begin to think about what types of new applications
are enabled by such combinations.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
42. Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Final words
Semantic image segmentation is hard.
Participating in a competition like the Pascal VOC is very hard.
But, it brings many technologies and people and groups and ideas
together.
Xavier Pep Fahad
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential