SlideShare uma empresa Scribd logo
1 de 42
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
The harmony potential:
fusing local and global information for semantic image segmentation
Andrew D. Bagdanov
bagdanov@cvc.uab.es
Departamento de Ciencias de la Computación
Universidad Autónoma de Barcelona
CVPR 2010 (to appear)
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Outline
1 Introduction
Semantic image segmentation
Semantic categories
Our main idea
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Giving semantics to pixels
Image Object Class
Semantic image segmentation is not object segmentation
Only for simple cases are they the same
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Turning a hard problem into a harder one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Make that a very hard one
Image Object Class
The object is to assign semantic labels to every pixel
Fine distinctions must be made
Occlusions, varying viewpoint and size complicate things
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Semantic categories
20 semantic categories for Pascal
aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow,
diningtable, dog, horse, motorbike, person, potted plant, sheep,
sofa, train, and tv/monitor.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
SOA: Conditional Random Fields (CRFs)
One of the most successful approaches to image segmentation is
the Hierarchical CRF approach.
Using potential functions, information at different scales can be
incorporated into the segmentation.
We identify three levels of scale: local, mid-level and global [Zhu,
NIPS2008].
We show how these three levels of scale can be integrated in a
way that preserves their unique characteristics.
Existing techniques apply overly-simplified models of context that
do not generalize upward from local to global scales.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Semantic image segmentation
Semantic categories
Our main idea
Global constraints on label combinations
Our principal idea is to use global classification to enhance
segmentation results.
Global image classification results tend to be less noisy than ones.
We will use them to constrain the combinations of semantic labels
we are likely to encounter during segmentation.
We also show how the resulting optimization problem can be
made tractable by learning to efficiently subsample label
combinations at the global level.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Outline
1 Introduction
2 Graph cuts for image segmentation
Smoothness potentials
Potts potentials
Robust PN
3 The harmony potential
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Some terminology
We represent our segmentation problem as a graph: G = (V, E)
V is used for indexing random variables, and E is the set of
undirected edges representing compatibility relationships between
random variables.
X = {Xi} denotes the set of random variables or nodes, for i ∈ V.
An energy function will be defined over graphical configurations of
random variables.
By the Hammersley-Clifford theorem, the energy of a configuration
of x = {xi} can be written as the negative exponential of an
energy function E(x) = c∈C ϕc(xc), where ϕc is the potential
function of clique c ∈ C.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Consistency potentials for labeling problems
The energy function of G can be written as:
E(x) =
i∈V
φ(xi) +
(i,j)∈EL
ψL(xi, xj) +
(i,g)∈EG
ψG(xi, xg).
The unary term φ(xi) depends on a single probability P(Xi = xi|i),
where i is the observation that affects Xi in the model.
The smoothness potential ψL(xi, xj) determines the pairwise
relationship between two local nodes.
The consistency potential ψG(xi, xg) expresses the dependency
between local nodes and a global node.
And the Maximum a Posteriori (MAP) estimate of the optimal
labeling is:
x∗
= arg min
x
E(x).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Representing semantic segmentations
Each node represents an image region
Nodes take single label from the set of semantic categories
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Smoothness: only local constraints
Adds additional constraint on neighboring nodes
Usually enforces gradual (local) changes
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Potts: ψG(xi, xg) = γl
i T[xi = xg]
New node enforces global consistency among local labels
Consistency with a single global label [Plath, ICML2009]
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Smoothness potentials
Potts potentials
Robust PN
Robust PN
: consistency + “anything goes”
Free
Extends Potts potential [Kohili, CVPR2008]
“Free label” at global node allows any local combination
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
Motivation revisited
Blowing up the problem
4 Experimental results
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Different features for discriminations
The previously mentioned approaches all try to make global
distinctions using local information.
Either by voting of local observations (Potts).
Or, by penalizing rampantly discordant local label assignments
PN.
None of these techniques try to exploit truly global information to
constrain local labels.
And none incorporate the notion of encoding combinations of
primitive node labels at the global level.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
The harmony potential: symphony of semantics
Let L = {l1, . . . , lM} denote the set of semantic class labels from
which local nodes Xi, take their labels.
The global node Xg, instead, will take labels from P(L), the power
set of L.
In this way, we can represent any combinations of primitive labels
from L at the global node.
The harmony potential is now defined as:
ψG(xi, xg) = γl
i T[xi /∈ xg].
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
The harmony potential: selective subsets
Only labels that do not agree with subset are penalized.
Can represent more diverse combinations.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Potentials: the gory details
The unary potential of the local nodes is:
φL(xi) = −µLKiωL(xi) log P(Xi = xi|i),
where µL is the weighting factor of the local unary potential, Ki
normalizes over the number of pixels inside superpixel i, and
ωL(xi) is a learned per-class normalization.
P(Xi = xi|i) is the classification score given an observed
representation i of the region, which is based on a bag-of-words
built from features of superpixel i and those superpixels adjacent
to it.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
More potentials
The global unary potential is defined as:
φG(xg) = −µGωG(xg) log P(Xg = xg|g),
where µG is the weighting factor of the global unary potential, and
ωG(xg) is again a per-class normalization like the one used in the
local unary potential.
The main difference comes in the computation of P(Xg = xg|g),
which is the posterior:
P(Xg = xg|g) ∝ P(g|Xg = xg)P(Xg = xg).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Holy crap that’s a lot of labels!
We have turned a barely tractable optimization problem into a
(seemingly) spectacularly intractable one.
To optimize the energy function, we must optimize over 2|L|
possible global node labels.
If we had an analytic form for P( = x∗
g |O) we might be able to do
something.
We don’t. Instead, we will use the probability that a certain label
∈ P(L) appears in x∗, given all the observations O required by
the model.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Motivation revisited
Blowing up the problem
Ranked subsampling of P(L)
We can do this using the following posterior:
P( ⊆ x∗
g |) ∝ P( ⊆ x∗
g )P(O| ⊆ x∗
g ).
This allows us to effectively rank possible global node labels, and
thus to prioritize candidates in the search for the optimal label x∗
g .
P( ⊆ x∗
g |O) establishes an order on subsets of the (unknown)
optimal labeling of the global node x∗
g that guides the
consideration of global labels.
We may not be able to exhaustively consider all labels in P(L), but
at least we consider the most likely candidates for x∗
g .
And image classification can give us an estimate of this posterior.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
5 Discussion
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Datasets
We have evaluated the harmony potential approach on two
standard, publicly available datasets.
The Pascal VOC 2009 Segmentation Challenge dataset contains
2250 color images of 20 different semantic classes.
This set is split into 750 images for training, 750 images for
testing, and 750 for validation.
The Microsoft MSRC-21 dataset contains 591 color images of 21
object classes.
We do our own splits for cross-validation on MSRC-21.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Unsupervised segmentation
Images are first over-segmented to with quick-shift to derive
super-pixels [Fulkerson, ICCV 2009].
This preserves object boundaries while simplifying the
representation.
Working at the super-pixel level reduces the number of nodes in
the CRF by 102 to 105 per image.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Local classification scores: P(Xi = xi|Oi)
We extract patches with 50% overlap on a regular grid at several
resolutions (12, 24, 36 and 48 pixels in diameter).
Patches are described with SIFT, color and for MSCR-21 location
features.
A vocabulary is constructed using k-means to quantize to 1000
SIFT words and 400 color words.
An SVM classifier using an intersection kernel is built for each
semantic category.
A similar number of positive and negative examples are used:
around a total of 8.000 superpixel samples for MSCR-21, and
20.000 for VOC 2009 for each class.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Global classification scores: P(Xg = xg|Og)
For the Pascal 2009 dataset we use our entry to the 2009 VOC
Classification Challenge
[Khan, PAMI2010 (submitted)].
It uses a bag-of-words representation based on SIFT and color
SIFT, plus spatial pyramids and color attention
[Khan, ICCV 2009].
An SVM classifier with a χ2 kernel is trained for each semantic
category in the dataset.
SVM outputs are re-normalized to generate an estimate of the
global label: P(Xg = xg|Og).
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
MAP inference
The optimal MAP label configuration x∗ is inferred using
α-expansion graph cuts [Kolmogorov, PAMI2004].
The global node uses the 100 most probable label subsets
obtained from ranked subsampling.
No significant improvements were observed by considering more
than 100 label subsets.
The average time to do MAP inference for an image in MSCR-21
is 0.24 seconds and in VOC 2009 is 0.32 seconds.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Cross-validation of CRF parameters
For MSCR-21 we learn the CRF parameters with a 5-fold
cross-validation of the union of training and validation sets.
If we only use the validation set of 59 images, we overfit to this
small set.
For VOC 2009, we used the available validation set to train CRF
parameters.
Since the background class always appears in combination with
other classes, we do not allow the harmony potential to apply any
penalization to the background class.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results (II)
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Quantitative results
Background
Aeroplane
Bicycle
Bird
Boat
Bottle
Bus
Car
Cat
Chair
BONN 83.9 64.3 21.8 21.7 32.0 40.2 57.3 49.4 38.8 5.2
BROOKES 79.6 48.3 6.7 19.1 10.0 16.6 32.7 38.1 25.3 5.5
Harmony potential 80.5 62.3 24.1 28.3 30.5 32.7 42.2 48.1 22.8 9.1
Cow
DinningTable
Dog
Horse
Motorbike
Person
PottedPlant
Sheep
Sofa
Train
TV/Monitor
Average
BONN 28.5 22.0 19.6 33.6 45.5 33.6 27.3 40.4 18.1 33.6 46.1 36.3
BROOKES 9.4 25.1 13.3 12.3 35.5 20.7 13.4 17.1 18.4 37.5 36.4 24.8
Harmony potential 30.1 7.9 21.5 41.9 49.6 31.5 26.1 37.0 20.1 39.4 31.1 34.1
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Datasets and implementation
Results: Pascal VOC 2009
Results: MSRC-21
Qualitative results
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Outline
1 Introduction
2 Graph cuts for image segmentation
3 The harmony potential
4 Experimental results
5 Discussion
Computational considerations
The future
Reflections
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
A modest cluster proposal
4 Dell R610i 1U Rack Servers
Each with: 2x Intel Xeon E5502 Quad Core CPUs
Each with: 24GB RAM
Each with: 4x Broadcom 10Gb Ethernet adapters
Each with: 1x 160GB 7.2K RPM Disk
Two units with: PERC 6/i SAS RAID Controller
One unit with: 5x 300GB 10K RPM Disk
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Organizing computations
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Some (mostly meaningless) numbers
Days of pascal challenge: 45
Seconds of computation: 3,888,000.00
Estimated GFLOPS: 307.2
Sustainded CPU utilization: 80%
Total GFLOP: 955,514,880.00
Images: 15,000
Pixels (assuming 640 × 480): 4,608,000,000.00
GFLOP/Image: 63,700.99
GFLOP/Pixel: 0.21
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Conclusions
The harmony potential works well for fusing global information into
local segmentations.
It works by modeling global observations as subsets of the local
label set.
Ranked sub-sampling, driven by the same posterior as used to
define the global potential function, renders the optimization
problem tractable.
The harmony potential gets state-of-the-art results are difficult,
publicly available datasets.
Most useful when multiple semantic classes co-occur frequently.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Prospectus
Semantic image segmentation has come a long way, but still has a
long way to go.
Segmentation will become mainstream event in Pascal VOC 2010
We have shown that combining global information with local can
be tractable and improves on state-of-the-art.
Currently, combining mid-level information is where the game is
being played.
Detection is probably the key.
We can also begin to think about what types of new applications
are enabled by such combinations.
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
Introduction
Graph cuts for image segmentation
The harmony potential
Experimental results
Discussion
Computational considerations
The future
Reflections
Final words
Semantic image segmentation is hard.
Participating in a competition like the Pascal VOC is very hard.
But, it brings many technologies and people and groups and ideas
together.
Xavier Pep Fahad
J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential

Mais conteúdo relacionado

Semelhante a The harmony potential: fusing local and global information for semantic image segmentation

Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
rafi
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
NAVER Engineering
 
Visual thinking colin_ware_lectures_2013_4_patterns
Visual thinking colin_ware_lectures_2013_4_patternsVisual thinking colin_ware_lectures_2013_4_patterns
Visual thinking colin_ware_lectures_2013_4_patterns
Elsa von Licy
 
Fcv scene hebert
Fcv scene hebertFcv scene hebert
Fcv scene hebert
zukun
 
Neuronal Detection using Persistent Homology
Neuronal Detection using Persistent HomologyNeuronal Detection using Persistent Homology
Neuronal Detection using Persistent Homology
Universidad de La Rioja
 
얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법
cyberemotion
 
얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법
cyberemotions
 
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video TexturesMontage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
Ruofei Du
 

Semelhante a The harmony potential: fusing local and global information for semantic image segmentation (20)

Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
 
David Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIDavid Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AI
 
Yolos you only look one sequence
Yolos you only look one sequenceYolos you only look one sequence
Yolos you only look one sequence
 
Small world effect
Small world effectSmall world effect
Small world effect
 
Sparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggmSparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggm
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
Zif bolker_w2
Zif bolker_w2Zif bolker_w2
Zif bolker_w2
 
Lecture10 xing
Lecture10 xingLecture10 xing
Lecture10 xing
 
20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs
 
On Meme Self-Adaptation in Spatially-Structured Multimemetic Algorithms
On Meme Self-Adaptation in Spatially-Structured Multimemetic AlgorithmsOn Meme Self-Adaptation in Spatially-Structured Multimemetic Algorithms
On Meme Self-Adaptation in Spatially-Structured Multimemetic Algorithms
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
Visual thinking colin_ware_lectures_2013_4_patterns
Visual thinking colin_ware_lectures_2013_4_patternsVisual thinking colin_ware_lectures_2013_4_patterns
Visual thinking colin_ware_lectures_2013_4_patterns
 
Fcv scene hebert
Fcv scene hebertFcv scene hebert
Fcv scene hebert
 
Neuronal Detection using Persistent Homology
Neuronal Detection using Persistent HomologyNeuronal Detection using Persistent Homology
Neuronal Detection using Persistent Homology
 
얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법얼굴 검출 기법과 감성 언어 인식기법
얼굴 검출 기법과 감성 언어 인식기법
 
얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법얼굴검출기법 감성언어인식기법
얼굴검출기법 감성언어인식기법
 
06 mlp
06 mlp06 mlp
06 mlp
 
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video TexturesMontage4D: Interactive Seamless Fusion of Multiview Video Textures
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
 

Mais de Media Integration and Communication Center

Mais de Media Integration and Communication Center (18)

ORUSSI: Optimal Road sUrveillance based on Scalable vIdeo
ORUSSI: Optimal Road sUrveillance based on Scalable vIdeoORUSSI: Optimal Road sUrveillance based on Scalable vIdeo
ORUSSI: Optimal Road sUrveillance based on Scalable vIdeo
 
Icme2011 demo poster
Icme2011 demo posterIcme2011 demo poster
Icme2011 demo poster
 
Icme2011 industrial poster
Icme2011 industrial posterIcme2011 industrial poster
Icme2011 industrial poster
 
High dynamic images between devices and vision limits
High dynamic images between devices and vision limitsHigh dynamic images between devices and vision limits
High dynamic images between devices and vision limits
 
Interactive Video Search and Browsing Systems
Interactive Video Search and Browsing SystemsInteractive Video Search and Browsing Systems
Interactive Video Search and Browsing Systems
 
Interactive Video Search and Browsing Systems
Interactive Video Search and Browsing SystemsInteractive Video Search and Browsing Systems
Interactive Video Search and Browsing Systems
 
Danthe. Digital and Tuscan heritage
Danthe. Digital and Tuscan heritageDanthe. Digital and Tuscan heritage
Danthe. Digital and Tuscan heritage
 
IM3I Presentation
IM3I PresentationIM3I Presentation
IM3I Presentation
 
IM3I flyer
IM3I flyerIM3I flyer
IM3I flyer
 
IM3I brochure
IM3I brochureIM3I brochure
IM3I brochure
 
IM3I flyer
IM3I flyerIM3I flyer
IM3I flyer
 
PASCAL VOC 2010: semantic object segmentation and action recognition in still...
PASCAL VOC 2010: semantic object segmentation and action recognition in still...PASCAL VOC 2010: semantic object segmentation and action recognition in still...
PASCAL VOC 2010: semantic object segmentation and action recognition in still...
 
MediaPick
MediaPickMediaPick
MediaPick
 
Andromeda
AndromedaAndromeda
Andromeda
 
Sirio, Orione and Pan
Sirio, Orione and PanSirio, Orione and Pan
Sirio, Orione and Pan
 
Vidivideo and IM3I
Vidivideo and IM3IVidivideo and IM3I
Vidivideo and IM3I
 
Ircdl damico del-bimbo-meoni
Ircdl damico del-bimbo-meoniIrcdl damico del-bimbo-meoni
Ircdl damico del-bimbo-meoni
 
Accurate Evaluation of HER-2 Ampli cation in FISH Images Poster at Internatio...
Accurate Evaluation of HER-2 Amplication in FISH Images Poster at Internatio...Accurate Evaluation of HER-2 Amplication in FISH Images Poster at Internatio...
Accurate Evaluation of HER-2 Ampli cation in FISH Images Poster at Internatio...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

The harmony potential: fusing local and global information for semantic image segmentation

  • 1. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion The harmony potential: fusing local and global information for semantic image segmentation Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento de Ciencias de la Computación Universidad Autónoma de Barcelona CVPR 2010 (to appear) J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 2. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 3. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Outline 1 Introduction Semantic image segmentation Semantic categories Our main idea 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 4. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Giving semantics to pixels Image Object Class Semantic image segmentation is not object segmentation Only for simple cases are they the same J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 5. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Turning a hard problem into a harder one Image Object Class The object is to assign semantic labels to every pixel Fine distinctions must be made J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 6. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Make that a very hard one Image Object Class The object is to assign semantic labels to every pixel Fine distinctions must be made Occlusions, varying viewpoint and size complicate things J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 7. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Semantic categories 20 semantic categories for Pascal aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv/monitor. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 8. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea SOA: Conditional Random Fields (CRFs) One of the most successful approaches to image segmentation is the Hierarchical CRF approach. Using potential functions, information at different scales can be incorporated into the segmentation. We identify three levels of scale: local, mid-level and global [Zhu, NIPS2008]. We show how these three levels of scale can be integrated in a way that preserves their unique characteristics. Existing techniques apply overly-simplified models of context that do not generalize upward from local to global scales. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 9. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Global constraints on label combinations Our principal idea is to use global classification to enhance segmentation results. Global image classification results tend to be less noisy than ones. We will use them to constrain the combinations of semantic labels we are likely to encounter during segmentation. We also show how the resulting optimization problem can be made tractable by learning to efficiently subsample label combinations at the global level. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 10. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Outline 1 Introduction 2 Graph cuts for image segmentation Smoothness potentials Potts potentials Robust PN 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 11. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Some terminology We represent our segmentation problem as a graph: G = (V, E) V is used for indexing random variables, and E is the set of undirected edges representing compatibility relationships between random variables. X = {Xi} denotes the set of random variables or nodes, for i ∈ V. An energy function will be defined over graphical configurations of random variables. By the Hammersley-Clifford theorem, the energy of a configuration of x = {xi} can be written as the negative exponential of an energy function E(x) = c∈C ϕc(xc), where ϕc is the potential function of clique c ∈ C. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 12. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Consistency potentials for labeling problems The energy function of G can be written as: E(x) = i∈V φ(xi) + (i,j)∈EL ψL(xi, xj) + (i,g)∈EG ψG(xi, xg). The unary term φ(xi) depends on a single probability P(Xi = xi|i), where i is the observation that affects Xi in the model. The smoothness potential ψL(xi, xj) determines the pairwise relationship between two local nodes. The consistency potential ψG(xi, xg) expresses the dependency between local nodes and a global node. And the Maximum a Posteriori (MAP) estimate of the optimal labeling is: x∗ = arg min x E(x). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 13. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Representing semantic segmentations Each node represents an image region Nodes take single label from the set of semantic categories J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 14. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Smoothness: only local constraints Adds additional constraint on neighboring nodes Usually enforces gradual (local) changes J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 15. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Potts: ψG(xi, xg) = γl i T[xi = xg] New node enforces global consistency among local labels Consistency with a single global label [Plath, ICML2009] J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 16. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Robust PN : consistency + “anything goes” Free Extends Potts potential [Kohili, CVPR2008] “Free label” at global node allows any local combination J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 17. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential Motivation revisited Blowing up the problem 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 18. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Different features for discriminations The previously mentioned approaches all try to make global distinctions using local information. Either by voting of local observations (Potts). Or, by penalizing rampantly discordant local label assignments PN. None of these techniques try to exploit truly global information to constrain local labels. And none incorporate the notion of encoding combinations of primitive node labels at the global level. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 19. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem The harmony potential: symphony of semantics Let L = {l1, . . . , lM} denote the set of semantic class labels from which local nodes Xi, take their labels. The global node Xg, instead, will take labels from P(L), the power set of L. In this way, we can represent any combinations of primitive labels from L at the global node. The harmony potential is now defined as: ψG(xi, xg) = γl i T[xi /∈ xg]. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 20. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem The harmony potential: selective subsets Only labels that do not agree with subset are penalized. Can represent more diverse combinations. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 21. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Potentials: the gory details The unary potential of the local nodes is: φL(xi) = −µLKiωL(xi) log P(Xi = xi|i), where µL is the weighting factor of the local unary potential, Ki normalizes over the number of pixels inside superpixel i, and ωL(xi) is a learned per-class normalization. P(Xi = xi|i) is the classification score given an observed representation i of the region, which is based on a bag-of-words built from features of superpixel i and those superpixels adjacent to it. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 22. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem More potentials The global unary potential is defined as: φG(xg) = −µGωG(xg) log P(Xg = xg|g), where µG is the weighting factor of the global unary potential, and ωG(xg) is again a per-class normalization like the one used in the local unary potential. The main difference comes in the computation of P(Xg = xg|g), which is the posterior: P(Xg = xg|g) ∝ P(g|Xg = xg)P(Xg = xg). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 23. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Holy crap that’s a lot of labels! We have turned a barely tractable optimization problem into a (seemingly) spectacularly intractable one. To optimize the energy function, we must optimize over 2|L| possible global node labels. If we had an analytic form for P( = x∗ g |O) we might be able to do something. We don’t. Instead, we will use the probability that a certain label ∈ P(L) appears in x∗, given all the observations O required by the model. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 24. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Ranked subsampling of P(L) We can do this using the following posterior: P( ⊆ x∗ g |) ∝ P( ⊆ x∗ g )P(O| ⊆ x∗ g ). This allows us to effectively rank possible global node labels, and thus to prioritize candidates in the search for the optimal label x∗ g . P( ⊆ x∗ g |O) establishes an order on subsets of the (unknown) optimal labeling of the global node x∗ g that guides the consideration of global labels. We may not be able to exhaustively consider all labels in P(L), but at least we consider the most likely candidates for x∗ g . And image classification can give us an estimate of this posterior. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 25. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 26. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Datasets We have evaluated the harmony potential approach on two standard, publicly available datasets. The Pascal VOC 2009 Segmentation Challenge dataset contains 2250 color images of 20 different semantic classes. This set is split into 750 images for training, 750 images for testing, and 750 for validation. The Microsoft MSRC-21 dataset contains 591 color images of 21 object classes. We do our own splits for cross-validation on MSRC-21. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 27. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Unsupervised segmentation Images are first over-segmented to with quick-shift to derive super-pixels [Fulkerson, ICCV 2009]. This preserves object boundaries while simplifying the representation. Working at the super-pixel level reduces the number of nodes in the CRF by 102 to 105 per image. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 28. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Local classification scores: P(Xi = xi|Oi) We extract patches with 50% overlap on a regular grid at several resolutions (12, 24, 36 and 48 pixels in diameter). Patches are described with SIFT, color and for MSCR-21 location features. A vocabulary is constructed using k-means to quantize to 1000 SIFT words and 400 color words. An SVM classifier using an intersection kernel is built for each semantic category. A similar number of positive and negative examples are used: around a total of 8.000 superpixel samples for MSCR-21, and 20.000 for VOC 2009 for each class. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 29. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Global classification scores: P(Xg = xg|Og) For the Pascal 2009 dataset we use our entry to the 2009 VOC Classification Challenge [Khan, PAMI2010 (submitted)]. It uses a bag-of-words representation based on SIFT and color SIFT, plus spatial pyramids and color attention [Khan, ICCV 2009]. An SVM classifier with a χ2 kernel is trained for each semantic category in the dataset. SVM outputs are re-normalized to generate an estimate of the global label: P(Xg = xg|Og). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 30. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 MAP inference The optimal MAP label configuration x∗ is inferred using α-expansion graph cuts [Kolmogorov, PAMI2004]. The global node uses the 100 most probable label subsets obtained from ranked subsampling. No significant improvements were observed by considering more than 100 label subsets. The average time to do MAP inference for an image in MSCR-21 is 0.24 seconds and in VOC 2009 is 0.32 seconds. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 31. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Cross-validation of CRF parameters For MSCR-21 we learn the CRF parameters with a 5-fold cross-validation of the union of training and validation sets. If we only use the validation set of 59 images, we overfit to this small set. For VOC 2009, we used the available validation set to train CRF parameters. Since the background class always appears in combination with other classes, we do not allow the harmony potential to apply any penalization to the background class. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 32. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 33. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results (II) J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 34. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Quantitative results Background Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair BONN 83.9 64.3 21.8 21.7 32.0 40.2 57.3 49.4 38.8 5.2 BROOKES 79.6 48.3 6.7 19.1 10.0 16.6 32.7 38.1 25.3 5.5 Harmony potential 80.5 62.3 24.1 28.3 30.5 32.7 42.2 48.1 22.8 9.1 Cow DinningTable Dog Horse Motorbike Person PottedPlant Sheep Sofa Train TV/Monitor Average BONN 28.5 22.0 19.6 33.6 45.5 33.6 27.3 40.4 18.1 33.6 46.1 36.3 BROOKES 9.4 25.1 13.3 12.3 35.5 20.7 13.4 17.1 18.4 37.5 36.4 24.8 Harmony potential 30.1 7.9 21.5 41.9 49.6 31.5 26.1 37.0 20.1 39.4 31.1 34.1 J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 35. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 36. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion Computational considerations The future Reflections J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 37. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections A modest cluster proposal 4 Dell R610i 1U Rack Servers Each with: 2x Intel Xeon E5502 Quad Core CPUs Each with: 24GB RAM Each with: 4x Broadcom 10Gb Ethernet adapters Each with: 1x 160GB 7.2K RPM Disk Two units with: PERC 6/i SAS RAID Controller One unit with: 5x 300GB 10K RPM Disk J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 38. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Organizing computations J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 39. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Some (mostly meaningless) numbers Days of pascal challenge: 45 Seconds of computation: 3,888,000.00 Estimated GFLOPS: 307.2 Sustainded CPU utilization: 80% Total GFLOP: 955,514,880.00 Images: 15,000 Pixels (assuming 640 × 480): 4,608,000,000.00 GFLOP/Image: 63,700.99 GFLOP/Pixel: 0.21 J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 40. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Conclusions The harmony potential works well for fusing global information into local segmentations. It works by modeling global observations as subsets of the local label set. Ranked sub-sampling, driven by the same posterior as used to define the global potential function, renders the optimization problem tractable. The harmony potential gets state-of-the-art results are difficult, publicly available datasets. Most useful when multiple semantic classes co-occur frequently. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 41. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Prospectus Semantic image segmentation has come a long way, but still has a long way to go. Segmentation will become mainstream event in Pascal VOC 2010 We have shown that combining global information with local can be tractable and improves on state-of-the-art. Currently, combining mid-level information is where the game is being played. Detection is probably the key. We can also begin to think about what types of new applications are enabled by such combinations. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 42. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Final words Semantic image segmentation is hard. Participating in a competition like the Pascal VOC is very hard. But, it brings many technologies and people and groups and ideas together. Xavier Pep Fahad J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential