Mixture Models for Image Analysis

Mixture Models for Image Analysis

Aristidis Likas & Christophoros Nikou

IPAN Research Group
Department of Computer Science
University of Ioannina

Collaborators:
Nikolaos Galatsanos, Professor

Konstantinos Blekas, Assistant Professor

Dr. Costas Constantinopoulos, Researcher

George Sfikas, Ph.d Candidate
Demetrios Gerogiannis, Ph.d Candidate

Outline
• Mixture Models and EM (GMM, SMM)
• Bayesian GMMs
• Image segmentation using mixture models
– Incremental Bayesian GMMs
– Spatially varying GMMs (SVMMs) with MRF priors
– SVMMs and line processes
• Image registration using mixture models

Mixture Models
• Probability density estimation: estimate the density function
model f(x) that generated a given dataset X={x1,…, xN}
• Mixture Models
M M
f ( x)   
j 1
j j ( x;  j )  j  0, 
j 1
j 1

– M pdf components φj(x),
– mixing weights: π1, π2, …, πM (priors)

• Gaussian Mixture Model (GMM): φj = N(μj, Σj)

GMM (graphical model)

πj

Hidden variable
observation

GMM examples

GMMs be used for density estimation (like histograms) or clustering
 j j ( x n ; j ) Cluster
P( j | x ) 
n
n
 z j 
n
memberhsip
f (x )
6 probability

Mixture Model training
• Given a dataset X={x1,…, xN} and a GMM f (x;Θ)

p( X ; )  p( x1 ,..., xN ; )  i 1 f ( xi ; )
N
• Likelihood:

• GMM training: log-likelihood maximization
N
  arg max  ln p( xi ; )


i 1

• Expectation-maximization (EM) algorithm
– Applicable when posterior P(Z|X) can be computed

EM for Mixture Models
• E-step: compute expectation of hidden
variables given the observations:
 j ( xn |  j )
P( j | x n )  z n 
j K

  j ( xn |  p )
p 1

• M-step: maximize expected complete likelihood
 ( t 1)
 arg max  Q(Θ)   log p( X , Z ;Θ)P( Z |X )

Q ( )    z n  log  j  log  ( x n |  j )
N K

j
n 1 j 1

EM for GMM (M-step)

n1 j
N
 zn  xn
Mean  (jt 1) 

N
n 1
z  n
j

n1 j
 z n  ( x n   (jt 1) )( x n   (jt 1) )T
N

Covariance (jt 1) 
n1 j
N
 zn 

n1 j
N
 zn 
Mixing weights  ( t 1)
j 
N

Student's t-distribution
   d  1/ 2
 ||
St ( x;  , , )   2 
d / 2  
1
(  d )
( )    1  ( x   )  ( x   ) / 
 1 2
 2  

 Mean μ
 Covariance matrix Σ
 Degrees of freedom v

 Bell-shaped + heavy-tailed (depending on v)
 Tends to Gaussian for large v

The Student's t-distribution

u; ~ Gamma( / 2, / 2)
x | u; ,  ~ ( ,  / u)

 Hierarchical distribution
 x follows a Gaussian distribution whose covariance is scaled
by a factor following a Gamma distribution.
 ML parameter estimation using the EM algorithm
(u is considered as hidden variable).

SMM: Student's t Mixture Models
 Each component j follows St(μj, Σj, vj) (robust mixture)

 Parameter estimation using EM
hidden variables: uj and zj
 j ( xn |  j )
 E-step:  z n 
j K

  p ( xn |  p )
p 1

v (jt )  d
 u 
n

v  x     
( t ) 1
 x n   (jt ) 
j
(t ) n (t ) T
j j j

SMM training
• M-step
n1 j
N
 u n  z n  x n
Mean  (jt 1)  j

n1 j
N
 u n  z n 
j

n1 j
 u n  z n  ( x n   (jt 1) )( x n   (jt 1) )T
N

(jt 1) 
j
Covariance

N
n 1
 u n  z n 
j j

n1 j
N
 u n  z n 
 (jt 1) 
j
Mixing
N
proportion

EM for SMM
• M-step
Degrees of freedom: no closed form update
 v (jt 1)   v (jt 1)   v (jt 1)  d 
log 
 2   
  2   1  log 
  

     2 

 z n (t )  log  u n (t )   u n (t ) 
n1 j
N
 v(jt 1)  d 
   0
j j
 
n1 j
N
 z n ( t )  2 

Mixture model training issues
• EM local maxima (dependence on initialization)

• Covariance Singularities

• How to select the number of components

• SMM vs GMM
• Better results for data with outliers (robustness)
• Higher dependence on initialization (how to
initialize vi ?)

Bayesian GMM
M M
f ( x )    j j ( x;  j ,  j ) 
j 1
j 1
j 1

Typical approach: Priors on all GMM parameters

 j : N (m, S ), p(  )   p( j )
j 1


Tj : Wishart (v,V ), p(T )   p(T j ), T j  1
j
j 1

  ( 1 ,...,   ) : Dirichlet (a1 ,..., aM )

Bayesian GMM training
• Parameters Θ become (hidden) RVs: H={Z, Θ}
• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)
• Approximations
• Sampling (RJMCMC)
• MAP approach
• Variational approach
• MAP approximation
• mode of the posterior P(Θ|Χ) (MAP-EM)
MAP  arg max {log P( X | )  log P( )}

• compute P(Z|X,ΘMAP)

Variational Inference (no parameters)
• Computes approximation q(H) of the true posterior P(H|X)
• For any pdf q(H): ln p  X   F  q   KL  q  H  P  H | X  
• Variational Bound (F) maximization
pX,H 
q*  arg max q F  q   arg max q  q  H  ln dH
qH 

• Mean field approximation qH    qH k 
k

exp ln p  X , H ;  
 
q  H ;  
k q H k

 exp ln p  X , H ;   dH k

q H k 
• System of equations

D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008

Variational Inference (with parameters)
• X data, H hidden RVs, Θ parameters
• For any pdf q(H;Θ):

ln p  X ;   F  q,   KL q  H ;  p  H | X ;  
• Maximization of Variational Bound F
p  X , H ; 
F  q,    q  H ;  ln dH  ln p  X ; 
q  H ; 
• Variational EM
• VE-Step: q  arg max F q, 
old
q
 
• VM-Step:   arg max F qold , 

 

Bayesian GMM training

• Bayesian GMMs (no parameters)
• mean field variational approximation
• tackles the covariance singularity problem
• requires to specify the parameters of the priors
• Estimating the number of components:
• Start with a large number of components
• Let the training process prune redundant components (πj=0)
• Dirichlet prior on πj prevents component prunning

Bayesian GMM without prior on π
• Mixing weights πj are parameters (remove Dirichlet prior)
• Training using Variational EM

Method (C-B)
• Start with a large number of components
• Perform variational maximization of the marginal likelihood
• Prunning of redundant components (πj=0)
• Only components that fit well to the data are finally retained

Bayesian GMM (C-B)

• C-B method: Results depend on
• the number of initial components
• initialization of components
• specification of the scale matrix V of the Wishart prior p(T)

Incremental Bayesian GMM
• Solution: incremental training using component splitting
• Local scale matrix V: based on the variance of the component to be
splitted

• Modification of the Bayesian GMM is needed
• Divide the components as ‘fixed’ or ‘free’
• Prior on the weights of ‘fixed’ components (retained)
• No prior on the weights of ‘free’ components (may be
eliminated)
• Prunning restricted among ‘free’ components
C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007

• Start with k=1 component.
•At each step:
• select a component j
• split component j in two subcomponents
• set the scale matrix V analogous to Σj
• apply Variational EM considering the two subcomponents
as free and the rest components as fixed
• either the two components will be retained and adjusted
• or one of them will be eliminated and the other one will
recover the original component (before split)
• until all components have been tested for split unsuccessfully

Mixture Models for Image Modeling
• Select a feature representation
• Compute a feature vector per pixel to form the training set
• Build a mixture model for the image using the training set

•Applications
• Image retrieval + relevance feedback
• Image segmentation
• Image registration

Mixture Models for Image Segmentation

• One cluster per mixture component.
• Assign pixels to clusters based on P(j|x)
• Take into account spatial smoothness:
neighbouring pixels are expected to have the same label
• Simple way: add pixel coordinates to the feature vector
• Bayesian way: impose MRF priors (SVMM)

Image segmentation

Number of segments determined automatically

Spatially Varying mixtures (1)

K
f ( x n | Π, Θ)    n  ( x n |  j )
j
n  1,2,...,N
j 1

x n Image feature (e.g. pixel intensity)

 n
j
Contextual mixing proportions

(x |  j )
n
Gaussian parameterized by  j  { j ,  j }
n
z j Data Label, hidden variable

Spatially Varying mixtures (2)
Insight into the contextual mixing proportions:

  p( z  1| x )
n
j
n
j
n

Smoothness is enforced in the image by imposing
a prior p(Π) on the probability of the pixel labels
(contextual mixing proportions).
N
L(Π | Χ, Θ)   log f ( x n | Π, Θ)  log p( Π)
n 1

SV-GMM with Gibbs prior (1)

• A typical constraint is the Gibbs prior:

1 U (Π ) N
p(Π)  e , U (Π )    VN i (Π ),
Z i 1

Ni 2

 
K
VNi (Π)      i
j
m
j , Smoothness weight
j 1 m 1

[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur.
Net., 2005]

• E-step: equivalent with GMM.
• M-step: the contextual mixing proportions are
solutions to a quadratic equation.

• Note that:
1) Parameter β of the Gibbs prior must be determined
beforehand.
2) The contextual mixing proportions are not
constraint to be probability vectors:
K
0   n  1,
j  j
 n  1, n  1, 2,..., N
j 1


To address these issues:
1) Class adaptive Gauss-Markov random field
prior.

2) Projection of probabilities to the hyper-plane
(another solution will be presented later on):

K

 j
 n  1, n  1, 2,..., N
j 1

SV-GMM with Gauss-Markov prior (2)
• One variance per cluster j=1,2,…,K per direction
d=0, 45, 90, 135 degrees

 N
m 2
D K   m ( j   j ) 
n

p(Π )    j ,d exp
N   1 n 1 Nn 
d 1 j 1  2  j2,d 
 
 

[C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]

SV-GMM with Gauss-Markov prior (3)

MAP estimation
 Posterior probabilities are the non-negative
solutions of the second degree equation:
  D D 
Q D D D
 0  | N n |    j2,d  ( n ) 2      j2,d   m  ( n )  z ij   j2,d  0
 n  p 1 d 1
 j  p 1 d 1 mN n
p
j
 j d 1
j

 d p 
  dp
 


 There is always a non-negative solution.

K
 Projection to the hyperplane:  j
 n  1, n  1, 2,..., N
j 1

RGB image segmentation (1)

Original image R-SNR = 2 dB
G-SNR = 4 dB
B-SNR = 3 dB


Noise-free image segmentation

SVFMM CA-SVFMM


Degraded image segmentation

SVFMM CA-SVFMM
(β determined by trial
and error)


Shading effect on cupola and wall
modeled with SVFMM with a GMRF
prior.
βj x10-3
Cupola 128
Sky 33
Wall 119

SV-GMM with DCM prior (1)
For pixel n, the class label is a random variable multinomially
distributed:
!
  jn  ,  jn  0,
K zn K
p( z n |  n )    jn  1, n  1,..., N ,
j

K

 z ij
j 1
j 1 j 1

   ,  ,...,  
T
parameterized by probability vector n
1
n n
2
n
K .

The whole image is parameterized by

Ξ   1
 ,  

2

 
,...,  


!
  jn  ,  jn  0,
K zn K
p( z n |  n )    jn  1, n  1,..., N ,
j

K

 z ij
j 1
j 1 j 1

Generative model for the image
• Multinomial distribution: K possile outcomes.
• Class label j, (j=1…K) appears with probability ξj .
• M realizations of the process.
• The distribution of the counts of a certain class is
binomial.


• The Dirichlet distribution forms the conjugate
prior for the multinomial distribution.
– The posterior p( | x) has the same functional form as
the prior p( ) .

p( x |  ) p( )
p( | x) 
 p( x |  ) p( )d
[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]


• It is natural to impose a Dirichlet prior on the
parameters of the multinomial pdf:

 K n
aj 
p ( n | a n )  K 
 
K ( a n 1)

j 1
 jn , a n  0, n  1,..., N , j  1,..., K ,
j

j

 j  
 a n j 1
j 1

parameterized by vector a   a , a ,..., a 
T
n n n n
1 2 K .

Marginalizing the parameters of the multinomial
1
p( z n | a n )   p( z n |  n ) p( n | a n ) d i , n  1, 2,..., N
0

yields the Dirichlet compound multinomial distribution for
the class labels:
 K n
aj 
M!  j 1  
K  an  z n

p( z | a )  K   a n , n  1,..., N .
n n j j


 K n 
z j    a j  z n  j 1
n
j
j  
j 1
 j 1 


Image model: for a given pixel, its class j is
determined by M=1 realization of the process.
  n  p( z n  1| x n )  1
 n
j j

 m  p( zm  1| x n )  0 m  j, m  1, 2,..., K ,
n

The DCM prior for the class label becomes:
an an
p( z n  1| a n )   j  1,..., K .
j j


j K n

 n
am
m 1


The model becomes spatially varying by imposing a
GMRF prior on the parameters of the Dirichlet pdf.

 N
n 2
K   m (a j  a j ) 
n

p ( A )    j exp
N   1 n 1 Nn 
j 1  2  j2 
 
 
[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]

MAP estimation

 Posterior probabilities are the non-negative solutions to

Q
  jam 
 n mNi  n 2 
 n j  a m 

mN i
j
 n z j  j  j
n n 2

 0  ( a n )3     j   (a j )    (a j )  | N |  0
a j
n j
| Nn | | Nn |
    n
   
K
 n
j   am , n  1, 2,..., N
n

m 1
m j

 There is always a non-negative solution.
 No need for projection!

Natural image segmentation (1)
 Berkeley image data base (300 images).
 Ground truth: human segmentations.
 Features
 MRF features

o 7x7 windows x 3 components.

o 147 dimensional vector.

o PCA on a single image.

o 8 principal components kept.


MRF features

Segmentation and recovery (1)
 Berkeley image data base.
 Additive white Gaussian noise
 SNR between -4 dB and 12 dB

 MRF features.
 Evaluation indices
 PR.

 VI.

 GCE.

 BDE.

Segmentation and recovery (2)
PR index (K=5)

Line processes (1)
Image recovery: estimate a smooth function from noisy
observations.

• Observations: d
• Function to be estimated: u
 2
min   di  ui      ui  ui 1  
2

u
 i i 

Data fidelity term Smoothness term

• Calculus of variations (Euler-Lagrange equations).

Line processes (2)
In presence of many edges (piecewise smoothness) the standard
solution is not satisfactory. A line process is integrated:

 
min   di  ui      ui  ui 1  1  li   a li 
2 2

u ,l
 i i i 
li  0 : Non-edge, include the term.
Penalty term
li  1: Edge, add penalty.

• Many local minima (due to simultaneous estimation of u and
l), calculus of variations cannot be applied.

Line processes (3)
 Milestones
 [D. Geman and S. Geman 1984],
 [A. Blake and A. Zisserman 1988],
 [M. Black 1996 ].
 Integration of la line process into a SV-GMM.
 Continuous line process model on the
contextual mixing proportions.
 Gamma distributed line process variables.
 Line process parameters are automatically
estimated from the data (EM and Variational
EM).

GMM with line process (2)

Line Process

GMM with continuous line process (1)
 Student’s-t prior on the local differences of the
contextual mixing proportions:

   ~ St (0,  , v jd ), d , n, j, k  d (n)
n
j
k
j
2
jd

 Distinct priors on each:
 Image class,
 Neighborhood direction (horizontal, vertical).


 Equivalently, at each pixel n:
   ~ N (0,  / u ),
n
j
k
j
2
jd
nk
j

u nk ~ Gamma(v jd / 2, v jd / 2), d , n, j, k  d (n).
j

 Joint distribution:
 N D
p(;  , v)    St ( n |  k ;  jd , v jd ).
j j
2


j 1 n 1 d 1 k ( n )

 Description of edge structure.
 Continuous generalization of a binary line
process.

u nk  
j
Weak class variances (smoothness).

u nk  0
j Uninformative prior (no smoothness).
nk
u j
Separation of class j from the remaining
classes.
[G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]

Edges between segments (2)
Horizontal differences Vertical differences

Sky

Cupola

Building

Numerical results (1)
Berkeley images -Rand index (RI)

Image registration

• Estimate the transformation TΘ mapping the
coordinates of an image I1 to a target image
I2:

2  x, y, z   I1    x, y, z 

ΤΘ is described by a set of parameters Θ

Image similarity measure

E()  I1    x, y, z  ,  2  x, y, z 
 
• Single modal images
– Quadratic error, Correlation, Fourier transform,
Sign changes.
• Multimodal images
– Inter-image uniformity, mutual information (MI),
normalized MI.

Fundamental hypothesis
• Correspondence between uniform regions in the two
images.
• Partitioning of the image to be registered.
– Not necessarily into joint regions.
• Projection of the partition onto the reference image.
• Evaluation of a distance between the overlapping regions.
– Minimum at correct alignment.
– Minimize the distance.

Distance between GMMs (1)

A straightforward approach would be:
M
G1 ( x | Π1 , Θ1 )    m  ( x )
i 1
m
i

m 1
N
G2 ( x | Π 2 , Θ 2 )    n n2 ( x i )
i

n 1

M N
E (G1 , G2 )    m n B  m , n 
1 2

m 1 n 1

Bhattacharyya distance

Distance between GMMs (2)

Knowing the correspondences
allows the definition of:

E (G1 , G2 )     B  T k  , k 
K
1 2

k 1

Pixels of the transformed floating Component of the reference image.
image overlapping with the kth
component of the reference image.

Energy function (1)

• For a set of transformation parameters Θ:
– Segment the image to be registered into K
segments by GMM (or SMM).
– For each segment:
• Project the pixels onto the reference image.
• Compute the mean and covariance of the reference
image pixels under the projection mask.
– Evaluate the distance between the distributions.

Energy function (2)
• Find the transformation parameters Θ:

 2 
min    B  T k  , k  
K
1

 k 1 
• Optimization by simplex, Powell method
or ICM.
[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]

Convexity

Battacharyya distance

Registration error
(Gaussian noise)

MMBIA 2007, Rio de
Janeiro, Brazil

Registration of point clouds

• Correspondence is unknown
– Greedy distance between mixtures
– Determine the correspondence (Hungarian algorithm)

Experimental results

initial point set 2% outliers + uniform noise

Conclusions
• Application of mixture models to
– Image segmentation
– Image registration
• Other applications
– Image retrieval
– Visual tracking
–…

Mixture Models for Image Analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (20)

Semelhante a Mixture Models for Image Analysis

Semelhante a Mixture Models for Image Analysis (20)

Mais de Distinguished Lecturer Series - Leon The Mathematician

Mais de Distinguished Lecturer Series - Leon The Mathematician (9)

Último

Último (20)

Mixture Models for Image Analysis