Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
Mixture Models for Image Analysis
1. Mixture Models for Image Analysis
Aristidis Likas & Christophoros Nikou
IPAN Research Group
Department of Computer Science
University of Ioannina
2. Collaborators:
Nikolaos Galatsanos, Professor
Konstantinos Blekas, Assistant Professor
Dr. Costas Constantinopoulos, Researcher
George Sfikas, Ph.d Candidate
Demetrios Gerogiannis, Ph.d Candidate
3. Outline
• Mixture Models and EM (GMM, SMM)
• Bayesian GMMs
• Image segmentation using mixture models
– Incremental Bayesian GMMs
– Spatially varying GMMs (SVMMs) with MRF priors
– SVMMs and line processes
• Image registration using mixture models
4. Mixture Models
• Probability density estimation: estimate the density function
model f(x) that generated a given dataset X={x1,…, xN}
• Mixture Models
M M
f ( x)
j 1
j j ( x; j ) j 0,
j 1
j 1
– M pdf components φj(x),
– mixing weights: π1, π2, …, πM (priors)
• Gaussian Mixture Model (GMM): φj = N(μj, Σj)
6. GMM examples
GMMs be used for density estimation (like histograms) or clustering
j j ( x n ; j ) Cluster
P( j | x )
n
n
z j
n
memberhsip
f (x )
6 probability
7. Mixture Model training
• Given a dataset X={x1,…, xN} and a GMM f (x;Θ)
p( X ; ) p( x1 ,..., xN ; ) i 1 f ( xi ; )
N
• Likelihood:
• GMM training: log-likelihood maximization
N
arg max ln p( xi ; )
i 1
• Expectation-maximization (EM) algorithm
– Applicable when posterior P(Z|X) can be computed
8. EM for Mixture Models
• E-step: compute expectation of hidden
variables given the observations:
j ( xn | j )
P( j | x n ) z n
j K
j ( xn | p )
p 1
• M-step: maximize expected complete likelihood
( t 1)
arg max Q(Θ) log p( X , Z ;Θ)P( Z |X )
Q ( ) z n log j log ( x n | j )
N K
j
n 1 j 1
9. EM for GMM (M-step)
n1 j
N
zn xn
Mean (jt 1)
N
n 1
z n
j
n1 j
z n ( x n (jt 1) )( x n (jt 1) )T
N
Covariance (jt 1)
n1 j
N
zn
n1 j
N
zn
Mixing weights ( t 1)
j
N
10. Student's t-distribution
d 1/ 2
||
St ( x; , , ) 2
d / 2
1
( d )
( ) 1 ( x ) ( x ) /
1 2
2
Mean μ
Covariance matrix Σ
Degrees of freedom v
Bell-shaped + heavy-tailed (depending on v)
Tends to Gaussian for large v
12. The Student's t-distribution
u; ~ Gamma( / 2, / 2)
x | u; , ~ ( , / u)
Hierarchical distribution
x follows a Gaussian distribution whose covariance is scaled
by a factor following a Gamma distribution.
ML parameter estimation using the EM algorithm
(u is considered as hidden variable).
14. SMM: Student's t Mixture Models
Each component j follows St(μj, Σj, vj) (robust mixture)
Parameter estimation using EM
hidden variables: uj and zj
j ( xn | j )
E-step: z n
j K
p ( xn | p )
p 1
v (jt ) d
u
n
v x
( t ) 1
x n (jt )
j
(t ) n (t ) T
j j j
15. SMM training
• M-step
n1 j
N
u n z n x n
Mean (jt 1) j
n1 j
N
u n z n
j
n1 j
u n z n ( x n (jt 1) )( x n (jt 1) )T
N
(jt 1)
j
Covariance
N
n 1
u n z n
j j
n1 j
N
u n z n
(jt 1)
j
Mixing
N
proportion
16. EM for SMM
• M-step
Degrees of freedom: no closed form update
v (jt 1) v (jt 1) v (jt 1) d
log
2
2 1 log
2
z n (t ) log u n (t ) u n (t )
n1 j
N
v(jt 1) d
0
j j
n1 j
N
z n ( t ) 2
17. Mixture model training issues
• EM local maxima (dependence on initialization)
• Covariance Singularities
• How to select the number of components
• SMM vs GMM
• Better results for data with outliers (robustness)
• Higher dependence on initialization (how to
initialize vi ?)
19. Bayesian GMM
M M
f ( x ) j j ( x; j , j )
j 1
j 1
j 1
Typical approach: Priors on all GMM parameters
j : N (m, S ), p( ) p( j )
j 1
Tj : Wishart (v,V ), p(T ) p(T j ), T j 1
j
j 1
( 1 ,..., ) : Dirichlet (a1 ,..., aM )
20. Bayesian GMM training
• Parameters Θ become (hidden) RVs: H={Z, Θ}
• Objective: Compute Posteriors P(Z|X), P(Θ|X) (intractable)
• Approximations
• Sampling (RJMCMC)
• MAP approach
• Variational approach
• MAP approximation
• mode of the posterior P(Θ|Χ) (MAP-EM)
MAP arg max {log P( X | ) log P( )}
• compute P(Z|X,ΘMAP)
21. Variational Inference (no parameters)
• Computes approximation q(H) of the true posterior P(H|X)
• For any pdf q(H): ln p X F q KL q H P H | X
• Variational Bound (F) maximization
pX,H
q* arg max q F q arg max q q H ln dH
qH
• Mean field approximation qH qH k
k
exp ln p X , H ;
q H ;
k q H k
exp ln p X , H ; dH k
q H k
• System of equations
D. Tzikas, A. Likas, N. Galatsanos, IEEE Signal Processing Magazine, 2008
22. Variational Inference (with parameters)
• X data, H hidden RVs, Θ parameters
• For any pdf q(H;Θ):
ln p X ; F q, KL q H ; p H | X ;
• Maximization of Variational Bound F
p X , H ;
F q, q H ; ln dH ln p X ;
q H ;
• Variational EM
• VE-Step: q arg max F q,
old
q
• VM-Step: arg max F qold ,
23. Bayesian GMM training
• Bayesian GMMs (no parameters)
• mean field variational approximation
• tackles the covariance singularity problem
• requires to specify the parameters of the priors
• Estimating the number of components:
• Start with a large number of components
• Let the training process prune redundant components (πj=0)
• Dirichlet prior on πj prevents component prunning
24. Bayesian GMM without prior on π
• Mixing weights πj are parameters (remove Dirichlet prior)
• Training using Variational EM
Method (C-B)
• Start with a large number of components
• Perform variational maximization of the marginal likelihood
• Prunning of redundant components (πj=0)
• Only components that fit well to the data are finally retained
25. Bayesian GMM (C-B)
• C-B method: Results depend on
• the number of initial components
• initialization of components
• specification of the scale matrix V of the Wishart prior p(T)
26. Incremental Bayesian GMM
• Solution: incremental training using component splitting
• Local scale matrix V: based on the variance of the component to be
splitted
• Modification of the Bayesian GMM is needed
• Divide the components as ‘fixed’ or ‘free’
• Prior on the weights of ‘fixed’ components (retained)
• No prior on the weights of ‘free’ components (may be
eliminated)
• Prunning restricted among ‘free’ components
C. Constantinopoulos & A. Likas, IEEE Trans. on Neural Networks, 2007
28. Incremental Bayesian GMM
• Start with k=1 component.
•At each step:
• select a component j
• split component j in two subcomponents
• set the scale matrix V analogous to Σj
• apply Variational EM considering the two subcomponents
as free and the rest components as fixed
• either the two components will be retained and adjusted
• or one of them will be eliminated and the other one will
recover the original component (before split)
• until all components have been tested for split unsuccessfully
29. Mixture Models for Image Modeling
• Select a feature representation
• Compute a feature vector per pixel to form the training set
• Build a mixture model for the image using the training set
•Applications
• Image retrieval + relevance feedback
• Image segmentation
• Image registration
30. Mixture Models for Image Segmentation
• One cluster per mixture component.
• Assign pixels to clusters based on P(j|x)
• Take into account spatial smoothness:
neighbouring pixels are expected to have the same label
• Simple way: add pixel coordinates to the feature vector
• Bayesian way: impose MRF priors (SVMM)
33. Spatially Varying mixtures (1)
K
f ( x n | Π, Θ) n ( x n | j )
j
n 1,2,...,N
j 1
x n Image feature (e.g. pixel intensity)
n
j
Contextual mixing proportions
(x | j )
n
Gaussian parameterized by j { j , j }
n
z j Data Label, hidden variable
34. Spatially Varying mixtures (2)
Insight into the contextual mixing proportions:
p( z 1| x )
n
j
n
j
n
Smoothness is enforced in the image by imposing
a prior p(Π) on the probability of the pixel labels
(contextual mixing proportions).
N
L(Π | Χ, Θ) log f ( x n | Π, Θ) log p( Π)
n 1
35. SV-GMM with Gibbs prior (1)
• A typical constraint is the Gibbs prior:
1 U (Π ) N
p(Π) e , U (Π ) VN i (Π ),
Z i 1
Ni 2
K
VNi (Π) i
j
m
j , Smoothness weight
j 1 m 1
[K. Blekas, A. Likas, N. Galatsanos and I. Lagaris. IEEE Trans. Neur.
Net., 2005]
37. SV-GMM with Gibbs prior (3)
• E-step: equivalent with GMM.
• M-step: the contextual mixing proportions are
solutions to a quadratic equation.
• Note that:
1) Parameter β of the Gibbs prior must be determined
beforehand.
2) The contextual mixing proportions are not
constraint to be probability vectors:
K
0 n 1,
j j
n 1, n 1, 2,..., N
j 1
38. SV-GMM with Gibbs prior (4)
To address these issues:
1) Class adaptive Gauss-Markov random field
prior.
2) Projection of probabilities to the hyper-plane
(another solution will be presented later on):
K
j
n 1, n 1, 2,..., N
j 1
39. SV-GMM with Gauss-Markov prior (2)
• One variance per cluster j=1,2,…,K per direction
d=0, 45, 90, 135 degrees
N
m 2
D K m ( j j )
n
p(Π ) j ,d exp
N 1 n 1 Nn
d 1 j 1 2 j2,d
[C. Nikou, N. Galatsanos and A. Likas. IEEE Trans. Im. Proc., 2007]
41. MAP estimation
Posterior probabilities are the non-negative
solutions of the second degree equation:
D D
Q D D D
0 | N n | j2,d ( n ) 2 j2,d m ( n ) z ij j2,d 0
n p 1 d 1
j p 1 d 1 mN n
p
j
j d 1
j
d p
dp
There is always a non-negative solution.
K
Projection to the hyperplane: j
n 1, n 1, 2,..., N
j 1
44. RGB image segmentation (3)
Degraded image segmentation
SVFMM CA-SVFMM
(β determined by trial
and error)
45. RGB image segmentation (4)
Shading effect on cupola and wall
modeled with SVFMM with a GMRF
prior.
βj x10-3
Cupola 128
Sky 33
Wall 119
46. SV-GMM with DCM prior (1)
For pixel n, the class label is a random variable multinomially
distributed:
!
jn , jn 0,
K zn K
p( z n | n ) jn 1, n 1,..., N ,
j
K
z ij
j 1
j 1 j 1
, ,...,
T
parameterized by probability vector n
1
n n
2
n
K .
The whole image is parameterized by
Ξ 1
,
2
,...,
47. SV-GMM with DCM prior (2)
!
jn , jn 0,
K zn K
p( z n | n ) jn 1, n 1,..., N ,
j
K
z ij
j 1
j 1 j 1
Generative model for the image
• Multinomial distribution: K possile outcomes.
• Class label j, (j=1…K) appears with probability ξj .
• M realizations of the process.
• The distribution of the counts of a certain class is
binomial.
48. SV-GMM with DCM prior (3)
• The Dirichlet distribution forms the conjugate
prior for the multinomial distribution.
– The posterior p( | x) has the same functional form as
the prior p( ) .
p( x | ) p( )
p( | x)
p( x | ) p( )d
[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
49. SV-GMM with DCM prior (4)
• It is natural to impose a Dirichlet prior on the
parameters of the multinomial pdf:
K n
aj
p ( n | a n ) K
K ( a n 1)
j 1
jn , a n 0, n 1,..., N , j 1,..., K ,
j
j
j
a n j 1
j 1
parameterized by vector a a , a ,..., a
T
n n n n
1 2 K .
50. SV-GMM with DCM prior (5)
Marginalizing the parameters of the multinomial
1
p( z n | a n ) p( z n | n ) p( n | a n ) d i , n 1, 2,..., N
0
yields the Dirichlet compound multinomial distribution for
the class labels:
K n
aj
M! j 1
K an z n
p( z | a ) K a n , n 1,..., N .
n n j j
K n
z j a j z n j 1
n
j
j
j 1
j 1
51. SV-GMM with DCM prior (6)
Image model: for a given pixel, its class j is
determined by M=1 realization of the process.
n p( z n 1| x n ) 1
n
j j
m p( zm 1| x n ) 0 m j, m 1, 2,..., K ,
n
The DCM prior for the class label becomes:
an an
p( z n 1| a n ) j 1,..., K .
j j
j K n
n
am
m 1
52. SV-GMM with DCM prior (7)
The model becomes spatially varying by imposing a
GMRF prior on the parameters of the Dirichlet pdf.
N
n 2
K m (a j a j )
n
p ( A ) j exp
N 1 n 1 Nn
j 1 2 j2
[C. Nikou, A. Likas and N. Galatsanos. IEEE Trans. Im. Proc., 2010]
54. MAP estimation
Posterior probabilities are the non-negative solutions to
Q
jam
n mNi n 2
n j a m
mN i
j
n z j j j
n n 2
0 ( a n )3 j (a j ) (a j ) | N | 0
a j
n j
| Nn | | Nn |
n
K
n
j am , n 1, 2,..., N
n
m 1
m j
There is always a non-negative solution.
No need for projection!
55. Natural image segmentation (1)
Berkeley image data base (300 images).
Ground truth: human segmentations.
Features
MRF features
o 7x7 windows x 3 components.
o 147 dimensional vector.
o PCA on a single image.
o 8 principal components kept.
64. Segmentation and recovery (1)
Berkeley image data base.
Additive white Gaussian noise
SNR between -4 dB and 12 dB
MRF features.
Evaluation indices
PR.
VI.
GCE.
BDE.
66. Line processes (1)
Image recovery: estimate a smooth function from noisy
observations.
• Observations: d
• Function to be estimated: u
2
min di ui ui ui 1
2
u
i i
Data fidelity term Smoothness term
• Calculus of variations (Euler-Lagrange equations).
67. Line processes (2)
In presence of many edges (piecewise smoothness) the standard
solution is not satisfactory. A line process is integrated:
min di ui ui ui 1 1 li a li
2 2
u ,l
i i i
li 0 : Non-edge, include the term.
Penalty term
li 1: Edge, add penalty.
• Many local minima (due to simultaneous estimation of u and
l), calculus of variations cannot be applied.
68. Line processes (3)
Milestones
[D. Geman and S. Geman 1984],
[A. Blake and A. Zisserman 1988],
[M. Black 1996 ].
Integration of la line process into a SV-GMM.
Continuous line process model on the
contextual mixing proportions.
Gamma distributed line process variables.
Line process parameters are automatically
estimated from the data (EM and Variational
EM).
70. GMM with continuous line process (1)
Student’s-t prior on the local differences of the
contextual mixing proportions:
~ St (0, , v jd ), d , n, j, k d (n)
n
j
k
j
2
jd
Distinct priors on each:
Image class,
Neighborhood direction (horizontal, vertical).
71. GMM with continuous line process (2)
Equivalently, at each pixel n:
~ N (0, / u ),
n
j
k
j
2
jd
nk
j
u nk ~ Gamma(v jd / 2, v jd / 2), d , n, j, k d (n).
j
Joint distribution:
N D
p(; , v) St ( n | k ; jd , v jd ).
j j
2
j 1 n 1 d 1 k ( n )
73. GMM with continuous line process (4)
Description of edge structure.
Continuous generalization of a binary line
process.
u nk
j
Weak class variances (smoothness).
u nk 0
j Uninformative prior (no smoothness).
nk
u j
Separation of class j from the remaining
classes.
[G. Sfikas, C. Nikou and N. Galatsanos. IEEE CVPR, 2008]
77. Image registration
• Estimate the transformation TΘ mapping the
coordinates of an image I1 to a target image
I2:
2 x, y, z I1 x, y, z
ΤΘ is described by a set of parameters Θ
78. Image similarity measure
E() I1 x, y, z , 2 x, y, z
• Single modal images
– Quadratic error, Correlation, Fourier transform,
Sign changes.
• Multimodal images
– Inter-image uniformity, mutual information (MI),
normalized MI.
79. Fundamental hypothesis
• Correspondence between uniform regions in the two
images.
• Partitioning of the image to be registered.
– Not necessarily into joint regions.
• Projection of the partition onto the reference image.
• Evaluation of a distance between the overlapping regions.
– Minimum at correct alignment.
– Minimize the distance.
80. Distance between GMMs (1)
A straightforward approach would be:
M
G1 ( x | Π1 , Θ1 ) m ( x )
i 1
m
i
m 1
N
G2 ( x | Π 2 , Θ 2 ) n n2 ( x i )
i
n 1
M N
E (G1 , G2 ) m n B m , n
1 2
m 1 n 1
Bhattacharyya distance
81. Distance between GMMs (2)
Knowing the correspondences
allows the definition of:
E (G1 , G2 ) B T k , k
K
1 2
k 1
Pixels of the transformed floating Component of the reference image.
image overlapping with the kth
component of the reference image.
82. Energy function (1)
• For a set of transformation parameters Θ:
– Segment the image to be registered into K
segments by GMM (or SMM).
– For each segment:
• Project the pixels onto the reference image.
• Compute the mean and covariance of the reference
image pixels under the projection mask.
– Evaluate the distance between the distributions.
83. Energy function (2)
• Find the transformation parameters Θ:
2
min B T k , k
K
1
k 1
• Optimization by simplex, Powell method
or ICM.
[D. Gerogiannis, C. Nikou and A. Likas. Image and Vision Computing, 2009]