Using modern, fast, MCMC methods to cluster a dataset. Data are modelled as clusters of multivariate Gaussians. Each cluster can have a different covariance matrix.
Selecting the number of clusters, and covariance model, is a challenge. Often, the BIC is used as an approximation to the Bayes Factor. With MCMC, we can compute the Bayes Factor more accurately.
This methods leads to improved performance in the case of datasets with a large number of small clusters.
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
MCMC for clustering of multivariate-Normal data
1. MCMC for mixtures of Gaussians, and model
selection
Aaron McDaid, aaronmcdaid@gmail.com
October 30, 2014
1 / 36
2. Six models
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll l
ll
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−20 0 20 40 60
0 20 40 60 80 100
V1
V2
(a) 1. vvv
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−10 −5 0 5 10
−30 −20 −10 0 10 20 30
V1
V2
(b) 2. eee
2 / 36
3. Six models
l
l l
l l
l
l
l l
l
l
l
l
ll
l l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
−10 0 10 20 30
−20 −10 0 10
V1
V2
(a) 3. vvi
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
llll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
l l
l
ll
l
ll
ll
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−30 −20 −10 0 10 20 30
−30 −20 −10 0 10 20 30
V1
V2
(b) 4. eei
3 / 36
4. Six models
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l l
l
l l
l ll l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
lll
l
l
l
l l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
ll
l
l
−20 −10 0 10 20
−20 −10 0 10 20
V1
V2
(a) 5. vii
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
−100 0 100 200
−200 −100 0 100 200
V1
V2
(b) 6. eii
4 / 36
5. Old Faithful N=272
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
50 60 70 80 90
eruptions
waiting
Old Faithful - Yellowstone National
Park
5 / 36
6. Old Faithful N=272
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
50 60 70 80 90
V1
V2
Old Faithful - Yellowstone National
Park
6 / 36
8. ne the mclust model
Bayes Factor and BIC - connection between mclust and
MCMC
Priors
Integration (analytical and numerical)
MCMC algorithm1
Selecting from the six models via MCMC
Evaluation (on synthetic data)
One application
1Mahlet G. Tadesse, Naijun Sha, and Marina Vannucci. Bayesian Variable
Selection in Clustering High-Dimensional Data". In: Journal of the American
Statistical Association 100.470 (June 2005), pp. 602{617. issn: 0162-1459.
doi: 10.1198/016214504000001565. url:
http://www.stat.rice.edu/~{}marina/papers/jasa05.pdf.
7 / 36
9. Goals
Not a `shootout' with mclust
See what MCMC can do
Calculate the Bayes Factor more precisely - is it better than
BIC?
Push to larger numbers of clusters
8 / 36
10. Basic model
N data points in a p-dimensional space.
m 2 (fvvv; eee; vvi; eei; vii; eiig)
K number of clusters
k covariance of clusterk
k mean Pof cluster k
K
k=1 k = 1
zi P(zi = k) = k
xi jzi=k Normal(k ;k ):
Mixture models
P(xi jzi=k) = N(xi jk ;k )
P(xi ) =
XK
k=1
kN(xi jk ;k )
9 / 36
11. mclust
MLE (Maximum Likelihood Estimate)
R package mclust2
Given (K;m), use Expectation-Maximization (EM) algorithm
to estimate (;;).
P(Xjk ;k ;;m;K)
Requires running EM for each possible combination of (K;m).
Hundreds of runs may be required. f(K = 2;m = VVI); (K =
3;m = EEI ); (K = 50;m = EEI ); : : : g
Then use BIC to select among the models.
2Chris Fraley and Adrian E. Raftery. MCLUST: Software for model-based
cluster analysis. In: Journal of Classi
13. mclust
Why do we need model selection?
vvv vvi vii
eee eei eii
De
14. ne = (;;).
P(Xj=^ eee;K;m=vvv;K) = P(Xj=^ eee;K;m=eee;K)
Cannot maximize P(Xj;m;K)
Count the degrees-of-freedom f , in order to penalize the more
complex model.
AIC = 2 log P(XjMLE
m;K ;m;K) 2f
BIC = 2 log P(XjMLE
m;K ;m;K) log(N)f
11 / 36
15. Bayes Factor
(BIC) Bayesian Information Criterion
BIC 2 log
Bayes Factor z }| {
(P(Xjm;K))
P(X = Xobs jm;K)
Informally, the average P(Xj;m;K) over all .
Can we compute this (weighted) average more accurately?
12 / 36
16. Bayes Factor
(BIC) Bayesian Information Criterion
BIC 2 log
Bayes Factor z }| {
(P(Xjm;K))
P(X = Xobs jm;K)
Informally, the average P(Xj;m;K) over all .
Can we compute this (weighted) average more accurately?
P(Xjm=vvv;K)
P(Xjm=eee;K) =
R
R P(X;jm=vvv;K) d
P(X;jm=eee;K) d
12 / 36
17. Full model
N data points in a p-dimensional space.
dependence distribution
m Uniform(fvvv; eee; vvi; eei; vii; eiig)
K jK0 Poisson(1)
jK Dirichlet(0):
zi j;K P(zi = kj;K) = k
k jm;K Wishart1(V0; g0):
k jk ;m;K Normal(0; 1
n0
k ):
xi jzi=k; k ;;m;K Normal(k ;k ):
0 =
1
2
;
1
2
; :::;
1
2
0 = X
n0 = 0:001
g0 = (p+1)+n0(p+1)
1n0
p + 1 +
V0 = Cov(X)(g0 p 1)
13 / 36
18. Dirichlet
( 1K
; 1K
; :::; 1K
).
PK
k=1 k = 1
Dirichlet gives us random vectors
Dirichlet(1; 2; :::; K)
K = 4, = (0:01; 0:09; 0:80; 0:10)
May lead to empty clusters in the prior, and therefore in
posterior too
KjX KTRUE
Solution3. K Poisson(1)jK 1
3Agostino Nobile. Bayesian
22. Integration
In general,
P(ajb) =
X
c
P(a; cjb)
P(ajb) =
Z
P(a; ejb) de
P(ajb) =
X
c
P(ajc; b)P(cjb)
P(ajb) =
Z
P(aje; b)P(ejb) de
16 / 36
23. Integration
P(mjX) =
1X
K=1
X
z
Z Z Z
P(;; z;;K;mjX) d d d
P(KjX) =
X
m
X
z
Z Z Z
P(;; z;;K;mjX) d d d
P(K;mjX) =
X
z
Z Z Z
P(;; z;;K;mjX) d d d
P(zjX) =
1X
K=1
X
m
Z Z Z
P(;; z;;K;mjX) d d d
17 / 36
24. Integration
P(z;K;mjX) =
Z Z Z
P(;; z;;K;mjX) d d d
P(z;K;mjX) =
Z Z Z
1
P(X)
P(;; z;;K;m;X) d d d
P(z;K;mjX) =
1
P(X)
Z Z Z
P(;;jz;K;m;X)P(z;K;m;X) d d P(z;K;m;X)
P(z;K;mjX) =
P(X)
Z Z Z
P(;;jz;K;m;X) d d d
18 / 36
26. ed the model, with all our priors, earlier. How do we get
our estimates?
RJMCMC4 would give many estimates of
P(;; z;;K;mjX).
Want faster MCMC.
Solve P(X; z;K;m) analytically.
Use that to sample z;K;mjX.
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995),
pp. 711{732. doi: 10.1093/biomet/82.4.711. url:
http://dx.doi.org/10.1093/biomet/82.4.711.
19 / 36
28. ed the model, with all our priors, earlier. How do we get
our estimates?
RJMCMC4 would give many estimates of
P(;; z;;K;mjX).
Want faster MCMC.
Solve P(X; z;K;m) analytically.
Use that to sample z;K;mjX.
Count popular (m), or (K), or (m;K) in sample. P(m;KjX).
(Proven identical to RJMCMC - dierent MCMC algorithms
(usually) don't change results, just speed.)
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995),
pp. 711{732. doi: 10.1093/biomet/82.4.711. url:
http://dx.doi.org/10.1093/biomet/82.4.711.
19 / 36
30. ed the model, with all our priors, earlier. How do we get
our estimates?
RJMCMC4 would give many estimates of
P(;; z;;K;mjX).
Want faster MCMC.
Solve P(X; z;K;m) analytically.
Use that to sample z;K;mjX.
Count popular (m), or (K), or (m;K) in sample. P(m;KjX).
(Proven identical to RJMCMC - dierent MCMC algorithms
(usually) don't change results, just speed.)
If desired, ;;jX; z;K;m is easily generated.
4Peter J. Green. Reversible Jump Markov Chain Monte Carlo computation
and Bayesian model determination. In: Biometrika 82.4 (Dec. 1995),
pp. 711{732. doi: 10.1093/biomet/82.4.711. url:
http://dx.doi.org/10.1093/biomet/82.4.711.
19 / 36
31. Analytical integration
P(X; z;K;m) =
Z Z Z
P(X;;; z;;K;m) d d d
P(;;;X; z;K;m) = P(;;;X; z;K;m)
P(;;jX; z;K;m)P(X; z;K;m) = P(X; z;K;mj;;)P(;;)
P(X; z;K;m) = P(X;z;K;mj;;)P(;;)
P(;;jX;z;K;m)
20 / 36
32. Numerical integration (MCMC)
Markov Chain Monte Carlo (MCMC)
Begin with an initial estimate (z1;m1;K1)
At each iteration, propose to perturb
(zi ;mi ;Ki ) ) (zi;mi;Ki)
Similar to current state, to enable a gradual `climb' towards
the good estimates.
21 / 36
33. Numerical integration (MCMC)
Markov Chain Monte Carlo (MCMC)
Begin with an initial estimate (z1;m1;K1)
At each iteration, propose to perturb
(zi ;mi ;Ki ) ) (zi;mi;Ki)
Similar to current state, to enable a gradual `climb' towards
the good estimates.
h
De
34. ne ai = min
1; P(X;zi;mi;Ki)
P(X;zi ;mi ;Ki )
q(zi ;mi ;Ki jzi;mi;Ki)
q(zi;mi;Kijzi ;mi ;Ki )
i
21 / 36
35. Numerical integration (MCMC)
Markov Chain Monte Carlo (MCMC)
Begin with an initial estimate (z1;m1;K1)
At each iteration, propose to perturb
(zi ;mi ;Ki ) ) (zi;mi;Ki)
Similar to current state, to enable a gradual `climb' towards
the good estimates.
h
De
36. ne ai = min
1; P(X;zi;mi;Ki)
P(X;zi ;mi ;Ki )
q(zi ;mi ;Ki jzi;mi;Ki)
q(zi;mi;Kijzi ;mi ;Ki )
i
(zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai .
(zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai .
Resulting estimates will be drawn as z;m;KjX
21 / 36
37. Numerical integration (MCMC)
Markov Chain Monte Carlo (MCMC)
Begin with an initial estimate (z1;m1;K1)
At each iteration, propose to perturb
(zi ;mi ;Ki ) ) (zi;mi;Ki)
Similar to current state, to enable a gradual `climb' towards
the good estimates.
h
De
38. ne ai = min
1; P(X;zi;mi;Ki)
P(X;zi ;mi ;Ki )
q(zi ;mi ;Ki jzi;mi;Ki)
q(zi;mi;Kijzi ;mi ;Ki )
i
(zi+1;mi+1;Ki+1) = (zi;mi;Ki) with probability ai .
(zi+1;mi+1;Ki+1) = (zi ;mi ;Ki ) with probability 1ai .
Resulting estimates will be drawn as z;m;KjX
`Good' proposals don't aect the distribution, but they do
improve speed
21 / 36
39. The above is too slow. Still too much correlation, slowing the progress.
So I run six chains,
(z;KjX;m = vvv) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
(z;KjX;m = eee) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
(z;KjX;m = vvi) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
(z;KjX;m = eei) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
(z;KjX;m = vii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
(z;KjX;m = eii) { (zi ;mi = vvv;Ki ) ) (zi;mi = vvv;Ki)
and combine the results.
Results should be combined in proportion to P(mjX).
22 / 36
41. (High level) description of complete algorithm:5
I run six chains (z;KjX;m). In parallel, independently of each
other.
There is a variable M which is the `current'/`best' model.
At iteration i , based on P(X; zi ;m;Ki ) (and other quantities).
Can be proven that M will be distributed proportional to
P(Xjm=M).
5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov
chain Monte Carlo methods. In: Journal of the Royal Statistical
Society-Series B Methodological 57.3 (1995), pp. 473{484.
24 / 36
42. (High level) description of complete algorithm:5
I run six chains (z;KjX;m). In parallel, independently of each
other.
There is a variable M which is the `current'/`best' model.
At iteration i , based on P(X; zi ;m;Ki ) (and other quantities).
Can be proven that M will be distributed proportional to
P(Xjm=M).
to work well, need to train good pseudopriors in advance.
5Bradley P Carlin and Siddhartha Chib. Bayesian model choice via Markov
chain Monte Carlo methods. In: Journal of the Royal Statistical
Society-Series B Methodological 57.3 (1995), pp. 473{484.
24 / 36
44. Synthetic data
N = 400
K 2 f5; 10; 20g
p 2 f16; 4g
g0 2 fp; p + 1; p + 2g
n0 2 f0:001; 0:01; 0:1g
m 2 fvvv; eee; vvi; eei; vii; eiig
324 kinds of dataset. 5 realizations of each. A total of 1620
datasets.
Ran mclust and MCMC algorithm on each
26 / 36
47. Synthetic data N=100 K=20
V1
−40 0 20 40
l l
ll
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
lll
l
ll
l
l
l ll
l
ll
l
l
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
ll
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l ll
l
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
−40 0 20 40
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
ll
l
l
l
l
l
ll
l
l
lll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l ll
l
l
ll
l
l
l
ll
l
l
l
lll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−40 0 20 40
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
ll
l
l
l
l ll
l
l
l
ll
l
l
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
l
ll l
l
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
−40 0 20 40
−40 0 20
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
−40 0 20
l l
l l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l l
ll
l l
ll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
V2
l
l
l
l
l
ll
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l l
ll
l
l l
l l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l l
l
ll
l
l
l
l
l l
ll
l l
l
l l
l l
l l l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l l
l l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
ll
ll
l
ll
l
l l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
ll
l l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
ll
l
l l
l
l
l
ll
l
ll
l
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l l
l
l
l
l
l
ll
l
ll
l
l
l l
l
l
l
l l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
ll
l
ll
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l l
l
l
ll
ll
ll
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
ll
l
ll
l
l
ll l
l
l
l
l
l
ll
l
lll l
l
l
l
l
l
l
l
l l
ll l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
V3 l l
ll
l
l
l
l
l
ll
ll
ll
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
ll
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
ll
l
l
l
ll
l
l
l
l l
l
l
l
l
l
lll
l
l
l
l
l
l
l
ll
l
l
llll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll l
l
l
l
l
l
l l
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
ll
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
ll l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
−40 0 20
l
l
l
l
l
l
l
l
l
l
l l
l l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
lll
l
l
ll
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
−40 0 20
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll ll
l ll l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll l
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
lll
l
ll
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
lll
l
ll
l
l
l
l
ll
l
l
l
l
l l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
ll
l
l
l
l l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l ll
l
l
l ll
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
V4 ll
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
ll l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l l
l l
l
ll
ll
l
l
l
l
l
ll
l
l l
l
l
l
l
l
ll
l l
lll
l
l
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l l
l l
ll
lll
l
l
l
l
l l
l
l
l l
l
ll
l
l
ll
l
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
l
l l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
ll
l
ll l
l
l
lll
l
l
l l l
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l
ll
l
l
l l
l
l
l
l l
l
l
l
l
l l
l
l l
l
l
l
l l
l
l
l
l
l
l l
ll
l
l ll
l
lll
l
ll
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l l
l
l l
l
l
l
l
lll l
l
ll
l
ll
l
l
l
l
l
ll
l
l
l
l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l l
ll
l
l
l l
ll
l
l
l l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll ll
ll
l
ll
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
ll l l
l
l l
l
l ll
l
l
l
l
ll
l
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
lll
ll
ll
l
l l
l
l
l l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
lll l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
V5 l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
ll
l
l
l
l
l l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
ll
l l
l
l
l
l
l l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l l
l
l
l l
l
l
l l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−40 0 20
l l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
ll
l
l
l l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
−40 0 20
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l
lll
l
ll
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
ll
l
ll
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
lll
l
l l
l
l ll
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
V6 lll
l
l
l
ll
ll
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll
l
l
l
l
ll
l
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l l
l
l
ll
l
l
ll
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l l
l
l
l
l l
l
l
l l
l
l
l
l
l
ll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l l
l
l
l
ll
l
l
ll
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l
ll
l l
l
l
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
V7
−40 0 20
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l l
l
l l
l
l
l l
−40 0 20 40
−40 0 20
l
l
ll
l
l l
l
l
lll l
l
l
l
l
l
l l
l
l
l
ll l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
ll
l
lll l
l
l
l l
l
l
l
lll
l
ll
l
l
l
lll
ll
ll
l
l
l l
l
l
l
l
l
ll
l
l
l
ll
l l
l
l
l
ll
l
l
l
l l
l
l
ll
ll
l
l l
l
l
−40 0 20 40
ll
l
l
l
l lll
l
l l
l
l
l
l
l
l
ll l
l
l
l l
l
l
l
l ll
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
lll
l l
l
l
l
ll
l
l
l l
ll
ll
l
l l
ll
ll
l
l
l
l l
l
l l lll
l
l
l
l
l
l
l
l
ll
l
l
l
l
lll
l l
l
l
l ll
l ll
l
l l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l
l l
l
l
l
l
l
−40 0 20 40
ll
l
l
l
lllll
l l
l ll
l
l
l
l
ll
l
l
l
l
l l
l
l
l
l l
l
ll
l
ll
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
ll
l
l l
l
l
l
l l
l
l
lll l l
l l
l l
l
l
l
l
l
l
l
lll
l
l l
l
l
l
l
ll
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l l
l
l
l l
l
l
l
l
l
l
−40 0 20 40
ll
l
l
l
l lll
ll
l l
l
ll
l
l
l
l
l
l
l
l
l ll
l
ll
l
l
l
l
lll
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
ll
l l
l
l
l V8
29 / 36
48. Synthetic data N=100 K=20
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−40 −20 0 20 40
−40 −20 0 20 40
V1
V2
30 / 36
49. Synthetic data N=100 K=20
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l
−40 −20 0 20 40
−40 −20 0 20 40
V1
V2
31 / 36
53. N = 100;K = 20; p = 8;m = vii . 15 such datasets
mclust
^K
^m
5 VVV
8 VII
12 VII
13 VII
17 VII
17 VII
18 VII
18 VII
18 VII
19 VII
19 VII
19 VII
20 VII
21 EII
32 EEE
MCMC
^K
^m
14 VII
16 VVV
17 VVV
19 VII
19 VII
19 VII
19 VII
19 VII
19 VII
19 VII
20 VII
20 VII
20 VII
20 VII
20 VII
35 / 36
54. Concluding remarks
V** more dicult that E**
VVV more dicult that VVI and VII
Large K, small N, most dicult
MCMC excels here (At
55. rst, I expected dierently)
Should repeat N = 100;K 2 f10; 20g across more p, more n0,
et cetera.
36 / 36