SlideShare a Scribd company logo
1 of 89
Download to read offline
IV Workshop Bayesian Nonparametrics, Roma, 12 Giugno 2004 1
Bayesian Inference on Mixtures
Christian P. Robert
Universit´e Paris Dauphine
Joint work with
JEAN-MICHEL MARIN, KERRIE MENGERSEN AND JUDITH ROUSSEAU
IV Workshop Bayesian Nonparametrics, Roma, 12 Giugno 2004 2
What’s new?!
• Density approximation & consistency
• Scarsity phenomenon
• Label switching & Bayesian inference
• Nonconvergence of the Gibbs sampler & population Monte Carlo
• Comparison of RJMCM with B& D
Intro/Inference/Algorithms/Beyond fixed k 3
1 Mixtures
Convex combination of “usual” densities (e.g., exponential family)
k
i=1
pif(x|θi) ,
k
i=1
pi = 1 k > 1 ,
Intro/Inference/Algorithms/Beyond fixed k 4
−1 0 1 2 3
0.10.20.30.4
0 1 2 3 4 5
0.00.10.20.30.4
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.00.20.40.60.81.0
0 2 4 6 8 10
0.000.100.200.30
0 2 4 6
0.000.050.100.150.200.25
−2 0 2 4 6 8 10
0.000.100.200.30
0 5 10 15
0.000.050.100.15
0 5 10 15
0.000.050.100.15
0 5 10 15
0.000.050.100.15
0.000.050.100.15
0.00.20.40.60.81.0
0.00.10.20.3
Normal mixture densities for K = 2, 5, 25, 50
Intro/Inference/Algorithms/Beyond fixed k 5
Likelihood
L(θ, p|x) =
n
i=1
k
j=1
pjf (xi|θj)
c Computable in O(nk) time
Intro:Misg/Inference/Algorithms/Beyond fixed k 6
Missing data representation
Demarginalisation
k
i=1
pif(x|θi) = f(x|θ, z) f(z|p) dz
where
X|Z = z ∼ f(x|θz), Z ∼ Mk(1; p1, ..., pk)
Missing “data” z1, . . . , zn that may be or may not be meaningful
[Auxiliary variables]
Intro:Misg/Inference/Algorithms/Beyond fixed k 7
Nonparametric re-interpretation
Approximation of unknown distributions
E.g., Nadaraya–Watson kernel
ˆkn(x|x) =
1
nhn
n
i=1
ϕ (x; xi, hn)
Intro:Misg/Inference/Algorithms/Beyond fixed k 8
Bernstein polynomials
Bounded continuous densities on [0, 1] approximated by Beta mixtures
(αk,βk)∈N2
+
pk Be(αk, βk) αk, βk ∈ N∗
[Consistency]
Associated predictive is then
ˆfn(x|x) =
∞
k=1
k
j=1
Eπ
[ωkj|x] Be(j, k + 1 − j) P(K = k|x) .
[Petrone and Wasserman, 2002]
Intro:Misg/Inference/Algorithms/Beyond fixed k 9
0.0 0.2 0.4 0.6 0.8 1.0
02468
11,0.1,0.9
0.0 0.2 0.4 0.6 0.8 1.0
2468
31,0.6,0.3
0.0 0.2 0.4 0.6 0.8 1.0
0.91.01.11.21.3
5,0.8,0.9
0.0 0.2 0.4 0.6 0.8 1.0
01234
54,0.8,2.6
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.01.2
22,1.2,1.6
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.5
45,2.9,1.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.5
7,4.9,3.3
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
67,5.1,9.3
0.0 0.2 0.4 0.6 0.8 1.001234
91,19.1,17.5
Realisations from the Bernstein prior
Intro:Misg/Inference/Algorithms/Beyond fixed k 10
0.0 0.2 0.4 0.6 0.8 1.0
0510152025
0.0 0.2 0.4 0.6 0.8 1.0
0.40.60.81.01.21.41.6
0.0 0.2 0.4 0.6 0.8 1.0
0.60.81.01.21.41.6
0.0 0.2 0.4 0.6 0.8 1.0
0.40.60.81.01.21.41.6
0.0 0.2 0.4 0.6 0.8 1.0
024681012
0.0 0.2 0.4 0.6 0.8 1.0
1.01.52.02.53.03.50.0 0.2 0.4 0.6 0.8 1.0
0.81.01.21.41.61.8
0.0 0.2 0.4 0.6 0.8 1.0
0.51.01.52.02.5
0.0 0.2 0.4 0.6 0.8 1.00.51.01.52.02.53.03.5
Realisations from a more general prior
Intro:Constancy/Inference/Algorithms/Beyond fixed k 11
Density estimation
[CPR & Rousseau, 2000–04]
Reparameterisation of a Beta mixture
p0U(0, 1) + (1 − p0)
K
k=1
pkB(αkεk, αk(1 − εk))
k≥1
pk = 1 ,
with density fψ
Can approximate most distributions g on [0, 1]
Assumptions
– g is piecewise continuous on {x ; g(x) < M} for all M’s
– g(x) log g(x) d x < ∞
Intro:Constancy/Inference/Algorithms/Beyond fixed k 12
Prior distributions
– π(K) has a light tail
P(K ≥ tn/ log n) ≤ exp −rn
– p0 ∼ Be(a0, b0), a0 < 1, b0 > 1
– pk ∝ ωk and ωk ∼ Be(1, k)
– location-scale “hole” prior
(αk, εk) ∼ {1 − exp [− {β1(αk − 2)c3
+ β2(εk − .5)c4
}]}
exp −τ0αc0
k /2 − τ1/{α2c1
k εc1
k (1 − εk)c1
} ,
Intro:Constancy/Inference/Algorithms/Beyond fixed k 13
Consistency results
Hellinger neighbourhood
A (f0) = {f, d(f, f0) ≤ }
Then, for all > 0,
π[A (g)|x1:n] → 1, as n → ∞, g a.s.
and
Eπ
[d(g, fψ)|x1:n] → 0, g a.s.
Extension to general parametric distributions by the cdf transform Fθ(x)
Intro/Inference/Algorithms/Beyond fixed k 14
2 [B] Inference
Difficulties:
• identifiability
• label switching
• loss function
• ordering constraints
• prior determination
Intro/Inference:Identifability/Algorithms/Beyond fixed k 15
Central (non)identifiability issue
k
j=1 pjf(y|θj) is invariant to relabelling of the components
Consequence
((pj, θj))1≤i≤k
only known up to a permutation τ ∈ Sk
Intro/Inference:Identifability/Algorithms/Beyond fixed k 16
Example 1. Two component normal mixture
p N (µ1, 1) + (1 − p) N (µ2, 1)
where p = 0.5 is known
The parameters µ1 and µ2 are identifiable
Intro/Inference:Identifability/Algorithms/Beyond fixed k 17
Bimodal likelihood [500 observations and (µ1, µ2, p) = (0, 2.5, 0.7)]
−1 0 1 2 3 4
−101234
µ1
µ2
Intro/Inference:Identifability/Algorithms/Beyond fixed k 18
Influence of p on the modes
−2 0 2 4
−2024
µ1
µ2
p=0.5
−2 0 2 4
−2024
µ1
µ2
p=0.6
−2 0 2 4
−2024
µ1
µ2
p=0.75
−2 0 2 4
−2024
µ1
µ2
p=0.85
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 19
Combinatorics
For a normal mixture,
pϕ(x; µ1, σ1) + (1 − p)ϕ(x; µ2, σ2)
under the pseudo-conjugate priors (i = 1, 2)
µi|σi ∼ N (ζi, σ2
i /λi), σ−2
i ∼ G a(νi/2, s2
i /2), p ∼ Be(α, β) ,
the posterior is
π (θ, p|x) ∝
n
j=1
{pϕ(xj; µ1, σ1) + (1 − p)ϕ(xj; µ2, σ2)} π (θ, p) .
Computation: complexity O(2n)
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 20
Missing variables (2)
Auxiliary variables z = (z1, . . . , zn) ∈ Z associated with observations
x = (x1, . . . , xn)
For (n1, . . . , nk), where n1 + . . . + nk = n,
Zj = z :
n
i=1
Izi=1 = n1, . . . ,
n
i=1
Izi=k = nk
which consists of all allocations with the given allocation vector (n1, . . . , nk) (and
j corresponding lexicographic order).
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 21
Number of nonnegative integer solutions of this decomposition of n
r =
n + k − 1
n
.
Partition
Z = ∪r
i=1Zi
[Number of partition sets of order O(nk−1
)]
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 22
Posterior decomposition
π θ, p|x =
r
i=1 z∈Zi
ω (z) π θ, p|x, z
with ω (z) posterior probability of allocation z.
Corresponding representation of posterior expectation of θ, p
r
i=1 z∈Zi
ω (z) Eπ
θ, p|x, z
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 23
Very sensible from an inferential point of view:
1. consider each possible allocation z of the dataset,
2. allocates a posterior probability ω (z) to this allocation, and
3. constructs a posterior distribution for the parameters conditional on this
allocation.
All possible allocations: complexity O(kn
)
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 24
Posterior
For a given permutation/allocation (kt), conditional posterior distribution
π(θ|(kt)) = N ξ1(kt),
σ2
1
n1 +
× IG((ν1 + )/2, s1(kt)/2)
×N ξ2(kt),
σ2
2
n2 + n −
× IG((ν2 + n − )/2, s2(kt)/2)
×Be(α + , β + n − )
Intro/Inference:Com’ics/Algorithms/Beyond fixed k 25
where
¯x1(kt) = 1
t=1 xkt , ˆs1(kt) = t=1(xkt − ¯x1(kt))2
,
¯x2(kt) = 1
n−
n
t= +1 xkt , ˆs2(kt) =
n
t= +1(xkt − ¯x2(kt))2
and
ξ1(kt) =
n1ξ1 + ¯x1(kt)
n1 +
, ξ2(kt) =
n2ξ2 + (n − )¯x2(kt)
n2 + n −
,
s1(kt) = s2
1 + ˆs2
1(kt) +
n1
n1 +
(ξ1 − ¯x1(kt))2
,
s2(kt) = s2
2 + ˆs2
2(kt) +
n2(n − )
n2 + n −
(ξ2 − ¯x2(kt))2
,
posterior updates of the hyperparameters
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 26
Scarcity
Frustrating barrier:
Almost all posterior probabilities ω (z) are zero
Example 2. Galaxy dataset with k = 4 components, Set of allocations with the
partition sizes (n1, n2, n3, n4) = (7, 34, 38, 3) with probability 0.59 and
(n1, n2, n3, n4) = (7, 30, 27, 18) with probability 0.32, and no other size group
getting a probability above 0.01.
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 27
Example 3. Normal mean mixture
For a same normal prior,
µ1, µ2 ∼ N(0, 10)
posterior weight associated with a z such that
n
i=1
Izi=1 = l
is
ω (z) ∝ (l + 1/4)(n − l + 1/4) pl
(1 − p)n−l
,
Thus posterior distribution of z only depends on l and repartition of the partition size
follows a distribution close to a binomial B(n, p) distribution.
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 28
For two different normal priors on the means,
µ1 ∼ N(0, 4) , µ2 ∼ N(2, 4) ,
posterior weight of z is
ω (z) ∝ (l + 1/4)(n − l + 1/4) pl
(1 − p)n−l
×
exp −[(l + 1/4)ˆs1 (z) + l{¯x1 (z)}2
/4]/2 ×
exp −[(n − l + 1/4)ˆs2 (z) + (n − l){¯x2 (z) − 2}2
/4]/2
where
¯x1 (z) =
1
l
n
i=1
Izi=1xi, ¯x2 (z) =
1
n − l
n
i=1
Izi=2xi
ˆs1 (z) =
n
i=1
Izi=1 (xi − ¯x1 (z))
2
, ˆs2 (z) =
n
i=1
Izi=2 (xi − ¯x2 (z))
2
.
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 29
Computation of exact weight of all partition sizes l impossible
Monte Carlo experiment by drawing z’s at random.
Example 4. Sample of 45 points simulated when p = 0.7, µ1 = 0 and µ2 = 2.5
leads to
l = 23 as the most likely partition, with a weight approximated by 0.962
For l = 27, weight approximated by 4.56 10−11
.
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 30
l=23
log(ω(kt))
−750 −700 −650 −600 −550
0.0000.0050.0100.0150.020
l=29
log(ω(kt))
−750 −700 −650 −600 −550
0.0000.0050.0100.0150.0200.025
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 31
Ten highest log-weights ω (z) (up to an additive constant)
0 10 20 30 40
−700−650−600−550
l
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 32
Most likely allocation z for a simulated dataset of 45 observations
−2 −1 0 1 2 3 4
0.00.10.20.30.40.5
Intro/Inference:Scarcity/Algorithms/Beyond fixed k 33
Caution! We simulated 450, 000 permutations, to be compared with a
total of 245
permutations!
Intro/Inference: Priors/Algorithms/Beyond fixed k 34
Prior selection
Basic difficulty: if exchangeable prior used on
θ = (θ1, . . . , θk)
all marginals on the θi’s are identical
Posterior expectation of θ1 identical to posterior expectation of θ2!
Intro/Inference: Priors/Algorithms/Beyond fixed k 35
Identifiability constraints
Prior restriction by identifiability constraint on the mixture parameters, for instance
by ordering the means [or the variances or the weights]
Not so innocuous!
• truncation unrelated to the topology of the posterior distribution
• may induce a posterior expectation in a low probability region
• modifies the prior modelling
θ(1)
−4 −3 −2 −1 0
0.00.20.40.60.8
θ(10)
−1.0 −0.5 0.0 0.5 1.0
0.00.20.40.60.81.01.21.4
θ(19)
1 2 3 4
0.00.20.40.60.8
Intro/Inference: Priors/Algorithms/Beyond fixed k 36
• with many components, ordering in terms of one type of parameter is unrealistic
• poor estimation (posterior mean)
-2 -1 0 1 2 3
0.00.10.20.30.40.5
Gibbs sampling
p
-2 -1 0 1 2 3
0.00.10.20.30.40.5
theta
-2 -1 0 1 2 3
0.00.10.20.30.40.5
tau
-2 -1 0 1 2 3
0.00.10.20.30.40.5
Random walk
p
-2 -1 0 1 2 3
0.00.10.20.30.40.5
theta
-2 -1 0 1 2 3
0.00.10.20.30.40.5
tau
-2 -1 0 1 2 3
0.00.10.20.30.40.5
Langevin
p
-2 -1 0 1 2 3
0.00.10.20.30.40.5
theta
-2 -1 0 1 2 3
0.00.10.20.30.40.5
tau
-2 -1 0 1 2 3
0.00.10.20.30.40.5
Tempered random walk
p
-2 -1 0 1 2 3
0.00.10.20.30.40.5
theta
-2 -1 0 1 2 3
0.00.10.20.30.40.5
tau
• poor exploration (MCMC)
Intro/Inference: Priors/Algorithms/Beyond fixed k 37
Improper priors??
Independent improper priors,
π (θ) =
k
i=1
πi(θi) ,
cannot be used since, if
πi(θi)dθi = ∞
then for every n,
π(θ, p|x)dθdp = ∞
Still, some improper priors can be used when the impropriety is on a common
(location/scale) parameter
[CPR & Titterington, 1998]
Intro/Inference: Loss/Algorithms/Beyond fixed k 38
Loss functions
Once a sample can be produced from the unconstrained posterior distribution, an
ordering constraint can be imposed ex post
[Stephens, 1997]
Good for MCMC exploration
Intro/Inference: Loss/Algorithms/Beyond fixed k 39
Again, difficult assesment of the true effect of the ordering constraints...
order p1 p2 p3 θ1 θ2 θ3 σ1 σ2 σ3
p 0.231 0.311 0.458 0.321 -0.55 2.28 0.41 0.471 0.303
θ 0.297 0.246 0.457 -1.1 0.83 2.33 0.357 0.543 0.284
σ 0.375 0.331 0.294 1.59 0.083 0.379 0.266 0.34 0.579
true 0.22 0.43 0.35 1.1 2.4 -0.95 0.3 0.2 0.5
−4 −2 0 2 4
0.00.10.20.30.40.50.6
x
y
Intro/Inference: Loss/Algorithms/Beyond fixed k 40
Pivotal quantity
For a permutation τ ∈ Sk, corresponding permutation of the parameter
τ(θ, p) = (θτ(1), . . . , θτ(k)), (pτ(1), . . . , pτ(k))
does not modify the value of the likelihood (& posterior under exchangeability).
Label switching phenomenon
Intro/Inference: Loss/Algorithms/Beyond fixed k 41
Reordering scheme:
Based on a simulated sample of size M,
(i) compute the pivot (θ, p)(i∗
)
such that
i∗
= arg max
i=1,...,M
π((θ, p)(i)
|x)
Monte Carlo approximation of the MAP estimator of (θ, p).
(ii) For i ∈ {1, . . . , M}:
1. Compute
τi = arg min
τ∈Sk
d τ((θ, p)(i)
), (θ, p)(i∗
)
2. Set (θ, p)(i)
= τi((θ, p)(i)
).
Intro/Inference: Loss/Algorithms/Beyond fixed k 42
Step (ii) chooses the reordering the closest to the MAP estimator
After reordering, the Monte Carlo posterior expectation is
M
j=1
(θi)(j)
M .
Intro/Inference: Loss/Algorithms/Beyond fixed k 43
Probabilistic alternative
[Jasra, Holmes & Stephens, 2004]
Also put a prior on permutations σ ∈ Sk
Defines a specific model M based on a preliminary estimate (e.g., by relabelling)
Computes
θj =
1
N
n
t=1 σ∈Sk
θ
(t)
σ(j)p(σ|θ(t)
, M)
Intro/Inference/Algorithms/Beyond fixed k 44
3 Computations
Intro/Inference/Algorithms: Gibbs/Beyond fixed k 45
3.1 Gibbs sampling
Same idea as the EM algorithm: take advantage of the missing data representation
General Gibbs sampling for mixture models
0. Initialization: choose p(0)
and θ(0)
arbitrarily
1. Step t. For t = 1, . . .
1.1 Generate z
(t)
i (i = 1, . . . , n) from (j = 1, . . . , k)
P z
(t)
i = j|p
(t−1)
j , θ
(t−1)
j , xi ∝ p
(t−1)
j f xi|θ
(t−1)
j
1.2 Generate p(t)
from π(p|z(t)
),
1.3 Generate θ(t)
from π(θ|z(t)
, x).
Intro/Inference/Algorithms: Gibbs/Beyond fixed k 46
Trapping states
Gibbs sampling may lead to trapping states, concentrated local modes that require
an enormous number of iterations to escape from, e.g., components with a small
number of allocated observations and very small variance
[Diebolt & CPR, 1990]
Also, most MCMC samplers fail to reproduce the permutation invariance of the
posterior distribution, that is, do not visit the k! replications of a given mode.
[Celeux, Hurn & CPR, 2000]
Intro/Inference/Algorithms: Gibbs/Beyond fixed k 47
Example 5. Mean normal mixture
0. Initialization. Choose µ
(0)
1 and µ
(0)
2 ,
1. Step t. For t = 1, . . .
1.1 Generate z
(t)
i (i = 1, . . . , n) from
P z
(t)
i = 1 = 1−P z
(t)
i = 2 ∝ p exp −
1
2
xi − µ
(t−1)
1
2
1.2 Compute n
(t)
j =
n
i=1
Iz
(t)
i =j
and (sx
j )(t)
=
n
i=1
Iz
(t)
i =j
xi
1.3 Generate µ
(t)
j (j = 1, 2) from N
λδ + (sx
j )(t)
λ + n
(t)
j
,
1
λ + n
(t)
j
.
Intro/Inference/Algorithms: Gibbs/Beyond fixed k 48
−1 0 1 2 3 4
−101234
µ1
µ2
Intro/Inference/Algorithms: Gibbs/Beyond fixed k 49
But...
−1 0 1 2 3 4
−101234
µ1
µ2
Intro/Inference/Algorithms: HM/Beyond fixed k 50
3.2 Metropolis–Hastings
Missing data structure is not necessary for MCMC implementation: the mixture
likelihood is available in closed form and computable in O(kn) time:
Intro/Inference/Algorithms: HM/Beyond fixed k 51
Step t. For t = 1, . . .
1.1 Generate (θ, p) from q θ, p|θ(t−1)
, p(t−1)
,
1.2 Compute
r =
f(x|θ, p)π(θ, p)q(θ(t−1)
, p(t−1)
|θ, p)
f(x|θ(t−1)
, p(t−1))π(θ(t−1)
, p(t−1))q(θ, p|θ(t−1)
, p(t−1))
,
1.3 Generate u ∼ U[0,1]
If r < u then (θ(t)
, p(t)
) = (θ, p)
else (θ(t)
, p(t)
) = (θ(t−1)
, p(t−1)
).
Intro/Inference/Algorithms: HM/Beyond fixed k 52
Proposal
Use of random walk inefficient for constrained parameters like the weights and the
variances.
Reparameterisation:
For the weights p, overparameterise the model as
pj = wj
k
l=1
wl , wj > 0
[Capp´e, Ryd´en & CPR]
The wj’s are not identifiable, but this is not a problem.
Proposed move on the wj’s is
log(wj) = log(w
(t−1)
j ) + uj, uj ∼ N (0, ζ2
)
Intro/Inference/Algorithms: HM/Beyond fixed k 53
Example 6. Mean normal mixture
Gaussian random walk proposal
µ1 ∼ N µ
(t−1)
1 , ζ2
and µ2 ∼ N µ
(t−1)
2 , ζ2
associated with
Intro/Inference/Algorithms: HM/Beyond fixed k 54
0. Initialization. Choose µ
(0)
1 and µ
(0)
2
1. Step t. For t = 1, . . .
1.1 Generate µj (j = 1, 2) from N µ
(t−1)
j , ζ2
,
1.2 Compute
r =
f (x|µ1, µ2, ) π (µ1, µ2)
f x|µ
(t−1)
1 , µ
(t−1)
2 π µ
(t−1)
1 , µ
(t−1)
2
,
1.3 Generate u ∼ U[0,1]
If r < u then µ
(t)
1 , µ
(t)
2 = (µ1, µ2)
else µ
(t)
1 , µ
(t)
2 = µ
(t−1)
1 , µ
(t−1)
2 .
Intro/Inference/Algorithms: HM/Beyond fixed k 55
−1 0 1 2 3 4
−101234
µ1
µ2
Intro/Inference/Algorithms: PMC/Beyond fixed k 56
3.3 Population Monte Carlo
Idea Apply dynamic importance sampling to simulate a sequence of iid samples
x(t)
= (x
(t)
1 , . . . , x(t)
n )
iid
≈ π(x)
where t is a simulation iteration index (at sample level)
Intro/Inference/Algorithms: PMC/Beyond fixed k 57
Dependent importance sampling
The importance distribution of the sample x(t)
qt(x(t)
|x(t−1)
)
can depend on the previous sample x(t−1)
in any possible way as long as marginal
distributions
qit(x) = qt(x(t)
) dx
(t)
−i
can be expressed to build importance weights
it =
π(x
(t)
i )
qit(x
(t)
i )
Intro/Inference/Algorithms: PMC/Beyond fixed k 58
Special case
qt(x(t)
|x(t−1)
) =
n
i=1
qit(x
(t)
i |x(t−1)
)
[Independent proposals]
In that case,
var ˆIt =
1
n2
n
i=1
var
(t)
i h(x
(t)
i ) .
Intro/Inference/Algorithms: PMC/Beyond fixed k 59
Population Monte Carlo (PMC)
Use previous sample (x(t)
) marginaly distributed from π
E ith(X
(t)
i ) = E
π(x
(t)
i )
qit(x
(t)
i )
h(x
(t)
i )qit(x
(t)
i ) dx
(t)
i = E [Eπ
[h(X)]]
to improve on approximation of π
Intro/Inference/Algorithms: PMC/Beyond fixed k 60
Resampling
Over iterations (in t), weights may degenerate:
e.g.,
1 1
while 2, . . . , n negligible
Use instead Rubin’s (1987) systematic resampling: at each iteration resample the
x
(t)
i ’s according to their weight
(t)
i and reset the weights to 1 (preserves
“unbiasedness”/increases variance)
Intro/Inference/Algorithms: PMC/Beyond fixed k 61
PMC for mixtures
Proposal distributions qit that simulate (θ
(i)
(t), p
(i)
(t)) and associated importance
weight
ρ
(i)
(t) =
f x|θ
(i)
(t), p
(i)
(t) π θ
(i)
(t), p
(i)
(t)
qit θ
(i)
(t), p
(i)
(t)
, i = 1, . . . , M
Approximations of the form
1
M
M
i=1
ρ
(i)
(t)
M
l=1 ρ
(l)
(t)
h θ
(i)
(t), p
(i)
(t)
give (almost) unbiased estimators of Eπ
x[h(θ, p)],
Intro/Inference/Algorithms: PMC/Beyond fixed k 62
0. Initialization. Choose θ
(1)
(0), . . . , θ
(M)
(0) and p
(1)
(0), . . . , p
(M)
(0)
1. Step t. For t = 1, . . . , T
1.1 For i = 1, . . . , M
1.1.1 Generate θ
(i)
(t), p
(i)
(t) from qit (θ, p),
1.1.2 Compute
ρ(i)
= f x|θ
(i)
(t), p
(i)
(t) π θ
(i)
(t), p
(i)
(t) qit θ
(i)
(t), p
(i)
(t) ,
1.2 Compute ω(i)
= ρ(i)
M
l=1
ρ(l)
,
1.3 Resample M values with replacement from the θ
(i)
(t), p
(i)
(t) ’s using
the weights ω(i)
Intro/Inference/Algorithms: PMC/Beyond fixed k 63
Example 7. Mean normal mixture
Implementation without the Gibbs augmentation step, using normal random walk
proposals based on the previous sample of (µ1, µ2)’s as in Metropolis–Hastings.
Selection of a “proper” scale:
bypassed by the adaptivity of the PMC algorithm
Several proposals associated with a range of variances vk, k = 1, . . . , K.
At each step, new variances can be selected proportionally to the performances of
the scales vk on the previous iterations, for instance, proportional to its
non-degeneracy rate
Intro/Inference/Algorithms: PMC/Beyond fixed k 64
Step t. For t = 1, . . . , T
1.1 For i = 1, . . . , M
1.1.1 Generate k from M (1; r1, . . . , rK),
1.1.2 Generate (µj)
(i)
(t) (j = 1, 2) from N (µj)
(i)
(t−1) , vk
1.1.4 Compute
ρ(i)
=
f x|(µ1)
(i)
(t), (µ2)
(i)
(t) π (µ1)
(i)
(t), (µ2)
(i)
(t)
K
l=1
2
j=1
ϕ (µj)
(i)
(t); (µ1)
(i)
(t−1), vl
,
1.2 Compute ω(i)
= ρ(i)
M
l=1
ρ(l)
,
1.3 Resample the (µ1)
(i)
(t), (µ2)
(i)
(t)’s using the weights ω(i)
1.4 Update the rl’s: rl is proportional to the number of (µ1)
(i)
(t), (µ2)
(i)
(t)’s with
variance vl resampled.
Intro/Inference/Algorithms: PMC/Beyond fixed k 65
−1 0 1 2 3 4
−101234
µ1
µ2
Intro/Inference/Algorithms/Beyond fixed k 66
4 Unknown number of components
When k number of components is unknown, there are several models
Mk
with corresponding parameter sets
Θk
in competition.
Intro/Inference/Algorithms/Beyond fixed k: RJ 67
Reversible jump MCMC
Reversibility constraint put on dimension-changing moves that bridge the sets Θk /
the models Mk
[Green, 1995]
Local reversibility for each pair (k1, k2) of possible values of k: supplement Θk1
and Θk2 with adequate artificial spaces in order to create a bijection between them:
Intro/Inference/Algorithms/Beyond fixed k: RJ 68
Basic steps
Choice of probabilities
πij
j
πij = 1
of jumping to model Mkj while in model Mki
θ(k1)
is completed by a simulation u1 ∼ g1(u1) into (θ(k1)
, u1) and θ(k2)
by
u2 ∼ g2(u2) into (θ(k2)
, u2)
(θ(k2)
, u2) = Tk1→k2
(θ(k1)
, u1),
Intro/Inference/Algorithms/Beyond fixed k: RJ 69
Green reversible jump algorithm
0. At iteration t, if x(t)
= (m, θ(m)
),
1. Select model Mn with probability πmn,
2. Generate umn ∼ ϕmn(u),
3. Set (θ(n)
, vnm) = Tm→n(θ(m)
, umn),
4. Take x(t+1)
= (n, θ(n)
) with probability
min
π(n, θ(n)
)
π(m, θ(m))
πnmϕnm(vnm)
πmnϕmn(umn)
∂Tm→n(θ(m)
, umn)
∂(θ(m), umn)
, 1 ,
and take x(t+1)
= x(t)
otherwise.
Intro/Inference/Algorithms/Beyond fixed k: RJ 70
Example 8. For a normal mixture
Mk :
k
j=1
pjkN(µjk, σ2
jk) ,
restriction to moves from Mk to neighbouring models Mk+1 and Mk−1.
[Richardson & Green, 1997]
Intro/Inference/Algorithms/Beyond fixed k: RJ 71
Birth and death steps
birth adds a new normal component generated from the prior
death removes one of the k components at random.
Birth acceptance probability
min
π(k+1)k
πk(k+1)
(k + 1)!
k!
πk+1(θk+1)
πk(θk) (k + 1)ϕk(k+1)(uk(k+1))
, 1
= min
π(k+1)k
πk(k+1)
(k + 1)
(k)
k+1(θk+1) (1 − pk+1)k−1
k(θk)
, 1 ,
where (k) is the prior probability of model Mk
Intro/Inference/Algorithms/Beyond fixed k: RJ 72
Proposal that can work well in some settings, but can also be inefficient (i.e. high
rejection rate), if the prior is vague.
Alternative: devise more local jumps between models,
(i). split



pjk = pj(k+1) + p(j+1)(k+1)
pjkµjk = pj(k+1)µj(k+1) + p(j+1)(k+1)µ(j+1)(k+1)
pjkσ2
jk = pj(k+1)σ2
j(k+1) + p(j+1)(k+1)σ2
(j+1)(k+1)
(ii). merge (reverse)
Intro/Inference/Algorithms/Beyond fixed k: RJ 73
Histogram and rawplot of 100, 000 k’s produced by RJMCMC
Histogram of k
k
1 2 3 4 5
0.00.10.20.30.4
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
12345
Rawplot of k
k
Intro/Inference/Algorithms/Beyond fixed k: RJ 74
Normalised enzyme dataset
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.51.01.52.02.53.0
Intro/Inference/Algorithms/Beyond fixed k: b&d 75
Birth and Death processes
Use of an alternative methodology based on a Birth–&-Death (point) process
Idea: Create a Markov chain in continuous time, i.e. a Markov jump process,
moving between models Mk, by births (to increase the dimension), deaths (to
decrease the dimension), and other moves
[Preston, 1976; Ripley, 1977; Stephens, 1999]
Intro/Inference/Algorithms/Beyond fixed k: b&d 76
Time till next modification (jump) is exponentially distributed with rate depending on
current state
Remember: if ξ1, . . . , ξv are exponentially distributed, ξi ∼ Exp(λi),
min ξi ∼ Exp
i
λi
Difference with MH-MCMC: Whenever a jump occurs, the corresponding move is
always accepted. Acceptance probabilities replaced with holding times.
Intro/Inference/Algorithms/Beyond fixed k: b&d 77
Balance condition
Sufficient to have detailed balance
L(θ)π(θ)q(θ, θ ) = L(θ )π(θ )q(θ , θ) for all θ, θ
for ˜π(θ) ∝ L(θ)π(θ) to be stationary.
Here q(θ, θ ) rate of moving from state θ to θ .
Possibility to add split/merge and fixed-k processes if balance condition satisfied.
[Capp´e, Ryd´en & CPR, 2002]
Intro/Inference/Algorithms/Beyond fixed k: b&d 78
Case of mixtures
Representation as a (marked) point process
Φ = {pj, (µj, σj)}
j
Birth rate λ0 (constant) and proposal from the prior
Death rate δj(Φ) for removal of component j
Overall death rate
k
j=1
δj(Φ) = δ(Φ)
Balance condition
(k + 1) d(Φ ∪ {p, (µ, σ)}) L(Φ ∪ {p, (µ, σ)}) = λ0L(Φ)
π(k)
π(k + 1)
with
d(Φ  {pj, (µj, σj)}) = δj(Φ)
Intro/Inference/Algorithms/Beyond fixed k: b&d 79
Stephen’s original algorithm:
For v = 0, 1, · · · , V
t ← v
Run till t > v + 1
1. Compute δj(Φ) =
L(Φ|Φj)
L(Φ)
λ0λ1
2. δ(Φ) ←
k
j=1
δj(Φj), ξ ← λ0 + δ(Φ), u ∼ U([0, 1])
3. t ← t − u log(u)
Intro/Inference/Algorithms/Beyond fixed k: b&d 80
4. With probability δ(Φ)/ξ
Remove component j with probability δj(Φ)/δ(Φ)
k ← k − 1
p ← p /(1 − pj) ( = j)
Otherwise,
Add component j from the prior π(µj, σj)
pj ∼ Be(γ, kγ)
p ← p (1 − pj) ( = j)
k ← k + 1
5. Run I MCMC(k, β, p)
Intro/Inference/Algorithms/Beyond fixed k: b&d 81
Rescaling time
In discrete-time RJMCMC, let the time unit be 1/N, put
βk = λk/N and δk = 1 − λk/N
As N → ∞, each birth proposal will be accepted, and having k components births
occur according to a Poisson process with rate λk
Intro/Inference/Algorithms/Beyond fixed k: b&d 82
while component (w, φ) dies with rate
lim
N→∞
Nδk+1 ×
1
k + 1
× min(A−1
, 1)
= lim
N→∞
N
1
k + 1
× likelihood ratio
−1
×
βk
δk+1
×
b(w, φ)
(1 − w)k−1
= likelihood ratio
−1
×
λk
k + 1
×
b(w, φ)
(1 − w)k−1
.
Hence
“RJMCMC→BDMCMC”
Intro/Inference/Algorithms/Beyond fixed k: b&d 83
Even closer to RJMCM
Exponential (random) sampling is not necessary, nor is continuous time!
Estimator of
I = g(θ)π(θ)dθ
by
ˆI =
1
N
N
1
g(θ(τi))
where {θ(t)} continuous time MCMC process and τ1, . . . , τN sampling instants.
Intro/Inference/Algorithms/Beyond fixed k: b&d 84
New notations:
1. Tn time of the n-th jump of {θ(t)} with T0 = 0
2. {θn} jump chain of states visited by {θ(t)}
3. λ(θ) total rate of {θ(t)} leaving state θ
Then holding time Tn − Tn−1 of {θ(t)} in its n-th state θn exponential rv with rate
λ(θn)
Intro/Inference/Algorithms/Beyond fixed k: b&d 85
Rao–Blackwellisation
If sampling interval goes to 0, limiting case
ˆI∞ =
1
TN
N
n=1
g(θn−1)(Tn − Tn−1)
Rao–Blackwellisation argument: replace ˆI∞ with
˜I =
1
TN
N
n=1
g(θn−1)
λ(θn−1)
=
1
TN
N
n=1
E[Tn − Tn−1 | θn−1] g(θn−1) .
Conclusion: Only simulate jumps and store average holding times!
Completely remove continuous time feature
Intro/Inference/Algorithms/Beyond fixed k: b&d 86
Example 9. Galaxy dataset
Comparison of RJMCMC and CTMCMC in the Galaxy dataset
[Capp´e & al., 2002]
Experiment:
• Same proposals (same C code)
• Moves proposed in equal proportions by both samplers (setting the probability
PF
of proposing a fixed k move in RJMCMC equal to the rate ηF
at which
fixed k moves are proposed in CTMCMC, and likewise PB
= ηB
for the birth
moves)
• Rao–Blackwellisation
• Number of jumps (number of visited configurations) in CTMCMC == number of
iterations of RJMCMC
Intro/Inference/Algorithms/Beyond fixed k: b&d 87
Results:
• If one algorithm performs poorly, so does the other. (For RJMCMC
manifested as small A’s—birth proposals are rarely accepted—while for
BDMCMC manifested as large δ’s—new components are indeed born but die
again quickly.)
• No significant difference between samplers for birth and death only
• CTMCMC slightly better than RJMCMC with split-and-combine moves
• Marginal advantage in accuracy for split-and-combine addition
• For split-and-combine moves, computation time associated with one step of
continuous time simulation is about 5 times longer than for reversible jump
simulation.
Intro/Inference/Algorithms/Beyond fixed k: b&d 88
Box plot for the estimated posterior on k obtained from 200 independent runs:
RJMCMC (top) and BDMCMC (bottom). The number of iterations varies from 5 000
(left), to 50 000 (middle) and 500 000 (right).
2 4 6 8 10 12 14
0
0.1
0.2
0.3
CT (500 000 it.)
k
2 4 6 8 10 12 14
0
0.1
0.2
0.3
RJ (500 000 it.)
2 4 6 8 10 12 14
0
0.1
0.2
0.3
CT (50 000 it.)
k
2 4 6 8 10 12 14
0
0.1
0.2
0.3
RJ (50 000 it.)
2 4 6 8 10 12 14
0
0.1
0.2
0.3
CT (5 000 it.)
posteriorprobability
k
2 4 6 8 10 12 14
0
0.1
0.2
0.3
RJ (5 000 it.)
posteriorprobability
Intro/Inference/Algorithms/Beyond fixed k: b&d 89
Same for the estimated posterior on k obtained from 500 independent runs: Top
RJMCMC and bottom, CTMCMC. The number of iterations varies from 5 000 (left
plots) to 50 000 (right plots).
2 4 6 8 10 12 14
0
0.1
0.2
0.3
CT (50 000 it.)
k
2 4 6 8 10 12 14
0
0.1
0.2
0.3
RJ (50 000 it.)
2 4 6 8 10 12 14
0
0.1
0.2
0.3
CT (5 000 it.)
posteriorprobability
k
2 4 6 8 10 12 14
0
0.1
0.2
0.3
RJ (5 000 it.)
posteriorprobability

More Related Content

What's hot

Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 

What's hot (20)

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 

Viewers also liked

Provisional Painting
Provisional PaintingProvisional Painting
Provisional Paintingnateabels
 
Week 15 1970s-2000s art
Week 15 1970s-2000s artWeek 15 1970s-2000s art
Week 15 1970s-2000s artnateabels
 
Halloween + Art
Halloween + ArtHalloween + Art
Halloween + Artnateabels
 
Week6 mediterranean greek_part2
Week6 mediterranean greek_part2Week6 mediterranean greek_part2
Week6 mediterranean greek_part2nateabels
 
Week2 Visual Elements Part2
Week2 Visual Elements Part2Week2 Visual Elements Part2
Week2 Visual Elements Part2nateabels
 
Week6 roman middleages_part2
Week6 roman middleages_part2Week6 roman middleages_part2
Week6 roman middleages_part2nateabels
 
Week12 art between_the_wars
Week12 art between_the_warsWeek12 art between_the_wars
Week12 art between_the_warsnateabels
 
Drawing ii final options - animal allegory2
Drawing ii   final options - animal allegory2Drawing ii   final options - animal allegory2
Drawing ii final options - animal allegory2nateabels
 
Week2 Visual Elements Part1
Week2 Visual Elements Part1Week2 Visual Elements Part1
Week2 Visual Elements Part1nateabels
 
Week8 - Renaissance Part 1
Week8 - Renaissance Part 1Week8 - Renaissance Part 1
Week8 - Renaissance Part 1nateabels
 
Week3 Principles Of Design
Week3 Principles Of DesignWeek3 Principles Of Design
Week3 Principles Of Designnateabels
 
Week1 Art or Not
Week1 Art or NotWeek1 Art or Not
Week1 Art or Notnateabels
 

Viewers also liked (14)

Provisional Painting
Provisional PaintingProvisional Painting
Provisional Painting
 
Week 15 1970s-2000s art
Week 15 1970s-2000s artWeek 15 1970s-2000s art
Week 15 1970s-2000s art
 
Halloween + Art
Halloween + ArtHalloween + Art
Halloween + Art
 
Week6 mediterranean greek_part2
Week6 mediterranean greek_part2Week6 mediterranean greek_part2
Week6 mediterranean greek_part2
 
Week2 Visual Elements Part2
Week2 Visual Elements Part2Week2 Visual Elements Part2
Week2 Visual Elements Part2
 
TELEVIZIJA
TELEVIZIJATELEVIZIJA
TELEVIZIJA
 
Week6 roman middleages_part2
Week6 roman middleages_part2Week6 roman middleages_part2
Week6 roman middleages_part2
 
2d 3d Media
2d 3d Media2d 3d Media
2d 3d Media
 
Week12 art between_the_wars
Week12 art between_the_warsWeek12 art between_the_wars
Week12 art between_the_wars
 
Drawing ii final options - animal allegory2
Drawing ii   final options - animal allegory2Drawing ii   final options - animal allegory2
Drawing ii final options - animal allegory2
 
Week2 Visual Elements Part1
Week2 Visual Elements Part1Week2 Visual Elements Part1
Week2 Visual Elements Part1
 
Week8 - Renaissance Part 1
Week8 - Renaissance Part 1Week8 - Renaissance Part 1
Week8 - Renaissance Part 1
 
Week3 Principles Of Design
Week3 Principles Of DesignWeek3 Principles Of Design
Week3 Principles Of Design
 
Week1 Art or Not
Week1 Art or NotWeek1 Art or Not
Week1 Art or Not
 

Similar to Bayesian Nonparametrics Workshop

Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theoremWathna
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Jagadeeswaran Rathinavel
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017Fred J. Hickernell
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Leo Asselborn
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Alexander Litvinenko
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 

Similar to Bayesian Nonparametrics Workshop (20)

Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimation
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 

More from Christian Robert (17)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 

Recently uploaded

DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 

Recently uploaded (20)

DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 

Bayesian Nonparametrics Workshop

  • 1. IV Workshop Bayesian Nonparametrics, Roma, 12 Giugno 2004 1 Bayesian Inference on Mixtures Christian P. Robert Universit´e Paris Dauphine Joint work with JEAN-MICHEL MARIN, KERRIE MENGERSEN AND JUDITH ROUSSEAU
  • 2. IV Workshop Bayesian Nonparametrics, Roma, 12 Giugno 2004 2 What’s new?! • Density approximation & consistency • Scarsity phenomenon • Label switching & Bayesian inference • Nonconvergence of the Gibbs sampler & population Monte Carlo • Comparison of RJMCM with B& D
  • 3. Intro/Inference/Algorithms/Beyond fixed k 3 1 Mixtures Convex combination of “usual” densities (e.g., exponential family) k i=1 pif(x|θi) , k i=1 pi = 1 k > 1 ,
  • 4. Intro/Inference/Algorithms/Beyond fixed k 4 −1 0 1 2 3 0.10.20.30.4 0 1 2 3 4 5 0.00.10.20.30.4 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.00.20.40.60.81.0 0 2 4 6 8 10 0.000.100.200.30 0 2 4 6 0.000.050.100.150.200.25 −2 0 2 4 6 8 10 0.000.100.200.30 0 5 10 15 0.000.050.100.15 0 5 10 15 0.000.050.100.15 0 5 10 15 0.000.050.100.15 0.000.050.100.15 0.00.20.40.60.81.0 0.00.10.20.3 Normal mixture densities for K = 2, 5, 25, 50
  • 5. Intro/Inference/Algorithms/Beyond fixed k 5 Likelihood L(θ, p|x) = n i=1 k j=1 pjf (xi|θj) c Computable in O(nk) time
  • 6. Intro:Misg/Inference/Algorithms/Beyond fixed k 6 Missing data representation Demarginalisation k i=1 pif(x|θi) = f(x|θ, z) f(z|p) dz where X|Z = z ∼ f(x|θz), Z ∼ Mk(1; p1, ..., pk) Missing “data” z1, . . . , zn that may be or may not be meaningful [Auxiliary variables]
  • 7. Intro:Misg/Inference/Algorithms/Beyond fixed k 7 Nonparametric re-interpretation Approximation of unknown distributions E.g., Nadaraya–Watson kernel ˆkn(x|x) = 1 nhn n i=1 ϕ (x; xi, hn)
  • 8. Intro:Misg/Inference/Algorithms/Beyond fixed k 8 Bernstein polynomials Bounded continuous densities on [0, 1] approximated by Beta mixtures (αk,βk)∈N2 + pk Be(αk, βk) αk, βk ∈ N∗ [Consistency] Associated predictive is then ˆfn(x|x) = ∞ k=1 k j=1 Eπ [ωkj|x] Be(j, k + 1 − j) P(K = k|x) . [Petrone and Wasserman, 2002]
  • 9. Intro:Misg/Inference/Algorithms/Beyond fixed k 9 0.0 0.2 0.4 0.6 0.8 1.0 02468 11,0.1,0.9 0.0 0.2 0.4 0.6 0.8 1.0 2468 31,0.6,0.3 0.0 0.2 0.4 0.6 0.8 1.0 0.91.01.11.21.3 5,0.8,0.9 0.0 0.2 0.4 0.6 0.8 1.0 01234 54,0.8,2.6 0.0 0.2 0.4 0.6 0.8 1.0 0.20.40.60.81.01.2 22,1.2,1.6 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.5 45,2.9,1.8 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.5 7,4.9,3.3 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.53.0 67,5.1,9.3 0.0 0.2 0.4 0.6 0.8 1.001234 91,19.1,17.5 Realisations from the Bernstein prior
  • 10. Intro:Misg/Inference/Algorithms/Beyond fixed k 10 0.0 0.2 0.4 0.6 0.8 1.0 0510152025 0.0 0.2 0.4 0.6 0.8 1.0 0.40.60.81.01.21.41.6 0.0 0.2 0.4 0.6 0.8 1.0 0.60.81.01.21.41.6 0.0 0.2 0.4 0.6 0.8 1.0 0.40.60.81.01.21.41.6 0.0 0.2 0.4 0.6 0.8 1.0 024681012 0.0 0.2 0.4 0.6 0.8 1.0 1.01.52.02.53.03.50.0 0.2 0.4 0.6 0.8 1.0 0.81.01.21.41.61.8 0.0 0.2 0.4 0.6 0.8 1.0 0.51.01.52.02.5 0.0 0.2 0.4 0.6 0.8 1.00.51.01.52.02.53.03.5 Realisations from a more general prior
  • 11. Intro:Constancy/Inference/Algorithms/Beyond fixed k 11 Density estimation [CPR & Rousseau, 2000–04] Reparameterisation of a Beta mixture p0U(0, 1) + (1 − p0) K k=1 pkB(αkεk, αk(1 − εk)) k≥1 pk = 1 , with density fψ Can approximate most distributions g on [0, 1] Assumptions – g is piecewise continuous on {x ; g(x) < M} for all M’s – g(x) log g(x) d x < ∞
  • 12. Intro:Constancy/Inference/Algorithms/Beyond fixed k 12 Prior distributions – π(K) has a light tail P(K ≥ tn/ log n) ≤ exp −rn – p0 ∼ Be(a0, b0), a0 < 1, b0 > 1 – pk ∝ ωk and ωk ∼ Be(1, k) – location-scale “hole” prior (αk, εk) ∼ {1 − exp [− {β1(αk − 2)c3 + β2(εk − .5)c4 }]} exp −τ0αc0 k /2 − τ1/{α2c1 k εc1 k (1 − εk)c1 } ,
  • 13. Intro:Constancy/Inference/Algorithms/Beyond fixed k 13 Consistency results Hellinger neighbourhood A (f0) = {f, d(f, f0) ≤ } Then, for all > 0, π[A (g)|x1:n] → 1, as n → ∞, g a.s. and Eπ [d(g, fψ)|x1:n] → 0, g a.s. Extension to general parametric distributions by the cdf transform Fθ(x)
  • 14. Intro/Inference/Algorithms/Beyond fixed k 14 2 [B] Inference Difficulties: • identifiability • label switching • loss function • ordering constraints • prior determination
  • 15. Intro/Inference:Identifability/Algorithms/Beyond fixed k 15 Central (non)identifiability issue k j=1 pjf(y|θj) is invariant to relabelling of the components Consequence ((pj, θj))1≤i≤k only known up to a permutation τ ∈ Sk
  • 16. Intro/Inference:Identifability/Algorithms/Beyond fixed k 16 Example 1. Two component normal mixture p N (µ1, 1) + (1 − p) N (µ2, 1) where p = 0.5 is known The parameters µ1 and µ2 are identifiable
  • 17. Intro/Inference:Identifability/Algorithms/Beyond fixed k 17 Bimodal likelihood [500 observations and (µ1, µ2, p) = (0, 2.5, 0.7)] −1 0 1 2 3 4 −101234 µ1 µ2
  • 18. Intro/Inference:Identifability/Algorithms/Beyond fixed k 18 Influence of p on the modes −2 0 2 4 −2024 µ1 µ2 p=0.5 −2 0 2 4 −2024 µ1 µ2 p=0.6 −2 0 2 4 −2024 µ1 µ2 p=0.75 −2 0 2 4 −2024 µ1 µ2 p=0.85
  • 19. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 19 Combinatorics For a normal mixture, pϕ(x; µ1, σ1) + (1 − p)ϕ(x; µ2, σ2) under the pseudo-conjugate priors (i = 1, 2) µi|σi ∼ N (ζi, σ2 i /λi), σ−2 i ∼ G a(νi/2, s2 i /2), p ∼ Be(α, β) , the posterior is π (θ, p|x) ∝ n j=1 {pϕ(xj; µ1, σ1) + (1 − p)ϕ(xj; µ2, σ2)} π (θ, p) . Computation: complexity O(2n)
  • 20. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 20 Missing variables (2) Auxiliary variables z = (z1, . . . , zn) ∈ Z associated with observations x = (x1, . . . , xn) For (n1, . . . , nk), where n1 + . . . + nk = n, Zj = z : n i=1 Izi=1 = n1, . . . , n i=1 Izi=k = nk which consists of all allocations with the given allocation vector (n1, . . . , nk) (and j corresponding lexicographic order).
  • 21. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 21 Number of nonnegative integer solutions of this decomposition of n r = n + k − 1 n . Partition Z = ∪r i=1Zi [Number of partition sets of order O(nk−1 )]
  • 22. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 22 Posterior decomposition π θ, p|x = r i=1 z∈Zi ω (z) π θ, p|x, z with ω (z) posterior probability of allocation z. Corresponding representation of posterior expectation of θ, p r i=1 z∈Zi ω (z) Eπ θ, p|x, z
  • 23. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 23 Very sensible from an inferential point of view: 1. consider each possible allocation z of the dataset, 2. allocates a posterior probability ω (z) to this allocation, and 3. constructs a posterior distribution for the parameters conditional on this allocation. All possible allocations: complexity O(kn )
  • 24. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 24 Posterior For a given permutation/allocation (kt), conditional posterior distribution π(θ|(kt)) = N ξ1(kt), σ2 1 n1 + × IG((ν1 + )/2, s1(kt)/2) ×N ξ2(kt), σ2 2 n2 + n − × IG((ν2 + n − )/2, s2(kt)/2) ×Be(α + , β + n − )
  • 25. Intro/Inference:Com’ics/Algorithms/Beyond fixed k 25 where ¯x1(kt) = 1 t=1 xkt , ˆs1(kt) = t=1(xkt − ¯x1(kt))2 , ¯x2(kt) = 1 n− n t= +1 xkt , ˆs2(kt) = n t= +1(xkt − ¯x2(kt))2 and ξ1(kt) = n1ξ1 + ¯x1(kt) n1 + , ξ2(kt) = n2ξ2 + (n − )¯x2(kt) n2 + n − , s1(kt) = s2 1 + ˆs2 1(kt) + n1 n1 + (ξ1 − ¯x1(kt))2 , s2(kt) = s2 2 + ˆs2 2(kt) + n2(n − ) n2 + n − (ξ2 − ¯x2(kt))2 , posterior updates of the hyperparameters
  • 26. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 26 Scarcity Frustrating barrier: Almost all posterior probabilities ω (z) are zero Example 2. Galaxy dataset with k = 4 components, Set of allocations with the partition sizes (n1, n2, n3, n4) = (7, 34, 38, 3) with probability 0.59 and (n1, n2, n3, n4) = (7, 30, 27, 18) with probability 0.32, and no other size group getting a probability above 0.01.
  • 27. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 27 Example 3. Normal mean mixture For a same normal prior, µ1, µ2 ∼ N(0, 10) posterior weight associated with a z such that n i=1 Izi=1 = l is ω (z) ∝ (l + 1/4)(n − l + 1/4) pl (1 − p)n−l , Thus posterior distribution of z only depends on l and repartition of the partition size follows a distribution close to a binomial B(n, p) distribution.
  • 28. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 28 For two different normal priors on the means, µ1 ∼ N(0, 4) , µ2 ∼ N(2, 4) , posterior weight of z is ω (z) ∝ (l + 1/4)(n − l + 1/4) pl (1 − p)n−l × exp −[(l + 1/4)ˆs1 (z) + l{¯x1 (z)}2 /4]/2 × exp −[(n − l + 1/4)ˆs2 (z) + (n − l){¯x2 (z) − 2}2 /4]/2 where ¯x1 (z) = 1 l n i=1 Izi=1xi, ¯x2 (z) = 1 n − l n i=1 Izi=2xi ˆs1 (z) = n i=1 Izi=1 (xi − ¯x1 (z)) 2 , ˆs2 (z) = n i=1 Izi=2 (xi − ¯x2 (z)) 2 .
  • 29. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 29 Computation of exact weight of all partition sizes l impossible Monte Carlo experiment by drawing z’s at random. Example 4. Sample of 45 points simulated when p = 0.7, µ1 = 0 and µ2 = 2.5 leads to l = 23 as the most likely partition, with a weight approximated by 0.962 For l = 27, weight approximated by 4.56 10−11 .
  • 30. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 30 l=23 log(ω(kt)) −750 −700 −650 −600 −550 0.0000.0050.0100.0150.020 l=29 log(ω(kt)) −750 −700 −650 −600 −550 0.0000.0050.0100.0150.0200.025
  • 31. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 31 Ten highest log-weights ω (z) (up to an additive constant) 0 10 20 30 40 −700−650−600−550 l
  • 32. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 32 Most likely allocation z for a simulated dataset of 45 observations −2 −1 0 1 2 3 4 0.00.10.20.30.40.5
  • 33. Intro/Inference:Scarcity/Algorithms/Beyond fixed k 33 Caution! We simulated 450, 000 permutations, to be compared with a total of 245 permutations!
  • 34. Intro/Inference: Priors/Algorithms/Beyond fixed k 34 Prior selection Basic difficulty: if exchangeable prior used on θ = (θ1, . . . , θk) all marginals on the θi’s are identical Posterior expectation of θ1 identical to posterior expectation of θ2!
  • 35. Intro/Inference: Priors/Algorithms/Beyond fixed k 35 Identifiability constraints Prior restriction by identifiability constraint on the mixture parameters, for instance by ordering the means [or the variances or the weights] Not so innocuous! • truncation unrelated to the topology of the posterior distribution • may induce a posterior expectation in a low probability region • modifies the prior modelling θ(1) −4 −3 −2 −1 0 0.00.20.40.60.8 θ(10) −1.0 −0.5 0.0 0.5 1.0 0.00.20.40.60.81.01.21.4 θ(19) 1 2 3 4 0.00.20.40.60.8
  • 36. Intro/Inference: Priors/Algorithms/Beyond fixed k 36 • with many components, ordering in terms of one type of parameter is unrealistic • poor estimation (posterior mean) -2 -1 0 1 2 3 0.00.10.20.30.40.5 Gibbs sampling p -2 -1 0 1 2 3 0.00.10.20.30.40.5 theta -2 -1 0 1 2 3 0.00.10.20.30.40.5 tau -2 -1 0 1 2 3 0.00.10.20.30.40.5 Random walk p -2 -1 0 1 2 3 0.00.10.20.30.40.5 theta -2 -1 0 1 2 3 0.00.10.20.30.40.5 tau -2 -1 0 1 2 3 0.00.10.20.30.40.5 Langevin p -2 -1 0 1 2 3 0.00.10.20.30.40.5 theta -2 -1 0 1 2 3 0.00.10.20.30.40.5 tau -2 -1 0 1 2 3 0.00.10.20.30.40.5 Tempered random walk p -2 -1 0 1 2 3 0.00.10.20.30.40.5 theta -2 -1 0 1 2 3 0.00.10.20.30.40.5 tau • poor exploration (MCMC)
  • 37. Intro/Inference: Priors/Algorithms/Beyond fixed k 37 Improper priors?? Independent improper priors, π (θ) = k i=1 πi(θi) , cannot be used since, if πi(θi)dθi = ∞ then for every n, π(θ, p|x)dθdp = ∞ Still, some improper priors can be used when the impropriety is on a common (location/scale) parameter [CPR & Titterington, 1998]
  • 38. Intro/Inference: Loss/Algorithms/Beyond fixed k 38 Loss functions Once a sample can be produced from the unconstrained posterior distribution, an ordering constraint can be imposed ex post [Stephens, 1997] Good for MCMC exploration
  • 39. Intro/Inference: Loss/Algorithms/Beyond fixed k 39 Again, difficult assesment of the true effect of the ordering constraints... order p1 p2 p3 θ1 θ2 θ3 σ1 σ2 σ3 p 0.231 0.311 0.458 0.321 -0.55 2.28 0.41 0.471 0.303 θ 0.297 0.246 0.457 -1.1 0.83 2.33 0.357 0.543 0.284 σ 0.375 0.331 0.294 1.59 0.083 0.379 0.266 0.34 0.579 true 0.22 0.43 0.35 1.1 2.4 -0.95 0.3 0.2 0.5 −4 −2 0 2 4 0.00.10.20.30.40.50.6 x y
  • 40. Intro/Inference: Loss/Algorithms/Beyond fixed k 40 Pivotal quantity For a permutation τ ∈ Sk, corresponding permutation of the parameter τ(θ, p) = (θτ(1), . . . , θτ(k)), (pτ(1), . . . , pτ(k)) does not modify the value of the likelihood (& posterior under exchangeability). Label switching phenomenon
  • 41. Intro/Inference: Loss/Algorithms/Beyond fixed k 41 Reordering scheme: Based on a simulated sample of size M, (i) compute the pivot (θ, p)(i∗ ) such that i∗ = arg max i=1,...,M π((θ, p)(i) |x) Monte Carlo approximation of the MAP estimator of (θ, p). (ii) For i ∈ {1, . . . , M}: 1. Compute τi = arg min τ∈Sk d τ((θ, p)(i) ), (θ, p)(i∗ ) 2. Set (θ, p)(i) = τi((θ, p)(i) ).
  • 42. Intro/Inference: Loss/Algorithms/Beyond fixed k 42 Step (ii) chooses the reordering the closest to the MAP estimator After reordering, the Monte Carlo posterior expectation is M j=1 (θi)(j) M .
  • 43. Intro/Inference: Loss/Algorithms/Beyond fixed k 43 Probabilistic alternative [Jasra, Holmes & Stephens, 2004] Also put a prior on permutations σ ∈ Sk Defines a specific model M based on a preliminary estimate (e.g., by relabelling) Computes θj = 1 N n t=1 σ∈Sk θ (t) σ(j)p(σ|θ(t) , M)
  • 45. Intro/Inference/Algorithms: Gibbs/Beyond fixed k 45 3.1 Gibbs sampling Same idea as the EM algorithm: take advantage of the missing data representation General Gibbs sampling for mixture models 0. Initialization: choose p(0) and θ(0) arbitrarily 1. Step t. For t = 1, . . . 1.1 Generate z (t) i (i = 1, . . . , n) from (j = 1, . . . , k) P z (t) i = j|p (t−1) j , θ (t−1) j , xi ∝ p (t−1) j f xi|θ (t−1) j 1.2 Generate p(t) from π(p|z(t) ), 1.3 Generate θ(t) from π(θ|z(t) , x).
  • 46. Intro/Inference/Algorithms: Gibbs/Beyond fixed k 46 Trapping states Gibbs sampling may lead to trapping states, concentrated local modes that require an enormous number of iterations to escape from, e.g., components with a small number of allocated observations and very small variance [Diebolt & CPR, 1990] Also, most MCMC samplers fail to reproduce the permutation invariance of the posterior distribution, that is, do not visit the k! replications of a given mode. [Celeux, Hurn & CPR, 2000]
  • 47. Intro/Inference/Algorithms: Gibbs/Beyond fixed k 47 Example 5. Mean normal mixture 0. Initialization. Choose µ (0) 1 and µ (0) 2 , 1. Step t. For t = 1, . . . 1.1 Generate z (t) i (i = 1, . . . , n) from P z (t) i = 1 = 1−P z (t) i = 2 ∝ p exp − 1 2 xi − µ (t−1) 1 2 1.2 Compute n (t) j = n i=1 Iz (t) i =j and (sx j )(t) = n i=1 Iz (t) i =j xi 1.3 Generate µ (t) j (j = 1, 2) from N λδ + (sx j )(t) λ + n (t) j , 1 λ + n (t) j .
  • 48. Intro/Inference/Algorithms: Gibbs/Beyond fixed k 48 −1 0 1 2 3 4 −101234 µ1 µ2
  • 49. Intro/Inference/Algorithms: Gibbs/Beyond fixed k 49 But... −1 0 1 2 3 4 −101234 µ1 µ2
  • 50. Intro/Inference/Algorithms: HM/Beyond fixed k 50 3.2 Metropolis–Hastings Missing data structure is not necessary for MCMC implementation: the mixture likelihood is available in closed form and computable in O(kn) time:
  • 51. Intro/Inference/Algorithms: HM/Beyond fixed k 51 Step t. For t = 1, . . . 1.1 Generate (θ, p) from q θ, p|θ(t−1) , p(t−1) , 1.2 Compute r = f(x|θ, p)π(θ, p)q(θ(t−1) , p(t−1) |θ, p) f(x|θ(t−1) , p(t−1))π(θ(t−1) , p(t−1))q(θ, p|θ(t−1) , p(t−1)) , 1.3 Generate u ∼ U[0,1] If r < u then (θ(t) , p(t) ) = (θ, p) else (θ(t) , p(t) ) = (θ(t−1) , p(t−1) ).
  • 52. Intro/Inference/Algorithms: HM/Beyond fixed k 52 Proposal Use of random walk inefficient for constrained parameters like the weights and the variances. Reparameterisation: For the weights p, overparameterise the model as pj = wj k l=1 wl , wj > 0 [Capp´e, Ryd´en & CPR] The wj’s are not identifiable, but this is not a problem. Proposed move on the wj’s is log(wj) = log(w (t−1) j ) + uj, uj ∼ N (0, ζ2 )
  • 53. Intro/Inference/Algorithms: HM/Beyond fixed k 53 Example 6. Mean normal mixture Gaussian random walk proposal µ1 ∼ N µ (t−1) 1 , ζ2 and µ2 ∼ N µ (t−1) 2 , ζ2 associated with
  • 54. Intro/Inference/Algorithms: HM/Beyond fixed k 54 0. Initialization. Choose µ (0) 1 and µ (0) 2 1. Step t. For t = 1, . . . 1.1 Generate µj (j = 1, 2) from N µ (t−1) j , ζ2 , 1.2 Compute r = f (x|µ1, µ2, ) π (µ1, µ2) f x|µ (t−1) 1 , µ (t−1) 2 π µ (t−1) 1 , µ (t−1) 2 , 1.3 Generate u ∼ U[0,1] If r < u then µ (t) 1 , µ (t) 2 = (µ1, µ2) else µ (t) 1 , µ (t) 2 = µ (t−1) 1 , µ (t−1) 2 .
  • 55. Intro/Inference/Algorithms: HM/Beyond fixed k 55 −1 0 1 2 3 4 −101234 µ1 µ2
  • 56. Intro/Inference/Algorithms: PMC/Beyond fixed k 56 3.3 Population Monte Carlo Idea Apply dynamic importance sampling to simulate a sequence of iid samples x(t) = (x (t) 1 , . . . , x(t) n ) iid ≈ π(x) where t is a simulation iteration index (at sample level)
  • 57. Intro/Inference/Algorithms: PMC/Beyond fixed k 57 Dependent importance sampling The importance distribution of the sample x(t) qt(x(t) |x(t−1) ) can depend on the previous sample x(t−1) in any possible way as long as marginal distributions qit(x) = qt(x(t) ) dx (t) −i can be expressed to build importance weights it = π(x (t) i ) qit(x (t) i )
  • 58. Intro/Inference/Algorithms: PMC/Beyond fixed k 58 Special case qt(x(t) |x(t−1) ) = n i=1 qit(x (t) i |x(t−1) ) [Independent proposals] In that case, var ˆIt = 1 n2 n i=1 var (t) i h(x (t) i ) .
  • 59. Intro/Inference/Algorithms: PMC/Beyond fixed k 59 Population Monte Carlo (PMC) Use previous sample (x(t) ) marginaly distributed from π E ith(X (t) i ) = E π(x (t) i ) qit(x (t) i ) h(x (t) i )qit(x (t) i ) dx (t) i = E [Eπ [h(X)]] to improve on approximation of π
  • 60. Intro/Inference/Algorithms: PMC/Beyond fixed k 60 Resampling Over iterations (in t), weights may degenerate: e.g., 1 1 while 2, . . . , n negligible Use instead Rubin’s (1987) systematic resampling: at each iteration resample the x (t) i ’s according to their weight (t) i and reset the weights to 1 (preserves “unbiasedness”/increases variance)
  • 61. Intro/Inference/Algorithms: PMC/Beyond fixed k 61 PMC for mixtures Proposal distributions qit that simulate (θ (i) (t), p (i) (t)) and associated importance weight ρ (i) (t) = f x|θ (i) (t), p (i) (t) π θ (i) (t), p (i) (t) qit θ (i) (t), p (i) (t) , i = 1, . . . , M Approximations of the form 1 M M i=1 ρ (i) (t) M l=1 ρ (l) (t) h θ (i) (t), p (i) (t) give (almost) unbiased estimators of Eπ x[h(θ, p)],
  • 62. Intro/Inference/Algorithms: PMC/Beyond fixed k 62 0. Initialization. Choose θ (1) (0), . . . , θ (M) (0) and p (1) (0), . . . , p (M) (0) 1. Step t. For t = 1, . . . , T 1.1 For i = 1, . . . , M 1.1.1 Generate θ (i) (t), p (i) (t) from qit (θ, p), 1.1.2 Compute ρ(i) = f x|θ (i) (t), p (i) (t) π θ (i) (t), p (i) (t) qit θ (i) (t), p (i) (t) , 1.2 Compute ω(i) = ρ(i) M l=1 ρ(l) , 1.3 Resample M values with replacement from the θ (i) (t), p (i) (t) ’s using the weights ω(i)
  • 63. Intro/Inference/Algorithms: PMC/Beyond fixed k 63 Example 7. Mean normal mixture Implementation without the Gibbs augmentation step, using normal random walk proposals based on the previous sample of (µ1, µ2)’s as in Metropolis–Hastings. Selection of a “proper” scale: bypassed by the adaptivity of the PMC algorithm Several proposals associated with a range of variances vk, k = 1, . . . , K. At each step, new variances can be selected proportionally to the performances of the scales vk on the previous iterations, for instance, proportional to its non-degeneracy rate
  • 64. Intro/Inference/Algorithms: PMC/Beyond fixed k 64 Step t. For t = 1, . . . , T 1.1 For i = 1, . . . , M 1.1.1 Generate k from M (1; r1, . . . , rK), 1.1.2 Generate (µj) (i) (t) (j = 1, 2) from N (µj) (i) (t−1) , vk 1.1.4 Compute ρ(i) = f x|(µ1) (i) (t), (µ2) (i) (t) π (µ1) (i) (t), (µ2) (i) (t) K l=1 2 j=1 ϕ (µj) (i) (t); (µ1) (i) (t−1), vl , 1.2 Compute ω(i) = ρ(i) M l=1 ρ(l) , 1.3 Resample the (µ1) (i) (t), (µ2) (i) (t)’s using the weights ω(i) 1.4 Update the rl’s: rl is proportional to the number of (µ1) (i) (t), (µ2) (i) (t)’s with variance vl resampled.
  • 65. Intro/Inference/Algorithms: PMC/Beyond fixed k 65 −1 0 1 2 3 4 −101234 µ1 µ2
  • 66. Intro/Inference/Algorithms/Beyond fixed k 66 4 Unknown number of components When k number of components is unknown, there are several models Mk with corresponding parameter sets Θk in competition.
  • 67. Intro/Inference/Algorithms/Beyond fixed k: RJ 67 Reversible jump MCMC Reversibility constraint put on dimension-changing moves that bridge the sets Θk / the models Mk [Green, 1995] Local reversibility for each pair (k1, k2) of possible values of k: supplement Θk1 and Θk2 with adequate artificial spaces in order to create a bijection between them:
  • 68. Intro/Inference/Algorithms/Beyond fixed k: RJ 68 Basic steps Choice of probabilities πij j πij = 1 of jumping to model Mkj while in model Mki θ(k1) is completed by a simulation u1 ∼ g1(u1) into (θ(k1) , u1) and θ(k2) by u2 ∼ g2(u2) into (θ(k2) , u2) (θ(k2) , u2) = Tk1→k2 (θ(k1) , u1),
  • 69. Intro/Inference/Algorithms/Beyond fixed k: RJ 69 Green reversible jump algorithm 0. At iteration t, if x(t) = (m, θ(m) ), 1. Select model Mn with probability πmn, 2. Generate umn ∼ ϕmn(u), 3. Set (θ(n) , vnm) = Tm→n(θ(m) , umn), 4. Take x(t+1) = (n, θ(n) ) with probability min π(n, θ(n) ) π(m, θ(m)) πnmϕnm(vnm) πmnϕmn(umn) ∂Tm→n(θ(m) , umn) ∂(θ(m), umn) , 1 , and take x(t+1) = x(t) otherwise.
  • 70. Intro/Inference/Algorithms/Beyond fixed k: RJ 70 Example 8. For a normal mixture Mk : k j=1 pjkN(µjk, σ2 jk) , restriction to moves from Mk to neighbouring models Mk+1 and Mk−1. [Richardson & Green, 1997]
  • 71. Intro/Inference/Algorithms/Beyond fixed k: RJ 71 Birth and death steps birth adds a new normal component generated from the prior death removes one of the k components at random. Birth acceptance probability min π(k+1)k πk(k+1) (k + 1)! k! πk+1(θk+1) πk(θk) (k + 1)ϕk(k+1)(uk(k+1)) , 1 = min π(k+1)k πk(k+1) (k + 1) (k) k+1(θk+1) (1 − pk+1)k−1 k(θk) , 1 , where (k) is the prior probability of model Mk
  • 72. Intro/Inference/Algorithms/Beyond fixed k: RJ 72 Proposal that can work well in some settings, but can also be inefficient (i.e. high rejection rate), if the prior is vague. Alternative: devise more local jumps between models, (i). split    pjk = pj(k+1) + p(j+1)(k+1) pjkµjk = pj(k+1)µj(k+1) + p(j+1)(k+1)µ(j+1)(k+1) pjkσ2 jk = pj(k+1)σ2 j(k+1) + p(j+1)(k+1)σ2 (j+1)(k+1) (ii). merge (reverse)
  • 73. Intro/Inference/Algorithms/Beyond fixed k: RJ 73 Histogram and rawplot of 100, 000 k’s produced by RJMCMC Histogram of k k 1 2 3 4 5 0.00.10.20.30.4 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 12345 Rawplot of k k
  • 74. Intro/Inference/Algorithms/Beyond fixed k: RJ 74 Normalised enzyme dataset 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.51.01.52.02.53.0
  • 75. Intro/Inference/Algorithms/Beyond fixed k: b&d 75 Birth and Death processes Use of an alternative methodology based on a Birth–&-Death (point) process Idea: Create a Markov chain in continuous time, i.e. a Markov jump process, moving between models Mk, by births (to increase the dimension), deaths (to decrease the dimension), and other moves [Preston, 1976; Ripley, 1977; Stephens, 1999]
  • 76. Intro/Inference/Algorithms/Beyond fixed k: b&d 76 Time till next modification (jump) is exponentially distributed with rate depending on current state Remember: if ξ1, . . . , ξv are exponentially distributed, ξi ∼ Exp(λi), min ξi ∼ Exp i λi Difference with MH-MCMC: Whenever a jump occurs, the corresponding move is always accepted. Acceptance probabilities replaced with holding times.
  • 77. Intro/Inference/Algorithms/Beyond fixed k: b&d 77 Balance condition Sufficient to have detailed balance L(θ)π(θ)q(θ, θ ) = L(θ )π(θ )q(θ , θ) for all θ, θ for ˜π(θ) ∝ L(θ)π(θ) to be stationary. Here q(θ, θ ) rate of moving from state θ to θ . Possibility to add split/merge and fixed-k processes if balance condition satisfied. [Capp´e, Ryd´en & CPR, 2002]
  • 78. Intro/Inference/Algorithms/Beyond fixed k: b&d 78 Case of mixtures Representation as a (marked) point process Φ = {pj, (µj, σj)} j Birth rate λ0 (constant) and proposal from the prior Death rate δj(Φ) for removal of component j Overall death rate k j=1 δj(Φ) = δ(Φ) Balance condition (k + 1) d(Φ ∪ {p, (µ, σ)}) L(Φ ∪ {p, (µ, σ)}) = λ0L(Φ) π(k) π(k + 1) with d(Φ {pj, (µj, σj)}) = δj(Φ)
  • 79. Intro/Inference/Algorithms/Beyond fixed k: b&d 79 Stephen’s original algorithm: For v = 0, 1, · · · , V t ← v Run till t > v + 1 1. Compute δj(Φ) = L(Φ|Φj) L(Φ) λ0λ1 2. δ(Φ) ← k j=1 δj(Φj), ξ ← λ0 + δ(Φ), u ∼ U([0, 1]) 3. t ← t − u log(u)
  • 80. Intro/Inference/Algorithms/Beyond fixed k: b&d 80 4. With probability δ(Φ)/ξ Remove component j with probability δj(Φ)/δ(Φ) k ← k − 1 p ← p /(1 − pj) ( = j) Otherwise, Add component j from the prior π(µj, σj) pj ∼ Be(γ, kγ) p ← p (1 − pj) ( = j) k ← k + 1 5. Run I MCMC(k, β, p)
  • 81. Intro/Inference/Algorithms/Beyond fixed k: b&d 81 Rescaling time In discrete-time RJMCMC, let the time unit be 1/N, put βk = λk/N and δk = 1 − λk/N As N → ∞, each birth proposal will be accepted, and having k components births occur according to a Poisson process with rate λk
  • 82. Intro/Inference/Algorithms/Beyond fixed k: b&d 82 while component (w, φ) dies with rate lim N→∞ Nδk+1 × 1 k + 1 × min(A−1 , 1) = lim N→∞ N 1 k + 1 × likelihood ratio −1 × βk δk+1 × b(w, φ) (1 − w)k−1 = likelihood ratio −1 × λk k + 1 × b(w, φ) (1 − w)k−1 . Hence “RJMCMC→BDMCMC”
  • 83. Intro/Inference/Algorithms/Beyond fixed k: b&d 83 Even closer to RJMCM Exponential (random) sampling is not necessary, nor is continuous time! Estimator of I = g(θ)π(θ)dθ by ˆI = 1 N N 1 g(θ(τi)) where {θ(t)} continuous time MCMC process and τ1, . . . , τN sampling instants.
  • 84. Intro/Inference/Algorithms/Beyond fixed k: b&d 84 New notations: 1. Tn time of the n-th jump of {θ(t)} with T0 = 0 2. {θn} jump chain of states visited by {θ(t)} 3. λ(θ) total rate of {θ(t)} leaving state θ Then holding time Tn − Tn−1 of {θ(t)} in its n-th state θn exponential rv with rate λ(θn)
  • 85. Intro/Inference/Algorithms/Beyond fixed k: b&d 85 Rao–Blackwellisation If sampling interval goes to 0, limiting case ˆI∞ = 1 TN N n=1 g(θn−1)(Tn − Tn−1) Rao–Blackwellisation argument: replace ˆI∞ with ˜I = 1 TN N n=1 g(θn−1) λ(θn−1) = 1 TN N n=1 E[Tn − Tn−1 | θn−1] g(θn−1) . Conclusion: Only simulate jumps and store average holding times! Completely remove continuous time feature
  • 86. Intro/Inference/Algorithms/Beyond fixed k: b&d 86 Example 9. Galaxy dataset Comparison of RJMCMC and CTMCMC in the Galaxy dataset [Capp´e & al., 2002] Experiment: • Same proposals (same C code) • Moves proposed in equal proportions by both samplers (setting the probability PF of proposing a fixed k move in RJMCMC equal to the rate ηF at which fixed k moves are proposed in CTMCMC, and likewise PB = ηB for the birth moves) • Rao–Blackwellisation • Number of jumps (number of visited configurations) in CTMCMC == number of iterations of RJMCMC
  • 87. Intro/Inference/Algorithms/Beyond fixed k: b&d 87 Results: • If one algorithm performs poorly, so does the other. (For RJMCMC manifested as small A’s—birth proposals are rarely accepted—while for BDMCMC manifested as large δ’s—new components are indeed born but die again quickly.) • No significant difference between samplers for birth and death only • CTMCMC slightly better than RJMCMC with split-and-combine moves • Marginal advantage in accuracy for split-and-combine addition • For split-and-combine moves, computation time associated with one step of continuous time simulation is about 5 times longer than for reversible jump simulation.
  • 88. Intro/Inference/Algorithms/Beyond fixed k: b&d 88 Box plot for the estimated posterior on k obtained from 200 independent runs: RJMCMC (top) and BDMCMC (bottom). The number of iterations varies from 5 000 (left), to 50 000 (middle) and 500 000 (right). 2 4 6 8 10 12 14 0 0.1 0.2 0.3 CT (500 000 it.) k 2 4 6 8 10 12 14 0 0.1 0.2 0.3 RJ (500 000 it.) 2 4 6 8 10 12 14 0 0.1 0.2 0.3 CT (50 000 it.) k 2 4 6 8 10 12 14 0 0.1 0.2 0.3 RJ (50 000 it.) 2 4 6 8 10 12 14 0 0.1 0.2 0.3 CT (5 000 it.) posteriorprobability k 2 4 6 8 10 12 14 0 0.1 0.2 0.3 RJ (5 000 it.) posteriorprobability
  • 89. Intro/Inference/Algorithms/Beyond fixed k: b&d 89 Same for the estimated posterior on k obtained from 500 independent runs: Top RJMCMC and bottom, CTMCMC. The number of iterations varies from 5 000 (left plots) to 50 000 (right plots). 2 4 6 8 10 12 14 0 0.1 0.2 0.3 CT (50 000 it.) k 2 4 6 8 10 12 14 0 0.1 0.2 0.3 RJ (50 000 it.) 2 4 6 8 10 12 14 0 0.1 0.2 0.3 CT (5 000 it.) posteriorprobability k 2 4 6 8 10 12 14 0 0.1 0.2 0.3 RJ (5 000 it.) posteriorprobability