SlideShare uma empresa Scribd logo
Adaptive Restore algorithm &
Importance Monte Carlo
Christian P. Robert
U. Paris Dauphine & Warwick U.
6th Workshop on Sequential Monte Carlo Methods,
ICMS
ERCall for applications
Ongoing 2023-2030 ERC funding for PhD and postdoctoral
collaborations with
▶ Michael Jordan (Berkeley)
▶ Eric Moulines (Paris)
▶ Gareth O Roberts (Warwick)
▶ myself (Paris)
Part I: Adaptive Restore algorithm
0 10 20 30 40 50
−2
0
2
4
x
t
Joint work with H McKimm, A Wang, M Pollock,
and GO Roberts
arXiv:2210.09901 – Bernoulli
MCMC regeneration
Reversibility attached with (sub-efficient) random walk
behaviour
Recent achievements in non-reversible MCMC with PDMPs
[Bouchard-Côté et al., 2018; Bierkens et al., 2019]
Link with regeneration
[Nummelin, 1978; Mykland et al., X, 1995]
At regeneration times, Markov chain starts again: future
independent of past
Scales poorly: regenerations recede exponentially with
dimension
MCMC regeneration
Reversibility attached with (sub-efficient) random walk
behaviour
Recent achievements in non-reversible MCMC with PDMPs
[Bouchard-Côté et al., 2018; Bierkens et al., 2019]
Link with regeneration
[Nummelin, 1978; Mykland et al., X, 1995]
At regeneration times, Markov chain starts again: future
independent of past
Scales poorly: regenerations recede exponentially with
dimension
PDMP: Marketing arguments
Current (?) default workhorse: reversible MCMC methods
(incl. NUTS)
[Hoffman & Gelman, 2014]
Non-reversible MCMC algorithms based on piecewise
deterministic Markov processes (aka PDMP) perform well
empirically
PDMP: Marketing arguments
Non-reversible MCMC algorithms based on piecewise
deterministic Markov processes (aka PDMP) perform well
empirically
Quantitative convergence rates and variance now available
▶ Physics origins
[Peters & De With, 2012; Krauth et al., 2009, 15, 16]
▶ geometric ergodicity for exponentially decaying tail target
[Mesquita & Hespanha, 2010]
▶ ergodicity targets on the real line
[Bierkens et al., 2016a,b]
Piecewise deterministic Markov process
PDMP sampler is a continuous-time, non-reversible, MCMC,
method based on auxiliary variables
1. empirically state-of-the-art performances
[Bouchard et al., 2017]
2. exact subsampling for big data
[Bierkens et al., 2017]
3. geometric ergodicity for large class of distribution
[Deligiannidis et al., 2017]
4. Ability to deal with intractable potential
log π(x) =
R
Uω(x)µ(dω)
[Pakman et al., 2016]
Piecewise deterministic Markov process
Piecewise deterministic Markov process {zt ∈ Z}t∈[0,∞), with
three ingredients,
1. Deterministic dynamics: between events, deterministic
evolution based on ODE
dzt/dt = Φ(zt)
2. Event occurrence rate: λ(t) = λ(zt)
3. Transition dynamics: At event time, τ, state prior to τ
denoted by zτ−, and new state generated by zτ ∼ Q(·|zτ−).
[Davis, 1984, 1993]
Implementation hardships
Main difficulties of implementing PDMP come from
1. Computing the ODE flow Ψ: linear dynamic, quadratic
dynamic
2. Simulating the inter-event time ηk: many techniques of
superposition and thinning for Poisson processes
[Devroye, 1986]
Simulation by superposition plus thinning
Most implementations thru discrete-time schemes by sampling
Bernoulli B(α(z))
For
Φ(z) = (x + vε, v) and α(z) = 1 ∧ π(x + vε)/π(x)
sampling inter-event time for strictly convex U(·) obtained by
solving
t⋆
= arg min
t
U(x + vt)
and additional randomization
▶ thinning: if there exists ᾱ such that α(Φ(z)) ≥ ᾱ(x, k),
accept-reject
▶ superposition and thinning: when α(z) = 1 ∧ ρ(Φ(z))/ρ(z)
and ρ(·) =
Q
i ρi(·) then ᾱ(z, k) =
Q
i ᾱi(z, k)
[Bouchard et al., 2017]
Restore process
Take {Yt}t≥0 diffusion / jump process on Rd with infinitesimal
generator LY and Y0 ∼ µ
Regeneration rate κ with associated tour length
τ = inf t ≥ 0 :
Zt
0
κ(Ys)ds ≥ ξ with ξ ∼ Exp(1)
{Y
(i)
t }t≥0, τ(i)
∞
i=0
iid realisations inducing regeneration times
Tj =
j−1
X
i=0
τ(i)
Restore process {Xt}t≥0 given by:
Xt =
∞
X
i=0
I[Ti,Ti+1)(t)Y
(i)
t−Ti
[Wang et al., 2021]
Restore process
0 2 4 6 8
−2
−1
0
1
2
x
t
Path of five tours of Brownian Restore with
π ≡ N(0, 12), µ ≡ N(0, 22) and C such that minx∈R κ(x) = 0,
with K = 200 and Λ0 = 1000. First and last output states
shown by green dots and red crosses
Stationarity of Restore
Infinitesimal generator of {Xt}t≥0
LXf(x) = LYf(x) + κ(x)
Z
[f(y) − f(x)]µ(y)dy
with adjoint L†
Y
Regeneration rate κ chosen as
κ(x) = L†
Yπ(x)
.
π(x) + Cµ(x)
.
π(x)
Implies {Xt}t≥0 is π-invariant
Z
Rd
LXf(x)π(x)dx = 0
Restore sampler
Rewrite
κ(x) =
L†
Yπ(x)
π(x)
+ C
µ(x)
π(x)
= κ̃(x) + C
µ(x)
π(x)
with
▶ κ̃ partial regeneration rate
▶ C  0 regeneration constant and
▶ Cµ regeneration measure, large enough for κ(·)  0
Resulting Monte Carlo method called Restore Sampler
[Wang et al., 2021]
Restore sampler convergence
Given π-invariance of {Xt}t≥0, Monte Carlo validation follows:
Eπ[f] = EX0∼µ
h Zτ(0)
0
f(Xs)dx
i.
EX0∼µ[τ(0)
] (1)
and a.s. convergence of ergodic averages:
1
t
Zt
0
f(Xs)ds → Eπ[f] (2)
For iid Zi :=
RTi+1
Ti
f(Xs)ds, CLT
√
n
ZTn
0
f(Xs)dx

Tn − Eπ[f]
!
→ N(0, σ2
f) (3)
where
σ2
f := EX0∼µ

Z0 − τ(0)
Eπ[f]
2
.
EX0∼µ[τ(0)
]
2
Restore sampler convergence
Given π-invariance of {Xt}t≥0, Monte Carlo validation follows:
Eπ[f] = EX0∼µ
h Zτ(0)
0
f(Xs)dx
i.
EX0∼µ[τ(0)
] (1)
and a.s. convergence of ergodic averages:
1
t
Zt
0
f(Xs)ds → Eπ[f] (2)
Estimator variance depends on expected tour length: prefer µ
favouring long tours
[Wang et al., 2020]
Minimal regeneration
Minimal regeneration measure, C+µ+ corresponding to smallest
possible rate
κ+
(x) := κ̃(x) ∨ 0 = κ̃(x) + C+
µ+
(x)
.
π(x)
leading to
µ+
(x) =
1
C+
[0 ∨ −κ̃(x)]π(x)
Frequent regeneration not necessarily detrimental, except when
when µ is not well-aligned to π, leading to wasted computation
Minimal Restore maximizes expected tour length / minimizes
asymptotic variance
[Wang et al., 2021]
Constant approximation
When target π lacks its normalizing constant Z
π(x) = π̃(x)

Z,
take energy U(x) := − log π(x) = log Z − log π̃(x)
E.g., when {Yt}t≥0 Brownian, κ̃ function of ∇U(x) and ∆U(x)
κ̃(x) :=
1
2

||∇U(x)||2
− ∆U(x)
Constant approximation
When target π lacks its normalizing constant Z
π(x) = π̃(x)

Z,
take energy U(x) := − log π(x) = log Z − log π̃(x)
In regeneration rate, Z absorbed into C
κ(x) = κ̃(x) + C
µ(x)
π̃(x)

Z
 = κ̃(x) + CZ
µ(x)
π̃(x)
= κ̃(x) + C̃
µ(x)
π̃(x)
where C̃ = CZ set by user. Since
C = 1/Eµ[τ],
using n tours with simulation time T,
Z ≈ C̃T

n
Adapting Restore
Adaptive Restore process defined by enriching underlying
continuous-time Markov process with regenerations at rate κ+
from distribution µt at time t
Convergence of (µt, πt) to (µ+, π): a.s. convergence of stochastic
approximation algorithms for discrete-time processes on
compact spaces
[Benaïm et al., 2018; McKimm et al., 2024]
Adapting Restore
Initial regeneration distribution µ0 and updates by addition of
point masses
µt(x) =

µ0(x), if N(t) = 0,
t
a+t
1
N(t)
PN(t)
i=1 δXζi
(x) + a
a+t µ0(x), if N(t)  0,
where a  0 constant and ζi arrival times of inhomogeneous
Poisson process (N(t) : t ≥ 0) with rate κ−(Xt)
κ−
(x) := [0 ∨ −κ̃(x)]
Poisson process simulated by Poisson thinning, under (strong)
assumption
K−
:= sup
x∈X
κ−
(x)  0
Adaptive Brownian Restore Algorithm
t ← 0, E ← ∅, i ← 0, X ∼ µ0.
while i  n do
τ̃ ∼ Exp(K+
), s ∼ Exp(Λ0), ζ̃ ∼ Exp(K−
).
if τ̃  s and τ̃  ζ̃ then
X ∼ N(X, τ̃), t ← t + τ̃, u ∼ U[0, 1].
if u  κ+
(X)/K+
then
if |E| = 0 then
X ∼ µ0.
else
X ∼ U(E) with probability t/(a + t), else X ∼ µ0.
end
i ← i + 1.
end
else if s  τ̃ and s  ζ̃ then
X ∼ N(X, s), t ← t + s, record X, t, i.
else
X ∼ N(X, ζ̃), t ← t + ζ̃, u ∼ U[0, 1].
If u  κ−
(X)/K−
then E ← E ∪ {X}.
end
end
ABRA...cadabra
0 10 20 30 40 50
−2
0
2
4
x
t
Path of an Adaptive Brownian Restore process with
π ≡ N(0, 1), µ0 ≡ N(2, 1), a = 10. Green dots and red crosses
as first and last output states of each tour.
Calibrating initial regeneration and parameters
▶ µ0 as initial approximate of π, e.g., µ0 ≡ Nd(0, I) with π
pre-transformed
▶ trade-off in choosing discrete measure dominance time:
smaller choices of a for faster convergence versus larger
values of a for more regenerations from µ0, hence better
exploration (range of a: between 103 and 104)
▶ K+ and K− based on quantiles of κ̃, from preliminary
MCMC runs
▶ or, assuming π close to Gaussian, initial guess of K− is d/2
and initial estimate for K+ based on χ2 approximation
Final remarks
▶ Adaptive Restore benefits from global moves, for targets
hard to approximate with a parametric distribution, with
large moves across the space
▶ Use of minimal regeneration rate makes simulation
computationally feasible and more likely in areas where π
has significant mass
▶ In comparison with wandom walk Metropolis, ABRA can
be slow but with higher efficiency for targets π with skewed
tails
Part II: Importance Monte Carlo
Joint work with C Andral, R Douc and H Marival
arXiv:2207.08271 – Stoch Proc  Appli
Rejection Markov chain
Mixing MCMC and rejection sampling :
1. Draw Markov chain (X̃i)1≤i≤m from kernel Q targeting π̃
2. With probability ρ(X̃i) ∈ (0, 1) (e.g. ρ = π/(Mπ̃)), accept
proposal
3. Resulting in new Markov chain (Xi)1≤i≤n, (n ≤ m)
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X1 X2 X3
Rejection Markov chain
Mixing MCMC and rejection sampling :
1. Draw Markov chain (X̃i)1≤i≤m from kernel Q targeting π̃
2. With probability ρ(X̃i) ∈ (0, 1) (e.g. ρ = π/(Mπ̃)), accept
proposal
3. Resulting in new Markov chain (Xi)1≤i≤n, (n ≤ m)
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X1 X2 X3
Rejection Markov Chain (cont’d)
Corresponding to kernel S for (Xi)
Sh(x) =
∞
X
k=1
EQ
x

ρ(Xk)h(Xk)


k−1
Y
i=1
(1 − ρ(Xi))




Nice (recursive) equation
Sh(x) = Q(ρh)(x) + Q((1 − ρ)Sh)(x)
Rejection Markov Chain (cont’d)
Corresponding to kernel S for (Xi)
Sh(x) =
∞
X
k=1
EQ
x

ρ(Xk)h(Xk)


k−1
Y
i=1
(1 − ρ(Xi))




Nice (recursive) equation
Sh(x) = Q(ρh)(x) + Q((1 − ρ)Sh)(x)
Invariant measure for S
If π̃Q = π̃, then S is ρ · π̃ invariant
Z
ρ(x)π̃(dx)S(x, dy) = ρ(y)π̃(dy)
Direct consequence : if ρ =
π
Mπ̃
, S is π-invariant
Invariant measure for S
If π̃Q = π̃, then S is ρ · π̃ invariant
Z
ρ(x)π̃(dx)S(x, dy) = ρ(y)π̃(dy)
Direct consequence : if ρ =
π
Mπ̃
, S is π-invariant
From rejection to importance
From
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X1 X2 X3
to
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X2
X1 X3 X5 X6
X4
From rejection to importance
From
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X1 X2 X3
to
X̃1 X̃2 X̃3 X̃4 X̃5
. . .
X2
X1 X3 X5 X6
X4
Importance Markov chain
▶ Case when ρ may take values above 1
▶ Define ρκ(x) := κπ(x)
π̃(x) for free parameter κ
▶ Allow to repeat elements of the chain
▶ Accept Ñi ∼ R̃(X̃i, .) copies of X̃i, with R̃i kernel in N s.t.
E[Ñi|X̃i] = ρκ(X̃i), e.g.
Ñi =
j
ρκ(X̃i)
k
+ Be

ρκ(X̃i)
Importance Markov chain
▶ Case when ρ may take values above 1
▶ Define ρκ(x) := κπ(x)
π̃(x) for free parameter κ
▶ Allow to repeat elements of the chain
▶ Accept Ñi ∼ R̃(X̃i, .) copies of X̃i, with R̃i kernel in N s.t.
E[Ñi|X̃i] = ρκ(X̃i), e.g.
Ñi =
j
ρκ(X̃i)
k
+ Be

ρκ(X̃i)
Related works
(i) Self-regenerative chain: independent draws (Q(x, .) = π̃) and
Ni = V · S copies where S ∼ Geo(q), V ∼ Be(p)
[Sahu and Zhigljavsky 2003; Gåsemyr 2002]
(ii) Proposals in a semi-Markov approach
[Malefaki and Iliopoulos 2008]
(iii) Dynamic Weighting Monte Carlo and correctly weighted
joint density: Z
wf(x, w)dw ∝ π(x)
[Wong and Liang 1997; Liu, Liang, and Wong 2001]
IMC algorithm
Algorithm 1: Importance Markov chain (IMC)
ℓ ← 0
Set an arbitrary X̃0
for k ← 1 to n do
Draw X̃k ∼ Q(X̃k−1, ·) and Ñk ∼ R̃(X̃k, ·)
Set Nℓ = Ñk
while Nℓ ≥ 1 do
Set (Xℓ, Nℓ) ← (X̃k, Nℓ − 1)
Set ℓ ← ℓ + 1
end
end
Dynamics of the chain
If Nk  0 :
Xk
Nk ̸= 0
Xk+1 = Xk
Nk+1 = Nk − 1
If Nk = 0 :
Xk
Nk = 0
Xk+1 ∼ Q(Xk, .)
Nk+1 ∼ R(Xk+1, .)
Formalisation of IMC
Define an extended Markov chain (Xi, Ni) where Xi and X̃i in
same space, and Ni ∈ N (counting number of repetitions)
Associated kernel
Ph(x, n) = I{n≥1}h(x, n − 1)
+ I{n=0}
∞
X
n′=0
Z
X
S(x, dx′
)R(x′
, n′
)h(x′
, n′
)
where
ρR̃(x) = R̃(x, [1, ∞)) ∈ [0, 1]
R(x, n) := R̃(x, n + 1)/ρR̃(x)
Formalisation of IMC
Define an extended Markov chain (Xi, Ni) where Xi and X̃i in
same space, and Ni ∈ N (counting number of repetitions)
Associated kernel
Ph(x, n) = I{n≥1}h(x, n − 1)
+ I{n=0}
∞
X
n′=0
Z
X
S(x, dx′
)R(x′
, n′
)h(x′
, n′
)
where
ρR̃(x) = R̃(x, [1, ∞)) ∈ [0, 1]
R(x, n) := R̃(x, n + 1)/ρR̃(x)
Invariant for P
Measure π̄ on X × N:
π(h) = κ−1
∞
X
n=1
Z
X
π̃(dx)R̃(x, n)
n−1
X
k=0
h(x, k)
such that
1. If
P∞
n=0 R̃(x, n)n = ρκ(x), marginal of π on the first
component equal to π
2. if π̃Q = π̃, Markov kernel π-invariant
Invariant for P
Measure π̄ on X × N:
π(h) = κ−1
∞
X
n=1
Z
X
π̃(dx)R̃(x, n)
n−1
X
k=0
h(x, k)
such that
1. If
P∞
n=0 R̃(x, n)n = ρκ(x), marginal of π on the first
component equal to π
2. if π̃Q = π̃, Markov kernel π-invariant
Strong Law of Large Numbers
Theorem 1
If for every ξ ∈ M1(X) and π̃ integrable function g,
lim
n→∞
n−1
n−1
X
k=0
g(X̃k) = π̃(g), PQ
ξ − as
then, for every π̄ integrable function g,
lim
n→∞
n−1
Kn−1
X
k=0
g(Xk)Nk = π̄(g), PP
ξ − as
Central limit theorem
Theorem 2
Under conditions on Q, π̃, take h : X → R as solution of a
Poisson equation for Q, then
1
√
n
n
X
i=1
(h(Xi) − π(h) )
PP
χ−law
⇝ N(0, σ2
(h)),
where
σ2
(h) = κσ̃2
(ρh0) + κ−1
^
σ2
(h0, κ),
σ̃2(ρh0) is the variance obtained with Q,
^
σ2(h0, κ) :=
R
X h2
0(x)VarR̃
x[N]π̃(dx),
VarR̃
x[N] :=
R
N R̃(x, dn)n2 −
R
N R̃(x, dn)n
2
.
Central limit theorem
Theorem 2
Under conditions on Q, π̃, take h : X → R as solution of a
Poisson equation for Q, then
1
√
n
n
X
i=1
(h(Xi) − π(h) )
PP
χ−law
⇝ N(0, σ2
(h)),
where
σ2
(h) = κσ̃2
(ρh0) + κ−1
^
σ2
(h0, κ),
σ̃2(ρh0) variance coming from instrumental chain, ^
σ2(h0, κ)
variance from random number of repetitions
choosing R̃
^
σ depends on the variance of R̃
VarR̃
x[N] :=
Z
N
R̃(x, dn)n2
−
Z
N
R̃(x, dn)n
2
For N integer-value random variable such that E[N] = ρ  ∞,
Var(N) ≥ {ρ} (1 − {ρ})
LoBound met by N = ⌊ρ⌋ + S, where S ∼ Ber({ρ}) (used as
“shifted Bernoulli” kernel default)
Geometric ergodicity
Theorem 3
Under assumptions on Q and R̃, P has unique invariant
probability measure π̄ and there exist constants δ, βr  1,
ζ  ∞, such that for all ξ ∈ M1(X × N),
∞
X
k=1
δk
dTV(ξPk
, π̄) ≤ ζ
Z
X×N
βn
r V(x) ξ(dxdn).
Pseudo-marginal version
▶ Cases when the density π not available but replaced by
(unbiased) estimate, leading to pseudo-marginal method.
[Andrieu and Roberts 2009]
▶ Extension: Importance Markov chain “pseudo-marginal
compatible” when unbiased estimate available
draw ^
π(Xi) (with expectation π(x)) and
Ni ∼ ⌊^
π/π̃(Xi)⌋ + Be ({^
π/π̃(Xi)})
by enlarging the chain structure
▶ ... and even with unbiased estimate of π̃
▶ resulting in higher variance
Pseudo-marginal version
▶ Cases when the density π not available but replaced by
(unbiased) estimate, leading to pseudo-marginal method.
[Andrieu and Roberts 2009]
▶ Extension: Importance Markov chain “pseudo-marginal
compatible” when unbiased estimate available
draw ^
π(Xi) (with expectation π(x)) and
Ni ∼ ⌊^
π/π̃(Xi)⌋ + Be ({^
π/π̃(Xi)})
by enlarging the chain structure
▶ ... and even with unbiased estimate of π̃
▶ resulting in higher variance
IMC in practice
Several factors to choose :
▶ auxiliary distribution π̃
▶ kernel Q
▶ value of κ
About κ
Several things to note
▶ κ is arbitrary if π and π̃ are unnormalized.
▶ the length of the final chain (Xi) grows linearly in κ for a
fixed length n chain (X̃i)
▶ hence, automatic tuning κ achieved by setting length of
chain
▶ For ESSκ := (
Pn
i=1 Ñi)2
Pn
i=1 Ñ2
i and usual IS ESS:
ESSIS := (
Pn
i=1 ρ(X̃i))2
.Pn
i=1 ρ(X̃i)2
ESSκ −→
κ→∞
ESSIS
Warning: notion of ESS not accounting for convergence of
Markov chain
About κ
Several things to note
▶ κ is arbitrary if π and π̃ are unnormalized.
▶ the length of the final chain (Xi) grows linearly in κ for a
fixed length n chain (X̃i)
▶ hence, automatic tuning κ achieved by setting length of
chain
▶ For ESSκ := (
Pn
i=1 Ñi)2
Pn
i=1 Ñ2
i and usual IS ESS:
ESSIS := (
Pn
i=1 ρ(X̃i))2
.Pn
i=1 ρ(X̃i)2
ESSκ −→
κ→∞
ESSIS
Warning: notion of ESS not accounting for convergence of
Markov chain
About κ
Several things to note
▶ κ is arbitrary if π and π̃ are unnormalized.
▶ the length of the final chain (Xi) grows linearly in κ for a
fixed length n chain (X̃i)
▶ hence, automatic tuning κ achieved by setting length of
chain
▶ For ESSκ := (
Pn
i=1 Ñi)2
Pn
i=1 Ñ2
i and usual IS ESS:
ESSIS := (
Pn
i=1 ρ(X̃i))2
.Pn
i=1 ρ(X̃i)2
ESSκ −→
κ→∞
ESSIS
Warning: notion of ESS not accounting for convergence of
Markov chain
Impact of κ
10 3 10 1
Mn/n
0.0000
0.0005
0.0010
0.0015
0.0020
ESS
/n
100 102 104
0
5
10
15
20
ESS
100 102 104
100
101
102
103
104
M_n
Dimension = 5, mixture of 6 gaussians
Impact of κ on ESS and Markov chain length
Independent IMC and normalizing flows
▶ Target: π a d-dimensional distribution, with 2d modes,
concentrated around the sphere
π(x) ∝ exp

−
1
2

∥x∥ − 2
0.1
2
−
d
X
i=1
log

e− 1
2
(
xi+3
0.6
)2
+ e−1
2
(
xi−3
0.6
)2



▶ Instrumental density and kernel: a normalizing flow T is
trained to approximate π: Q(x, .) = π̃(.) = T♯N(0, 1)
▶ Comparison with Metropolis–Hastings and Self
Regenerative MCMC
[Sahu and Zhigljavsky 2003; Gabrié et al 2022]
5 10 15 20 25
dimension
2000
4000
6000
8000
10000
12000
ESS
kind
imc
imh
osr
Conclusion
▶ Versatile framework that applies to many different kernels
Q and auxiliary distributions π̃
▶ Extensions: adaptive version – multiple auxiliary chains –
delayed acceptance
References I
Andrieu, Christophe and Gareth O. Roberts (Apr. 2009). “The
Pseudo-Marginal Approach for Efficient Monte Carlo Computations”. In: The
Annals of Statistics 37.2, pp. 697–725.
Gåsemyr, Jørund (2002). Markov Chain Monte Carlo Algorithms with
Independent Proposal Distribution and Their Relationship to Importance
Sampling and Rejection Sampling. Tech. rep.
Liu, Jun S, Faming Liang, and Wing Hung Wong (June 2001). “A Theory for
Dynamic Weighting in Monte Carlo Computation”. In: Journal of the
American Statistical Association 96.454, pp. 561–573.
Malefaki, Sonia and George Iliopoulos (Apr. 2008). “On Convergence of
Properly Weighted Samples to the Target Distribution”. In: Journal of
Statistical Planning and Inference 138.4, pp. 1210–1225.
Sahu, Sujit K. and Anatoly A. Zhigljavsky (June 2003). “Self-Regenerative
Markov Chain Monte Carlo with Adaptation”. In: Bernoulli 9.3, pp. 395–422.
Wong, Wing Hung and Faming Liang (Dec. 1997). “Dynamic Weighting in
Monte Carlo and Optimization”. In: Proceedings of the National Academy of
Sciences 94.26, pp. 14220–14224.

Mais conteúdo relacionado

Semelhante a Adaptive Restore algorithm & importance Monte Carlo

New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
Alexander Litvinenko
 

Semelhante a Adaptive Restore algorithm & importance Monte Carlo (20)

Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCD
 
International journal of engineering and mathematical modelling vol2 no1_2015_1
International journal of engineering and mathematical modelling vol2 no1_2015_1International journal of engineering and mathematical modelling vol2 no1_2015_1
International journal of engineering and mathematical modelling vol2 no1_2015_1
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
 
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
Metodo gauss_newton.pdf
Metodo gauss_newton.pdfMetodo gauss_newton.pdf
Metodo gauss_newton.pdf
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo Method
 
Probabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic ReachabilityProbabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
Probabilistic Control of Uncertain Linear Systems Using Stochastic Reachability
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance Constraints
 

Mais de Christian Robert

Mais de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 

Último

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
PirithiRaju
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
Sérgio Sacani
 

Último (20)

Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
In vitro androgenesis ...............pptx
In vitro androgenesis ...............pptxIn vitro androgenesis ...............pptx
In vitro androgenesis ...............pptx
 
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCEPLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
PLANT DISEASE MANAGEMENT PRINCIPLES AND ITS IMPORTANCE
 
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Cell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptxCell Immobilization Methods and Applications.pptx
Cell Immobilization Methods and Applications.pptx
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
 

Adaptive Restore algorithm & importance Monte Carlo

  • 1. Adaptive Restore algorithm & Importance Monte Carlo Christian P. Robert U. Paris Dauphine & Warwick U. 6th Workshop on Sequential Monte Carlo Methods, ICMS
  • 2. ERCall for applications Ongoing 2023-2030 ERC funding for PhD and postdoctoral collaborations with ▶ Michael Jordan (Berkeley) ▶ Eric Moulines (Paris) ▶ Gareth O Roberts (Warwick) ▶ myself (Paris)
  • 3. Part I: Adaptive Restore algorithm 0 10 20 30 40 50 −2 0 2 4 x t Joint work with H McKimm, A Wang, M Pollock, and GO Roberts arXiv:2210.09901 – Bernoulli
  • 4. MCMC regeneration Reversibility attached with (sub-efficient) random walk behaviour Recent achievements in non-reversible MCMC with PDMPs [Bouchard-Côté et al., 2018; Bierkens et al., 2019] Link with regeneration [Nummelin, 1978; Mykland et al., X, 1995] At regeneration times, Markov chain starts again: future independent of past Scales poorly: regenerations recede exponentially with dimension
  • 5. MCMC regeneration Reversibility attached with (sub-efficient) random walk behaviour Recent achievements in non-reversible MCMC with PDMPs [Bouchard-Côté et al., 2018; Bierkens et al., 2019] Link with regeneration [Nummelin, 1978; Mykland et al., X, 1995] At regeneration times, Markov chain starts again: future independent of past Scales poorly: regenerations recede exponentially with dimension
  • 6. PDMP: Marketing arguments Current (?) default workhorse: reversible MCMC methods (incl. NUTS) [Hoffman & Gelman, 2014] Non-reversible MCMC algorithms based on piecewise deterministic Markov processes (aka PDMP) perform well empirically
  • 7. PDMP: Marketing arguments Non-reversible MCMC algorithms based on piecewise deterministic Markov processes (aka PDMP) perform well empirically Quantitative convergence rates and variance now available ▶ Physics origins [Peters & De With, 2012; Krauth et al., 2009, 15, 16] ▶ geometric ergodicity for exponentially decaying tail target [Mesquita & Hespanha, 2010] ▶ ergodicity targets on the real line [Bierkens et al., 2016a,b]
  • 8. Piecewise deterministic Markov process PDMP sampler is a continuous-time, non-reversible, MCMC, method based on auxiliary variables 1. empirically state-of-the-art performances [Bouchard et al., 2017] 2. exact subsampling for big data [Bierkens et al., 2017] 3. geometric ergodicity for large class of distribution [Deligiannidis et al., 2017] 4. Ability to deal with intractable potential log π(x) = R Uω(x)µ(dω) [Pakman et al., 2016]
  • 9. Piecewise deterministic Markov process Piecewise deterministic Markov process {zt ∈ Z}t∈[0,∞), with three ingredients, 1. Deterministic dynamics: between events, deterministic evolution based on ODE dzt/dt = Φ(zt) 2. Event occurrence rate: λ(t) = λ(zt) 3. Transition dynamics: At event time, τ, state prior to τ denoted by zτ−, and new state generated by zτ ∼ Q(·|zτ−). [Davis, 1984, 1993]
  • 10. Implementation hardships Main difficulties of implementing PDMP come from 1. Computing the ODE flow Ψ: linear dynamic, quadratic dynamic 2. Simulating the inter-event time ηk: many techniques of superposition and thinning for Poisson processes [Devroye, 1986]
  • 11. Simulation by superposition plus thinning Most implementations thru discrete-time schemes by sampling Bernoulli B(α(z)) For Φ(z) = (x + vε, v) and α(z) = 1 ∧ π(x + vε)/π(x) sampling inter-event time for strictly convex U(·) obtained by solving t⋆ = arg min t U(x + vt) and additional randomization ▶ thinning: if there exists ᾱ such that α(Φ(z)) ≥ ᾱ(x, k), accept-reject ▶ superposition and thinning: when α(z) = 1 ∧ ρ(Φ(z))/ρ(z) and ρ(·) = Q i ρi(·) then ᾱ(z, k) = Q i ᾱi(z, k) [Bouchard et al., 2017]
  • 12. Restore process Take {Yt}t≥0 diffusion / jump process on Rd with infinitesimal generator LY and Y0 ∼ µ Regeneration rate κ with associated tour length τ = inf t ≥ 0 : Zt 0 κ(Ys)ds ≥ ξ with ξ ∼ Exp(1) {Y (i) t }t≥0, τ(i) ∞ i=0 iid realisations inducing regeneration times Tj = j−1 X i=0 τ(i) Restore process {Xt}t≥0 given by: Xt = ∞ X i=0 I[Ti,Ti+1)(t)Y (i) t−Ti [Wang et al., 2021]
  • 13. Restore process 0 2 4 6 8 −2 −1 0 1 2 x t Path of five tours of Brownian Restore with π ≡ N(0, 12), µ ≡ N(0, 22) and C such that minx∈R κ(x) = 0, with K = 200 and Λ0 = 1000. First and last output states shown by green dots and red crosses
  • 14. Stationarity of Restore Infinitesimal generator of {Xt}t≥0 LXf(x) = LYf(x) + κ(x) Z [f(y) − f(x)]µ(y)dy with adjoint L† Y Regeneration rate κ chosen as κ(x) = L† Yπ(x) . π(x) + Cµ(x) . π(x) Implies {Xt}t≥0 is π-invariant Z Rd LXf(x)π(x)dx = 0
  • 15. Restore sampler Rewrite κ(x) = L† Yπ(x) π(x) + C µ(x) π(x) = κ̃(x) + C µ(x) π(x) with ▶ κ̃ partial regeneration rate ▶ C 0 regeneration constant and ▶ Cµ regeneration measure, large enough for κ(·) 0 Resulting Monte Carlo method called Restore Sampler [Wang et al., 2021]
  • 16. Restore sampler convergence Given π-invariance of {Xt}t≥0, Monte Carlo validation follows: Eπ[f] = EX0∼µ h Zτ(0) 0 f(Xs)dx i. EX0∼µ[τ(0) ] (1) and a.s. convergence of ergodic averages: 1 t Zt 0 f(Xs)ds → Eπ[f] (2) For iid Zi := RTi+1 Ti f(Xs)ds, CLT √ n ZTn 0 f(Xs)dx Tn − Eπ[f] ! → N(0, σ2 f) (3) where σ2 f := EX0∼µ Z0 − τ(0) Eπ[f] 2 . EX0∼µ[τ(0) ] 2
  • 17. Restore sampler convergence Given π-invariance of {Xt}t≥0, Monte Carlo validation follows: Eπ[f] = EX0∼µ h Zτ(0) 0 f(Xs)dx i. EX0∼µ[τ(0) ] (1) and a.s. convergence of ergodic averages: 1 t Zt 0 f(Xs)ds → Eπ[f] (2) Estimator variance depends on expected tour length: prefer µ favouring long tours [Wang et al., 2020]
  • 18. Minimal regeneration Minimal regeneration measure, C+µ+ corresponding to smallest possible rate κ+ (x) := κ̃(x) ∨ 0 = κ̃(x) + C+ µ+ (x) . π(x) leading to µ+ (x) = 1 C+ [0 ∨ −κ̃(x)]π(x) Frequent regeneration not necessarily detrimental, except when when µ is not well-aligned to π, leading to wasted computation Minimal Restore maximizes expected tour length / minimizes asymptotic variance [Wang et al., 2021]
  • 19. Constant approximation When target π lacks its normalizing constant Z π(x) = π̃(x) Z, take energy U(x) := − log π(x) = log Z − log π̃(x) E.g., when {Yt}t≥0 Brownian, κ̃ function of ∇U(x) and ∆U(x) κ̃(x) := 1 2 ||∇U(x)||2 − ∆U(x)
  • 20. Constant approximation When target π lacks its normalizing constant Z π(x) = π̃(x) Z, take energy U(x) := − log π(x) = log Z − log π̃(x) In regeneration rate, Z absorbed into C κ(x) = κ̃(x) + C µ(x) π̃(x) Z = κ̃(x) + CZ µ(x) π̃(x) = κ̃(x) + C̃ µ(x) π̃(x) where C̃ = CZ set by user. Since C = 1/Eµ[τ], using n tours with simulation time T, Z ≈ C̃T n
  • 21. Adapting Restore Adaptive Restore process defined by enriching underlying continuous-time Markov process with regenerations at rate κ+ from distribution µt at time t Convergence of (µt, πt) to (µ+, π): a.s. convergence of stochastic approximation algorithms for discrete-time processes on compact spaces [Benaïm et al., 2018; McKimm et al., 2024]
  • 22. Adapting Restore Initial regeneration distribution µ0 and updates by addition of point masses µt(x) = µ0(x), if N(t) = 0, t a+t 1 N(t) PN(t) i=1 δXζi (x) + a a+t µ0(x), if N(t) 0, where a 0 constant and ζi arrival times of inhomogeneous Poisson process (N(t) : t ≥ 0) with rate κ−(Xt) κ− (x) := [0 ∨ −κ̃(x)] Poisson process simulated by Poisson thinning, under (strong) assumption K− := sup x∈X κ− (x) 0
  • 23. Adaptive Brownian Restore Algorithm t ← 0, E ← ∅, i ← 0, X ∼ µ0. while i n do τ̃ ∼ Exp(K+ ), s ∼ Exp(Λ0), ζ̃ ∼ Exp(K− ). if τ̃ s and τ̃ ζ̃ then X ∼ N(X, τ̃), t ← t + τ̃, u ∼ U[0, 1]. if u κ+ (X)/K+ then if |E| = 0 then X ∼ µ0. else X ∼ U(E) with probability t/(a + t), else X ∼ µ0. end i ← i + 1. end else if s τ̃ and s ζ̃ then X ∼ N(X, s), t ← t + s, record X, t, i. else X ∼ N(X, ζ̃), t ← t + ζ̃, u ∼ U[0, 1]. If u κ− (X)/K− then E ← E ∪ {X}. end end
  • 24. ABRA...cadabra 0 10 20 30 40 50 −2 0 2 4 x t Path of an Adaptive Brownian Restore process with π ≡ N(0, 1), µ0 ≡ N(2, 1), a = 10. Green dots and red crosses as first and last output states of each tour.
  • 25. Calibrating initial regeneration and parameters ▶ µ0 as initial approximate of π, e.g., µ0 ≡ Nd(0, I) with π pre-transformed ▶ trade-off in choosing discrete measure dominance time: smaller choices of a for faster convergence versus larger values of a for more regenerations from µ0, hence better exploration (range of a: between 103 and 104) ▶ K+ and K− based on quantiles of κ̃, from preliminary MCMC runs ▶ or, assuming π close to Gaussian, initial guess of K− is d/2 and initial estimate for K+ based on χ2 approximation
  • 26. Final remarks ▶ Adaptive Restore benefits from global moves, for targets hard to approximate with a parametric distribution, with large moves across the space ▶ Use of minimal regeneration rate makes simulation computationally feasible and more likely in areas where π has significant mass ▶ In comparison with wandom walk Metropolis, ABRA can be slow but with higher efficiency for targets π with skewed tails
  • 27. Part II: Importance Monte Carlo Joint work with C Andral, R Douc and H Marival arXiv:2207.08271 – Stoch Proc Appli
  • 28. Rejection Markov chain Mixing MCMC and rejection sampling : 1. Draw Markov chain (X̃i)1≤i≤m from kernel Q targeting π̃ 2. With probability ρ(X̃i) ∈ (0, 1) (e.g. ρ = π/(Mπ̃)), accept proposal 3. Resulting in new Markov chain (Xi)1≤i≤n, (n ≤ m) X̃1 X̃2 X̃3 X̃4 X̃5 . . . X1 X2 X3
  • 29. Rejection Markov chain Mixing MCMC and rejection sampling : 1. Draw Markov chain (X̃i)1≤i≤m from kernel Q targeting π̃ 2. With probability ρ(X̃i) ∈ (0, 1) (e.g. ρ = π/(Mπ̃)), accept proposal 3. Resulting in new Markov chain (Xi)1≤i≤n, (n ≤ m) X̃1 X̃2 X̃3 X̃4 X̃5 . . . X1 X2 X3
  • 30. Rejection Markov Chain (cont’d) Corresponding to kernel S for (Xi) Sh(x) = ∞ X k=1 EQ x  ρ(Xk)h(Xk)   k−1 Y i=1 (1 − ρ(Xi))     Nice (recursive) equation Sh(x) = Q(ρh)(x) + Q((1 − ρ)Sh)(x)
  • 31. Rejection Markov Chain (cont’d) Corresponding to kernel S for (Xi) Sh(x) = ∞ X k=1 EQ x  ρ(Xk)h(Xk)   k−1 Y i=1 (1 − ρ(Xi))     Nice (recursive) equation Sh(x) = Q(ρh)(x) + Q((1 − ρ)Sh)(x)
  • 32. Invariant measure for S If π̃Q = π̃, then S is ρ · π̃ invariant Z ρ(x)π̃(dx)S(x, dy) = ρ(y)π̃(dy) Direct consequence : if ρ = π Mπ̃ , S is π-invariant
  • 33. Invariant measure for S If π̃Q = π̃, then S is ρ · π̃ invariant Z ρ(x)π̃(dx)S(x, dy) = ρ(y)π̃(dy) Direct consequence : if ρ = π Mπ̃ , S is π-invariant
  • 34. From rejection to importance From X̃1 X̃2 X̃3 X̃4 X̃5 . . . X1 X2 X3 to X̃1 X̃2 X̃3 X̃4 X̃5 . . . X2 X1 X3 X5 X6 X4
  • 35. From rejection to importance From X̃1 X̃2 X̃3 X̃4 X̃5 . . . X1 X2 X3 to X̃1 X̃2 X̃3 X̃4 X̃5 . . . X2 X1 X3 X5 X6 X4
  • 36. Importance Markov chain ▶ Case when ρ may take values above 1 ▶ Define ρκ(x) := κπ(x) π̃(x) for free parameter κ ▶ Allow to repeat elements of the chain ▶ Accept Ñi ∼ R̃(X̃i, .) copies of X̃i, with R̃i kernel in N s.t. E[Ñi|X̃i] = ρκ(X̃i), e.g. Ñi = j ρκ(X̃i) k + Be ρκ(X̃i)
  • 37. Importance Markov chain ▶ Case when ρ may take values above 1 ▶ Define ρκ(x) := κπ(x) π̃(x) for free parameter κ ▶ Allow to repeat elements of the chain ▶ Accept Ñi ∼ R̃(X̃i, .) copies of X̃i, with R̃i kernel in N s.t. E[Ñi|X̃i] = ρκ(X̃i), e.g. Ñi = j ρκ(X̃i) k + Be ρκ(X̃i)
  • 38. Related works (i) Self-regenerative chain: independent draws (Q(x, .) = π̃) and Ni = V · S copies where S ∼ Geo(q), V ∼ Be(p) [Sahu and Zhigljavsky 2003; Gåsemyr 2002] (ii) Proposals in a semi-Markov approach [Malefaki and Iliopoulos 2008] (iii) Dynamic Weighting Monte Carlo and correctly weighted joint density: Z wf(x, w)dw ∝ π(x) [Wong and Liang 1997; Liu, Liang, and Wong 2001]
  • 39. IMC algorithm Algorithm 1: Importance Markov chain (IMC) ℓ ← 0 Set an arbitrary X̃0 for k ← 1 to n do Draw X̃k ∼ Q(X̃k−1, ·) and Ñk ∼ R̃(X̃k, ·) Set Nℓ = Ñk while Nℓ ≥ 1 do Set (Xℓ, Nℓ) ← (X̃k, Nℓ − 1) Set ℓ ← ℓ + 1 end end
  • 40. Dynamics of the chain If Nk 0 : Xk Nk ̸= 0 Xk+1 = Xk Nk+1 = Nk − 1 If Nk = 0 : Xk Nk = 0 Xk+1 ∼ Q(Xk, .) Nk+1 ∼ R(Xk+1, .)
  • 41. Formalisation of IMC Define an extended Markov chain (Xi, Ni) where Xi and X̃i in same space, and Ni ∈ N (counting number of repetitions) Associated kernel Ph(x, n) = I{n≥1}h(x, n − 1) + I{n=0} ∞ X n′=0 Z X S(x, dx′ )R(x′ , n′ )h(x′ , n′ ) where ρR̃(x) = R̃(x, [1, ∞)) ∈ [0, 1] R(x, n) := R̃(x, n + 1)/ρR̃(x)
  • 42. Formalisation of IMC Define an extended Markov chain (Xi, Ni) where Xi and X̃i in same space, and Ni ∈ N (counting number of repetitions) Associated kernel Ph(x, n) = I{n≥1}h(x, n − 1) + I{n=0} ∞ X n′=0 Z X S(x, dx′ )R(x′ , n′ )h(x′ , n′ ) where ρR̃(x) = R̃(x, [1, ∞)) ∈ [0, 1] R(x, n) := R̃(x, n + 1)/ρR̃(x)
  • 43. Invariant for P Measure π̄ on X × N: π(h) = κ−1 ∞ X n=1 Z X π̃(dx)R̃(x, n) n−1 X k=0 h(x, k) such that 1. If P∞ n=0 R̃(x, n)n = ρκ(x), marginal of π on the first component equal to π 2. if π̃Q = π̃, Markov kernel π-invariant
  • 44. Invariant for P Measure π̄ on X × N: π(h) = κ−1 ∞ X n=1 Z X π̃(dx)R̃(x, n) n−1 X k=0 h(x, k) such that 1. If P∞ n=0 R̃(x, n)n = ρκ(x), marginal of π on the first component equal to π 2. if π̃Q = π̃, Markov kernel π-invariant
  • 45. Strong Law of Large Numbers Theorem 1 If for every ξ ∈ M1(X) and π̃ integrable function g, lim n→∞ n−1 n−1 X k=0 g(X̃k) = π̃(g), PQ ξ − as then, for every π̄ integrable function g, lim n→∞ n−1 Kn−1 X k=0 g(Xk)Nk = π̄(g), PP ξ − as
  • 46. Central limit theorem Theorem 2 Under conditions on Q, π̃, take h : X → R as solution of a Poisson equation for Q, then 1 √ n n X i=1 (h(Xi) − π(h) ) PP χ−law ⇝ N(0, σ2 (h)), where σ2 (h) = κσ̃2 (ρh0) + κ−1 ^ σ2 (h0, κ), σ̃2(ρh0) is the variance obtained with Q, ^ σ2(h0, κ) := R X h2 0(x)VarR̃ x[N]π̃(dx), VarR̃ x[N] := R N R̃(x, dn)n2 − R N R̃(x, dn)n 2 .
  • 47. Central limit theorem Theorem 2 Under conditions on Q, π̃, take h : X → R as solution of a Poisson equation for Q, then 1 √ n n X i=1 (h(Xi) − π(h) ) PP χ−law ⇝ N(0, σ2 (h)), where σ2 (h) = κσ̃2 (ρh0) + κ−1 ^ σ2 (h0, κ), σ̃2(ρh0) variance coming from instrumental chain, ^ σ2(h0, κ) variance from random number of repetitions
  • 48. choosing R̃ ^ σ depends on the variance of R̃ VarR̃ x[N] := Z N R̃(x, dn)n2 − Z N R̃(x, dn)n 2 For N integer-value random variable such that E[N] = ρ ∞, Var(N) ≥ {ρ} (1 − {ρ}) LoBound met by N = ⌊ρ⌋ + S, where S ∼ Ber({ρ}) (used as “shifted Bernoulli” kernel default)
  • 49. Geometric ergodicity Theorem 3 Under assumptions on Q and R̃, P has unique invariant probability measure π̄ and there exist constants δ, βr 1, ζ ∞, such that for all ξ ∈ M1(X × N), ∞ X k=1 δk dTV(ξPk , π̄) ≤ ζ Z X×N βn r V(x) ξ(dxdn).
  • 50. Pseudo-marginal version ▶ Cases when the density π not available but replaced by (unbiased) estimate, leading to pseudo-marginal method. [Andrieu and Roberts 2009] ▶ Extension: Importance Markov chain “pseudo-marginal compatible” when unbiased estimate available draw ^ π(Xi) (with expectation π(x)) and Ni ∼ ⌊^ π/π̃(Xi)⌋ + Be ({^ π/π̃(Xi)}) by enlarging the chain structure ▶ ... and even with unbiased estimate of π̃ ▶ resulting in higher variance
  • 51. Pseudo-marginal version ▶ Cases when the density π not available but replaced by (unbiased) estimate, leading to pseudo-marginal method. [Andrieu and Roberts 2009] ▶ Extension: Importance Markov chain “pseudo-marginal compatible” when unbiased estimate available draw ^ π(Xi) (with expectation π(x)) and Ni ∼ ⌊^ π/π̃(Xi)⌋ + Be ({^ π/π̃(Xi)}) by enlarging the chain structure ▶ ... and even with unbiased estimate of π̃ ▶ resulting in higher variance
  • 52. IMC in practice Several factors to choose : ▶ auxiliary distribution π̃ ▶ kernel Q ▶ value of κ
  • 53. About κ Several things to note ▶ κ is arbitrary if π and π̃ are unnormalized. ▶ the length of the final chain (Xi) grows linearly in κ for a fixed length n chain (X̃i) ▶ hence, automatic tuning κ achieved by setting length of chain ▶ For ESSκ := ( Pn i=1 Ñi)2 Pn i=1 Ñ2 i and usual IS ESS: ESSIS := ( Pn i=1 ρ(X̃i))2 .Pn i=1 ρ(X̃i)2 ESSκ −→ κ→∞ ESSIS Warning: notion of ESS not accounting for convergence of Markov chain
  • 54. About κ Several things to note ▶ κ is arbitrary if π and π̃ are unnormalized. ▶ the length of the final chain (Xi) grows linearly in κ for a fixed length n chain (X̃i) ▶ hence, automatic tuning κ achieved by setting length of chain ▶ For ESSκ := ( Pn i=1 Ñi)2 Pn i=1 Ñ2 i and usual IS ESS: ESSIS := ( Pn i=1 ρ(X̃i))2 .Pn i=1 ρ(X̃i)2 ESSκ −→ κ→∞ ESSIS Warning: notion of ESS not accounting for convergence of Markov chain
  • 55. About κ Several things to note ▶ κ is arbitrary if π and π̃ are unnormalized. ▶ the length of the final chain (Xi) grows linearly in κ for a fixed length n chain (X̃i) ▶ hence, automatic tuning κ achieved by setting length of chain ▶ For ESSκ := ( Pn i=1 Ñi)2 Pn i=1 Ñ2 i and usual IS ESS: ESSIS := ( Pn i=1 ρ(X̃i))2 .Pn i=1 ρ(X̃i)2 ESSκ −→ κ→∞ ESSIS Warning: notion of ESS not accounting for convergence of Markov chain
  • 56. Impact of κ 10 3 10 1 Mn/n 0.0000 0.0005 0.0010 0.0015 0.0020 ESS /n 100 102 104 0 5 10 15 20 ESS 100 102 104 100 101 102 103 104 M_n Dimension = 5, mixture of 6 gaussians Impact of κ on ESS and Markov chain length
  • 57. Independent IMC and normalizing flows ▶ Target: π a d-dimensional distribution, with 2d modes, concentrated around the sphere π(x) ∝ exp  − 1 2 ∥x∥ − 2 0.1 2 − d X i=1 log e− 1 2 ( xi+3 0.6 )2 + e−1 2 ( xi−3 0.6 )2   ▶ Instrumental density and kernel: a normalizing flow T is trained to approximate π: Q(x, .) = π̃(.) = T♯N(0, 1) ▶ Comparison with Metropolis–Hastings and Self Regenerative MCMC [Sahu and Zhigljavsky 2003; Gabrié et al 2022]
  • 58. 5 10 15 20 25 dimension 2000 4000 6000 8000 10000 12000 ESS kind imc imh osr
  • 59. Conclusion ▶ Versatile framework that applies to many different kernels Q and auxiliary distributions π̃ ▶ Extensions: adaptive version – multiple auxiliary chains – delayed acceptance
  • 60. References I Andrieu, Christophe and Gareth O. Roberts (Apr. 2009). “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations”. In: The Annals of Statistics 37.2, pp. 697–725. Gåsemyr, Jørund (2002). Markov Chain Monte Carlo Algorithms with Independent Proposal Distribution and Their Relationship to Importance Sampling and Rejection Sampling. Tech. rep. Liu, Jun S, Faming Liang, and Wing Hung Wong (June 2001). “A Theory for Dynamic Weighting in Monte Carlo Computation”. In: Journal of the American Statistical Association 96.454, pp. 561–573. Malefaki, Sonia and George Iliopoulos (Apr. 2008). “On Convergence of Properly Weighted Samples to the Target Distribution”. In: Journal of Statistical Planning and Inference 138.4, pp. 1210–1225. Sahu, Sujit K. and Anatoly A. Zhigljavsky (June 2003). “Self-Regenerative Markov Chain Monte Carlo with Adaptation”. In: Bernoulli 9.3, pp. 395–422. Wong, Wing Hung and Faming Liang (Dec. 1997). “Dynamic Weighting in Monte Carlo and Optimization”. In: Proceedings of the National Academy of Sciences 94.26, pp. 14220–14224.