social pharmacy d-pharm 1st year by Pragati K. Mahajan
Discussion of Fearnhead and Prangle, RSS< Dec. 14, 2011
1. Semi-automatic ABCA Discussion
Semi-automatic ABC
A Discussion
Christian P. Robert
Universit´ Paris-Dauphine, IuF, & CREST
e
http://xianblog.wordpress.com
November 2, 2011
LA TEX code borrowed from arXiv:1004.1112v2
3. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
Regular Bayesian computation issues
When faced with a non-standard posterior distribution
π(θ|y) ∝ π(θ)L(θ|y)
the standard solution is to use simulation (Monte Carlo) to
produce a sample
θ1 , . . . , θ T
from π(θ|y) (or approximately by Markov chain Monte Carlo
methods)
[Robert & Casella, 2004]
4. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
Untractable likelihoods
Cases when the likelihood function f (y|θ) is unavailable and when
the completion step
f (y|θ) = f (y, z|θ) dz
Z
is impossible or too costly because of the dimension of z
c MCMC cannot be implemented!
5. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
Untractable likelihoods
c MCMC cannot be implemented!
6. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
7. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
8. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Rubin, 1984; Tavar´ et al., 1997]
e
9. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
A as approximative
When y is a continuous random variable, equality z = y is replaced
with a tolerance condition,
(y, z) ≤
where is a distance
10. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
A as approximative
When y is a continuous random variable, equality z = y is replaced
with a tolerance condition,
(y, z) ≤
where is a distance
Output distributed from
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
11. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)} ≤
set θi = θ
end for
where η(y) defines a (generaly in-sufficient) statistic
12. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
13. Semi-automatic ABCA Discussion
Approximate Bayesian computation (recap)
Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
[Not garanteed!]
15. Semi-automatic ABCA Discussion
Summary statistic selection
F&P’s setting
F&P’s ABC
Use of a summary statistic S(·), an importance proposal g(·), a
kernel K(·) ≤ 1 and a bandwidth h > 0 such that
(θ, ysim ) ∼ g(θ)f (ysim |θ)
is accepted with probability (hence the bound)
K[{S(ysim ) − sobs }/h]
[or is it K[{S(ysim ) − sobs }]/h, cf (2)? typo]
the corresponding importance weight defined by
π(θ) g(θ)
16. Semi-automatic ABCA Discussion
Summary statistic selection
F&P’s setting
Errors, errors, and errors
Three levels of approximation
π(θ|yobs ) by π(θ|sobs ) loss of information
π(θ|sobs ) by
π(s)K[{s − sobs }/h]π(θ|s) ds
πABC (θ|sobs ) =
π(s)K[{s − sobs }/h] ds
noisy observations
πABC (θ|sobs ) by importance Monte Carlo based on N
simulations, represented by var(a(θ)|sobs )/Nacc [expected
number of acceptances]
17. Semi-automatic ABCA Discussion
Summary statistic selection
F&P’s setting
Average acceptance asymptotics
For the average acceptance probability/approximate likelihood
p(θ|sobs ) = f (ysim |θ) K[{S(ysim ) − sobs }/h] dysim ,
overall acceptance probability
p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd )
[Lemma 1]
18. Semi-automatic ABCA Discussion
Summary statistic selection
F&P’s setting
Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2
[Not particularly useful in practice]
19. Semi-automatic ABCA Discussion
Summary statistic selection
F&P’s setting
Calibration of h
“This result gives insight into how S(·) and h affect the
Monte Carlo error. To minimize Monte Carlo error, we
need hd to be not too small. Thus ideally we want S(·)
to be a low dimensional summary of the data that is
sufficiently informative about θ that π(θ|sobs ) is close, in
some sense, to π(θ|yobs )” (p.5)
Constraint on h only addresses one term in the approximation
error and acceptance probability
h large prevents π(θ|sobs ) to be close to πABC (θ|sobs )
d small prevents π(θ|sobs ) to be close to π(θ|yobs )
20. Semi-automatic ABCA Discussion
Summary statistic selection
Noisy ABC
Calibrated ABC
Definition
For 0 < q < 1 and subset A, fix [one specific?/all?] event Eq (A)
with PrABC (θ ∈ Eq (A)|sobs ) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq (A)) = q
Why calibrated and not exact?
21. Semi-automatic ABCA Discussion
Summary statistic selection
Noisy ABC
Calibrated ABC
Theorem
Noisy ABC, where
sobs = S(yobs ) + h , ∼ K(·)
is calibrated
[Wilkinson, 2008]
no condition on h
22. Semi-automatic ABCA Discussion
Summary statistic selection
Noisy ABC
Calibrated ABC
Theorem
For noisy ABC, the expected noisy-ABC log-likelihood,
E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + h )]π(yobs |θ0 )K( )dyobs dx,
has its maximum at θ = θ0 .
[Last line of proof contains a typo]
True for any choice of summary statistic?
[Imposes at least identifiability...]
Relevant in asymptotia and not for the data
23. Semi-automatic ABCA Discussion
Summary statistic selection
Noisy ABC
Calibrated ABC
Corollary
For noisy ABC, the ABC posterior converges onto a point mass on
the true parameter value as m → ∞.
For standard ABC, not always the case (unless h goes to zero).
Strength of regularity conditions (c1) and (c2) in Bernardo
& Smith, 1994?
[constraints on posterior]
Some condition upon summary statistic?
24. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Loss motivated statistic
Under quadratic loss function,
Theorem
ˆ
(i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when
ˆ = E(θ|yobs ) (!)
θ
(ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs )
ˆ
iii If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ]
ˆ
E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK(x)dx + o(h2 ).
Measure-theoretic difficulties?
dependence of sobs on h makes me uncomfortable
Relevant for choice of K?
25. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Optimal summary statistic
“We take a different approach, and weaken the
requirement for πABC to be a good approximation to
π(θ|yobs ). We argue for πABC to be a good
approximation solely in terms of the accuracy of certain
estimates of the parameters.” (p.5)
From this result, F&P derive their choice of summary statistic,
S(y) = E(θ|y)
[almost sufficient]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
26. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Optimal summary statistic
“We take a different approach, and weaken the
requirement for πABC to be a good approximation to
π(θ|yobs ). We argue for πABC to be a good
approximation solely in terms of the accuracy of certain
estimates of the parameters.” (p.5)
From this result, F&P derive their choice of summary statistic,
S(y) = E(θ|y)
[EABC [θ|S(yobs )] = E[θ|yobs ]]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
27. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Caveat
Since E(θ|yobs ) is most usually unavailable, F&P suggest
(i) use a pilot run of ABC to determine a region of non-negligible
posterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data to
estimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
28. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Approximating the summary statistic
As Beaumont et al. (2002) and Blum and Fran¸ois (2010), F&P
c
use a linear regression to approximate E(θ|yobs ):
(i)
θi = β0 + β (i) f (yobs ) + i
29. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Applications
The paper’s second half covers:
g-and-k-distribution
stochastic kinetic biochemical networks
LotkaVolterra model
Ricker map ecological model
M/G/1-queue
tuberculosis bacteria genotype data
30. Semi-automatic ABCA Discussion
Summary statistic selection
Optimal summary statistic
Questions
dependence on h and S(·) in the early stage
reduction of Bayesian inference to point estimation
approximation error in step (iii) not accounted for
not parameterisation invariant
practice shows that proper approximation to genuine posterior
distributions stems from using a (much) larger number of
summary statistics than the dimension of the parameter
the validity of the approximation to the optimal summary
statistic depends on the quality of the pilot run;
important inferential issues like model choice are not covered
by this approach.