Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013

Discussion of
“Approximate Bayesian Computation (ABC) as the new
empirical Bayes approach” by Christian Robert
The validation of ABC

Francesco Pauli

DEAMS - University of Trieste

Padova, March, 21st 2013

F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 1 / 19

ABC: picture

We actually have πABC (θ|y ) = π(θ|s(Y ) ∈ Uε (s(yobs ))) or a non
parametric approximation.

We can at best aim at π(θ|s(y ))

We want π(θ|y )


ABC: picture



We want π(θ|y )

What legitimates using πABC (θ|y )?
Type of justiﬁcation is also connected to whether ABC is a computational
tool or a ‘new inference machine’.


ABC: picture

this is the easy part, there are various answers
“ε matters little if “small enough””,
can be included in estimation.

We want π(θ|y )



ABC: picture

this is the easy part, there are various answers
“ε matters little if “small enough””,
can be included in estimation.
we need a good statistic S,
good with respect to what?
consistency,
coverage.
We want π(θ|y )



Legitimacy: consistency results I

πABC (θ|s) consistent for π(θ|s)
It is easily seen that, as ε → 0, πABC (θ|Data) tend to π(θ|s(y )).
Biau et al (2012) define the approximation as a k nearest neighbour
and prove consistency.
What about π(θ|s) and π(θ|y )?
Equal if S is sufficient for θ.
Consistent if S ‘tends to sufficiency’.
The approach taken is to find conditions for insufficient S to guarantee
consistency.


Legitimacy: consistency results II

Using the framework of noisy ABC, consistency for π(θ|y ) is shown
by Dean, Singh, Jasra, & Peters, 2011.
The proof is written assuming that observations and not a summary
statistic are used.
However, they also say that “If the mapping S() preserves the
identiﬁability of the system, that is to say if assumption (A1) also
holds for the HMMs with observations S(Y1); S(Y2), then it is trivial
to see that assumptions (A2)-(A7) will also be preserved for all
reasonable choices of S() and thus that Theorems 1, 2 and 3 will also
hold for ABC MLE performed using the summary statistic”


Legitimacy: consistency results III

The conclusion is “Theorems 1 and 2 provide a theoretical
justification for the ABC MLE procedure analogous to that provided
for the standard MLE procedure by the classical notion of asymptotic
consistency. In particular they show that an arbitrary degree of
accuracy in the parameter estimate can be achieved given sufficient
data and a sufficiently small ε.”


Legitimacy: consistency results IV

Fearnhead and Prangle (2012) also put forward a consistency result
within the noisy ABC framework,
in particular assuming that a coverage property holds: “[. . .] under
repeated sampling from the prior, data and summary statistics, events
assigned probability q by the ABC posterior will occur with probability
q.”
they show that “[. . .] under the standard regularity conditions
(Bernardo and Smith, 1994), the noisy ABC posterior will converge
onto a point mass on the true parameter value as m → ∞.”


Legitimacy: consistency results V

Marin et al (2012) focus on consistency in model selection;
they state condition for the summary statistics in order to obtain
consistent model selection through Bayes factors.
they also point out that “(a) diﬀerent statistics should be used for
estimation and for testing and (b) that they should not be mixed in a
single summary statistic. ”


Legitimacy: consistency results VI

Connection between the different results? Are they equivalent or
cover different situations?
Undoubtedly, they offer legitimacy to ABC procedures.
It seems to me they go into the direction of justifying the procedure
per se, not as an approximation of the standard ABC (this might be
relevant to interpreting ABC as a mere computational tool or a new
inference type).


Is consistency enough?

Consistency is a nice property.
It does not say how far from the target π(θ|y ) do we get in a specific
instance.
The strategy is to find a class of statistics S for which ABC is
consistent.
Does not allow to say which of the different (insufficient) statistics or
strategies to select the statistics is better.
It is true that some of the aforementioned works state optimality of
particular strategies, for instance Fearnhead & Prangle state that
“[. . .] choosing summary statistics as the posterior means produces
ABC estimators that are optimal in terms of minimizing quadratic
loss”,
but it is also true that when different procedures are compared the
picture is not totally clear.

Comparison of procedures

Blum et al (2013) compare methods of dimension reduction in ABC;
that is, the methods differ because of the choice of the summary
statistics;
the comparison is based on simulations on three different models;
they put forward that “the most suitable set of summary statistics for
an analysis may be dataset dependent”;
eventually, no uniformly best method is found.

This would call for “application specific” validation to complement the
theory.


Diagnostics based on coverage properties

Prangle et al (2013) propose diagnostics based on the coverage
property “For inference on a continuous scalar parameter, θ, an
informal deﬁnition is that a given credible interval based on (θ|y0 ),
where y0 ∼ π(y |θ0 , m0 ) for ﬁxed m0 , should contain the true
parameter, θ0 , the appropriate proportion of times.”
Diagnostics are obtained repeatedly constructing ABC approximations
for known values of the parameters (for known models) and checking
that the coverage property holds.
Technically, these becomes a problem of checking uniform distribution
of p-values. Details


What kind of justiﬁcation is most appropriate?

Is using validation the most appropriate thing to do?
Can we say something about how far do we get from π(θ|y )?
Does using validation qualify the method as approximation or new
inference?
Prangle et al (2013) say that “Note that the above results do not
prove that the posterior π(θ|y ) is the only distribution to satisfy
coverage with respect to our choice of H. However, we are unaware of
any other such distributions that are likely to arise in the ABC
context.” this may be more coherent with seeing ABC as a new
inference machine.
Connections with Monahan and Boos (1992)?


ABCel

In ABCel the likelihood is substituted by the EL;
no simulation of the sample is involved;
As a side note, it seems to me that this is a diﬀerent framework, even
if we look at the empirical likelihood as a summary statistic: is ABCel
A?
Anyway, since we substitute the likelihood with a surrogate, the issue
of validating the results we get is relevant.


Legitimacy of EL in (A)BC I

Lazar (2003) proposed using EL in the Bayesian paradigm;
the procedure seem to lack a general justiﬁcation;
in particular a simulation study is performed;
the conclusion is that “Based on both the Monahan & Boos (1992)
heuristic and an examination of the frequentist properties of Bayesian
intervals, it appears reasonable to use empirical likelihood within the
Bayesian paradigm.”


Legitimacy of EL in (A)BC II

however “These results need to be interpreted with some care. While
they indicate that it is feasible to consider a Bayesian inferential
procedure based on replacing the data likelihood with empirical
likelihood, the validity of the posterior inference needs to be
established for each case individually. For example, as demonstrated
in an unpublished Carnegie Mellon University technical report by L. A.
Wasserman, empirical likelihood for the median and Jeﬀreys’
likelihood are related, and hence the two can be expected to exhibit
similar poor behaviour.”
This may suggest that the proposals above for the diagnostics in ABC
can be exploited here.


Legitimacy of EL in (A)BC III

Adimari & Pauli (2010) also employed EL as a surrogate for the
proper likelihood in the context of pairwise likelihood inference;
they argue that “ based on general results for empirical likelihood,
[. . .] such a surrogate has good asymptotic properties.”
In particular, asymptotic normality with covariance matrix the
Godambe information matrix is put forward as a justiﬁcation;
they also explored its eﬃcacy “by comparing it with the ordinary
posterior distribution on simulated datasets.”


Diagnostics based on coverage properties, details I

g (θ|y ), Gy (θ) resp. density and df approximating π(θ|y );
B(α) : [0, 1] → B([0, 1]) s.t. BM(α) = α;
C (y , α) = G 1 [B(α)] a cred. int. according to g ;
H(θ, y ) df for (θ, y ).
g satisﬁes the coverage property w.r. to H(θ0 , y0 ) if ∀ B, α ∈ [0, 1]

P(θ0 ∈ C (y0 , α)) = α

That is, if
p0 = Gy0 (θ0 ) ∼ U(0, 1)


Diagnostics based on coverage properties, details II

π(θ|y0)

g(θ|y0) α
y

C(y0,α)

θ


Diagnostics based on coverage properties, details III

α
y

C(y0,α)

θ

Back


Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013

Similar to Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013 (19)

More from Christian Robert

More from Christian Robert (20)

Recently uploaded

Recently uploaded (20)

Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013