A Critique of the Proposed National Education Policy Reform
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
1. Discussion of
“Approximate Bayesian Computation (ABC) as the new
empirical Bayes approach” by Christian Robert
The validation of ABC
Francesco Pauli
DEAMS - University of Trieste
Padova, March, 21st 2013
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 1 / 19
2. ABC: picture
We actually have πABC (θ|y ) = π(θ|s(Y ) ∈ Uε (s(yobs ))) or a non
parametric approximation.
We can at best aim at π(θ|s(y ))
We want π(θ|y )
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 2 / 19
3. ABC: picture
We actually have πABC (θ|y ) = π(θ|s(Y ) ∈ Uε (s(yobs ))) or a non
parametric approximation.
We can at best aim at π(θ|s(y ))
We want π(θ|y )
What legitimates using πABC (θ|y )?
Type of justification is also connected to whether ABC is a computational
tool or a ‘new inference machine’.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 2 / 19
4. ABC: picture
We actually have πABC (θ|y ) = π(θ|s(Y ) ∈ Uε (s(yobs ))) or a non
parametric approximation.
this is the easy part, there are various answers
“ε matters little if “small enough””,
can be included in estimation.
We can at best aim at π(θ|s(y ))
We want π(θ|y )
What legitimates using πABC (θ|y )?
Type of justification is also connected to whether ABC is a computational
tool or a ‘new inference machine’.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 2 / 19
5. ABC: picture
We actually have πABC (θ|y ) = π(θ|s(Y ) ∈ Uε (s(yobs ))) or a non
parametric approximation.
this is the easy part, there are various answers
“ε matters little if “small enough””,
can be included in estimation.
We can at best aim at π(θ|s(y ))
we need a good statistic S,
good with respect to what?
consistency,
coverage.
We want π(θ|y )
What legitimates using πABC (θ|y )?
Type of justification is also connected to whether ABC is a computational
tool or a ‘new inference machine’.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 2 / 19
6. Legitimacy: consistency results I
πABC (θ|s) consistent for π(θ|s)
It is easily seen that, as ε → 0, πABC (θ|Data) tend to π(θ|s(y )).
Biau et al (2012) define the approximation as a k nearest neighbour
and prove consistency.
What about π(θ|s) and π(θ|y )?
Equal if S is sufficient for θ.
Consistent if S ‘tends to sufficiency’.
The approach taken is to find conditions for insufficient S to guarantee
consistency.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 3 / 19
7. Legitimacy: consistency results II
Using the framework of noisy ABC, consistency for π(θ|y ) is shown
by Dean, Singh, Jasra, & Peters, 2011.
The proof is written assuming that observations and not a summary
statistic are used.
However, they also say that “If the mapping S() preserves the
identifiability of the system, that is to say if assumption (A1) also
holds for the HMMs with observations S(Y1); S(Y2), then it is trivial
to see that assumptions (A2)-(A7) will also be preserved for all
reasonable choices of S() and thus that Theorems 1, 2 and 3 will also
hold for ABC MLE performed using the summary statistic”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 4 / 19
8. Legitimacy: consistency results III
The conclusion is “Theorems 1 and 2 provide a theoretical
justification for the ABC MLE procedure analogous to that provided
for the standard MLE procedure by the classical notion of asymptotic
consistency. In particular they show that an arbitrary degree of
accuracy in the parameter estimate can be achieved given sufficient
data and a sufficiently small ε.”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 5 / 19
9. Legitimacy: consistency results IV
Fearnhead and Prangle (2012) also put forward a consistency result
within the noisy ABC framework,
in particular assuming that a coverage property holds: “[. . .] under
repeated sampling from the prior, data and summary statistics, events
assigned probability q by the ABC posterior will occur with probability
q.”
they show that “[. . .] under the standard regularity conditions
(Bernardo and Smith, 1994), the noisy ABC posterior will converge
onto a point mass on the true parameter value as m → ∞.”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 6 / 19
10. Legitimacy: consistency results V
Marin et al (2012) focus on consistency in model selection;
they state condition for the summary statistics in order to obtain
consistent model selection through Bayes factors.
they also point out that “(a) different statistics should be used for
estimation and for testing and (b) that they should not be mixed in a
single summary statistic. ”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 7 / 19
11. Legitimacy: consistency results VI
Connection between the different results? Are they equivalent or
cover different situations?
Undoubtedly, they offer legitimacy to ABC procedures.
It seems to me they go into the direction of justifying the procedure
per se, not as an approximation of the standard ABC (this might be
relevant to interpreting ABC as a mere computational tool or a new
inference type).
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 8 / 19
12. Is consistency enough?
Consistency is a nice property.
It does not say how far from the target π(θ|y ) do we get in a specific
instance.
The strategy is to find a class of statistics S for which ABC is
consistent.
Does not allow to say which of the different (insufficient) statistics or
strategies to select the statistics is better.
It is true that some of the aforementioned works state optimality of
particular strategies, for instance Fearnhead & Prangle state that
“[. . .] choosing summary statistics as the posterior means produces
ABC estimators that are optimal in terms of minimizing quadratic
loss”,
but it is also true that when different procedures are compared the
picture is not totally clear.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 9 / 19
13. Comparison of procedures
Blum et al (2013) compare methods of dimension reduction in ABC;
that is, the methods differ because of the choice of the summary
statistics;
the comparison is based on simulations on three different models;
they put forward that “the most suitable set of summary statistics for
an analysis may be dataset dependent”;
eventually, no uniformly best method is found.
This would call for “application specific” validation to complement the
theory.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 10 / 19
14. Diagnostics based on coverage properties
Prangle et al (2013) propose diagnostics based on the coverage
property “For inference on a continuous scalar parameter, θ, an
informal definition is that a given credible interval based on (θ|y0 ),
where y0 ∼ π(y |θ0 , m0 ) for fixed m0 , should contain the true
parameter, θ0 , the appropriate proportion of times.”
Diagnostics are obtained repeatedly constructing ABC approximations
for known values of the parameters (for known models) and checking
that the coverage property holds.
Technically, these becomes a problem of checking uniform distribution
of p-values. Details
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 11 / 19
15. What kind of justification is most appropriate?
Is using validation the most appropriate thing to do?
Can we say something about how far do we get from π(θ|y )?
Does using validation qualify the method as approximation or new
inference?
Prangle et al (2013) say that “Note that the above results do not
prove that the posterior π(θ|y ) is the only distribution to satisfy
coverage with respect to our choice of H. However, we are unaware of
any other such distributions that are likely to arise in the ABC
context.” this may be more coherent with seeing ABC as a new
inference machine.
Connections with Monahan and Boos (1992)?
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 12 / 19
16. ABCel
In ABCel the likelihood is substituted by the EL;
no simulation of the sample is involved;
As a side note, it seems to me that this is a different framework, even
if we look at the empirical likelihood as a summary statistic: is ABCel
A?
Anyway, since we substitute the likelihood with a surrogate, the issue
of validating the results we get is relevant.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 13 / 19
17. Legitimacy of EL in (A)BC I
Lazar (2003) proposed using EL in the Bayesian paradigm;
the procedure seem to lack a general justification;
in particular a simulation study is performed;
the conclusion is that “Based on both the Monahan & Boos (1992)
heuristic and an examination of the frequentist properties of Bayesian
intervals, it appears reasonable to use empirical likelihood within the
Bayesian paradigm.”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 14 / 19
18. Legitimacy of EL in (A)BC II
however “These results need to be interpreted with some care. While
they indicate that it is feasible to consider a Bayesian inferential
procedure based on replacing the data likelihood with empirical
likelihood, the validity of the posterior inference needs to be
established for each case individually. For example, as demonstrated
in an unpublished Carnegie Mellon University technical report by L. A.
Wasserman, empirical likelihood for the median and Jeffreys’
likelihood are related, and hence the two can be expected to exhibit
similar poor behaviour.”
This may suggest that the proposals above for the diagnostics in ABC
can be exploited here.
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 15 / 19
19. Legitimacy of EL in (A)BC III
Adimari & Pauli (2010) also employed EL as a surrogate for the
proper likelihood in the context of pairwise likelihood inference;
they argue that “ based on general results for empirical likelihood,
[. . .] such a surrogate has good asymptotic properties.”
In particular, asymptotic normality with covariance matrix the
Godambe information matrix is put forward as a justification;
they also explored its efficacy “by comparing it with the ordinary
posterior distribution on simulated datasets.”
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 16 / 19
20. Diagnostics based on coverage properties, details I
g (θ|y ), Gy (θ) resp. density and df approximating π(θ|y );
B(α) : [0, 1] → B([0, 1]) s.t. BM(α) = α;
C (y , α) = G 1 [B(α)] a cred. int. according to g ;
H(θ, y ) df for (θ, y ).
g satisfies the coverage property w.r. to H(θ0 , y0 ) if ∀ B, α ∈ [0, 1]
P(θ0 ∈ C (y0 , α)) = α
That is, if
p0 = Gy0 (θ0 ) ∼ U(0, 1)
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 17 / 19
21. Diagnostics based on coverage properties, details II
π(θ|y0)
g(θ|y0) α
y
C(y0,α)
θ
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 18 / 19
22. Diagnostics based on coverage properties, details III
α
y
C(y0,α)
θ
Back
F. Pauli (DEAMS - Univ. of Trieste) ABC: validation Padova, March, 21st 2013 19 / 19