UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Â
PAWL - GPU meeting @ Warwick
1. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Parallel Adaptive WangâLandau Algorithm
Pierre E. Jacob
CEREMADE - Universit´ Paris Dauphine, funded by AXA Research
e
GPU in Computational Statistics
January 25th, 2012
joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford),
Pierre Del Moral (INRIA & Universit´ de Bordeaux), Robin J. Ryder (Dauphine)
e
Pierre E. Jacob PAWL 1/ 29
2. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Outline
1 WangâLandau algorithm
2 Improvements
Automatic Binning
Adaptive proposals
Parallel Interacting Chains
3 Example: variable selection
4 Conclusion
Pierre E. Jacob PAWL 2/ 29
3. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Context
unnormalized target density Ď
on a state space X
A kind of adaptive MCMC algorithm
It iteratively generates a sequence Xt .
The stationary distribution is not Ď itself.
At each iteration a diďŹerent stationary distribution is targeted.
Pierre E. Jacob PAWL 3/ 29
4. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Partition the space
The state space X is cut into d bins:
d
X = Xi and âi = j Xi ⊠Xj = â
i=1
Goal
The generated sequence spends a desired proportion Ďi of
time in each bin Xi ,
within each bin Xi the sequence is asymptotically distributed
according to the restriction of Ď to Xi .
Pierre E. Jacob PAWL 4/ 29
5. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Stationary distribution
DeďŹne the mass of Ď over Xi by:
Ďi = Ď(x)dx
Xi
The stationary distribution of the WL algorithm is:
ĎJ(x)
Ď (x) â Ď(x) Ă
Ë
ĎJ(x)
where J(x) is the index such that x â XJ(x)
Pierre E. Jacob PAWL 5/ 29
6. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Example with a bimodal, univariate target density: Ď and two Ď
Ë
corresponding to diďŹerent partitions. Here Ďi = d â1 .
Original Density, with partition lines Biased by X Biased by Log Density
0
â2
â4
Log Density
â6
â8
â10
â12
â5 0 5 10 15 â5 0 5 10 15 â5 0 5 10 15
X
Pierre E. Jacob PAWL 6/ 29
7. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Plugging estimates
In practice we cannot compute Ďi analytically. Instead we plug in
estimates θt (i) of Ďi /Ďi at iteration t, and deďŹne the distribution
Ďθt by:
1
Ďθt (x) â Ď(x) Ă
θt (J(x))
MetropolisâHastings
The algorithm does a MetropolisâHastings step, aiming Ďθt at
iteration t, generating a new point Xt , updating θt . . .
Pierre E. Jacob PAWL 7/ 29
8. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Estimate of the bias
The update of the estimated bias θt (i) is done according to:
θt (i) â θtâ1 (i) [1 + Îłt (1 Xi (Xt ) â Ďi )]
I
with d the number of bins, Îłt a decreasing sequence or âstep
sizeâ. E.g. Îłt = 1/t.
If 1 Xi (Xt ) then θt (i) increases;
I
otherwise θt (i) decreases.
Pierre E. Jacob PAWL 8/ 29
9. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
The algorithm itself
1: First, âi â {1, . . . , d} set θ0 (i) â 1.
2: Choose a decreasing sequence {Îłt }, typically Îłt = 1/t.
3: Sample X0 from an initial distribution Ď0 .
4: for t = 1 to T do
5: Sample Xt from Ptâ1 (Xtâ1 , ¡), a MH kernel with invariant
distribution Ďθtâ1 (x).
6: Update the bias: θt (i) â θtâ1 (i)[1 + Îłt (1 Xi (Xt ) â Ďi )].
I
7: end for
Pierre E. Jacob PAWL 9/ 29
10. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Result
In the end we get:
a sequence Xt asymptotically following Ď ,
Ë
as well as estimates θt (i) of Ďi /Ďi .
Pierre E. Jacob PAWL 10/ 29
11. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Usual improvement: Flat Histogram
Wait for the FH criterion to occur before decreasing Îłt .
νt (i)
(FH) max â Ďi < c
i=1...d t
t
where νt (i) = k=1 1 Xi (Xk )
I and c > 0.
WL with stochastic schedule
Let Îşt be the number of times FH was reached at iteration t. Use
γκt at iteration t instead of γt . If FH reached, reset νt (i) to 0.
Pierre E. Jacob PAWL 11/ 29
12. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
WangâLandau
Theoretical Understanding of WL with deterministic schedule
The schedule γt decreases at each iteration, hence θt converges,
hence Pt (¡, ¡) converges . . . â âdiminishing adaptationâ.
Theoretical Understanding of WL with stochastic schedule
Flat Histogram is reached in ďŹnite time for any Îł, Ď, c if one uses
the following update:
log θt (i) â log θtâ1 (i) + Îł(1 Xt (Xt ) â Ďi )
I
instead of
θt (i) â θtâ1 (i)[1 + Îł(1 Xt (Xt ) â Ďi )]
I
Pierre E. Jacob PAWL 12/ 29
13. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Automate Binning
Maintain some kind of uniformity within bins. If non-uniform, split
the bin.
Frequency
Frequency
Log density Log density
(a) Before the split (b) After the split
Pierre E. Jacob PAWL 13/ 29
14. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Adaptive proposals
Target a speciďŹc acceptance rate:
Ďt+1 = Ďt + Ďt (21 > 0.234) â 1)
I(A
Or use the empirical covariance of the already-generated chain:
Σt = δ à Cov (X1 , . . . , Xt )
Pierre E. Jacob PAWL 14/ 29
15. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
(1) (N)
N chains (Xt , . . . , Xt ) instead of one.
targeting the same biased distribution Ďθt at iteration t,
sharing the same estimated bias θt at iteration t.
The update of the estimated bias becomes:
 
N
1 (j)
log θt (i) â log θtâ1 (i) + γκt ďŁ 1 Xi (Xt ) â Ďi 
I
N
j=1
Pierre E. Jacob PAWL 15/ 29
16. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
How âparallelâ is PAWL?
The algorithmâs additional cost compared to independent parallel
MCMC chains lies in:
1 N (j)
getting the proportions N j=1 1 Xi (Xt )
I
updating (θt (1), . . . , θt (d)).
Pierre E. Jacob PAWL 16/ 29
17. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Example: Normal distribution
Histogram of the binned coordinate
0.4
0.3
Density
0.2
0.1
0.0
â4 â2 0 2 4
binned coordinate
Pierre E. Jacob PAWL 17/ 29
18. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Reaching Flat Histogram
40
30
#FH
N=1
20 N = 10
N = 100
10
2000 4000 6000 8000 10000
iterations
Pierre E. Jacob PAWL 18/ 29
19. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Stabilization of the log penalties
10
5
value
0
â5
â10
2000 4000 6000 8000 10000
iterations
Figure: log θt against t, for N = 1
Pierre E. Jacob PAWL 19/ 29
20. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Stabilization of the log penalties
10
5
value
0
â5
â10
2000 4000 6000 8000 10000
iterations
Figure: log θt against t, for N = 10
Pierre E. Jacob PAWL 20/ 29
21. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Stabilization of the log penalties
10
5
value
0
â5
â10
2000 4000 6000 8000 10000
iterations
Figure: log θt against t, for N = 100
Pierre E. Jacob PAWL 21/ 29
22. WangâLandau algorithm
Automatic Binning
Improvements
Adaptive proposals
Example: variable selection
Parallel Interacting Chains
Conclusion
Parallel Interacting Chains
Multiple eďŹects of parallel chains
 
N
1 (j)
log θt (i) â log θtâ1 (i) + γκt ďŁ 1 Xi (Xt ) â Ďi 
I
N
j=1
FH is reached more often when N increases, hence γκt
decreases quicker;
log θt tends to vary much less when N increases, even for a
ďŹxed value of Îł.
Pierre E. Jacob PAWL 22/ 29
23. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Variable selection
Settings
Pollution data as in McDonald & Schwing (1973). For 60
metropolitan areas:
15 possible explanatory variables (including precipitation,
population per household, . . . ) (denoted by X ),
the response variable Y is the age-adjusted mortality rate.
This leads to 32,768 possible models to explain the data.
Pierre E. Jacob PAWL 23/ 29
24. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Variable selection
Introduce
Îł â {0, 1}p the âvariable selectorâ,
qÎł represents the number of variables in model âÎłâ,
g some large value (g -prior, see Zellner 1986, Marin & Robert
2007).
Posterior distribution
Ď(Îł|y, X) â (g + 1)â(qÎł +1)/2
ân/2
g
T
y yâ yT XÎł (XT XÎł )â1 XÎł y
Îł .
g +1
Pierre E. Jacob PAWL 24/ 29
25. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Variable selection
Most naive MH algorithm
The proposal is ďŹipping a variable on / oďŹ at random, at each
iteration.
Binning
Along values of log Ď(x), found with a preliminary exploration, in
20 bins.
Pierre E. Jacob PAWL 25/ 29
26. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Variable selection
N=1 N = 10 N = 100
0
â20
Log(θ)
â40
â60
20000 40000 60000 80000 5000 10000 15000 20000 25000 500 1000 1500 2000 2500 3000 3500
Iteration
Figure: Each run took 2 minutes (+/- 5 seconds). Dotted lines show the
real Ď.
Pierre E. Jacob PAWL 26/ 29
27. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Variable selection
WangâLandau MetropolisâHastings, Temp = 1
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Model Saturation
0.0
MetropolisâHastings, Temp = 10 MetropolisâHastings, Temp = 100
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500
Iteration
Figure: qÎł /p (mean and 95% interval) along iterations, for N = 100.
Pierre E. Jacob PAWL 27/ 29
28. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Conclusion
Automatic binning but. . .
We still have to deďŹne a range of plausible (or âinterestingâ)
values.
Parallel Chains
Seems reasonable to use more than N = 1 chain, with or without
GPUs. No theoretical validation of this yet. Optimal N for a given
computational eďŹort?
Need of a stochastic schedule?
It seems that using large N makes the use and hence the choice of
Îłt irrelevant.
Pierre E. Jacob PAWL 28/ 29
29. WangâLandau algorithm
Improvements
Example: variable selection
Conclusion
Would you like to know more?
Article: An Adaptive Interacting Wang-Landau Algorithm for
Automatic Density Exploration, with L. Bornn, P. Del Moral, A.
Doucet.
Article: The Wang-Landau algorithm reaches the Flat
Histogram criterion in ďŹnite time, with R. Ryder.
Software: PAWL, available on CRAN:
install.packages("PAWL")
References:
F. Wang, D. Landau, Physical Review E, 64(5):56101
Y. Atchad´, J. Liu, Statistica Sinica, 20:209-233
e
Pierre E. Jacob PAWL 29/ 29