Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
1. Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’s
Monte Carlo Simulations, Sampling and
problem
Probability
Monte Carlo
Markov Chain Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Xin-She Yang
Pseudorandom
Pseudorandom
number
generation
Other
distributions c 2010
Limitations
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
2. Estimating π
Monte Carlo
& MCMC
Xin-She Yang How to estimate π using only a ruler and some match sticks?
Monte Carlo
Estimating π
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
3. Buffon’s Needle Problem
Monte Carlo
& MCMC
Buffon’s needle problem (1733). Probability of crossing a line
Xin-She Yang 2 L
p= · ,
Monte Carlo
π d
Estimating π where L = length of needles, and d =spacing.
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
4. Probability of Crossing a Line
Monte Carlo
& MCMC
Xin-She Yang Since p ≈ n/N ≈ 2L/πd, we have
Monte Carlo
2N L
Estimating π
Buffon’s
π≈ · .
problem n d
Probability
Monte Carlo
Monte Carlo
integration
Lazzarini (1901): L = 5d/6, N = 3408, n = 1808, so
Quality of
Sampling
Quasi-Monte 2 × 3408 5
Carlo π≈ · ≈ 3.14159290.
Pseudorandom 1808 6
Pseudorandom
number
generation
Other
distributions Too accurate?! Is this right? What happens when n = 1809?
Limitations √
Multivariate
distributions Errors ∼ 1/ N ∼ 2%.
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
5. Monte Carlo Methods
Monte Carlo
& MCMC
Everyone has used Monte Carlo methods in some way ...
Xin-She Yang
Monte Carlo
Estimating π
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Markov Measure temperatures, choose a product, ...
Chains
Markov chains
Markov chains
Taste soup, wine ...
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
6. Monte Carlo Integration
Monte Carlo
& MCMC n
1
Xin-She Yang I= fdv = V fi + O(ǫ),
Ω N
Monte Carlo
i =1
Estimating π
1 N 2 √
Buffon’s
problem N i =1 fi − µ2
Probability ǫ∼ ∼ O(1/ N).
Monte Carlo N
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
7. Importance and Quality of the Samples
Monte Carlo
& MCMC
Higher dimensions – even more challenging!
Xin-She Yang
I= ... f (u, v , ..., w ) du dv ...dw .
Monte Carlo
Estimating π
Buffon’s
problem
√
Probability Errors ∼ 1/ N
Monte Carlo
Monte Carlo
integration
Quality of
Higher dimensional integrals
Sampling
Quasi-Monte
Carlo
How to distribute these sampling points?
Pseudorandom
Pseudorandom
number
Regular grids: E ∼ O(N −2/d ) in d ≥ 4 dimensions (not
generation
Other
enough!)
distributions
Limitations
Multivariate
distributions
Strategies: importance sampling, Latin hypercube, ...
Markov
Chains
Markov chains
Any other ways?
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
8. Quasi-Monte Carlo Methods
Monte Carlo
& MCMC
In essence, that is to distribute (consecutive) sampling points
Xin-She Yang
as far away as possible, using quasi-random or low-discrepancy
numbers (not pseudo-random)... Halton, Sobol, Corput ...
Monte Carlo
Estimating π
Buffon’s
For example, Corput express an integer n as a prime base b
problem
Probability m
Monte Carlo
Monte Carlo n= aj (n)b j , aj ∈ {0, 1, 2, ..., b − 1}.
integration
Quality of j=0
Sampling
Quasi-Monte
Carlo Then, it is reversed or reflected
Pseudorandom
m
Pseudorandom 1
number
generation φb (n) = aj (n) .
Other b j+1
distributions j=0
Limitations
Multivariate
distributions
For example, 0, 1, 2, ..., 15 =⇒ 0, 1 , 1 , 3 , 1 , ..., 15 .
2 4 4 8 16
Markov
Chains
Markov chains
Errors ∼ O(1/N)
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
9. Pseudorandom numbers – by deterministic
sequences
Monte Carlo
& MCMC
Uniform Distributions:
Xin-She Yang
di = (adi −1 + c) mod m,
Monte Carlo
Estimating π
Classic IBM generator:
Buffon’s
m = 231 (strong correlation!)
problem
Probability a = 65539, c = 0,
Monte Carlo
Monte Carlo
integration
Quality of
In fact, correlation coefficient is 1!
Sampling
Quasi-Monte
Better choice (old Matlab):
Carlo
Pseudorandom a = 75 = 16807, c = 0, m =31 −1 = 2, 147, 483, 647.
Pseudorandom
number
generation
Other
If scaled by m, all numbers are in [1/m, (m − 1)/m].
distributions
Limitations New Matlab: [ǫ, 1 − ǫ], ǫ = 2−53 ≈ 1.1 × 10−16 .
Multivariate
distributions
Markov
Chains
IEEE: 64-bits system = 53 bits for a signed fraction in base 2
Markov chains
Markov chains
and 11 bits for a signed exponent.
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
10. Other Distributions
Monte Carlo
& MCMC
Inverse transform method, rejection method, Mersenne twister,
Xin-She Yang
..., Markov chain Monte Carlo.
2
√1 e −u /2 ,
Monte Carlo
Estimating π Standard norm distribution: p(u) = 2π
Buffon’s
v −u 2 /2 du
CDF: Φ(v ) = √1 = 1 v
2 [1 + ( 2 )],
problem
−∞ e
Probability
√
2π
Monte Carlo
Monte Carlo √
integration
Quality of
v = Φ−1 (u) = 2 erf−1 (2u − 1),
Sampling 1200 10000
Quasi-Monte
Carlo
1000
8000
Pseudorandom
Pseudorandom 800
number 6000
generation
600
Other
distributions 4000
Limitations 400
Multivariate
distributions 2000
200
Markov
Chains 0
0 0.2 0.4 0.6 0.8 1
0
-6 -4 -2 0 2 4 6
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
11. Transform method: Limitations
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
√
Estimating π v = Φ−1 (u) = 2 erf−1 (2u − 1),
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo √
integration
π πx 3 7π 2 x 5 127π 3 x 7
Quality of
Sampling erf−1 (x) = x+ + + + ··· .
Quasi-Monte
Carlo
2 12 480 40320
Pseudorandom
Pseudorandom
number
generation
Not so easy to calculate!
Other
distributions
Limitations
Sometimes, the inverse may not be possible.
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
12. Multivariate Distributions
Monte Carlo
& MCMC
Bivariate normal distributions:
Xin-She Yang 1 −(v1 +v2 )/2
2 2
p(v1 , v2 ) = e .
Monte Carlo
2π
Estimating π
Buffon’s Box-M¨ller method: from u1 , u2 ∼ uniform distributions
u
problem
Probability
Monte Carlo
Monte Carlo
v1 = −2 ln u1 cos(2πu2 ), v2 = −2 ln u1 sin(2πu2 ).
integration
Quality of
Sampling
Quasi-Monte
Carlo
Problems
Pseudorandom
Pseudorandom
number
Difficult to calculate the inverse in most cases
generation
Other
(sometimes, even impossible!).
distributions
Limitations
Multivariate
Other methods (e.g., rejection method) are inefficient.
distributions
Markov
Chains
Markov chains
So – the Markov chain Monte Carlo (MCMC) way!
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
13. Random Walk down the Markov Chains
Monte Carlo
& MCMC
Random walk – A drunkard’s walk:
Xin-She Yang
ut+1 = µ + ut + wt ,
Monte Carlo
Estimating π where wt is a random variable, and µ is the drift.
Buffon’s
problem For example, wt ∼ N(0, σ 2 ) (Gaussian).
Probability
Monte Carlo
Monte Carlo 25 10
integration
Quality of 20
Sampling 5
Quasi-Monte
Carlo 15
0
Pseudorandom 10
Pseudorandom -5
number 5
generation
-10
Other 0
distributions
Limitations -5
-15
Multivariate
distributions -10 -20
0 100 200 300 400 500 -15 -10 -5 0 5 10 15 20
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
14. Markov Chains
Monte Carlo
& MCMC
Xin-She Yang Markov chain: the next state only depends on the current state
and the transition probability.
Monte Carlo
Estimating π
Buffon’s
problem
Probability
P(i , j) ≡ P(Vt+1 = Sj V0 = Sp , ..., Vt = Si )
Monte Carlo
Monte Carlo
integration
Quality of
= P(Vt+1 = Sj Vt = Sj ),
Sampling
Quasi-Monte
Carlo
=⇒ Pij πi∗ = Pji πj∗ , π ∗ = stionary probability distribution.
Pseudorandom
Pseudorandom
number
generation
Other
Examples: Brownian motion
distributions
Limitations
Multivariate
distributions
ui +1 = µ + ui + ǫi , ǫi ∼ N(0, σ 2 ).
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
15. Markov Chains
Monte Carlo
& MCMC
Monopoly (board games)
Xin-She Yang
Monte Carlo
Estimating π
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Markov
Chains Monopoly Animation
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
16. A Famous $Billion Markov Chain – PageRank
Monte Carlo
& MCMC
Xin-She Yang Google PageRank Algorithm (by Page et al., 1997)
Monte Carlo
Estimating π
Buffon’s
problem
Probability
Monte Carlo
Monte Carlo
integration
Quality of
Sampling
Quasi-Monte
Carlo
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Limitations
Multivariate
distributions
Billions of web pages: pages = states, link probability ∼ 1/t
Markov
Chains where t ≈ the expectation of the number of clicks.
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
17. Googling as a Markov Chain
(t)
Monte Carlo
(t+1) 1−α Ranki
& MCMC Rankj = +α ,
Xin-She Yang N B(pi )
pi ∈Ω(pi )
Monte Carlo
Estimating π
where N=number of pages, B(pi ) is the link bounds of page
(t=0)
Buffon’s
problem pi , and α=a ranking factor (≈ 0.85). Ranki = 1/N.
Probability
T
Monte Carlo
Monte Carlo
Let R = Rank1 , ..., RankN , and L(pi , pj ) = 0 if no links
integration =⇒
Quality of
Sampling
Quasi-Monte
Carlo
(1 − α)
L(p1 , p1 ) ... L(p1 , pj ) ...L(p1 , pN )
.
.
Pseudorandom
.
Pseudorandom 1 .
R= . + α L(pi , p1 ) L(pi , pj ) ...L(pi , pN ) R,
number
generation N .
. ..
Other . .
distributions .
Limitations (1 − α) L(pN , p1 ) ... L(pN , pN )
Multivariate
distributions
where N L(pi , pj ) = 1. Google Matrix (stochastic, sparse).
Markov
Chains i =1
Markov chains
Markov chains =⇒ a stationary probability distribution R (update monthly).
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
18. Markov Chain Monte Carlo
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Landmarks: Monte Carlo method (1930s, 1945, from 1950s)
Estimating π
Buffon’s
e.g., Metropolis Algorithm (1953), Metropolis-Hastings (1970).
problem
Probability
Monte Carlo
Monte Carlo
Markov Chain Monte Carlo (MCMC) methods – A class of
integration
Quality of
methods.
Sampling
Quasi-Monte
Carlo
Really took off in 1990s, now applied to a wide range of areas:
Pseudorandom
Pseudorandom physics, Bayesian statistics, climate changes, machine learning,
number
generation
Other
finance, economy, medicine, biology, materials and engineering
distributions
Limitations
...
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
19. Metropolis-Hastings
Monte Carlo
& MCMC
The Metropolis-Hastings algorithm algorithm:
Xin-She Yang
1 Begin with any initial θ0 at time t ← 0 such that
Monte Carlo p(θ0 ) > 0
Estimating π
Buffon’s
problem 2 Generating a candidate sample θ∗ ∼ q(θt , .) from a
Probability
Monte Carlo proposal distribution
Monte Carlo
integration
Quality of 3 Evaluate the acceptance probability α(θt , θ∗ ) given by
Sampling
Quasi-Monte
Carlo
p(θ∗ )q(θ∗ , θt )
Pseudorandom α = min ,1
Pseudorandom
number
p(θt )q(θt , θ∗ )
generation
Other
distributions
4 Generate a uniformly-distributed random number u ∼
Limitations
Multivariate Unif[0, 1], and accept θ∗ if α ≥ u. That is, if α ≥ u then
distributions
Markov
θt+1 ← θ∗ else θt+1 ← θt
Chains
Markov chains
5 Increase the counter or time t ← t + 1, and go to step 2
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
20. Mixture distribution: A distribution with known
mean and variance.
Monte Carlo
& MCMC
f (x|µ, σ 2 ) = K αi pi (x|µi , σi2 ),
i =i
K
i =1 αi = 1.
Xin-She Yang
E.g., α1 = α2 = 1/2, µ1 = 2, µ2 = −2 and σ1 = σ2 = 1.
6
Monte Carlo 4
Estimating π 2
Buffon’s
problem 0
Probability -2
Monte Carlo
Monte Carlo -4
0 2000 4000 6000 8000 10000
integration
Quality of
Sampling 0.2
Quasi-Monte 0.18
Carlo
0.16
Pseudorandom
Pseudorandom 0.14
number
generation 0.12
Other 0.1
distributions
Limitations 0.08
Multivariate
distributions 0.06
0.04
Markov
Chains 0.02
Markov chains 0
Markov chains −6 −4 −2 0 2 4 6
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
21. When to Stop the Chain
Monte Carlo
& MCMC As the MCMC runs, convergence may be reached
Xin-She Yang
When does a chain converge? When to stop the chain ... ?
Monte Carlo
Estimating π Are the samples correlated ?
Buffon’s
problem
Probability 0
Monte Carlo
Monte Carlo
integration 100
Quality of
Sampling
200
Quasi-Monte
Carlo
Pseudorandom 300
Pseudorandom
number 400
generation
Other
distributions
500
Limitations
Multivariate
distributions 600
Markov
Chains 0 100 200 300 400 500 600 700 800 900
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
22. A Long Single Chain or Multiple Short Chains?
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo When a Markov chain will converge in practice? If it has
Estimating π
Buffon’s converged, what does it mean?
problem
Probability
Monte Carlo Is a very long chain really good enough (from statistical
Monte Carlo
integration point of view)?
Quality of
Sampling
Quasi-Monte How long is long enough?
Carlo
Pseudorandom Are multiple chains better?
Pseudorandom
number
generation
How to improve the sampling efficiency and/or mixing
Other
distributions properties ?
Limitations
Multivariate
distributions
Markov
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
23. Simulated Tempering
Monte Carlo
& MCMC Simulated annealing: temperature T from high to low.
Xin-She Yang Simulated tempering: raise T to a higher value, reduce to low.
Monte Carlo
Estimating π
Buffon’s
πτ = π(x)1/τ , πτ →∞ → 1, as τ → ∞.
problem
Probability
Monte Carlo The basic idea is to reduce from a very high τ to τ0 = 1.
Monte Carlo
integration
Quality of
Sampling
flatten
Quasi-Monte
Carlo
=⇒
Pseudorandom
π≥ 0 πτ = π(x)1/τ
Pseudorandom
number
generation
Other
distributions
Limitations
Tempering
Multivariate
distributions
Use flattened (near uniform) distributions as
Markov
Chains proposals/candidates to produce high quality samplings.
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
24. Sampling: Forward or Backward? Which Way?
Monte Carlo
& MCMC Is this the only way?
Xin-She Yang
No! – Coupling from the Past & Metaheuristics
Monte Carlo
Estimating π
Buffon’s
problem
Probability
Monte Carlo If we go backward along the chain, any advantages? If so, how?
Monte Carlo
integration
Quality of
Sampling
Is there a universally efficient sampling tool for drawing
Quasi-Monte
Carlo samples in general?
Pseudorandom
Pseudorandom
number
No! – No-free-lunch theorem (Wolpert & Macready, 1997)
generation
Other
distributions The aim of the research is to find the best algorithm(s) for a
Limitations
Multivariate
distributions
given/specific problem/distribution.
Markov
Chains
Markov chains
Also Metaheuristics (very promosing).
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC
25. Thank you
Monte Carlo
& MCMC
Xin-She Yang References
Monte Carlo Gamerman D., Markov Chain Monte Carlo, Chapman & Hall/CRC, (1997).
Estimating π Corcoran J. and Tweedie R., Perfect sampling ... Jour. Stat. Plan. Infer., 104, 297 (2002).
Buffon’s
problem Cox M., Forbes A. B., Harris P. M., Smith I., Classification and solution of regression ..., NPL SSfM
Probability Report, (2004).
Monte Carlo Propp J. & Wilson D., Exact sampling ..., Random Stru. Alg., 9, 223 (1996).
Monte Carlo
integration Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).
Quality of
Sampling Yang X. S., Introduction to Computational Mathematics, World Scientific, (2008).
Quasi-Monte Yang X. S., Engineering Optimization: An Introduction with Metaheuristic Applications, Wiley,
Carlo
(2010).
Pseudorandom
Pseudorandom
number
generation
Other
distributions
Acknowledgement:
Limitations
Multivariate EPSRC, SSfM, NPL, CUED, and London Maths Society.
distributions
Markov
Thank you!
Chains
Markov chains
Markov chains
A Famous
Markov Chain Xin-She Yang Monte Carlo & MCMC