1. Parallel MCMC
Random Number Generators
Summary
Parallel Bayesian computation in R ≥ 2.14
using the packages foreach and parallel
Matt Moores Cathy Hargrave
Bayesian Research & Applications Group
Queensland University of Technology, Brisbane, Australia
CRICOS provider no. 00213J
Thursday September 27, 2012
BRAG Sept. 27 Parallel MCMC in R
2. Parallel MCMC
Random Number Generators
Summary
Outline
1 Parallel MCMC
Introduction
R packages
2 Random Number Generators
RNG and parallel MCMC
RNGs available in R
BRAG Sept. 27 Parallel MCMC in R
3. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
Motivation
Why parallel?
large datasets
many MCMC iterations
multiple CPU cores now commonplace
eg. Intel Core i5 and i7
even mobile phones have multicore CPUs
BRAG Sept. 27 Parallel MCMC in R
4. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
Parallel MCMC
2 kinds of parallelism:
concurrent MCMC chains
always applicable
straightforward to implement
concurrent updates within an iteration
only useful for a very large parameter space
ideally in a compiled language (eg. Rcpp with OpenMP)
also implicit parallelism, eg. with Intel Math Kernel Library
BRAG Sept. 27 Parallel MCMC in R
5. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
Concurrent Chains
BRAG Sept. 27 Parallel MCMC in R
6. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
Simple Network Of Workstations
R package snow by Luke Tierney, et al.
spawns multiple copies of R
provides several options for inter-process communication
TCP sockets
available on any platform, including Microsoft Windows
Message Passing Interface (via the package Rmpi)
Parallel Virtual Machine (via the package rpvm)
NetWorkSpaces (via the package nws)
can either run on a local machine or a cluster (eg. Lyra)
BRAG Sept. 27 Parallel MCMC in R
7. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
multicore
R package by Simon Urbanek
implemented using the POSIX fork system call
available on Linux and Mac OS X
clones the R instance (functions + data)
takes advantage of copy-on-write
will fork as many processes as there are available CPU
cores, unless told otherwise
BRAG Sept. 27 Parallel MCMC in R
8. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
parallel
R package parallel included in the core R distribution
available in versions ≥ 2.14.0
incorporates subsets of snow, multicore, and rlecuyer
sensible default behaviour
BRAG Sept. 27 Parallel MCMC in R
9. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
foreach
"syntactic sugar"
§
l i b r a r y ( foreach )
library ( parallel )
library ( doParallel )
# w i l l a u t o m a t i c a l l y use a SOCK c l u s t e r on Windows
# ( o t h e r w i s e uses m u l t i c o r e )
r e g i s t e r D o P a r a l l e l ( cores = d e t e c t C o r e s ( ) )
f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% {
# t h i s code w i l l be executed c o n c u r r e n t l y
...
}
BRAG Sept. 27 Parallel MCMC in R
10. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
foreach with SNOW
§
l i b r a r y ( foreach )
library ( parallel )
library ( doParallel )
# setup l o c a l SOCK c l u s t e r f o r 4 CPU cores
c l ← makePSOCKcluster ( 4 )
registerDoParallel ( cl )
f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% {
# t h i s code w i l l be executed c o n c u r r e n t l y
...
}
stopCluster ( cl )
BRAG Sept. 27 Parallel MCMC in R
11. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
foreach with multicore
§
l i b r a r y ( foreach )
library ( parallel )
library ( doParallel )
# f o r k one c h i l d process f o r each CPU core
c l ← makeForkCluster ( d e t e c t C o r e s ( ) )
registerDoParallel ( cl )
f o r e a c h ( i =1: getDoParWorkers ( ) ) %dopar% {
# t h i s code w i l l be executed c o n c u r r e n t l y
...
}
BRAG Sept. 27 Parallel MCMC in R
12. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
foreach with CODA
If your Gibbs sampler returns an mcmc object, these can be
conbined into an mcmc.list:
§
l i b r a r y ( coda )
samples . l i s t ← f o r e a c h ( i =1: getDoParWorkers ( ) ,
. combine=mcmc . l i s t ,
. m u l t i c o m b i n e =T ) %dopar% {
# t h i s code w i l l be executed c o n c u r r e n t l y
...
}
BRAG Sept. 27 Parallel MCMC in R
13. Parallel MCMC
Introduction
Random Number Generators
R packages
Summary
foreach with other libraries
You need to declare any libraries that are used inside the child
process. For example:
§
l i b r a r y ( mvtnorm )
l i b r a r y ( coda )
f o r e a c h ( i =1: getDoParWorkers ( ) ,
. packages=c ( "mvtnorm" , "coda" ) ) %dopar% {
# t h i s code uses mcmc ( . . . ) and rmvnorm ( . . . )
...
}
BRAG Sept. 27 Parallel MCMC in R
14. Parallel MCMC
RNG and parallel MCMC
Random Number Generators
RNGs available in R
Summary
Random Number Generators for parallel MCMC
The chains of our Gibbs sampler run independently, but:
if the same RNG is seeded with the same value, all of the
chains will generate the same random numbers in the
same sequence - they will be identical!
we either need to use:
different seeds, or
different random number generators
for each chain (preferably both)
it is also advisable to choose (or generate) different initial
values in each chain of our Gibbs sampler
BRAG Sept. 27 Parallel MCMC in R
15. Parallel MCMC
RNG and parallel MCMC
Random Number Generators
RNGs available in R
Summary
Mersenne Twister
The default RNG in R
pseudo-random sequence with 32bit precision
periodicity of 219937 − 1
takes 0.4 seconds to generate 107 random numbers
on an Intel Core i5 running R 2.15.1 and Windows 7
open-source implementation available at:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
Matsumoto & Nishimura (1998) TOMACS 8: 3–30.
BRAG Sept. 27 Parallel MCMC in R
16. Parallel MCMC
RNG and parallel MCMC
Random Number Generators
RNGs available in R
Summary
Other RNGs in the base package
Wichmann-Hill (1982) Applied Statistics 31, 188–190.
Marsaglia-Multicarry
(Usenet newsgroup sci.stat.math, 1997)
Super-Duper
(Reeds, J., Hubert, S. and Abrahams, M., 1982–4)
For JAGS with up to 4 concurrent chains:
§
r n g I n i t s ← p a r a l l e l . seeds ( "base::BaseRNG" , 4 )
BRAG Sept. 27 Parallel MCMC in R
17. Parallel MCMC
RNG and parallel MCMC
Random Number Generators
RNGs available in R
Summary
L’Ecuyer
Available via R libraries rlecuyer or parallel
Multiple independent streams of random numbers
Periodicity ≈ 2191
(each stream is a subsequence of length 2127 )
0.6 seconds to generate 107 random numbers via runif
To initialize each child process in a SNOW cluster with an
independent stream:
§
c l ← makeCluster ( 4 )
clusterSetRNGStream ( c l )
registerDoParallel ( cl )
L’Ecuyer, et al. (2002) Operations Research, 50(6): 1073–1075.
BRAG Sept. 27 Parallel MCMC in R
18. Parallel MCMC
Random Number Generators
Summary
Summary
Most MCMC algorithms are "embarrasingly parallel"
chains run independently
(as long as the RNG is set up correctly)
The R packages foreach and doParallel make parallelism
easy, on any computing platform
Related topics (not covered in this presentation):
Running R on a supercomputer (eg. lyra.qut.edu.au)
Cloud computing with Apache Hadoop
GPU programming in R (nVidia CUDA)
BRAG Sept. 27 Parallel MCMC in R
19. Appendix For Further Reading
For Further Reading
Norman Matloff
The Art of R Programming.
No Starch Press, 2011.
M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney & U.
Mansmann
State of the Art in Parallel Computing with R.
Journal of Statistical Software, 31(1), 2009.
P. L’Ecuyer, R. Simard, E.J. Chen & W.D. Kelton
An Object-Oriented Random-Number Package with Many Long Streams
and Substreams.
Operations Research, 50(6): 1073–1075, 2002.
M. Matsumoto & T. Nishimura
Mersenne Twister: A 623-Dimensionally Equidistributed Uniform
Pseudo-Random Number Generator.
ACM Transactions on Modeling and Computer Simulation, 8: 3–30,
1998.
BRAG Sept. 27 Parallel MCMC in R