LupoPasini_SIAMCSE15

Motivations Standard MC vs. MCSA Choice of preconditioners Convergence conditions Numerical results
Iterative Performance of Monte Carlo Linear Solver Methods
Massimiliano Lupo Pasini (Emory University)
In collaboration with:
Michele Benzi (Emory University)
Thomas Evans (Oak Ridge National Laboratory)
Steven Hamilton(Oak Ridge National Laboratory)
Stuart Slattery(Oak Ridge National Laboratory)
SIAM Conference on Computational Science and Engineering 2015
March 17, 2015
Massimiliano Lupo Pasini SIAM Conference on Computational Science and Engineering 2015
MCSA 1

Acknowledgments
This material is based upon work supported by the U.S.
Department of Energy, Oﬃce of Science, Advanced Scientiﬁc
Computing Research program.
MCSA 2

Outline
Motivations
MCSA 3

Outline
Motivations
Standard Monte Carlo vs. Monte Carlo Synthetic Acceleration
(MCSA)
MCSA 3

Outline
Motivations
(MCSA)
Choice of the preconditioner
MCSA 3

Outline
Motivations
(MCSA)
Convergence conditions
MCSA 3

Outline
Motivations
(MCSA)
Convergence conditions
Numerical results
MCSA 3

Motivations
Goal:
development of reliable algorithms to solve highly dimensional
(towards exascale) sparse linear systems in parallel
Issues:
occurrence of faults (abnormal operating conditions of the
computing system which cause a wrong answer)
hard faults
soft faults
[M. Hoemmen and M.A. Heroux, 2011; G. Bronevetsky and B.R.
de Supinski, 2008; P. Prata and J.B. Silva, 1999 ]
MCSA 4

Motivations
Resilience: ability to compute a correct output in presence of faults
Diﬀerent approaches to tackle the problem:
adaptation of Krylov subspace methods (CG, GMRES,
Bi-CGStab, ...) to fault tolerance by recover-restart strategies
abandonment of deterministic paradigm in favor of stochastic
approaches
[E. Agullo, L. Giraud, A. Guernouche, J. Roman and M. Zounon, 2013]
MCSA 5

Monte Carlo Linear Solvers
Mathematical setting
Let us consider a sparse linear system
Ax = b, (1)
where A ∈ Rn×n, x ∈ Rn, b ∈ Rn.
With left preconditioning (1) becomes
P−1
Ax = P−1
b, P ∈ Rn×n
. (2)
(2) can be reinterpreted as a ﬁxed point scheme
x = Hx + f, H = I − P−1
A, f = P−1
b. (3)
Assuming ρ(H) < 1, (3) generates a sequence of approximate
solutions {xk}∞
k=0 which converges to the exact solution of (1).
MCSA 6

Mathematical setting
The solution to (3) can be written in terms of a power series in H
(Neumann series):
x =
∞
=0
H f.
By restricting the attention to a single component of x we have
xi =
∞
=0
n
k1=1
n
k2=1
· · ·
n
k =1
Hk0,k1
Hk1,k2
. . . Hk −1,k fk . (4)
(4) can be reinterpreted as the sampling of an estimator deﬁned on
a random walk.
MCSA 7

Mathematical setting: Forward (Direct) Method
Goal: evaluate a functional such as
J(x) = h, x =
n
i=1
hi xi , h ∈ Rn
.
Approach: deﬁne random walks to evaluate J.
Consider a random walk whose state space S is the set of indices of
the forcing term f:
S = {1, 2, . . . , n} ⊂ N.
[N. Metropolis and S. Ulam, 1949; G. E. Forsythe and A. Leibler, 1950;
W. R. Wasow, 1952; J. H. Halton, 1962]
MCSA 8

Initial probability: pick k0 as initial step of the random walk:
˜p(k0 = i) = ˜pk0
=
|hi |
n
i=1|hi |
.
Possible choices for the transition probability:
p(:, i) : S → [0, 1] ∀i ∈ S
p(ki = j|ki−1 = i) = Pi,j =
|Hi,j |
n
i=1|Hi,j |
weighted
Pi,j =
0 if Hi,j = 0
1
#(non-zeros in the row) if Hi,j = 0
uniform
MCSA 9

A related sequence of weights is defined by
wi,j =
Hi,j
Pi,j
in order to build an auxiliary sequence
W0 =
hk0
˜pk0
, Wi = Wi−1wki−1,ki
ki−1, ki ∈ S.
This allows us to define a random variable X(·) : Π → R.
Π= set of realizations of a random walk γ defined on S:
X(ν) =
∞
=0
W fk , ν = permutation of γ
MCSA 10

Deﬁne the expected value of X(·):
E[X] =
ν
PνX(ν) (Pν = probability of the permutation ν).
It can be proved that
E[W fk ] = h, H f
and consequently
E
∞
=0
W fk = h, x .
MCSA 11

If h is a vector of the standard basis: h = ej
˜p(K0 = i) = δi,j , i, j = 1, . . . , n
and the expected value assumes the following form:
E
∞
=0
W fk = xi =
∞
=0
n
k1=1
n
k2=1
· · ·
n
k =1
Pk0,k1
Pk1,k2
. . . Pk −1,k wk0,k1
wk1,k2
. . . wk −1,k fk .
(5)
Drawback: the estimator is deﬁned entry-wise.
MCSA 12

Mathematical setting: Adjoint Method
Consider the adjoint linear system
(P−1
A)T
y = d.
With a ﬁxed point approach, we get
y = HT
y + d.
By picking d = ej we have
J∗
(y) := y, f = x, ej = xj , j = 1, . . . , n.
MCSA 13

Mathematical setting: Adjoint Method
The initial probability and transition matrix may be deﬁned as
˜pk0 = p(k0 = i) =
|fi |
n
i=1|fi |
, Pi,j =
|HT
i,j |
n
k=1|HT
i,k|
=
|Hj,i |
n
k=1|Hk,i |
.
Reintroduce weights as before:
wi,j =
Hj,i
Pi,j
⇒ W0 = sign(fi ) f 1, Wj = Wj−1wki−1,ki
, ki−1, ki ∈ S.
Expected value of the estimator for the entire solution vector x:
E
∞
=0
W dk =
∞
=0
n
k1
n
k2
· · ·
n
k
fk0
Pk0,k1
Pk1,k2
. . . Pk −1,K wk0,k1
. . . wk −1,k δk ,j . (6)
MCSA 14

Statistical constraints for convergence
Expected value and variance of the estimator must be ﬁnite to
apply the Central Limit Theorem.
Forward Method: (H∗
)i,j =
(Hi,j )2
Pi,j
E
∞
=0
W fk < ∞ ⇔ ρ(H) < 1
Var
∞
=0
W fk < ∞ ⇔ ρ(H
∗
) < 1
Adjoint Method: (H∗
)i,j =
(Hj,i )2
Pi,j
E
∞
=0
W dk < ∞ ⇔ ρ(H) < 1
Var
∞
=0
W dk < ∞ ⇔ ρ(H
∗
) < 1
[J. H. Halton, 1994; H. Ji, M. Mascagni and Y. Li, 2013]
MCSA 15

The choice of the transition probability
Theorem (H. Ji, M. Mascagni and Y. Li)
Let H ∈ Rn×n, where H ∞ < 1 for the Forward method and
H 1 < 1 for the Adjoint method. Consider νk as the realization of
a random walk γ truncated at the k-th step. Then, there always
exists a transition matrix P such that Var X(νk) → 0 and
Var ν X(νk) is bounded as k → ∞.
Remark If the hypotheses hold, then Pi,j =
|Hi,j |
n
k=1|Hi,k |
(Forward) or
Pi,j =
|HT
i,j |
n
k=1|HT
i,k
|
(Adjoint) guarantee H∗
∞ < 1 or H∗
1 < 1
respectively.
MCSA 16

First approaches to a parallelization
First attempts to apply Monte Carlo as a linear solver considered
the ﬁxed point formulation of the original linear system.
+ embarrassingly parallelizable (ideally we could employ as many
processes as the amount of histories)
− huge amount of time to compute a reasonably accurate
solution
E.g. r0 = b − Ax0 ≈ 1 then rk
b < 10−m ⇒ Nhistories ≈ 102m
because of the Central Limit Theorem.
[S. Branford, C. Sahin, A. Thandavan, C. Weihrauch, V. N. Alexandrov
and I. T. Dimov, 2008; P. Jakovits, I. Kromonov and S. R. Srirama, 2011]
MCSA 17

Monte Carlo Synthetic Acceleration (MCSA)
Data: A, b, H, f, x0
Result: xnum
xl = x0;
while not reached convergence do
xl+1
2 = Hxl + f;
rl+1
2 = b − Axl+1
2 ;
δxl+1
2 = (I − H)−1rl+1
2 ; "Solved" with Standard MC
xl+1 = xl+1
2 + δxl+1
2 ;
end
xnum = xl+1
[S. Slattery, 2013 (PhD Thesis); T. M. Evans, S. R. Slattery and P. P. H. Wilson, 2013]
MCSA 18

Adaptivity for the number of random walks
Forward Method:
θi ∈ R
θi = E ∞
l=0 Wl bkl
σi = Var[θi ]
Find Ñi s.t.
σ
Ñi
i
|ˆxi |
< ε
i = 1, . . . , n
Adjoint Method:
θ ∈ Rn
θi = E ∞
l=0 Wl dkl
δkl ,i
σi = Var[θ]i
Find Ñ s.t.
σ
Ñ
1
ˆx 1
< ε
i = 1, . . . , n
MCSA 19

Applicable preconditioning techniques
Remark: explicit knowledge of Hij is needed
In fact the entry Pij of the transition matrix is deﬁned in terms of
Hij (Forward method) or Hji (Adjoint method).
This limits the viable choices of preconditioners:
diagonal preconditioners
block diagonal preconditioners
sparse approximate inverse preconditioners (AINV)
MCSA 20

Examples
Strictly diagonally dominant matrices (s.d.d.) with diagonal
preconditioning.
A ∈ Rn×n s.d.d. by rows: P = diag(A)
H = I − P−1A, H ∞ < 1 (⇔ H∗
∞ < 1)
A ∈ Rn×n s.d.d. by columns: P = diag(A)
H = I − AP−1, H 1 < 1 (⇔ H∗
1 < 1)
Strict diag. dominance Forward Method Adjoint method
by rows converges not guaranteed
by columns not guaranteed converges
MCSA 21

Examples
Other examples:
A M-matrix ⇒ ∃D diagonal s.t. AD is s.d.d.
with preconditioner P = block_diag(A):



p
j=1
j=i
A−1
ii Aij ∞ < 1. ∀i = 1, . . . , p
⇒ Forward method with block Jacobi converges



p
i=1
i=j
A−1
ii Aij 1 < 1. ∀j = 1, . . . , p
⇒ Adjoint method with block Jacobi converges
MCSA 22

2D parabolic problems: test case 1
Give an open and bounded set Ω ⊂ R2 s.t. Ω = (0, 2)2 (1, 2)2



∂u
∂t − µ∆u + β(x) · u = 0, x ∈ Ω, t ∈ (0, T]
u(x, 0) = u0, x ∈ Ω
u(x, t) = 0, x ∈ ΓD, t ∈ (0, T]
∂u
∂n + χu = 0, x ∈ ΓR, t ∈ (0, T]
(7)
where µ = 3
200 , χ = 3, β(x) = [x, −y]T
.
ΓR = {{x = 1} × (1, 2)} ∪ {(1, 2) × {y = 1}}, ΓD = ∂Ω ΓR .
Discretization (FreeFem++ employed):
Triangular FEM
spatial discretization step hmax = 0.018, 37,177 d.o.f.’s
time discretization step ∆t = h2
max
MCSA 23

Preconditioning: sparse factorized AINV with drop tolerance
τ = 0.05. Code provided by courtesy by Miroslav Tuma.
[M. Benzi and M. Tuma, SISC, 1998]
Numerical setting:
Adjoint MCSA employed as linear solver
relative residual tolerance: ε1 = 10−7
weighted transition probability
maximal # steps per history: 10
statistical error-based adaptive parameter: ε2 = 0.01
granularity of the adaptive approach: nhistories = 100
simulations run on laptop
MCSA 24

Stiﬀness matrix: A (n = 37, 177), s.d.d. by rows and columns.
AINV preconditioner: M. Drop tolerance τ = 0.05 for both factors.
Iteration matrix: H = I − MA.
Solution computed just for a generic time step.
nnz(H)
nnz(A) = 2.984
H 1 = 0.597 ( ρ(H) ≈ 0.286, ρ(H∗) ≈ 0.104 )
# MCSA iterations: 8
relative error= 2.71 · 10−8
# total random walks employed: 90,000
MCSA 25

Give an open and bounded set Ω = (0, 1) × (0, 1)



∂u
∂t − µ∆u + β(x) · u = 0, x ∈ Ω, t ∈ (0, T]
u(x, 0) = u0, x ∈ Ω
u(x, t) = 0, x ∈ ∂Ω, t ∈ (0, T]
(8)
where µ = 3
200 , β(x) = [2, sin(x)]T
.
Discretization (FreeFem++ employed):
Triangular FEM
spatial discretization step hmax = 0.014, 40,501 d.o.f.’s
max
MCSA 26

Stiﬀness matrix: A (n = 40, 501), s.d.d. by rows and columns.
nnz(H)
nnz(A) = 3.106
H 1 = 0.185 ( ρ(H) ≈ 0.083, ρ(H∗) ≈ 0.017 )
# total random walks employed: 66,800
MCSA 27

Give an open and bounded set Ω = (0, 1) × (0, 1)



∂u
∂t − µ∆u + β(x) · u = 0, x ∈ Ω, t ∈ (0, T]
u(x, 0) = u0, x ∈ Ω
u(x, t) = uD(x), x ∈ ∂Ω, t ∈ (0, T]
(9)
where µ = 3
200 , β(x) = [2y(1 − x2
), −2x(1 − y2
)]T
,
uD = 0 on {{x = 0} × (0, 1)}, {(0, 1) × {y = 0}}, {(0, 1) × {y = 1}}.
Discretization (IFISS toolbox employed):
Quadrilateral FEM
spatial discretization step h = 2−8
⇒ 66,049 d.o.f.’s
MCSA 28

Stiﬀness matrix: A (n = 66, 049). Not s.d.d.
nnz(H)
nnz(A) = 4.736
H 1 = 0.212 ( ρ(H) ≈ 0.171, ρ(H∗) ≈ 0.032 )
# total random walks employed: 88, 400
MCSA 29

Conclusions and future developments
Conclusions
MC solvers motivated by resilience
hypotheses for convergence difficult to satisfy in general
currently best results obtained using AINV
Future developments
extension of the set of matrices for which MC solvers is
guaranteed to converge a priori
analysis of how ρ(H) and ρ(H∗) affect convergence
refinement of the adaptive selection of histories
MCSA 30

Bibliography I
I. Dimov, V. Alexandrov, and A. Karaivanova.
Parallel resolvent Monte Carlo algorithms for linear algebra
problems.
Mathematics and Computers in Simulation, 55(55):25–35,
2001.
T.M Evans, S.W. Mosher, S.R. Slattery, and S.P. Hamilton.
A Monte Carlo synthetic-acceleration method for solving the
thermal radiation diﬀusion equation.
Journal of Computational Physics, 258(November
2013):338–358, 2014.
MCSA 31

Bibliography II
T.M. Evans, S.R. Slattery, and P.P.H Wilson.
Mixed Monte Carlo parallel algorithms for matrix computation.
International Conference on Computational Science 2002, 2002.
T.M. Evans, S.R. Slattery, and P.P.H Wilson.
A spectral analysis of the domain decomposed Monte Carlo
method for linear systems.
International Conference on Mathematics and Computational
Methods Applied to Nuclear Science & Engineering, 2013.
J.H. Halton.
Sequential Monte Carlo.
Mathematical Proceeding of the Cambridge Philosophical
Society, 58(1):57–58, 1962.
MCSA 32

Bibliography III
J.H. Halton.
Sequential Monte Carlo techniques for the solution of linear
systems.
Journal of Scientiﬁc Computing, 9(2):213–257, 1994.
J. Hao, M. Mascagni, and L. Yaohang.
Convergence analysis of Markov chain Monte Carlo linear
solvers using Ulam-von Neumann algorithm.
SIAM Journal on Numerical Analysis, 51(4):2107–2122, 2013.
N. Metropolis and S. Ulam.
The MOnte Carlo ethod.
Journal of th American Statistical Association,
44(247):335–341, 1949.
MCSA 33

Bibliography IV
S. Slattery.
Parallel Monte Carlo Synthetic Acceleration Methods For
Discrete Transport Problems.
PhD thesis, University of Wisconsin-Madison, 2013.
MCSA 34

The choice of the transition probability
Theorem (H. Ji, M. Mascagni and Y. Li)
Let H ∈ Rn×n with spectral radius ρ(H) < 1. Let H+ ∈ Rn×n
where H+
i,j = |Hi,j |. Consider νk as the realization of a random walk
γ truncated at the k-th step. If ρ(H+) > 1, there does not exist a
transition matrix P such that Var X(νk) converges to zero as
k → ∞.
MCSA 35

Numerical experiments
Simpliﬁed PN equations
Steady-state, multigroup, one-dimensional, eigenvalue-form of
Boltzmann transport equation:
µ
∂ψg (x, µ)
∂x
+ σg
(x)ψg
(x, µ) =
Ng
g =1 4π
σgg
s (x, ˆΩ · ˆΩ )ψg
(x, Ω )dΩ +
+
1
k
Ng
g =1
χg
4π 4π
νσg
f (x)ψg
(x, Ω )dΩ .
(10)
ψg (x, µ) = angular ﬂux for group g
σg = total interaction cross section
σgg
s (x, ˆΩ · ˆΩ ) = scattering cross section from group g → g
MCSA 36

Legendre polynomial spectral discretization (PN equations)
consider just odd sets of PN equations
removing lower order gradient terms from each equation (SPN
equations)
− · Dn Un +
N+1
2
m=0
AnmUm =
1
k
N+1
2
m=1
FnmUm, n = 1, . . . ,
N + 1
2
(11)
Um is a linear combination of moments.
Discretized with Finite Volumes
MCSA 37

Sparsity pattern and zoom on the structure of a SP1 matrix
MCSA 38

Properties of matrices:
discretization of SP1 equations
all nonzero diagonal entries
Numerical treatment:
block Jacobi preconditioning
Kind of matrix size nnz Prec. block size nnz after prec
SP1 (a) 18,207 486,438 63 2,644,523
SP1 (b) 19,941 998,631 69 1,774,786
[Matrices provided by courtesy by Thomas Evans, Steven Hamilton,
Stuart Slattery, ORNL]
MCSA 39

MCSA parameter setting:
Adjoint method
residual relative tolerance: ε1 = 10−7
almost optimal transition probability
maximal # steps per history: 10
statistical error-based adaptive parameter: ε2 = 0.5
granularity of the adaptive approach: nhistories = 1, 000
simulations run in serial mode
MCSA 40

matrix ρ(H) ρ(H∗) relative err. # iterations
SP1 (a) 0.9779 0.9758 9.97 · 10−6 340
SP1 (b) 0.9798 0.9439 3.89 · 10−5 209
Figure 1 : # histories for (a) at each
iteration.
Figure 2 : # histories for (b) at each
iteration.
MCSA 41

Issues raised for a general SPN matrix:
hard to find a block Jacobi preconditioner s.t. ρ(H∗) < 1 for
every SPN problem
attempt to use Approximate Inverse preconditioners was not
effective
for sparse preconditioners ρ(H∗
) < 1 is not respected
attempt to use ILU preconditioners was not effective
for ILU(0) we got ρ(H∗
) > 1
massive fill-in with ILUT
reordering and scaling did not facilitate the convergence
requirements
MCSA 42

Diagonally dominant matrices
Set of matrices obtained by a diagonal shift of matrices associated
with SP1, SP3 and SP5 equations.
(A + sD), s ∈ R+
, D = diag(A).
All the matrices have been turned into s.d.d. by columns.
Initial matrix s ρ(H) Forward − ρ(H∗) Adjoint − ρ(H∗)
SP1 (a) 0.3 0.7597 0.7441 0.6983
SP1 (b) 0.4 0.7046 1.1448 0.5680
SP3 0.9 0.5869 0.4426 0.3727
SP5 1.6 0.5477 0.3790 0.3431
MCSA 43

Diagonally dominant matrices
(A + sD), s ∈ R+
, D = diag(A).
matrix nnz s relative err. # iterations
SP1 (a) 486,438 0.3 6.277 · 10−7 36
SP1 (b) 998,631 0.4 9.885 · 10−7 21
SP3 846,549 0.9 5.919 · 10−7 18
SP5 1,399,134 1.6 4.031 · 10−7 19
MCSA 44

Diagonal shift
Strict diagonally dominance is a suﬃcient condition for convergence
but not necessary.
(A + sD), s ∈ R+
, D = diag(A)
such that ρ(H) < 1 and ρ(H∗) < 1.
Initial matrix s ρ(H) Forward − ρ(H∗) Adjoint − ρ(H∗)
SP1 (a) 0.2 0.8230 0.8733 0.8195
SP1 (b) 0.2 0.8220 1.5582 0.7731
SP3 0.3 0.8126 0.9459 0.7961
SP5 0.7 0.8376 0.8865 0.8026
MCSA 45

Diagonal shift
(A + sD), s ∈ R+
, D = diag(A).
matrix s relative err. # iterations
SP1 (a) 0.2 6.394 · 10−7 48
SP1 (b) 0.2 2.59 · 10−6 36
SP3 0.3 5.35 · 10−7 45
SP5 0.7 4.21 · 10−7 70
MCSA 46

LupoPasini_SIAMCSE15

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to LupoPasini_SIAMCSE15

Similar to LupoPasini_SIAMCSE15 (20)

More from Karen Pao

More from Karen Pao (7)

Recently uploaded

Recently uploaded (20)

LupoPasini_SIAMCSE15