Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017

On the generation of random
ﬁelds
Ian Sloan
i.sloan@unsw.edu.au
University of New South Wales, Sydney, Australia
Joint with I Graham & R Scheichl (Bath),
F Kuo (UNSW) & D Nuyens(KU Leuven)
SAMSI, September 1, 2017

Suppose we want to generate a Gaussian random ﬁeld over this
L-shaped region with a hole:
The question is: how to generate the random ﬁeld?

Suppose we want to generate a Gaussian random field over this
L-shaped region with a hole:
The question is: how to generate the random field?
Application: PDE with random field as input.
Then QMC to find expected value of linear functionals

Gaussian random fields
Z(x) = Z(x, ω), for x ∈ D ⊂ Rd
, is a Gaussian random field if
for each x ∈ D, Z(x) is a normally distributed random variable;
the field is fully determined by knowing its mean
¯Z(x) := E[Z(x)]
and its covariance function:
r(x, y) := E Z(x) − ¯Z(x) Z(y) − ¯Z(y) .

Gaussian random fields
Z(x) = Z(x, ω), for x ∈ D ⊂ Rd
, is a Gaussian random field if
for each x ∈ D, Z(x) is a normally distributed random variable;
the field is fully determined by knowing its mean
¯Z(x) := E[Z(x)]
and its covariance function:
r(x, y) := E Z(x) − ¯Z(x) Z(y) − ¯Z(y) .
For simplicity we will consider throughout mean-zero fields, that is
¯Z(x) = 0, x ∈ D =⇒ r(x, y) := E[Z(x)Z(y)].

Examples of 2d covariance functions
r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– very smooth (in 1-d is σ2e−|x−y|2/λ2
– “Gaussian”)
Here σ2
is the variance, and λ is the correlation length.

r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– “Gaussian”)
Here σ2
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
– not smooth at x = y (in 1-d is σ2e−|x−y|/λ, – “exponential”).

r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– “Gaussian”)
Here σ2
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
More general is the Matérn class rν(x, y), ν ∈ [1
2
, ∞), which
contains the examples above at the two ends of its parameter range:
ν = 1
2
gives the exponential case, ν = ∞ gives the Gaussian case.

r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– “Gaussian”)
Here σ2
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
More general is the Matérn class rν(x, y), ν ∈ [1
2
, ∞), which
contains the examples above at the two ends of its parameter range:
ν = 1
2
gives the exponential case, ν = ∞ gives the Gaussian case.
How to compute realisations of the input ﬁeld? One way is:

Karhunen-Loève expansion
Z(x, ω) =
∞
j=1
√
µj Yj(ω)φj(x),
where the Yj are independent standard normal random numbers, and
(µj, φj) satisfy
D
r(x, x′
)φj(x′
) dx′
= µjφj(x),
D
φi(x)φj(x) dx = δij.

Why does it work?
Because if
Z(x) =
∞
j=1
√
µj Yj(ω)φj(x),
then formally
E[Z(x)Z(x′
)] = E
∞
j=1
√
µj Yjφj(x)
∞
k=1
√
µk Ykφk(x′
)
=
∞
j=1
∞
k=1
√
µj
√
µkφj(x)φk(x′
)E[YjYk]
=
∞
j=1
∞
k=1
√
µj
√
µkφj(x)φk(x′
)δj,k
=
∞
j=1
µjφj(x)φj(x′
) = r(x, x′
)
by Mercer’s theorem.

There’s a problem if KL convergence is slow
The KL convergence can be very slow if
the ﬁeld is rough (that is, if ν is small),
or if the correlation length λ is small,
or the variance σ2
is large.

There’s a problem if KL convergence is slow
The KL convergence can be very slow if
the ﬁeld is rough (that is, if ν is small),
or if the correlation length λ is small,
or the variance σ2
is large.
And if the convergence is slow for a 1-dimensional physical domain D
then it is MUCH slower for a 2-dimensional domain, and worse still for
a 3-dimensional domain D.
This means that the truncation errors can be VERY large.

And another thing:
Also, the eigenvalue problem becomes non-trivial for a 3-dimensional
domain D if thousands of eigenvalues and eigenfunction pairs are
needed:
Recall
D
r(x, x′
)φj(x′
) dx′
= µjφj(x),
D
φi(x)φj(x) dx = δij,

The discrete alternative
If we are going to use, for example, piecewise linear finite elements,
then we don’t need the field everywhere: we only need it at points
related to the finite element grid - for example, at the triangle centroids,
as shown. This is “the discrete alternative”.
As input we now need r(x, x′
) only at the discrete x, x′
- we need
only “standard information” in the language of IBC.

How to generate the field at grid points?
Suppose we want the field only at a set of points x1, . . . , xM ∈ D.
Now the field is a vector of length M:
Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.

Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.
This is a Gaussian random vector with mean zero and a positive
deﬁnite covariance matrix
R = [R]M
i,j=1.
where
Ri,j = E[Z(xi)Z(xj)]= r(xi, xj).

Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.
This is a Gaussian random vector with mean zero and a positive
deﬁnite covariance matrix
R = [R]M
i,j=1.
where
Ri,j = E[Z(xi)Z(xj)]= r(xi, xj).
So if r(x, y) is known, then so is the covariance matrix.

How to generate a random ﬁeld with prescribed covariance matrix R?
Suppose we can factorise the matrix in some way
R = BB⊤
.
Because R is positive deﬁnite, we can, for example, take B to be the
square root of R. Also Cholesky, ....

How to generate a random field with prescribed covariance matrix R?
Suppose we can factorise the matrix in some way
R = BB⊤
.
Because R is positive definite, we can, for example, take B to be the
square root of R. Also Cholesky, ....
Once B is known we can generate the field by
Z(ω) = BY(ω), where Y = (Y1(ω), . . . , YM (ω))⊤
,
where Y1(ω), . . . , YM (ω) are iid standard normal variables.

Why does it work?
Simply note that, because
Z(ω) = BY(ω),
with Y a vector of iid standard normal random variables, we have
E[ZZ⊤
] = E[BYY⊤
B⊤
] = B E[YY⊤
]B⊤
= BB⊤
= R.

Now there is no truncation error!
Now there is no truncation error because all we need is to factorise the
covariance matrix R. The problem has turned into one of linear
algebra.

Now there is no truncation error!
Now there is no truncation error because all we need is to factorise the
covariance matrix R. The problem has turned into one of linear
algebra.
But now suppose that M is very large, say in the tens of thousands, or
even millions. The matrix R is typically dense, so a Cholesky
factorisation will take M3
operations. This is not generally feasible
when M is large.

Let’s specialise the covariance function
In practice most covariance functions have this form:
r(x, y) = ρ( x − y ),
That is, the covariance functiion is stationary and isotropic.
In this situation there is great beneﬁt is taking the grid to be UNIFORM.

The benefits of uniformity
When the covariance function is isotropic there are great benefits to
computing the field on a UNIFORM grid, because then the matrix is
typically block Toeplitz. This is the path we follow: we initially
compute the field only at the red crosses in this image.
After that we use bilinear interpolation to find the field at the blue
points.(The resulting error is of same order as FE error.)

The uniform grid
We cover the original domain D by the unit cube in Rd
, and on it deﬁne
a uniform grid with m0 + 1 points on each edge. So the spacing is
h = 1/m0, and in total there are M = (m0 + 1)d
points in the grid.

The 1-dimensional case
For a 1-dimensional domain D covered by the unit interval, and on it a
uniform grid with spacing h, the ﬁrst row of the covariance matrix
R = (ρ(|xi − xj|))m0
i,j=0 is
ρ(0), ρ(h), ρ(2h), ....., ρ(m0h),
and the second row is
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 1)h),
etc. This is a Toeplitz matrix.

Extending the matrix
It can be made into a circulant matrix Rext
of almost double the
number of rows and columns (i.e. 2m0) by reﬂecting the top row, to
obtain, in the 1-dimensional case, for the ﬁrst row
ρ(0), ρ(h), ρ(2h), ....., ρ((m0 − 1)h), ρ(m0h), ρ((m0 − 1)h), ..., ρ(h)
and then, by “wrapping it around”, the second row becomes
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 2)h), ρ((m0 − 1)h), ρ(m0h), ..., ρ(2h)
etc – a CIRCULANT matrix.

Extending the matrix
It can be made into a circulant matrix Rext
of almost double the
number of rows and columns (i.e. 2m0) by reﬂecting the top row, to
obtain, in the 1-dimensional case, for the ﬁrst row
ρ(0), ρ(h), ρ(2h), ....., ρ((m0 − 1)h), ρ(m0h), ρ((m0 − 1)h), ..., ρ(h)
and then, by “wrapping it around”, the second row becomes
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 2)h), ρ((m0 − 1)h), ρ(m0h), ..., ρ(2h)
etc – a CIRCULANT matrix.
The point is that a circulant matrix of size M × M can be
factorised by FFT in a time of order M log M.

More precisely, write
Rext
= XΛX⊤
,
where Λ is the diagonal matrix of eigenvalues of Rext
, and the rows of
X are the normalised eigenvectors (which are just complex
exponentials).
Note that the eigenvalues of Rext
are real.
And we use an all-real version of FFT, which makes for efﬁcient implementation.

More precisely, write
Rext
= XΛX⊤
,
where Λ is the diagonal matrix of eigenvalues of Rext
, and the rows of
X are the normalised eigenvectors (which are just complex
exponentials).
Note that the eigenvalues of Rext
are real.
And we use an all-real version of FFT, which makes for efﬁcient implementation.
If D is a 2 or 3-dimensional region then the matrix is BLOCK circulant,
and again FFT can be used.

History
In the 1-dimensional case there is a substantial literature on circulant
embedding for the efﬁcient generation of Gaussian random ﬁelds:
Dietrich and Newsam, 1997
Chan and Wood, 1997

What’s the catch?
The catch is that the extended matrix Rext
may not be positive
deﬁnite – because some of the eigenvalues of Rext
may be
negative.

Before we ﬁx the non-p.d. problem:
Let’s assume for the moment that all eigenvalues of Rext
are
non-negative. Then we can write
Rext
= XΛX⊤
= (XΛ1/2
)(XΛ1/2
)⊤
,
How does this help with factorisation of R? Answer: R is a submatrix
of Rext
. By selecting the appropriate rows and columns of the
factorisation above we obtain
R = BB⊤
,
with B consisting of just the appropriate rows of XΛ1/2
.

Fixing the non-p.d. problem
We extend the matrix R before reﬂection: keeping the same grid
spacing h, we cover the unit cube by now a larger cube of size
ℓ = mh, with m > m0, and hence ℓ = m/m0 > 1.
Again this is not new, but the way we do the extension might be:
sometimes the extension is done by “padding by zeros”.

Theorem (GKNSS, 2017)
Assume that the covariance function satisfies r(x, y) = ρ(|x − y|),
with ρ ∈ L1
(Rd
) and ρ ∈ L1
(Rd
) (where ρ is the Fourier transform of
ρ), and satisfies also
k∈Zd
|ρ(hk)| < ∞.
Then for ℓ = m/m0 sufficiently large the resulting extended matrix is
positive definite.

But how large does ℓ need to be?
Theorem (GKNSS 2017) For the exponential covariance function, and
for h → 0, positive deﬁniteness of Rext
is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
So, for ﬁxed λ, ℓ needs to grow like log 1
h
= log(m0).

is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
h
= log(m0).
Remark 1. Note that the condition is easily satisﬁed when λ is small.
That’s good news, because that’s the hard case!

is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
h
= log(m0).
Remark 1. Note that the condition is easily satisﬁed when λ is small.
That’s good news, because that’s the hard case!
Remark 2. For the whole Matérn class the result is similar, but more
extension is needed as ν increases:
ℓ
λ
≥ C1 + C2ν1/2
log max(
λ
h
, ν1/2
) .

Experiments agree
1 2 3 4 5 6
0
2
4
6
8
10
12
log2 m0
ℓ=m/m0
d = 3, λ = 0.5
ν = 4
ν =
√
8
ν = 2
ν =
√
2
ν = 1
ν = 1/
√
2
ν = 0.5

The power of the technology
Some numbers to ponder:
For d = 3 and m0 = 25
= 32:
the number of grid points in the unit cube is 333
= 35, 937
the covariance matrix R has 35, 937 rows and columns, and
hence has 336
≈ 1.3 × 109
elements

For d = 3 and m0 = 25
= 32:
= 35, 937
hence has 336
≈ 1.3 × 109
elements
if we can take ℓ = 1 than Rext
has 646
≈ 7 × 1010
elements
if ℓ = m/m0 = 6 then Rext
has 63
× 646
≈ 1013
elements, all
non-zero

For d = 3 and m0 = 25
= 32:
= 35, 937
hence has 336
≈ 1.3 × 109
elements
if we can take ℓ = 1 than Rext
has 646
≈ 7 × 1010
elements
if ℓ = m/m0 = 6 then Rext
has 63
× 646
≈ 1013
elements, all
non-zero
And if m0 = 200 say, then ....

There’s also something interesting
There is something interesting about the theory and the experiments:
It is the EASY cases that require a lot of extension (and hence a very
big matrix Rext
).
the difﬁcult cases are those with small correlation length λ or low smoothness ν.

There’s also something interesting
There is something interesting about the theory and the experiments:
It is the EASY cases that require a lot of extension (and hence a very
big matrix Rext
).
the difﬁcult cases are those with small correlation length λ or low smoothness ν.
So the circulant embedding technique, while perhaps not so useful for
easy problems, might be very useful for really hard problems.

I Graham, F Kuo, D Nuyens, R Scheichl and I Sloan, “Analysis of
circulant embedding methods for sampling stationary random ﬁelds”, in
late stage of preparation

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017

Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017