Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
A series of maximum entropy upper bounds of the differential entropy
Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017
From moments to sparse representations, a geometric, algebraic and algorithmi...BernardMourrain
Similar to Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017 (20)
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Generating Random Fields the Circulant Way - Ian H. Sloan, Sep 1, 2017
1. On the generation of random
fields
Ian Sloan
i.sloan@unsw.edu.au
University of New South Wales, Sydney, Australia
Joint with I Graham & R Scheichl (Bath),
F Kuo (UNSW) & D Nuyens(KU Leuven)
SAMSI, September 1, 2017
2. Suppose we want to generate a Gaussian random field over this
L-shaped region with a hole:
The question is: how to generate the random field?
3. Suppose we want to generate a Gaussian random field over this
L-shaped region with a hole:
The question is: how to generate the random field?
Application: PDE with random field as input.
Then QMC to find expected value of linear functionals
4. Gaussian random fields
Z(x) = Z(x, ω), for x ∈ D ⊂ Rd
, is a Gaussian random field if
for each x ∈ D, Z(x) is a normally distributed random variable;
the field is fully determined by knowing its mean
¯Z(x) := E[Z(x)]
and its covariance function:
r(x, y) := E Z(x) − ¯Z(x) Z(y) − ¯Z(y) .
5. Gaussian random fields
Z(x) = Z(x, ω), for x ∈ D ⊂ Rd
, is a Gaussian random field if
for each x ∈ D, Z(x) is a normally distributed random variable;
the field is fully determined by knowing its mean
¯Z(x) := E[Z(x)]
and its covariance function:
r(x, y) := E Z(x) − ¯Z(x) Z(y) − ¯Z(y) .
For simplicity we will consider throughout mean-zero fields, that is
¯Z(x) = 0, x ∈ D =⇒ r(x, y) := E[Z(x)Z(y)].
6. Examples of 2d covariance functions
r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– very smooth (in 1-d is σ2e−|x−y|2/λ2
– “Gaussian”)
Here σ2
is the variance, and λ is the correlation length.
7. Examples of 2d covariance functions
r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– very smooth (in 1-d is σ2e−|x−y|2/λ2
– “Gaussian”)
Here σ2
is the variance, and λ is the correlation length.
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
– not smooth at x = y (in 1-d is σ2e−|x−y|/λ, – “exponential”).
8. Examples of 2d covariance functions
r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– very smooth (in 1-d is σ2e−|x−y|2/λ2
– “Gaussian”)
Here σ2
is the variance, and λ is the correlation length.
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
– not smooth at x = y (in 1-d is σ2e−|x−y|/λ, – “exponential”).
More general is the Matérn class rν(x, y), ν ∈ [1
2
, ∞), which
contains the examples above at the two ends of its parameter range:
ν = 1
2
gives the exponential case, ν = ∞ gives the Gaussian case.
9. Examples of 2d covariance functions
r(x, y) = σ2
exp −
|x1 − y1|2
+ |x2 − y2|2
λ2
,
– very smooth (in 1-d is σ2e−|x−y|2/λ2
– “Gaussian”)
Here σ2
is the variance, and λ is the correlation length.
r(x, y) = σ2
exp −
|x1 − y1|2 + |x2 − y2|2
λ
,
– not smooth at x = y (in 1-d is σ2e−|x−y|/λ, – “exponential”).
More general is the Matérn class rν(x, y), ν ∈ [1
2
, ∞), which
contains the examples above at the two ends of its parameter range:
ν = 1
2
gives the exponential case, ν = ∞ gives the Gaussian case.
How to compute realisations of the input field? One way is:
10. Karhunen-Loève expansion
Z(x, ω) =
∞
j=1
√
µj Yj(ω)φj(x),
where the Yj are independent standard normal random numbers, and
(µj, φj) satisfy
D
r(x, x′
)φj(x′
) dx′
= µjφj(x),
D
φi(x)φj(x) dx = δij.
11. Why does it work?
Because if
Z(x) =
∞
j=1
√
µj Yj(ω)φj(x),
then formally
E[Z(x)Z(x′
)] = E
∞
j=1
√
µj Yjφj(x)
∞
k=1
√
µk Ykφk(x′
)
=
∞
j=1
∞
k=1
√
µj
√
µkφj(x)φk(x′
)E[YjYk]
=
∞
j=1
∞
k=1
√
µj
√
µkφj(x)φk(x′
)δj,k
=
∞
j=1
µjφj(x)φj(x′
) = r(x, x′
)
by Mercer’s theorem.
12. There’s a problem if KL convergence is slow
The KL convergence can be very slow if
the field is rough (that is, if ν is small),
or if the correlation length λ is small,
or the variance σ2
is large.
13. There’s a problem if KL convergence is slow
The KL convergence can be very slow if
the field is rough (that is, if ν is small),
or if the correlation length λ is small,
or the variance σ2
is large.
And if the convergence is slow for a 1-dimensional physical domain D
then it is MUCH slower for a 2-dimensional domain, and worse still for
a 3-dimensional domain D.
This means that the truncation errors can be VERY large.
14. And another thing:
Also, the eigenvalue problem becomes non-trivial for a 3-dimensional
domain D if thousands of eigenvalues and eigenfunction pairs are
needed:
Recall
D
r(x, x′
)φj(x′
) dx′
= µjφj(x),
D
φi(x)φj(x) dx = δij,
15. The discrete alternative
If we are going to use, for example, piecewise linear finite elements,
then we don’t need the field everywhere: we only need it at points
related to the finite element grid - for example, at the triangle centroids,
as shown. This is “the discrete alternative”.
As input we now need r(x, x′
) only at the discrete x, x′
- we need
only “standard information” in the language of IBC.
16. How to generate the field at grid points?
Suppose we want the field only at a set of points x1, . . . , xM ∈ D.
Now the field is a vector of length M:
Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.
17. How to generate the field at grid points?
Suppose we want the field only at a set of points x1, . . . , xM ∈ D.
Now the field is a vector of length M:
Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.
This is a Gaussian random vector with mean zero and a positive
definite covariance matrix
R = [R]M
i,j=1.
where
Ri,j = E[Z(xi)Z(xj)]= r(xi, xj).
18. How to generate the field at grid points?
Suppose we want the field only at a set of points x1, . . . , xM ∈ D.
Now the field is a vector of length M:
Z(ω) := (Z(x1, ω), . . . , Z(xM , ω))⊤
.
This is a Gaussian random vector with mean zero and a positive
definite covariance matrix
R = [R]M
i,j=1.
where
Ri,j = E[Z(xi)Z(xj)]= r(xi, xj).
So if r(x, y) is known, then so is the covariance matrix.
19. How to generate a random field with prescribed covariance matrix R?
Suppose we can factorise the matrix in some way
R = BB⊤
.
Because R is positive definite, we can, for example, take B to be the
square root of R. Also Cholesky, ....
20. How to generate a random field with prescribed covariance matrix R?
Suppose we can factorise the matrix in some way
R = BB⊤
.
Because R is positive definite, we can, for example, take B to be the
square root of R. Also Cholesky, ....
Once B is known we can generate the field by
Z(ω) = BY(ω), where Y = (Y1(ω), . . . , YM (ω))⊤
,
where Y1(ω), . . . , YM (ω) are iid standard normal variables.
21. Why does it work?
Simply note that, because
Z(ω) = BY(ω),
with Y a vector of iid standard normal random variables, we have
E[ZZ⊤
] = E[BYY⊤
B⊤
] = B E[YY⊤
]B⊤
= BB⊤
= R.
22. Now there is no truncation error!
Now there is no truncation error because all we need is to factorise the
covariance matrix R. The problem has turned into one of linear
algebra.
23. Now there is no truncation error!
Now there is no truncation error because all we need is to factorise the
covariance matrix R. The problem has turned into one of linear
algebra.
But now suppose that M is very large, say in the tens of thousands, or
even millions. The matrix R is typically dense, so a Cholesky
factorisation will take M3
operations. This is not generally feasible
when M is large.
24. Let’s specialise the covariance function
In practice most covariance functions have this form:
r(x, y) = ρ( x − y ),
That is, the covariance functiion is stationary and isotropic.
In this situation there is great benefit is taking the grid to be UNIFORM.
25. The benefits of uniformity
When the covariance function is isotropic there are great benefits to
computing the field on a UNIFORM grid, because then the matrix is
typically block Toeplitz. This is the path we follow: we initially
compute the field only at the red crosses in this image.
After that we use bilinear interpolation to find the field at the blue
points.(The resulting error is of same order as FE error.)
26. The uniform grid
We cover the original domain D by the unit cube in Rd
, and on it define
a uniform grid with m0 + 1 points on each edge. So the spacing is
h = 1/m0, and in total there are M = (m0 + 1)d
points in the grid.
27. The 1-dimensional case
For a 1-dimensional domain D covered by the unit interval, and on it a
uniform grid with spacing h, the first row of the covariance matrix
R = (ρ(|xi − xj|))m0
i,j=0 is
ρ(0), ρ(h), ρ(2h), ....., ρ(m0h),
and the second row is
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 1)h),
etc. This is a Toeplitz matrix.
28. Extending the matrix
It can be made into a circulant matrix Rext
of almost double the
number of rows and columns (i.e. 2m0) by reflecting the top row, to
obtain, in the 1-dimensional case, for the first row
ρ(0), ρ(h), ρ(2h), ....., ρ((m0 − 1)h), ρ(m0h), ρ((m0 − 1)h), ..., ρ(h)
and then, by “wrapping it around”, the second row becomes
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 2)h), ρ((m0 − 1)h), ρ(m0h), ..., ρ(2h)
etc – a CIRCULANT matrix.
29. Extending the matrix
It can be made into a circulant matrix Rext
of almost double the
number of rows and columns (i.e. 2m0) by reflecting the top row, to
obtain, in the 1-dimensional case, for the first row
ρ(0), ρ(h), ρ(2h), ....., ρ((m0 − 1)h), ρ(m0h), ρ((m0 − 1)h), ..., ρ(h)
and then, by “wrapping it around”, the second row becomes
ρ(h), ρ(0), ρ(h), ...., ρ((m0 − 2)h), ρ((m0 − 1)h), ρ(m0h), ..., ρ(2h)
etc – a CIRCULANT matrix.
The point is that a circulant matrix of size M × M can be
factorised by FFT in a time of order M log M.
30. More precisely, write
Rext
= XΛX⊤
,
where Λ is the diagonal matrix of eigenvalues of Rext
, and the rows of
X are the normalised eigenvectors (which are just complex
exponentials).
Note that the eigenvalues of Rext
are real.
And we use an all-real version of FFT, which makes for efficient implementation.
31. More precisely, write
Rext
= XΛX⊤
,
where Λ is the diagonal matrix of eigenvalues of Rext
, and the rows of
X are the normalised eigenvectors (which are just complex
exponentials).
Note that the eigenvalues of Rext
are real.
And we use an all-real version of FFT, which makes for efficient implementation.
If D is a 2 or 3-dimensional region then the matrix is BLOCK circulant,
and again FFT can be used.
32. History
In the 1-dimensional case there is a substantial literature on circulant
embedding for the efficient generation of Gaussian random fields:
Dietrich and Newsam, 1997
Chan and Wood, 1997
33. What’s the catch?
The catch is that the extended matrix Rext
may not be positive
definite – because some of the eigenvalues of Rext
may be
negative.
34. Before we fix the non-p.d. problem:
Let’s assume for the moment that all eigenvalues of Rext
are
non-negative. Then we can write
Rext
= XΛX⊤
= (XΛ1/2
)(XΛ1/2
)⊤
,
How does this help with factorisation of R? Answer: R is a submatrix
of Rext
. By selecting the appropriate rows and columns of the
factorisation above we obtain
R = BB⊤
,
with B consisting of just the appropriate rows of XΛ1/2
.
35. Fixing the non-p.d. problem
We extend the matrix R before reflection: keeping the same grid
spacing h, we cover the unit cube by now a larger cube of size
ℓ = mh, with m > m0, and hence ℓ = m/m0 > 1.
Again this is not new, but the way we do the extension might be:
sometimes the extension is done by “padding by zeros”.
36. Theorem (GKNSS, 2017)
Assume that the covariance function satisfies r(x, y) = ρ(|x − y|),
with ρ ∈ L1
(Rd
) and ρ ∈ L1
(Rd
) (where ρ is the Fourier transform of
ρ), and satisfies also
k∈Zd
|ρ(hk)| < ∞.
Then for ℓ = m/m0 sufficiently large the resulting extended matrix is
positive definite.
37. But how large does ℓ need to be?
Theorem (GKNSS 2017) For the exponential covariance function, and
for h → 0, positive definiteness of Rext
is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
So, for fixed λ, ℓ needs to grow like log 1
h
= log(m0).
38. But how large does ℓ need to be?
Theorem (GKNSS 2017) For the exponential covariance function, and
for h → 0, positive definiteness of Rext
is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
So, for fixed λ, ℓ needs to grow like log 1
h
= log(m0).
Remark 1. Note that the condition is easily satisfied when λ is small.
That’s good news, because that’s the hard case!
39. But how large does ℓ need to be?
Theorem (GKNSS 2017) For the exponential covariance function, and
for h → 0, positive definiteness of Rext
is guaranteed if
ℓ
λ
≥ C1 + C2 log
λ
h
.
So, for fixed λ, ℓ needs to grow like log 1
h
= log(m0).
Remark 1. Note that the condition is easily satisfied when λ is small.
That’s good news, because that’s the hard case!
Remark 2. For the whole Matérn class the result is similar, but more
extension is needed as ν increases:
ℓ
λ
≥ C1 + C2ν1/2
log max(
λ
h
, ν1/2
) .
41. The power of the technology
Some numbers to ponder:
For d = 3 and m0 = 25
= 32:
the number of grid points in the unit cube is 333
= 35, 937
the covariance matrix R has 35, 937 rows and columns, and
hence has 336
≈ 1.3 × 109
elements
42. The power of the technology
Some numbers to ponder:
For d = 3 and m0 = 25
= 32:
the number of grid points in the unit cube is 333
= 35, 937
the covariance matrix R has 35, 937 rows and columns, and
hence has 336
≈ 1.3 × 109
elements
if we can take ℓ = 1 than Rext
has 646
≈ 7 × 1010
elements
if ℓ = m/m0 = 6 then Rext
has 63
× 646
≈ 1013
elements, all
non-zero
43. The power of the technology
Some numbers to ponder:
For d = 3 and m0 = 25
= 32:
the number of grid points in the unit cube is 333
= 35, 937
the covariance matrix R has 35, 937 rows and columns, and
hence has 336
≈ 1.3 × 109
elements
if we can take ℓ = 1 than Rext
has 646
≈ 7 × 1010
elements
if ℓ = m/m0 = 6 then Rext
has 63
× 646
≈ 1013
elements, all
non-zero
And if m0 = 200 say, then ....
44. There’s also something interesting
There is something interesting about the theory and the experiments:
It is the EASY cases that require a lot of extension (and hence a very
big matrix Rext
).
the difficult cases are those with small correlation length λ or low smoothness ν.
45. There’s also something interesting
There is something interesting about the theory and the experiments:
It is the EASY cases that require a lot of extension (and hence a very
big matrix Rext
).
the difficult cases are those with small correlation length λ or low smoothness ν.
So the circulant embedding technique, while perhaps not so useful for
easy problems, might be very useful for really hard problems.
46. I Graham, F Kuo, D Nuyens, R Scheichl and I Sloan, “Analysis of
circulant embedding methods for sampling stationary random fields”, in
late stage of preparation