A small introduction into H-matrices which I gave for my colleagues
1. Introduction into hierarchical matrix technique
Alexander Litvinenko
KAUST, SRI-UQ Center
http://sri-uq.kaust.edu.sa/
www.hlib.org www.hlibpro.com
August 16, 2015
3. 4*
Content
1. Motivation
2. Low-rank matrices
3. Cluster Tree, Block Cluster Tree and Admissibility condition
4. 1D BEM example
5. Hierarchical matrices: cost and storage
6. Two applications
Used: H-matrices Winter School Script (www.hlib.org), PhD thesis
of Ronald Kriemann, preprints from www.mis.mpg.de
3 / 27
4. 4*
Motivation: Iterative and direct solvers
Ax = b
Iterative methods: Jacobi, Gauss- Seidel, SOR, ...
Direct solvers: Gaussian elimination, domain decompositions, LU,...
Cost of A−1 is O(n3), number of iteration is proportional to
cond(A).
If A is structured (diagonal, Toeplitz, circulant) then can apply e.g.
FFT, but if not ?
What if you need not only x = A−1b, but f (A)
(e.g. A−1, exp A, sinA, signA, ...)?
4 / 27
6. 4*
Rank-k matrices
M ∈ Rn×m, U ≈ ˜U ∈ Rn×k, V ≈ ˜V ∈ Rm×k, k min(n, m).
The storage ˜M = ˜U ˜Σ ˜V T is k(n + m) instead of n · m for M
represented in the full matrix format.
U VΣ
T=M
U
VΣ∼
∼ ∼ T
=M
∼
Figure : Reduced SVD, only k biggest singular values are taken.
6 / 27
7. 4*
H-matrices (Hackbusch ’99)
1. Build cluster tree TI and block cluster tree TI×I .
I
I
I I
I
I
I I I I1
1
2
2
11 12 21 22
I11
I12
I21
I22
7 / 27
8. 4*
Admissible condition
2. For each (t × s) ∈ TI×I , t, s ∈ TI , check
admissibility condition
min{diam(Qt), diam(Qs)} ≤ η · dist(Qt, Qs).
if(adm=true) then M|t×s is a rank-k matrix
block
if(adm=false) then divide M|t×s further or de-
fine as a dense matrix block, if small enough.
Q
Qt
S
dist
H=
t
s
Resume: Grid → cluster tree (TI ) + admissi-
bility condition → block cluster tree (TI×I ) →
H-matrix → H-matrix arithmetics.
4 2
2 2 3
3 3
4 2
2 2
4 2
2 2
4
8 / 27
9. 4*
Where does the admissibility condition come from?
Let B1, B2 ⊂ Rd are compacts, and χ(x, y) is defined for
(x, y) ∈ B1 × B2 with x = y.
Let K be an integral operator with an asymptotic smooth kernel χ
in the domain B1 × B2:
(Kv)(x) =
B2
χ(x, y)v(y)dy (x ∈ B1).
Suppose that χ(k)(x, y) is an approximation of χ in B1 × B2 of the
separate form:
χ(k)
(x, y) =
k
ν=1
ϕ(k)
ν (x)ψ(k)
ν (y),
where k is the rank of separation.
Then χ − χ(k)
∞,B1×B2 ≤ c1
c2min{diam(B1),diam(B2)}
dist(B1,B2)
k
.
9 / 27
10. 4*
H-Matrix Approximation of BEM Matrix
Consider the following integral equation
1
0
log |x − y|U(y)dy = F(x), x ∈ (0, 1).
After discretisation by Galerkin’s method we obtain
1
0
1
0
φi (x) log |x − y|U(y)dydx =
1
0
φi (x)F(x)dx, 0 ≤ i < n,
in the space Vn := span{φ0, ..., φn−1}, where φi , i = 1, ..., n − 1,
are some basis functions in BEM. The discrete solution Un in the
space Vn is Un := n−1
j=0 uj φj with uj being the solution of the
linear system 10 / 27
11. Gu = f , Gij :=
1
0
1
0
φi (x) log |x − y|φj (y)dydx, fi :=
1
0
φi (x)F(x)dx.
(1)
We replace the kernel function g(x, y) = log |x − y| by a
degenerate kernel
˜g(x, y) =
k−1
ν=0
gν(x)hν(y). (2)
Then we substitute g(x, y) = log |x − y| in (1) for ˜g(x, y)
˜Gij :=
1
0
1
0
φi (x)
k−1
ν=0
gν(x)hν(y)φj (y)dydx.
After easy transformations
˜Gij :=
k−1
ν=0
(
1
0
φi (x)gν(x)dx)(
1
0
hν(y)φj (y)dy).
11 / 27
12. Now, all admissible blocks G|(t,s) can be represented in the form
G|(t,s) = ABT
, A ∈ R|t|×k
, B ∈ R|s|×k
,
where the entries of the factors A and B are
Aiν :=
1
0
φi (x)gν(x)dx, Bjν :=
1
0
φj (y)hν(y)dy.
We use the fact that the basis functions are local and obtain for all
inadmissible blocks:
˜Gij :=
(i+1)/n
i/n
(j+1)/n
j/n
log |x − y|dydx.
12 / 27
13. 4*
Storage and complexity (single proc. and p-proc. on shared mem.)
Let H(TI×J, k) := {M ∈ RI×J | rank(M |t×s) ≤ k for all
admissible leaves t × s of TI×J}, n := max(|I|, |J|, |K|).
Operation Sequential Compl. Parallel Complexity
(R.Kriemann 2005)
building(M) N = O(n log n) N
p + O(|V (T)L(T)|)
storage(M) N = O(kn log n) N
Mx N = O(kn log n) N
p + n√
p
αM ⊕ βM N = O(k2n log n) N
p
αM M ⊕ βM N = O(k2n log2
n) N
p + O(Csp(T)|V (T)|)
M−1 N = O(k2n log2
n) N
p + O(nn2
min)
LU N = O(k2n log2
n) N
H-LU N = O(k2n log2
n) N
p + O(k2n log2
n
n1/d )
13 / 27
14. 4*
H-matrix approximation of covariance matrices
Example: H-matrix approximation of covariance
matrices
Can be applied for efficient computing the Karhunen-Loeve
Expansion (need to solve a large eigenvalue problem), and
for generating large random fields (Cholesky and MV product)
14 / 27
16. 4*
H - Matrices
Dependence of the computational time and storage requirements
of CH on the rank k, n = 322.
k time (sec.) memory C−CH 2
C 2
2 0.04 2e + 6 3.5e − 5
6 0.1 4e + 6 1.4e − 5
9 0.14 5.4e + 6 1.4e − 5
12 0.17 6.8e + 6 3.1e − 7
17 0.23 9.3e + 6 6.3e − 8
The time for dense matrix C is 3.3 sec. and the storage 1.4e + 8
(Bytes).
16 / 27
17. 4*
Examples of covariance functions
The random field requires to specify its spatial correl. structure
covf (x, y) = E[(f (x, ·) − µf (x))(f (y, ·) − µf (y))],
where E is the expectation and µf (x) := E[f (x, ·)].
Let h = 3
i=1 h2
i / 2
i , where hi := xi − yi , i are cov. lengths.
Gaussian cov(h) = σ2 · exp(−h2),
exponential cov(h) = σ2 · exp(−h),
spherical
cov(h) =
σ2 · 1 − 3
2
h
hr
+ 1
2
h3
h3
r
for 0 ≤ h ≤ hr ,
0 for h > hr .
17 / 27
18. 4*
Other Applications
1. Matrix exponential allows us to solve ODEs
˙x(t) = Ax(t), x(0) = x0, → x(t) = exp (tA)x0
2. Other matrix function: use representation by the Cauchy
integral
f (A) =
1
2πi Γ
f (t)(A − tI)−1
dt
and exponentially convergent quadrature rule [Hackbusch, TR
4/2005, MPI]
f (A) ≈
k
j=1
wj f (tj )(A − tj I)−1
to obtain an approximation.
18 / 27
19. 4*
Matrix equations
(Arises in the context of optimal control problems)
(Lyapunov) AX + XAT
+ C = 0, (3)
(Sylvester) AX − XB + C = 0, (4)
(Riccati) AX + XAT
− XFX + C = 0. (5)
where A, B, C, F are given matrices and X is the solution (a
matrix).
19 / 27
20. 4*
Solution of the Sylvester equations
Theorem
Let A ∈ Rn×n, B ∈ Rm×m, and let σ(A) < σ(B), then
X :=
∞
0
exp(tA)C exp(−tB)dt (6)
solves the Sylvester equation.
And with some special quadrature formulas [Hackbusch, TR 4/2005,
MPI]
Xk :=
k
j=0
wj exp(tj A)C exp(−tj B). (7)
fulfills X − Xk < CA,B C 2 exp(−
√
k).
20 / 27
21. 4*
Estimation of H-matrix rank of Xk
Lemma
If C ∈ H(T, kC ), exp(tj A) ≈ Aj ∈ H(T, kA),
exp(−tj B) ≈ Bj ∈ H(T, kB), for all tj (see more in [Gavrilyuk et al
2002, and Grasedyck et al 2003]) then
Xk ∈ H(T, kX ), kX = O(kp2
max{kA, kB, kC }). (8)
Numerically Xk can be found by a) matrix exponential b) multigrid
c) meta method (ADI); d) matrix sign function (see more
H-matrix lecture notes, Ch. 10).
21 / 27
22. 4*
Future work
1. Parallel H-matrix implementation on multicore systems and
on GPUs
2. H-matrix approximation of large covariance matrices (Mat´ern)
3. determinant and trace of an H-matrix
Kullback-Leibler divergence (KLD) DKL(P Q) is measure of the
information lost when distribution Q is used to approximate P:
DKL(P Q) =
∞
−∞
p(x) ln
p(x)
q(x)
dx,
where p, q densities of P and Q. For miltivariate normal
distributions (µ0, Σ0) and (µ1, Σ1)
2DKL(N0 N1) = tr(Σ−1
1 Σ0) + (µ1 − µ0)T
Σ−1
1 (µ1 − µ0) − k − ln
det Σ0
det Σ1
22 / 27
23. 4*
Conclusion
+ Complexity and storage is O(kr n logq
n), r = 1, 2, 3, q = 1, 2
+ Allow to compute f (A) efficiently for some class of functions f
+ Many examples: FEM 1D, 2D and 3D, BEM 1D,2D and 3D,
Lyapunov, Sylvester, Riccati matrix equations
+ Well appropriate to be used as a preconditioner (for iterative
methods)
+ There are sequential (www.hlib.org) and parallel
(www.hlibpro.com) libraries
+ There are A LOT of implementation details!
- Not so easy implementation
- Can be large constants in 3D
23 / 27