This document presents a parallel direct method for solving initial value problems for ordinary differential equations. The method approximates the initial value problem using a boundary value method, resulting in a block tridiagonal linear system. The paper describes how to solve this system in parallel using a generalization of cyclic reduction. Numerical tests show the parallel direct method provides good speedups for problems of small dimension compared to an iterative method. The method could provide an alternative approach for solving ODEs in parallel that avoids bottlenecks of iterative methods.
A Parallel Direct Method For Solving Initial Value Problems For Ordinary Differential Equations
1. Applied Numerical Mathematics 11 (1993) 85-93
North-Holland
APNUM 358
85
A parallel direct method
initial value problems for
differential equations *
for solving
ordinary
P. Amodio and D. Trigiante zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC
Dipartimento di Matematica, Uniuersit6 di Bari, Via Orebona 4, I-70125 Bari, Italy
Abstract
Amodio, P. and D. Trigiante, A parallel direct method for solving initial value problems for ordinary
differential equations, Applied Numerical Mathematics 11 (1993) 85-93.
The aim of this paper is to solve, by direct methods, the systems arising in the numerical solution of ODES
with the boundary value techniques [2,3,7]. The obtained block tridiagonal systems are solved by generalizing
known solvers for tridiagonal systems [l]. In particular, a generalization of the parallel cyclic reduction is
considered.
For problems of small dimension, we show that direct methods give good results.
Keywords. Ordinary differential equations; block tridiagonal systems; direct methods; parallel computers. zyxwvutsrqpon
1. Introduction
The design of good parallel algorithms for the numerical solution of ordinary differential
equations on parallel computers is a difficult problem. A bottleneck is the iterative procedure
which is utilized to determine the solution. An interesting approach to this problem could be
the approximation of the initial value problem with a boundary value method (BVM) [2,3,7].
In this paper, the parallel solution of the following system of differential equations
Y’(t) =A(t)Y(t), t > t,,
Y(t,) = Y,, (1.1)
where A(t) is an m X m matrix will be obtained by using the BVMs associated with a direct
method for solving the resulting system of equations. For simplicity we shall assume that
Y(t) = 0 is asymptotically stable.
In Section 2 we recall how to obtain a block tridiagonal linear system starting from problem
(1.1). In Section 3 we briefly discuss the stability properties of the coefficient matrix. In Section
Correspondence to: P. Amodio, Dipartimento di Matematica, Universith di Bari, Via Orebona 4, I-70125 Bari, Italy.
Fax: ( + 39-80) 242722.
* Work supported by the European Community (P.C.A. contract #4040) and C.N.R. (P.F. “Calcolo Parallelo”,
sottoprogetto 1 and contract of research).
01689274/93/$05.00 0 1993 - Elsevier Science Publishers B.V. All rights reserved
2. 86 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P. Amodio, D. Trigiante / A parallel direct method for ODES
4 a parallel direct method (derived from the cyclic reduction algorithm for a tridiagonal system)
is explained in detail and in Section 5 some numerical tests are performed in comparision with
the subroutine LSODE.
2.
to
Boundary value techniques for solving initial value problems
In [2,3,7] it has been shown that applying boundary value techniques to problem (1.1) leads
the discrete problem
‘01 E1
C, D, E, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
E
c, ii;
(
-1
(2.1)
where Ci, Di, and Ei are m X m blocks which depend on the particular method used; in the
following sections we shall use Simpson’s method defined by (considering constant stepsize h)
ci = -I - $4(ti_,), Di = - $4(ti), E,=I- f/ U(t,+,)
as a two-step implicit method and the following trapezoidal method as a one-step implicit
method:
c, = -I - $A(t,_l), D, =I-- &4(t,).
The system (2.1) is a block tridiagonal system. In [3] a similar system (derived by discretizing
problem (1.1) with the midpoint method and the implicit Euler method) has been solved by
using iterative methods, and good speed-ups were obtained when the dimension of A(t) is
large. In the case of a lower dimension, the results were not as good.
In this paper we shall threat the system (2.1) by using a parallel direct method which, as we
shall see, gives good speed-ups for small m.
3. Stability of the linear system
In order to gain a better insight on the well-conditioning, we shall discuss the easier case of
real tridiagonal systems (obtained when A(t) is scalar and real).
In this case it is known [4] that a sufficient condition for the well-conditioning of the
coefficient matrix is the algebraic diagonal dominance by rows and by columns. This means that
we have (see (2.1)):
IDiI > ICi+EiI, (3.la) zyxwvuts
IDiI > ICi+l+Eid lI> (3 .lb)
where now Ci, Di, and Ei are scalars.
From the row condition (3.la) we obtain:
zh I A(ti) I> ih IA(t,_,) +A(ti+l) I * 4 IA(fi) I> IA(ti-,) +A(tj+l) I*
3. P. Amodio, D. Trigiante / A parallel direct method for ODES 87
If we suppose that A(tj) is smooth in [ti_l, ti+ 1], then from zyxwvutsrqponmlkjihgfedcbaZYXWVUT
A(ti_,) +A(t,+l) = 2A(ti) +h2A”(li) where &E (ti_l, ti+*)
it follows that if /z2A”(f;) is small then the coefficient matrix in (2.1) can always be reduced to
an algebraic diagonally dominant one.
From (3.lb) we obtain
+h I A(ti) I > $h I A(tJ I,
which is always verified.
If the dimension of A(t) is greater than 1, then-supposing A(t) =A constant-conditions
(3.1) are verified for the block matrix element by element (the algebraic diagonal dominance
condition is verified on the elements in the same position in the blocks). This is equivalent to a
sort of block algebraic diagonal dominance. Numerical tests show that the properties of
well-conditioning are mantained.
4. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
The parallel algorithm
The parallel solution of tridiagonal and block tridiagonal linear systems has been the object
of several papers [1,6]. For tridiagonal systems, the most interesting parallel methods are the
cyclic reduction and the partition methods [8].
The idea of the cyclic reduction method is to cyclically reduce the size of the original system
(in each step of the reduction to a half) in order to obtain a tridiagonal system which is easier
to solve. Given the matrix
A= zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
a1 Cl
b, a2 c2
. .
c n-l
bn-l an
(4.1)
to obtain a tridiagonal submatrix of dimension $z, we consider the odd-even permutation
matrix
P, =
1
0 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
0
0
0
0
1
0
0
1
0
0
0
1
0
0
and the following factorization for P,APF:
(4.2)
where D, and E, are diagonal matrices containing respectively the odd and the even main
diagonal entries of A, S, and T, are bidiagonal, and A, = E, - S,D;IT, is again tridiagonal.
4. 88 P. Amodio, D. Trigiante /A parallel direct method for ODES
The main advantage of the cyclic reduction with respect to partition methods is the minimum
memory requirement; the problem is that it requires synchronizations among the processors at
each step of the reduction [S].
In [l] it is demonstrated that an efficient version of the cyclic reduction (defined as a parallel
cyclic reduction algorithm) for the solution of large tridiagonal systems on parallel computers
with a small number of processors (so that n/p x=- 1) may be obtained by applying a scalar
cyclic reduction to each submatrix obtained through a partition method.
Let us recall something about the approach from which the parallel cyclic reduction derives
(see [l] for more detail): we partition the matrix (4.1) as a block tridiagonal matrix, that is,
A(1) c$‘)e
1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ii’)eT a,(1) ca)e;f zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
0
bc2)e
0 0
A(2) ci2)e zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
1
A= 0 b(:)eT (2)
a1
ai"- 1) cp)e;f
bg)e, A(P)
I
(4.3)
where e, = (1, O,, . . ,O)T, e, = (0,. .
.
,
0,l)T, A(')
is a tridiagonal block, and a(,i), bf’, b’,“, ca’,
and c(,‘) are scalars.
Factorizing (4.3) in a proper way, it is possible to make both the factorization of the blocks
A(')
and the solution of the associate linear systems independent. For example, we may consider
the following factorization (also defined in [6]):
A(‘) 0
yW 1 ,,,GW 0
0 Ac2) 0
0 zJC2jT1
OT
Ik-l
OT
p
p
@(2)
1 w(P)T
0 Acp)
(4.4)
,(p- 1)
$P)
6. 90 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P. Amodio, D. Trigiarrte/ A parallel direct method for ODES
become the identity matrix and the null matrix while a dense block is substituted for each
nonzero element.
To exploit the structure of the rigth-hand side (see (2.111, we consider an even-odd block
permutation matrix. In this way, the lower triangular systems need not be solved; in fact, the
solution of each lower triangular system requires (see (4.6)) the solution of a block diagonal
system with a null right-hand side and the updating of the right-hand side by means of null
vectors. The following “operative” factorization for MC’) results:
where
OT OT 0,
zyxwvutsrqponmlkjihgfedcbaZY
U, L,‘T, f,
0 I 0 6?
6,, fir, @r, and ?Pr are m x m blocks with the same properties as &,, &r, 4,, and $r, and A, is
block tridiagonal.
To obtain the factorization of MC’), it is necessary to invert the even blocks on the main
diagonal of each Di. Therefore, the number of operations is O(n . m3). Because the reduced
system has dimension m -p, a necessary condition to obtain good speed-ups is y1Z+p. Moreover
IZX- m must result to reach good performances with respect to iterative solvers [33.
5. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Numerical tests
The numerical tests have been performed on a distributed memory parallel computer, a
multiputer with 32 transputers T800-20, each with 1 Mb of local memory. The scalar tests have
Table 1
Problem 1: error
No. of steps h, Y
32 5.10-3 1.2
64 1.10-3 1.12
128 5-10-4 1.06
256 2.10-4 1.03
512 1.10p4 1.015
EI
4.0.10-4
2.9.10-j
2.0~10~6
1.2.10-7
7.9.10-9
E2 rtol
8.5.10-4 10-4
7.8.10-5 10-s
7.4.10-6 10-6
6.4.10-’ 10-s
5.8.10~8 lo-”
at01
1o-4
10-6
10-7
10-s
lo- lo
7. P. Amodio, D. Trigiante / A parallel direct method for ODES 91
Table 2
Problem 1: speed-up of the algorithm
Blocks Processors
2 4 8 16 32
32 1.96 3.18 3.97
64 2.00 3.61 5.39 6.11
128 2.00 3.88 6.54 8.85 8.86
256 4.00 7.36 11.56 13.96
512 7.85 13.77 19.82
been performed on a single TSOO-20with 16 Mb of memory. The topology of interconnection
used among the processors is a pipeline.
For this algorithm, the ideal topology should be a hypercube. In fact, the only point of
synchronization is in the solution of the (p - 1) X (p - 1) block tridiagonal reduced system. By
solving this system with the cyclic reduction, we obtain the solution in log,p steps, and each
processor communicates only with the neighbouring processors in the hypercube (see [5] for
more detail).
The parallel solver has been compared with the subroutine LSODE from ODEPACK which
is a known algorithm for solving initial value problems on scalar computers. LSODE depends
on the two parameters rtol and atol (relative and absolute tolerance parameters) which
determine the accuracy of the solution. They have been chosen in order to obtain in LSODE an
accuracy comparable to the one obtained with the parallel algorithm.
We have used Simpson’s method with variable stepsize; hi has been chosen in accordance
with the formula zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
hi = h&,
where y is a constant which depends on t,, the time when the steady state is reached, and on
the number of blocks (i.e. on the required precision of the parallel solver). zyxwvutsrqponmlkjihgfedcbaZY
Problem 1. In the first test problem, the 3 x 3 matrix
I
-21 19 -20
A(t) = 19 -21 20
40 -40 -40
Table 3
Problem 1: speed-up of the problem
Blocks
32
64
128
256
512
Processors
2
3.14
2.68
2.85
4 8 16 32
5.09 6.36
4.77 7.12 8.07
5.35 9.02 12.19 12.21
4.02 7.33 11.52 13.91
6.58 11.54 16.61
8. 92 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P. Amodio, D. Trigiante / A parallel direct method for ODES
Table 4
Problem 2: error
No. of steps
32
64
128
256
h, Y
10-2 1.4
10-2 1.2
5.10-3 1.09
10-3 1.05
E,
6.4.10p3
1.1.10-3
7.1.10-S
2.7.10p6
E,
3.0~10~2
6.9.10V3
4.4.10-4
3.2.10-5
rtol
8.10-4
9.10-5
2.10-6
8.10-s
at01
10-4
10-5
10-6
10-s
has eigenvalues A, = - 2 and h2,3 = -40 f 40i. The starting point is Y0 = (1, 0, - ljT. For this
problem the theoretic solution is known (see [7]). Table 1 contains the two error parameters
which we have used to evaluate the accuracy of the solution for different sizes of the coefficient
matrix of the parallel solver. We have posed
E, = max I ti,j - si,j I and
i.i
E, = pz7
where the elements ti,j and s~,~ are the jth components of the theoretic and the obtained
solution respectively after i steps. Table 2 and Table 3 show, respectively, the speed-up of the
algorithm (ratio between the scalar and the parallel execution of the algorithm) and the
speed-up of the problem (ratio between the scalar execution of LSODE and the parallel
execution of the algorithm). zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Problem 2. In the second test problem we have considered an 8 x 8 coefficient matrix with
eigenvalues
hZi_i = - IO’_3, & = -5 . lo’-3, i= 1,...,4.
Tables 4-6 are analogous to the previous Tables l-3. The theoretical solution has been
obtained through the MATLAB function Y(t) = eA’fY,.
A comparision between the Tables 3 and 6 shows that the problem speed-up decreases when
the size of A(t) becomes larger. Better results may be obtained by considering a different
two-step ‘method, or a higher number of processors and a different topology of interconnection.
However the size of A(t) is the real bottleneck for this parallel solver.
Table 5
Problem 2: speed-up of the algorithm
Blocks Processors
4 8 16 32
32 3.47 4.91 5.36
64 3.81 6.18 8.10 8.24
128 7.12 10.92 13.36
256 13.22 19.13
9. P. Amodio, D. Trigiante / A parallel direct method for ODES 93
Table 6
Problem 2: speed-up of the problem
Blocks Processors
4 8 16 32
32 2.08 2.93 3.20
64 1.71 2.77 3.63 3.69
128 2.45 3.76 4.60
256 4.29 6.20 zyxwvutsrqponmlkjihgfedc
Acknowledgement
We thank Dr. Luigi Brugnano for his helpful comments and for presenting the results of this
work to the International Conference of Grado, and Dr. Francesca Mazzia and Ms. Paulene
Butts for their suggestions in the preparation of the manuscript.
References
[lj
121
131
[41
151
161
[71
Bl
P. Amodio, L. Brugnano and T. Politi, Parallel factorizations and parallel solvers for tridiagonal linear systems,
Linear Algebra Appl. (to appear).
A.O.H. Axelsson and J.G. Verwer, Boundary value techniques for initial value problems in ordinary differential
equations, Preprint, Mathematisch Centrum, Amsterdam (19831.
L. Brugnano, F. Mazzia and D. Trigiante, Parallel implementation of BVM methods, in: Proceedings Znterna-
tional Conference on Parallel Methods for Ordinary Differential Equations: The State of the Art, Grado, Italy
(1991); also: Appl. Numer. Math. 11 (1993) 115-124 (this issue).
L. Brugnano and D. Trigiante, Invertibility and conditioning of tridiagonal matrices, Linear Algebra Appl. 166
(1992) 131-150.
I.N. Hajj and S. Skelboe, A multilevel parallel solver for block tridiagonal and banded linear systems, Parallel
Comput. 15 (1990) 21-45.
S.L. Johnsson, Solving narrow banded systems on ensemble architectures, ACM Trans. Math. Software 11 (1985)
271-288.
L. Lopez and D. Trigiante, Numerical boundary value techniques for initial value problems, in: Proceedings at the
International Conference on Parallel Methods for Ordinary Differential Equations: The State of the Art, Grado,
Italy (1991); also: Appl. Numer. Math. 11 (1993) 225-239 (this issue)..
J.M. Ortega, Introduction to Parallel and Vector Solution of Linear Systems (Plenum, New York, 1988).