'DRc- /
A Recursive Form of the Inverse of
General Sparse Matrices
Johannes Bisschop and Alexander Meeraus
Technical Note No. 1
Research Project 670-24-     //
July    1976
Development Research center
World Bank
1818 H Street, N.W.
Washington, D.C.    20433


A Recursive Form of the Inverse of
General Sparse Matrices
by
Johannes Bisschop and Alexander Meeraus
ABSTRACT:          The Hellerman-Rarick preassigned pivot procedure and a
recursive partitioning algorithm are used to find a stable and compact
representation of the inverse of a gene-al sparse matrix.
1.   Introduction
Sparse matrix technology is an important computational tool in
a broad spectrum of application areas (see [6], [9], and [11]). Common to
Lll these areas is the need for an efficient representation of the inverse
of a sparse matrix A . There are several criteria to be considered in the
design of any representation of the matrix  A     for large sparse matrices
A    Some important one-; are:
i) Numerical stability - are rounding errors kept under
control such that the resulting representation gives sufficiently accurate
results;
ii) Compactness - are storage requirements such that
all operations can be performed with a minimal amount of core; -,
iii) Flexibility - in how far is the representation of
A-1  zesponsive to changes in  A , and to the specific structure of A ;
iv) Computational requirements - how many computations
need to be performed when the representation of A-     is applied to any vector.


-2-
We will refer to these criteria in the sequel.
Many applications today use either the product form of the
inverse (PFI) or the elimination form of the inverse (EFI) , although
partitioning schemes have been successfully employed in certain cases
(see [2], [10], and (151). In an effort to minimize the build-up of
nonzero elements in either the PEI or EFI , further advantage is taken
of the structure of the original matrix by permuting rows and/or columns
(see [1], [3], [4], [5], [7], [12], [13], and [14]). The work by Hellerman
and Rarick ([4] and [51) has been an important contribution in the area
of structure detection in general, sparse matrices. Their preassigned
pivot procedure (from here on referred to as the HR procedure) is fast and
widely used in inversion routines involving sparse marices. We will study
the specific structure identified by the HR procedure, and show how this
can be used to find a compact, numerically stable representation of the
matrix A
2.   Spikes and Bumps.
The HR procedure permutes the rows and columns of a sparse matrix
A so that the permuted matrix is block lower triangular. Furthermore, each
diagonal block, or bump, is almost strictly lower triangular except for some
columns which still contain nonzero elements above the diagonal. These
columns are refer:ed to as spikes. A pictorial example of a set of spikes
contained in a single bump is presented in Figure 1 . For a complete
discussic:, uf the HR procedure, the reader is referred to the original
papers of Hel3.erman and Rarick. It should be noted that knowledge of the


3
4
5
6..
7
8
9   ...............................................
10
11
12
13
14
15
16   .............
16
18
19
21
22
21.
24
~30
World  nk-1590
FIGURE 1: A BUMP CONTAINING 30 SPIKES


-4-
original matrix A and the inverse of each bump is sufficient to compute
A b   or cA-1 efficiently for any vector b     or c . It is this observa-
tion that allows us to focus the major part of this article on single bumps.
Co-sider an m x m matrix B representing a typical bump obtained
from the HR procedure. Number all spikes in B from left to right. Let
B  contain  q spikes indexed by the set I =   1,2,...     . Each spike
icI corresponds to a column number in  B   indexed by tae set {il,i2,...,i  .
There is also a corresponding row number for the peak of each spike indexed
by the set  fjljj2p -- q). Note that jk < i        k0    . . -
Definition 2.1     A set of spikes contained in  B is said to be
properly nested if for any two spikes k   and  Z with ik <i, one of the
following inequalities holds:
i)          < j              (spike k  is said to be
dominated by spike Z);
ii)        j > i              (spike k  is not dominated
by spike  1).
Figure 1 displays a set of properly nested Spikes. In this
example every spike other than 30 is dominated by spike 30 , but spike
9 , for instance, Is uot dominated by spike 28 . Whenever two spikes are
not properly nested, a simple interchange will make their nesting proper.
It should be noted that the HR procedure, by construction, will always
produce bumps with properly nested spikes.
3.   Partitioning the Bumps.
Consider a bump B of size m x m containing q :pikes, and
let


B1  B 2
where   BI  consists of all rows and columns of     B  minus those correspond-
ing to the numbers    fil, 1 2'...I i   . The remaining    q  rows and columns
in their original order make up the matrices     B2 P B3 , and    B4     By
construction, the matrix    B   is lower triangular with nonzeros on the
diagonal, and is therefore invertible. The following tn!eorem, proven for
completeness and later reference, is due to Kron (8] .
Theorem 3.1      Consider the above partitioning scheme involving bump      B
-1                -1                            -1
Let  Q = B   - B  B    B      Then  B     exists if and only if   Q     exists,
 31  2
and knowledge of   B1, 932 ,   3 , and   Q    is sufficient to represent    B
Proof:           If  Q-1   exists, then
1   2
-1       -1     -1      -1
where               B 1 B  +             B   Q  3 3
1.        1        1   2Q     3   1
-1     -1
3         -BI 32 0
2           i  2
-1      -1
 =      -Q     B  3
3               3  1
and         4   =      1
If  B     exists, but  Q-    does not, then there exists
#0-1                                                  -1
a vector                               3 0  such that  1B  B B 1= 0  .  Set  a  = - 3  B  a
suhta           [4-   B3               1                       2  2
Then              B   B         2        =
LB3            2i             0 j    (al , C2) # (0 ,0)


-6-
This contradicts the invertibility assumption on    B
By construction, the matrix   B1  is lower triangular, so that
-11
B1   is readily available in product form without the creation of new
nonzero elements. Therefore, mere knowledge of B1 ' B      , B3 , and    Q-1
is sufficient to represent   B-1
It is straightforward to verify that tle foLlowing computations
are needed for the pre- and postmultiplication of B-      with a vector.
Let                    xlL1               :]   F:b]
2             3  a       b2,
-1           -1
then      i) x      =   Q   (b2    3 B1 b1
ii) x         B11 (b    B2 x2)
Let                                     c                 B21
L x   x2c                             2
1~      -       -1 2
then      i) X      =   (c2 -c   B 1 B2 'Q
2    2    1      -1 2
ii) X          (c  - x2 3) B-
The number of =ultiplications required in the above calculation3
is determined by the size of Q    and the degree of sparseness of B1 P B 2
and  B  . Note that    B  ,    , and  Q  are accessed once and   BI  twice.
This differs from the PFI or EFI where every element of the inverse
representation is used only once. However, the total number of nonzero


- 7-
elements in the PFI or EFI usually exceeds by far the total number
of nonzero elements in  B1 , B2 , B3 , and Q .
When comparing the storage requirements for the partitioned form
of the inverse and the PFI . consider a 100 x 100 bump with 1 spike in
column  100 . .* many as  lt-  new nonzero elements can be created in the
PFI , although the actual number is usually less than 100 , and is deter-
mined by the degree of sparsety in the bump and the actual locations of
the nonzero elements. The number of nonzeros created in the partitioned
form of the inverse is equal to the number of elements in the I x 1 Q matrix.
It should be noted, however, that the savings in storage requirements are
paid for with the additional access of the matrix Bl1
The product form requires a definite choice of pivots, usually
determined by the natural order of the spikes. If these pivots do not
satisfy numerical requirements, spike interchanges are made in an effort
to obtain a stable nverse repre3entation. It is then possible that a tall
spike is followed by a lower spike so that the build-up of new nonzero
elements is stronger than it was before such spike interchange. If all
spikes are grouped together, spike interchanges are not necessary in the
partitioned form of the inverse. All that is needed for stability is a
stable inversion method for the matrix Q . It is the inversion of the
matrix Q that will be studied in the next sectior.
4.   A recursive Form of the Matrix Q-1
In this section we will explore the structuce within bumps as
detected by the HR procedure, and show how this can be used to find a


-8-
compact representation of the mazrix  Q-1   from the previous section.
Definition  4.1         A matrix  S with    a blocks of rows and   s blocks
of columns is said to be bordered block lower triangular (BBLT) whenever
the cells S.. = 0 for j > i, j # s, i = 1,2,...,s-1.
The following theorem is fundamental in identifying the nested
BBLT  structure of Q whenever it exists
Theorem 4.1     Consider a bump  B , a set   I of properly nested spikes,
-1
and the matrix  Q = B  - B  B-    B  If for any two spikes k    and  2 in
4  3 1    2
I with i  > ik, Epike  k  is not dominated by spikL   2 , i.e. j   > ik
then the  (k,Z) th  entry of Q   is zero.
Proof:     The  (k,i) th  entry of the matrix   B   is zero by assumption
siace  j  > ik . We only need to show that the    (k,Z) th  entry of
B -1                          th
B3 2 12    is zero. In the   ik     row of the original bump   B , considering
the columns sequentially in increasing order, every element following the
diagonal element (ik , i ) is zero whenever the column indicator does not
refer to a spike. In the i.th column of B    prior to partitioning, every ele-
ment above the element (ji , i) is zero. Since ik < j., the inner product of
to              th
the  ik     row and  i.     column of the original bump  B   is zero after
one has eliminated all column (row) elements for which the column (row)
indicators refer to a spike. The resulting row and column are precisely
the k th   row of B 3  and the  Z th  column of B2 . As the matrix     B1
is lower triangular, so is its inverse. It is straight forward to verify
that any row vector with all zeros in its last p entries will still have
these zeros after premultiplication with a lower triangular matrix. This
completes the proof that the  (k,Z) th entry of Q   is zero.


-7-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
2                                                                   12
3
4
5
6
7
8
9
10
12
13
14
152p
16
17
18
19
20
21
22
23
30
28W 8 k 15
ZERO ELEMENT
FIGURE 2: THE STRUCTURE OF THE MATRIX Q FOR THE BUMP IN FIGURE 1


-10-
The nested BBLT form of the matrix Q can be deduced from
the spike information. Figure 2 displays this nested BBLT structure
for the example in Figure 1, where at cne point the nesting is four deep.
The followiing theorem car be employed to invert matrices with
a BBLT structure.
Theorem   4.2      Let  Q  be   BBLT  with   N  blocks of rorws and   N  blocks
of columns.   Assume that every diagonal block     Q    ,i  i = 1,2,... ,N-1  is
nonsingular. Let
N-1
Q           - IN    QNi P i
where                            j1-Qlj             Pi
Then knowledge of    Q-1   Q- I   j = 1,2,... ,N-1 , and the original bump    B
ii                                1
from which Q was derived is sufficient to re?resent Q
Proof: Every elemen. of the matrix (, can be computed from the original
elements of the bump B . We merely need to provide a procedure to compute
Q- 1 b  or  cQ-1  for any vector   b  or  c   in terms of         Q-, and
the elements of the matrix    Q  other than   Q       Let  Q = (0 i)
b = (b i) , C = (ci ) , and  x = (x i)   i,j = 1,2,...,\' . Then    x  Q11b
can be zomputed using
i)  zj        Q1 (b    -     Qji z )
ii   j    i=l
N--1
iii)  x    ~    Q-1 (b,     J Q      -1 Q   xi)      j=12..,-
ii  i  J  XN  i J


- 11 -
and x = Q     can be computed using
S-1            -
i)  z        (c -       zi Qij   j
 xN               zi QiN
N
X(C4               x~ Q )Q
iii)  x    =   ic=-     +x     ij   jj         ] = 1,2,... ,N-1l
i=j+1
To verify this, oe merely needs to reccgnize that the above
procedures are equivalent to the ones developed im section 3 . Let the
matrix   1 be the first (N-1)2 blocks contained in the north-west section
of the matrix Q . Then the abo%.L two procedures soell out the computatlons
-l          -1
involving B1 b1 and c 1B   or section 3 employing the b2ock lower triangular
structure of the maxtix B1 . This concludes the proof of the theorem.
The above theorem can be used again in the representation of
-1
Qii whenever the matrix Q       assumes the same  3BLT  structure as the
matrix Q . The exten:: to which one may want to take advantage of the
recursive BBLT   structure of the matrix Q   is largely determined by the
size of the matrices to be inverted, the additional overhead involved in
keeping track of the nesting, and the extra coputations required to
generate the representation of Q-1
It is important to ?oint out that Theorem 4.2 requires the
assumption that every diagonal block  Qii   is arnsingular. In practice
this condition is not necessarily satisfied. However, in the process of
inverting a block  Q    , a caciiumn and row can be taken out and added to
that part of the BBLT structure that does not contain the diagonal blocks.


-12-.
This will decrease the order of the block Qii by one, and increase the
size of the border correspondingly. In Figure 2 , for instance, the
inverse of the  5 x 5 block Q22 , determined by the intersection of rows
and columns 5 through 9 , may not exist. If one of the 2 x 2 diagonal
blocks of Q22   is singular, an appropriate choice of row (column) can be
inserted in between rows (columns) 8 and 9 . If the border of Q
causes the nonsingularity, inserting row (colum) 9 in between rows
(columns) 28 and 29 is appropriate. If one does not want to take
advantage of the BELT   structure of      , and an explicit representation
-1
of Q     is of interest, the elimination of a row (column) will always
add to the border consisting of rows (columns) 29 and 30 . Hellerman
and Rarick have reported in (4] that in practice very few interchanges of
spikes within buzps art needed assuming the matrix to be inverted is
nonsingular.
5. Summary and Conclusi -ns.
In this pa?er wa have explored Li detail the structure of a
general sparse matrix A as detected by the HR procedure. The follcwing
approach has been suggested when computing x = A-1 b , or x - cA-1
1. Use Gaussian elimination   for all variables x   not belonging to a bump;
2. Use Kron's partiti-ning scheme for each bump  Bi'
B,    F Bil B12
-1
where     a) B il is lower triangular so that Bil    -s readily available
in product form,


-13-
b) Qi      (Bi4    3 Bl Bi2)    is in bordered block lower
triangular form,
3. Use Kron's partitioning scheme for each matrix   Qi
Fil  12
Q    =
i         Qi3 Q14
-1
where     a) Q     is block lower triangular so that Qil   is available
in block product form after nonsingularities on the diagonal blocks
have been reived,
-1
b) Q        Q4 -    3   1 Qi2) is a full matrix to be inverted
by any stable method.
4. Use step 3 recursively for every BBLT structure nested inside the
diagonal blocks of Qil , making row and column permutations when needed.
The resulting representation of the matrix A-1 has shown to be
both compact and stable. Given the fact that most ideas in this paper have
been explored before in the literature, the main contribution is Theorem 4.1
which identifies the nested BBLT   structure of the matrices Qi . It paves
the road for the decomposition of sparse diagonal blocks of A containing
-1
many spikes. The result is a representation of Q      that requirE: relatively
little storage.


References
1]   Duff, I.S., and Reid, J.K., "A Comparison of Sparsety Orderings
for Obtaining a Pivotal Sequence in Gaussian Elimination",
J. Inst. Maths. Applics. 14 (1974), pp. 281-291.
2)   Glaser, G.H.,and Saliba, M.S., "Application of Sparse Matrices
to Aanlyticgl Photogrametry", in Sparse Matrices and Their
Applications (see reference Ill]), pp. 135-146.
3]   Harary, F., "A Graph Theoretric Approach to Matrix Inversion by
Partitioning", Numer. Math. 4 (1962), pp. 128-135.
( 4] Hellerman, E., and Rarick, D., "Reinversion with the Preassigned
Pivot Procedure", Math. Programming 2 (1971), pp. 195-216.
5]   Hellerman, E., and TZarick, D., "The Partitioned Preassigned Pivot
Procedure", in Sparse Matrices and Their Applications
(see reference 111]), pp. 67-76.
61   Himmelblau, D.M. (Editor), Decomposition of Large-Scale Problems,
North-Holland Publishing Company, 1973.
7]   Kerorkian, A.K., and Snoek, J., "Theory and Applications of
Structural Analysis in Partitioning, Disjointing and Con-
structing Hierarchical Systems", in Decomposivion of Large-
Scale Problems (see reference [6]), pp. 467.
8]   Kron, G., Diakoptics, MacDonald, London, 1956.
9]   Reid, J.K. (Editor), Large Sparse Set. of Linear Equations,
Academic Press, New York, 1971.
[10]  Rose, D.J., and Bunch, J.R., "The Role of Partitioning in the
Numerical Sclution of Sparse Systems", in Sparse Matrices
and Their Applications (see reference [11]), pp. 177-187.
[11]  Rose, D.J., and Willoughby, R.A. (Editors), Sparse Matrices and
Their Applications, Plenum Press, New York, 1972.
[12]  Spillers, W.R., "Applications of Graph Theory to Decomposition
Prcblems of Struzctures", in Decomposition of Large-Scale
Problems (see reference [61), pp. 285-305.
(13]  Steward, D.V., "Partiticning and Tearing Systems of Equations",
J. Siam Numer. Anal. Ser. B 2 (1965), pp. 345-365.
[14]  Tewarson, R.P., Sparse Matrices, Academic Press, New York, 1973.
(15]  Willoughby, R.A., "Sparse Matrix Algorithms and Their Relation to
Problem Classes and Computer Architecture", in Large Sparse
Sets of Linear Equations (see reference [9]), pp. 255-277.


A-1
Appendix:    A verification of results not formally proven in the paper.
A;        Let the matrix A be in blocked lower triangular form where
every block Aij   is of size qi x q.   , i,j - 1,2,...,n . The matrix
-1                                       -1
A    exists if and only if the matrices A     exist for i = 1,2,...,n
ii
Proof:    Let A-1 exist, but assume that for at least one    i , the
matrix Aii is singular.    Then there exists a vector a # 0 such that
i
Aii at = 0 . Let p =    X     . Then the p x p matrix A(p) contained
Z=1
in the northwest corner of A is singular. To verify this, let the last
q. elements of the p x 1 vector a be equal to a,        and let all other
elements be equal to zero. Since Aki = 0    for k > i , we have that
A(p) u = 0 , a # 0 . As A kZ    0  for k = 1,2,...,i , Z > i , it follows
that the first p rows of the matrix A form a linearly dependent set.
This contradicts the invertibility of A .
-1
Next assume that A   is singular, but that AI    exists for every
ii
i = 1,2,...,n . Then there exists a vector a # 0 such that Aa = 0 . The
first q1 ele:ents of a must be zero since A        is nonsingular. Similarly,
the following q2 elements of a must be zero since A22 is nonsingular.
This reasoning can be repeated for every diagonal block Ai , i = 1,2,...,n
This contradicts the assumption that a ' 0
B:        For the partitioned matrix
B         [ 1     ]
h1     2 B  h
the matrix B-1 has elements


А-2
g1   =   В11  + в11 g2 Q-1 вз вi1        �
Вг   °   _ в11 g2 Q 1                    :
s3   =   _ Q 1 вз в11                    1
апд                �4   =   Q-1                             �
where              Q    -   В4  - Вз В11 Вг                 .
Proof:   We пеед to verify that  В В-1  = I.
i)       В1 s1 + В2 s3     =   В1 (Bi1 + вi1 в2 Q-1 вз вi1) + в2 (- Q-1 вз в11)
=   I+ В2 Q-1 ВЗ В11  - Вг Q-1 ВЗ В11
=   г
ii)      в1 s2 + в2 s4     =   в1 �- вiг в2 Q-1� + в2 Q-1
�   _ g2 Q-1 + в2 Q-1
=   0
iii)     Вз 81 + В4 63     =   Вз iB11 + в11 В2 Q-1 вз Bi1) + в4 i- Q-1 Вз В11)
=   вз в1I + вз в11 в2 Q'1 вз в11 - в4 Q' вз вi1
=   вз в2j  -(в4 - вз в11 вг) Q-1 вз в11
�       -1  _          `1
вз в1           вз в1
�   р


л-з
iv)      Вз 6г + В4 84   =   Вз (- В11 В2 Q-1) + в4 Q-1
_   (В4 - Вз s1i В2) Q-1
=   г   .
С:       The solution of
В1   В2    Х1   Ы
3
ВЗ   В4    к2   Ъ2
is computed by the follaving two steps:
i)   хг   =   Q-1 (Ъ2 - Вз В11 Ъ1)     ,
ii)   х1   =   В11 (Ъ1 - В2 х2)        .
Proof:    В1 х1 + В2 х2      =   Ы     Ьу construction.
Вз х1 r В4 хг   =   ВЗ (В11 h1 - В11 Вг хг)  + В4 х2
-1
=   вз в1  Ъ1  + СВ4 - Вз в11 В2)к2
-1                                           -1
=   Вз В1  Ы       +(В4 _ В� 311 g2) Q-1 �b2 - Вз В1  Ы)
=   Ъ2


А-4
D:        The solutlon of
В1  В2
С х1   х2 1   В    В                  � с1   с2 �
3    4
is computed Ъу the following two steps;
1)    хг   °   ic2 - с1 В11 Вг) Q-1
ii)   х1   =   (с1 - хг Вз) В11
?roof:      х1 В1 + х2 Вз   =   с1   Ьу constructlon.
х1 В2 + х2 В4    =    ic1 В11 - хг Вз В11) Вг  + х2 В4
�    с1 g11 В? �+  х,,  (В4 - Вз Р11 Б2)
=   с В-1 В    +(с  - с  В-1 В) Q-1 (В  - В  В-1 В)
1 1     2     2     1 1    2            4    3 1    2
�   с2
Е:         The eolution of
�411  о        о   ----  41х     �х1         -ъ1
421  Q22     0   ----   Q2'.3    х2          ь2
4Э1  Q32     �33   ----   43N    х3          ь3
г :  : :                    :      :           :
1 ач1  Q�2     �чз   ----   Qчч    xrr         ьх


A-5
is computed by the following three steps
i)  z        Q   (b    J-1 Qji zi)
N-1
ii) x,            (b,       Q   z
Ni i
J-1
ii   i    JN N         Qji  i
S-1
Here,        QN,          Q    P
Ni   i
where          Q- 1 tQ,          Q   P                           J=1,2,...,N-1
ji                ji
Proof:    The above method is equivalent to the partitioning scheme used
in part C   of this appendix. To verify this, let B 1    be the first N-1
row and column blocks of the matrix  Q ; let B 4    QNTI ; and let B 2  and
B 3  consist of the remaining columns and rows respectively. Then the above
three steps merely spel- out the two steps used in part C
F:        The solution of
Qll 0     0    - - - -  Q L
Q21 Q22 0      - - - - Q2N
XN       Q31 Q32 Q33SIN                       [Cl C2        CN
Q    Q
Ni   N2  ON3  - - - - QNN
is computed by the following three steps
N-1           -1
z i      (c i  -   I   z i Qij) Q ii
i=i+l


A-6
N-1
ii)    XN   =    (cN -       Zzi QiN)Q
x      c       N       Q)Q-1                 J=1,2,. .. ,N-1
iii)    xj   =     (c  -        xiQij)Qjjj1,..N-
i= j+1
Proof:     The reasoning used in section      E  applies also to this section.
We refer back to the partitioning scheme used in part D of this
appendix.