'DRc- / A Recursive Form of the Inverse of General Sparse Matrices Johannes Bisschop and Alexander Meeraus Technical Note No. 1 Research Project 670-24- // July 1976 Development Research center World Bank 1818 H Street, N.W. Washington, D.C. 20433 A Recursive Form of the Inverse of General Sparse Matrices by Johannes Bisschop and Alexander Meeraus ABSTRACT: The Hellerman-Rarick preassigned pivot procedure and a recursive partitioning algorithm are used to find a stable and compact representation of the inverse of a gene-al sparse matrix. 1. Introduction Sparse matrix technology is an important computational tool in a broad spectrum of application areas (see [6], [9], and [11]). Common to Lll these areas is the need for an efficient representation of the inverse of a sparse matrix A . There are several criteria to be considered in the design of any representation of the matrix A for large sparse matrices A Some important one-; are: i) Numerical stability - are rounding errors kept under control such that the resulting representation gives sufficiently accurate results; ii) Compactness - are storage requirements such that all operations can be performed with a minimal amount of core; -, iii) Flexibility - in how far is the representation of A-1 zesponsive to changes in A , and to the specific structure of A ; iv) Computational requirements - how many computations need to be performed when the representation of A- is applied to any vector. -2- We will refer to these criteria in the sequel. Many applications today use either the product form of the inverse (PFI) or the elimination form of the inverse (EFI) , although partitioning schemes have been successfully employed in certain cases (see [2], [10], and (151). In an effort to minimize the build-up of nonzero elements in either the PEI or EFI , further advantage is taken of the structure of the original matrix by permuting rows and/or columns (see [1], [3], [4], [5], [7], [12], [13], and [14]). The work by Hellerman and Rarick ([4] and [51) has been an important contribution in the area of structure detection in general, sparse matrices. Their preassigned pivot procedure (from here on referred to as the HR procedure) is fast and widely used in inversion routines involving sparse marices. We will study the specific structure identified by the HR procedure, and show how this can be used to find a compact, numerically stable representation of the matrix A 2. Spikes and Bumps. The HR procedure permutes the rows and columns of a sparse matrix A so that the permuted matrix is block lower triangular. Furthermore, each diagonal block, or bump, is almost strictly lower triangular except for some columns which still contain nonzero elements above the diagonal. These columns are refer:ed to as spikes. A pictorial example of a set of spikes contained in a single bump is presented in Figure 1 . For a complete discussic:, uf the HR procedure, the reader is referred to the original papers of Hel3.erman and Rarick. It should be noted that knowledge of the 3 4 5 6.. 7 8 9 ............................................... 10 11 12 13 14 15 16 ............. 16 18 19 21 22 21. 24 ~30 World nk-1590 FIGURE 1: A BUMP CONTAINING 30 SPIKES -4- original matrix A and the inverse of each bump is sufficient to compute A b or cA-1 efficiently for any vector b or c . It is this observa- tion that allows us to focus the major part of this article on single bumps. Co-sider an m x m matrix B representing a typical bump obtained from the HR procedure. Number all spikes in B from left to right. Let B contain q spikes indexed by the set I = 1,2,... . Each spike icI corresponds to a column number in B indexed by tae set {il,i2,...,i . There is also a corresponding row number for the peak of each spike indexed by the set fjljj2p -- q). Note that jk < i k0 . . - Definition 2.1 A set of spikes contained in B is said to be properly nested if for any two spikes k and Z with ik i (spike k is not dominated by spike 1). Figure 1 displays a set of properly nested Spikes. In this example every spike other than 30 is dominated by spike 30 , but spike 9 , for instance, Is uot dominated by spike 28 . Whenever two spikes are not properly nested, a simple interchange will make their nesting proper. It should be noted that the HR procedure, by construction, will always produce bumps with properly nested spikes. 3. Partitioning the Bumps. Consider a bump B of size m x m containing q :pikes, and let B1 B 2 where BI consists of all rows and columns of B minus those correspond- ing to the numbers fil, 1 2'...I i . The remaining q rows and columns in their original order make up the matrices B2 P B3 , and B4 By construction, the matrix B is lower triangular with nonzeros on the diagonal, and is therefore invertible. The following tn!eorem, proven for completeness and later reference, is due to Kron (8] . Theorem 3.1 Consider the above partitioning scheme involving bump B -1 -1 -1 Let Q = B - B B B Then B exists if and only if Q exists, 31 2 and knowledge of B1, 932 , 3 , and Q is sufficient to represent B Proof: If Q-1 exists, then 1 2 -1 -1 -1 -1 where B 1 B + B Q 3 3 1. 1 1 2Q 3 1 -1 -1 3 -BI 32 0 2 i 2 -1 -1 = -Q B 3 3 3 1 and 4 = 1 If B exists, but Q- does not, then there exists #0-1 -1 a vector 3 0 such that 1B B B 1= 0 . Set a = - 3 B a suhta [4- B3 1 2 2 Then B B 2 = LB3 2i 0 j (al , C2) # (0 ,0) -6- This contradicts the invertibility assumption on B By construction, the matrix B1 is lower triangular, so that -11 B1 is readily available in product form without the creation of new nonzero elements. Therefore, mere knowledge of B1 ' B , B3 , and Q-1 is sufficient to represent B-1 It is straightforward to verify that tle foLlowing computations are needed for the pre- and postmultiplication of B- with a vector. Let xlL1 :] F:b] 2 3 a b2, -1 -1 then i) x = Q (b2 3 B1 b1 ii) x B11 (b B2 x2) Let c B21 L x x2c 2 1~ - -1 2 then i) X = (c2 -c B 1 B2 'Q 2 2 1 -1 2 ii) X (c - x2 3) B- The number of =ultiplications required in the above calculation3 is determined by the size of Q and the degree of sparseness of B1 P B 2 and B . Note that B , , and Q are accessed once and BI twice. This differs from the PFI or EFI where every element of the inverse representation is used only once. However, the total number of nonzero - 7- elements in the PFI or EFI usually exceeds by far the total number of nonzero elements in B1 , B2 , B3 , and Q . When comparing the storage requirements for the partitioned form of the inverse and the PFI . consider a 100 x 100 bump with 1 spike in column 100 . .* many as lt- new nonzero elements can be created in the PFI , although the actual number is usually less than 100 , and is deter- mined by the degree of sparsety in the bump and the actual locations of the nonzero elements. The number of nonzeros created in the partitioned form of the inverse is equal to the number of elements in the I x 1 Q matrix. It should be noted, however, that the savings in storage requirements are paid for with the additional access of the matrix Bl1 The product form requires a definite choice of pivots, usually determined by the natural order of the spikes. If these pivots do not satisfy numerical requirements, spike interchanges are made in an effort to obtain a stable nverse repre3entation. It is then possible that a tall spike is followed by a lower spike so that the build-up of new nonzero elements is stronger than it was before such spike interchange. If all spikes are grouped together, spike interchanges are not necessary in the partitioned form of the inverse. All that is needed for stability is a stable inversion method for the matrix Q . It is the inversion of the matrix Q that will be studied in the next sectior. 4. A recursive Form of the Matrix Q-1 In this section we will explore the structuce within bumps as detected by the HR procedure, and show how this can be used to find a -8- compact representation of the mazrix Q-1 from the previous section. Definition 4.1 A matrix S with a blocks of rows and s blocks of columns is said to be bordered block lower triangular (BBLT) whenever the cells S.. = 0 for j > i, j # s, i = 1,2,...,s-1. The following theorem is fundamental in identifying the nested BBLT structure of Q whenever it exists Theorem 4.1 Consider a bump B , a set I of properly nested spikes, -1 and the matrix Q = B - B B- B If for any two spikes k and 2 in 4 3 1 2 I with i > ik, Epike k is not dominated by spikL 2 , i.e. j > ik then the (k,Z) th entry of Q is zero. Proof: The (k,i) th entry of the matrix B is zero by assumption siace j > ik . We only need to show that the (k,Z) th entry of B -1 th B3 2 12 is zero. In the ik row of the original bump B , considering the columns sequentially in increasing order, every element following the diagonal element (ik , i ) is zero whenever the column indicator does not refer to a spike. In the i.th column of B prior to partitioning, every ele- ment above the element (ji , i) is zero. Since ik < j., the inner product of to th the ik row and i. column of the original bump B is zero after one has eliminated all column (row) elements for which the column (row) indicators refer to a spike. The resulting row and column are precisely the k th row of B 3 and the Z th column of B2 . As the matrix B1 is lower triangular, so is its inverse. It is straight forward to verify that any row vector with all zeros in its last p entries will still have these zeros after premultiplication with a lower triangular matrix. This completes the proof that the (k,Z) th entry of Q is zero. -7- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 2 12 3 4 5 6 7 8 9 10 12 13 14 152p 16 17 18 19 20 21 22 23 30 28W 8 k 15 ZERO ELEMENT FIGURE 2: THE STRUCTURE OF THE MATRIX Q FOR THE BUMP IN FIGURE 1 -10- The nested BBLT form of the matrix Q can be deduced from the spike information. Figure 2 displays this nested BBLT structure for the example in Figure 1, where at cne point the nesting is four deep. The followiing theorem car be employed to invert matrices with a BBLT structure. Theorem 4.2 Let Q be BBLT with N blocks of rorws and N blocks of columns. Assume that every diagonal block Q ,i i = 1,2,... ,N-1 is nonsingular. Let N-1 Q - IN QNi P i where j1-Qlj Pi Then knowledge of Q-1 Q- I j = 1,2,... ,N-1 , and the original bump B ii 1 from which Q was derived is sufficient to re?resent Q Proof: Every elemen. of the matrix (, can be computed from the original elements of the bump B . We merely need to provide a procedure to compute Q- 1 b or cQ-1 for any vector b or c in terms of Q-, and the elements of the matrix Q other than Q Let Q = (0 i) b = (b i) , C = (ci ) , and x = (x i) i,j = 1,2,...,\' . Then x Q11b can be zomputed using i) zj Q1 (b - Qji z ) ii j i=l N--1 iii) x ~ Q-1 (b, J Q -1 Q xi) j=12..,- ii i J XN i J - 11 - and x = Q can be computed using S-1 - i) z (c - zi Qij j xN zi QiN N X(C4 x~ Q )Q iii) x = ic=- +x ij jj ] = 1,2,... ,N-1l i=j+1 To verify this, oe merely needs to reccgnize that the above procedures are equivalent to the ones developed im section 3 . Let the matrix 1 be the first (N-1)2 blocks contained in the north-west section of the matrix Q . Then the abo%.L two procedures soell out the computatlons -l -1 involving B1 b1 and c 1B or section 3 employing the b2ock lower triangular structure of the maxtix B1 . This concludes the proof of the theorem. The above theorem can be used again in the representation of -1 Qii whenever the matrix Q assumes the same 3BLT structure as the matrix Q . The exten:: to which one may want to take advantage of the recursive BBLT structure of the matrix Q is largely determined by the size of the matrices to be inverted, the additional overhead involved in keeping track of the nesting, and the extra coputations required to generate the representation of Q-1 It is important to ?oint out that Theorem 4.2 requires the assumption that every diagonal block Qii is arnsingular. In practice this condition is not necessarily satisfied. However, in the process of inverting a block Q , a caciiumn and row can be taken out and added to that part of the BBLT structure that does not contain the diagonal blocks. -12-. This will decrease the order of the block Qii by one, and increase the size of the border correspondingly. In Figure 2 , for instance, the inverse of the 5 x 5 block Q22 , determined by the intersection of rows and columns 5 through 9 , may not exist. If one of the 2 x 2 diagonal blocks of Q22 is singular, an appropriate choice of row (column) can be inserted in between rows (columns) 8 and 9 . If the border of Q causes the nonsingularity, inserting row (colum) 9 in between rows (columns) 28 and 29 is appropriate. If one does not want to take advantage of the BELT structure of , and an explicit representation -1 of Q is of interest, the elimination of a row (column) will always add to the border consisting of rows (columns) 29 and 30 . Hellerman and Rarick have reported in (4] that in practice very few interchanges of spikes within buzps art needed assuming the matrix to be inverted is nonsingular. 5. Summary and Conclusi -ns. In this pa?er wa have explored Li detail the structure of a general sparse matrix A as detected by the HR procedure. The follcwing approach has been suggested when computing x = A-1 b , or x - cA-1 1. Use Gaussian elimination for all variables x not belonging to a bump; 2. Use Kron's partiti-ning scheme for each bump Bi' B, F Bil B12 -1 where a) B il is lower triangular so that Bil -s readily available in product form, -13- b) Qi (Bi4 3 Bl Bi2) is in bordered block lower triangular form, 3. Use Kron's partitioning scheme for each matrix Qi Fil 12 Q = i Qi3 Q14 -1 where a) Q is block lower triangular so that Qil is available in block product form after nonsingularities on the diagonal blocks have been reived, -1 b) Q Q4 - 3 1 Qi2) is a full matrix to be inverted by any stable method. 4. Use step 3 recursively for every BBLT structure nested inside the diagonal blocks of Qil , making row and column permutations when needed. The resulting representation of the matrix A-1 has shown to be both compact and stable. Given the fact that most ideas in this paper have been explored before in the literature, the main contribution is Theorem 4.1 which identifies the nested BBLT structure of the matrices Qi . It paves the road for the decomposition of sparse diagonal blocks of A containing -1 many spikes. The result is a representation of Q that requirE: relatively little storage. References 1] Duff, I.S., and Reid, J.K., "A Comparison of Sparsety Orderings for Obtaining a Pivotal Sequence in Gaussian Elimination", J. Inst. Maths. Applics. 14 (1974), pp. 281-291. 2) Glaser, G.H.,and Saliba, M.S., "Application of Sparse Matrices to Aanlyticgl Photogrametry", in Sparse Matrices and Their Applications (see reference Ill]), pp. 135-146. 3] Harary, F., "A Graph Theoretric Approach to Matrix Inversion by Partitioning", Numer. Math. 4 (1962), pp. 128-135. ( 4] Hellerman, E., and Rarick, D., "Reinversion with the Preassigned Pivot Procedure", Math. Programming 2 (1971), pp. 195-216. 5] Hellerman, E., and TZarick, D., "The Partitioned Preassigned Pivot Procedure", in Sparse Matrices and Their Applications (see reference 111]), pp. 67-76. 61 Himmelblau, D.M. (Editor), Decomposition of Large-Scale Problems, North-Holland Publishing Company, 1973. 7] Kerorkian, A.K., and Snoek, J., "Theory and Applications of Structural Analysis in Partitioning, Disjointing and Con- structing Hierarchical Systems", in Decomposivion of Large- Scale Problems (see reference [6]), pp. 467. 8] Kron, G., Diakoptics, MacDonald, London, 1956. 9] Reid, J.K. (Editor), Large Sparse Set. of Linear Equations, Academic Press, New York, 1971. [10] Rose, D.J., and Bunch, J.R., "The Role of Partitioning in the Numerical Sclution of Sparse Systems", in Sparse Matrices and Their Applications (see reference [11]), pp. 177-187. [11] Rose, D.J., and Willoughby, R.A. (Editors), Sparse Matrices and Their Applications, Plenum Press, New York, 1972. [12] Spillers, W.R., "Applications of Graph Theory to Decomposition Prcblems of Struzctures", in Decomposition of Large-Scale Problems (see reference [61), pp. 285-305. (13] Steward, D.V., "Partiticning and Tearing Systems of Equations", J. Siam Numer. Anal. Ser. B 2 (1965), pp. 345-365. [14] Tewarson, R.P., Sparse Matrices, Academic Press, New York, 1973. (15] Willoughby, R.A., "Sparse Matrix Algorithms and Their Relation to Problem Classes and Computer Architecture", in Large Sparse Sets of Linear Equations (see reference [9]), pp. 255-277. A-1 Appendix: A verification of results not formally proven in the paper. A; Let the matrix A be in blocked lower triangular form where every block Aij is of size qi x q. , i,j - 1,2,...,n . The matrix -1 -1 A exists if and only if the matrices A exist for i = 1,2,...,n ii Proof: Let A-1 exist, but assume that for at least one i , the matrix Aii is singular. Then there exists a vector a # 0 such that i Aii at = 0 . Let p = X . Then the p x p matrix A(p) contained Z=1 in the northwest corner of A is singular. To verify this, let the last q. elements of the p x 1 vector a be equal to a, and let all other elements be equal to zero. Since Aki = 0 for k > i , we have that A(p) u = 0 , a # 0 . As A kZ 0 for k = 1,2,...,i , Z > i , it follows that the first p rows of the matrix A form a linearly dependent set. This contradicts the invertibility of A . -1 Next assume that A is singular, but that AI exists for every ii i = 1,2,...,n . Then there exists a vector a # 0 such that Aa = 0 . The first q1 ele:ents of a must be zero since A is nonsingular. Similarly, the following q2 elements of a must be zero since A22 is nonsingular. This reasoning can be repeated for every diagonal block Ai , i = 1,2,...,n This contradicts the assumption that a ' 0 B: For the partitioned matrix B [ 1 ] h1 2 B h the matrix B-1 has elements А-2 g1 = В11 + в11 g2 Q-1 вз вi1 � Вг ° _ в11 g2 Q 1 : s3 = _ Q 1 вз в11 1 апд �4 = Q-1 � where Q - В4 - Вз В11 Вг . Proof: We пеед to verify that В В-1 = I. i) В1 s1 + В2 s3 = В1 (Bi1 + вi1 в2 Q-1 вз вi1) + в2 (- Q-1 вз в11) = I+ В2 Q-1 ВЗ В11 - Вг Q-1 ВЗ В11 = г ii) в1 s2 + в2 s4 = в1 �- вiг в2 Q-1� + в2 Q-1 � _ g2 Q-1 + в2 Q-1 = 0 iii) Вз 81 + В4 63 = Вз iB11 + в11 В2 Q-1 вз Bi1) + в4 i- Q-1 Вз В11) = вз в1I + вз в11 в2 Q'1 вз в11 - в4 Q' вз вi1 = вз в2j -(в4 - вз в11 вг) Q-1 вз в11 � -1 _ `1 вз в1 вз в1 � р л-з iv) Вз 6г + В4 84 = Вз (- В11 В2 Q-1) + в4 Q-1 _ (В4 - Вз s1i В2) Q-1 = г . С: The solution of В1 В2 Х1 Ы 3 ВЗ В4 к2 Ъ2 is computed by the follaving two steps: i) хг = Q-1 (Ъ2 - Вз В11 Ъ1) , ii) х1 = В11 (Ъ1 - В2 х2) . Proof: В1 х1 + В2 х2 = Ы Ьу construction. Вз х1 r В4 хг = ВЗ (В11 h1 - В11 Вг хг) + В4 х2 -1 = вз в1 Ъ1 + СВ4 - Вз в11 В2)к2 -1 -1 = Вз В1 Ы +(В4 _ В� 311 g2) Q-1 �b2 - Вз В1 Ы) = Ъ2 А-4 D: The solutlon of В1 В2 С х1 х2 1 В В � с1 с2 � 3 4 is computed Ъу the following two steps; 1) хг ° ic2 - с1 В11 Вг) Q-1 ii) х1 = (с1 - хг Вз) В11 ?roof: х1 В1 + х2 Вз = с1 Ьу constructlon. х1 В2 + х2 В4 = ic1 В11 - хг Вз В11) Вг + х2 В4 � с1 g11 В? �+ х,, (В4 - Вз Р11 Б2) = с В-1 В +(с - с В-1 В) Q-1 (В - В В-1 В) 1 1 2 2 1 1 2 4 3 1 2 � с2 Е: The eolution of �411 о о ---- 41х �х1 -ъ1 421 Q22 0 ---- Q2'.3 х2 ь2 4Э1 Q32 �33 ---- 43N х3 ь3 г : : : : : : 1 ач1 Q�2 �чз ---- Qчч xrr ьх A-5 is computed by the following three steps i) z Q (b J-1 Qji zi) N-1 ii) x, (b, Q z Ni i J-1 ii i JN N Qji i S-1 Here, QN, Q P Ni i where Q- 1 tQ, Q P J=1,2,...,N-1 ji ji Proof: The above method is equivalent to the partitioning scheme used in part C of this appendix. To verify this, let B 1 be the first N-1 row and column blocks of the matrix Q ; let B 4 QNTI ; and let B 2 and B 3 consist of the remaining columns and rows respectively. Then the above three steps merely spel- out the two steps used in part C F: The solution of Qll 0 0 - - - - Q L Q21 Q22 0 - - - - Q2N XN Q31 Q32 Q33SIN [Cl C2 CN Q Q Ni N2 ON3 - - - - QNN is computed by the following three steps N-1 -1 z i (c i - I z i Qij) Q ii i=i+l A-6 N-1 ii) XN = (cN - Zzi QiN)Q x c N Q)Q-1 J=1,2,. .. ,N-1 iii) xj = (c - xiQij)Qjjj1,..N- i= j+1 Proof: The reasoning used in section E applies also to this section. We refer back to the partitioning scheme used in part D of this appendix.