DETECTING OR VERITING SPARSETT STRUCTURE I2? LARGE SCALE NONLINEAR PROGRAMMING byk Arne Drud Technical Note No. 14 Research Project 671-58 August 1979 Development Research Center World Bank 1818 H Street, N.W. Washington, D.C. 20433 The views expressed in this paper are those of the author and do not necessarily reflect those of the World Bank or its affiliated o-rganizations. Detecting or Verifying Sparsety Structure in Large Scale Nonlinear Programming Arne Drud Development Research Center World Bank 1818 H Street, N.W., Washington, D.C. 20433 0. Abstract This paper describes some algorithms for the following problem: Given a nonlinear function defined as a black box, find which variables the function actually depends on (the set of active variables) and for each of these variables test if the relation is linear or nonlinear. Alternatively, verify if a given set of active variables and/or a given linearity pattern is correct. The algorithms are based on group testing and binary search and they are especially efficient when the set of possible variables is large. 1. Introduction In Nonlinear Programming the user is often asked to define his problem by writing a subroutine that for given input variables evaluates.one or more of the nonlinear functions and returns their values. We assume that no der atives are supplied by the user. They must be computed numerically when needed. In large scale problems it will often be advantageous to know the sparsety pattern of the different functions, i.e. which variables enter actively in each function, because this information can be used to decrease the core storage requirements, speed up many computational procedures in the optimization routine, and - just as important in many cases - it can decrease the work required to compute numerical derivatives considerably. Much work can also be saved if we can distinguish linear variables from nonlinear. - 2 - It is of course possible to ask the user to describe the sparsety and linearity pattern in the input to the optimization code. However, most users don't like to be asked the same question two times - first write a subroutine and then describe what the subroutine contains. Therefore, this paper is concerned with the problem of detecting the sparsety and linearity pattern of a function when only function values can be computed. The research described in this paper was motivated by the following practical problem. A 228 equation econometric model was defined through more than 35 FORTRAN subroutines. The residuals of each equation could be computed separately. The set of optimization variables contained approximately 250 variables per time period and since each variable could have up to 19 lags the total number of potential variables was 5000. Evaluatinv all functions more than 5000 times each would be very expensive, so a structured search method was needed. The paper is organized as follows: in section 2 the problem is defined. In section 3 we discuss the problem of detecting if a specific variable is active in the expression, and we introduce the important concept of group testing known from statistics. The ideas from binary search are then used in section 4 to develop a fast method for identifying the set of active variables of an expression from the set of all variables. A method foi testing if we really found all active variables is developed in Section 5, and we discuss how this method also can be used to test the correctness of userspecitied structural information. The problem around detecting linearity is discussed in section 6, and in section 7 we return to our original dynamic problem, and some modifications to the static procedures are given. rhe conclusions are drawn in section 8. 2. Problem Definition Assume that a function f(x) is given through a FORTRAN subroutine. The subroutine has been compiled and we have only access to the compiled ver- sion. x is an n-dimensional vector of optimization variables. We assume that f(x) is defined for all x that satisfy the bounds a < x < 8, that the function is differentiable with continuous first derivatives (although this is not an important assumption), and that n is a large number but f only depends on a small but unknown number p -.f all the variables. We call these variables the active variables, and tne rest of the variables are called passive vari- ables. The problem is to find the p active variables and test if they enter linearly. 3. Detecting Active Variables The algorithm described in this paper depends on testing whole groups of variables at a time. The idea has earlier been used in quality control where it is called group testing, see Dorfmann [1] and Sobel et.al. [2]. Dorfman was testing blood samples for the presence of a syphilitic infection in a two level procedure. The blood samples were divided in halves and first k half-samples were mixed and the mixture was tested. If the mixture didn't show the infection all blood samples were declared ck, and if the mixture did show an infection the other halves of the mixed blood samples were tested incividually. In [2] Sobel et. a]. describes a more general but also rather complicated procedure with more test levels. The algorithm suggested in section 4 is based on the following obser- vations and assumptions: - 4 - a. if x is a passive variable then changing xi by any amount will still give the same function value, i.e. the machine representa- tions will be equal for all bits. b. If the variables x all are passive for i in some set I, then changing all the xi variables by any amount will still give the same function value. c. If x. is an active variable and x. is changed by some amount &x. 0, then it is "unlikely" that the function will give the same value on all bits. d. If the variables xi, i cI for some set I, are all-changed by nonzero amounts and if f depends or at least one of the x. - variables, then it is "unlikely" that the function will give the same value. e. If x. is an active variable and if xi is changed from m different base points the "probability" that a change in f is not detected will decrease with increasing m. f. Let the function value in a base point be f . The variables 0 corresponding to the two disjoint index sets A and J are both changed and the new function value is f l f0. When only the variables in the index set I are changed by the same amount as before the function value is f2' fl. If f0 = 2 it is not likely that there are any active variables in the set I, but there are at least one active variable in the set J. - 5 - f2. If f l 2 the .-e at least one active variable in the set I, and it is not .Xely that there are any active variables in the set J. f3. If f2 is different from both f0 and f then both index set I and index set J contains active variables. We note that it is easy to identify a variable as being active. However, it is more difficult to identify a variable as being passive because we could experience the "unlikely" event of an active variable that did not cause any change in f. The following examples show that 'unlikely" events in some cases can have a significant probability. Examole 1: f(x) = max (0, X - 1 < z < 1. If we take two random 1 -1 x-values from the feasible set the probability that f(x) will be the same is 1/4. Changing x1 from the same base point n times will decrease the -n-1. probability to 2- 3 Examole 2: f(x) = x max (0, X ), - 1 < x. < 1. If we change x, at rancom from a random base point the probability that we will detect x1 as an active variable is 1/2, independent of the number of changes made to x1. With n random ba.epoints the probability of detecting x is 2-. Although the examples show that unlikely events can happen, we propose an algorithm that disregard these events in a first phase and then makes a "clean up" in a second phase. -6- 4. The Basic Algorithm We start by defining the following notation: 1. xb<->xa (i1,i2) means that the values in the two vectors x and x are interchanges for the indices in the interval from i to i incl. 1 2 2. xb,(xa ( 2) denotes an n-vector whose coordinates have values taken from the vector x b except for indices in the interval from i1 to 12 incl. where the values are taken from Xa' 3. rXO is the largest integer less than or equal to X and LX is the smallest integer greater than or equal to X. The algorithm for finding active variables is based on the simple fact that if f(xb) # f(xa) then for some i f(xb)xa(1,i-1)) / f(xb,xa(1,i)) and x. is an active variable. The points in the sequence xb,xa(1,j) differ from each other only in one coordinate and they can be thought of as a path from x to xa. If there is only one active variable between the variables changed from xb to xa than f(xb,x a(1,j)) is equal to f(xb) for j < i and equal to f(x ) for j>i. I a- The algorithm now identifies points with different function values and makes a binary .:arch over the indices to find when f-(x x (i, i-1)) '. bp a f(xaxb((1,i)). If a new function value different from fkxb) and f(xa) shows up during the search in say f(xb,xa((1,j)) then there must be an active variable between the variables 1 to j and another between the variables j+l to n, and the search is cut in two. - 7 - A depth first strategy is used to limit the amount of storage neces- sary during the search. With this strategy we will need two vectors of length n, x and x , (plus the bound vectors), three vectors of lengtn no more than log2(n), rb, lb, and fh, plus three -vectors of length p, the number of active variables, for the linearity tests and the indices. x and x will contain two differont points. rb(d) contai:,s the right index of the interval searched 1n level d of the search. lb(d) = 0 if one of the halves of the interval in level d was assum,ed v;pty, and if 1b(d) t 0 it contains the left index of that part of the interval that has n.t y-t been searched in level d. fh(d) contains the function value f(x .x (1,rb(d))) for each level d that is o a divided into two nonempty halves. The main part of the search is performed in a subroutine whose inpu; is two vectors yb and y a, two distinct function value- f0 and fl, and two indices i and i2 . b and Ya are the basepoint and the alternative point respectively after the procedure x <->xa(i2) has been performed, i.e. b. >a(l)2)hsbepefmd,i. y= Xb,X (1,12) and ya x,Xb(,i2) f0 = f(Xb,xa(1,l1)), fl j f(x a n i1 and i2 are the endpoints of the index interval that is going to be searched. The steps in the FIND (yby a fO5f11 2) subroutine are: - 8 - 1. d: = 0; if (i1 = i2) go to d6; 2.d]: d:=d + 1; rb (d): = i2 i 3 L(ii + i2)12j +1; 3. Y >a (i3i 2); 2 3 - 1; 4.d2: f2 = b if (f2 ) go to d3; 6. 1b(d): = 0; i : = i 2 =: rb(d); 7. if (ii = i2) yo to d5; 8. d:=d + 1; rb (d): = 2 2 L 1 + 12 ) 2 9. Yb < y 1 2); go to d2; 10.12: if (f. t f ) go to da; 11. Ib(d):=0; if (i = 12) go to d5 else go to dl; 12.d4: 1b(d): =i2 + 1; fh (d):= fl, f 2 12 13. if (i = i2 go to d5 else go to dl; 14.iJ5: Yb < >yå (l'i 1) 15.d6: Save variable no. i as an active variable and make 16. a prelminary test for linearity, see section 6; 17.d7: If (d=0) return; 18. If (1b(d)=0) go to d8; 19. i i:= ib(d); lb(d):=0, 1 := rb(d); f0 :=f1; f1:= fh(d); 20. if (i1 = 2 go to Ud7 21. i2 (i + i2)12 ; y <-> ya (i, ); go to d2; 21 2 1 + 2 b a l'2 22.d3: i3:= rb(d); if (i3 >2) Yb <> ya 2 + 1, i3 2 3 23. d: = d - 1; go to d7; -9- The following coiTments should make it easier to understand the pro- 1. i2 and yb are always maintain such that y, :: xba(1i2) In line 2-3 i2 is decreased, and in line 6-8, 19-21, and in line 22 12 is increased and Yb is in all cases updated at once. 2. i1 and i 2 indicates the current search interval. If iI = i2 we are testing a single variable and if it is active we save it in "go to d6". 3. In line 2-3 the current inerval is cut in twoZ and the 'eft end of it is searched. 4. Line G-9 correspond tD the case :-hore f(xb,xa(1Il f(xb,xa(1,i2)) / f(x.,xa(1,rb( ')) such that the interval from il to i2 is assu-ed to be empZy onc the search is continued with the interval from i2 + 1 to rb(d). 5. In line 11 f(xb ,x a(1,i l-1))} F f(xb)xa(I2 (xb,xa(1,rb(d)) such that the intervai frog i2 + 1 to rb(d) is assu-ed to be empty and the search is co-tinued in line 2 and 3 with the interval from i1 to i2* 6. In line 12-13 f(xb,xa(,il- 1) 1 f(xbx a(1.i 2 f(xb,xa(1,rb(d)) and the right interval from i2 + 1 to rb(0) is saved (lb(d) = 12 + 1) w-ile the left interval from i1 to 12 is divided in two ind searched iicediately in line 2-3 unless 11= 2 in which case i1 is an active variable and it is saved. 7. The lines from 17 to 23 are a backtrack routine. If d = 0 we are back at the top level after all levels have been searched, and we are finished. - 10 - 8. In line 19-21 lb(d) i 0 which reans that the vecond half of level d has not yet been searched, see point 6 above. Restore the interval from i = lb(d) to i2 = rb(d) as well as f0 f(xb'a ( 1-1)) an i = lb' a 2), divide the interval if i 12 , update Yb and start searching agair. 9. In line 22-23 1b(d)= 0, i.e. the interval on level d has been searched co-ple-.ely. 12 is updated to the point to the riglht end of the interval in level d, and if 2 is incased v is updated at once. 10. Since the irter.is are alw-ys 3lved the nun-ber )f levels is al,o,ays less t - e al to lo g(n, . At cach level a maxiua of p differert. interva-s will be considered where p is the nur.ber of active ve r, bes we are girg to idern.ify in the search. Åt the first levels the nuner of intervals will often be less than p and at least lets than 2d. Since it only requires one function evalgtion to- go from one level *o the next the number of function e-alations in FIND is less than or equal to p . Flog2(n)l . Even for large p the search is efficient and it will never use rore than n+1 functioi evaluations including the two that 4,ere supplied as input. The search is initialized thrcucc the folLowing steps: Choose at random two points xb, the base point, an-4 x a, the alternative point, such that the bounds are satisfied and xbi ai, e.g. abs(xbi ~ xi) > j.l (Bi -CK). bi~~ ~ ~ 'j8, -.ab(*i i - 11 - Compute f0 = f(xb). xb<->x a(1,n) Ccmpute f 1 f(xb). if (fl 0) call FINO with ya I xa' b = xb, 1 1 2 = n, 0 0, and fl =f 5. A Clean-uo Procedure for the Passive Variables The purpose of the clean-up procedure is to find any active variables that were not found during the first search. The clean-up procedure is based on the fact used several times already that changing d passive variable does not change the function value. The steps in the procedure are: 1. For ibase = 1 to nbase perform step 2 to 7. 2. Choose a random base point, xb, within the bounds and compute f0 = f(xb)' 3. For ialt = 1 to nalt perform step 4 to 6. 4. Change all passive variables by some random amount and compute the function in the new point xas = f(xa). 5. if (f #f2) go to 9. 6. End of ialt-loop. 7. End of ibase-loop. 8. Stop 9. Call FIND with y b b = xa, i = 1, 12 n, f0 0,p and fl f 10. Go to 1. If nothing unexpected happens, i.e. if no new variables are recog- nized in step 5, the procedure stops after nbase - (nalt + 1) function evalua- tions. If a change in the function is found when it was not expected the binary search described in last section is used to 'lind at least one inCex i where f(x bxa(1,i-1)) i f(xb'xa(1,)). This will require no more than pl.rlog2(n function evaluations were p12. is the number of new active variables found. After the search procedure we start from the beginning of the clean-up procedure. Note that it is not necessary to take special care of the active variables found earlier in the second search since they all have the same values in x and x b* The clean-up procedure described above can also be very useful in an optimizatiGn procedure where the user defines the sparsety structure. The procedure :an be called aftar the input has been read to verify if the sparsety structure described by the user is correct. The initial set of active variablis is simply taken directly from the input, and if no new active variables are found in the clean-up procedure the input is assumed to be correct. A modificaticn of the procedure can also be used during the opti- mization to find if we have moved into an area of the feasible region whiere a variable that earlier seemed passive has become active. Only one base point - the current optimizatior point - should be used, and the alternative point should be close to the base point. The procedure could f. ex. be called if something in the optimization procedure behaves "unnormal". 6. Detecting Linear Variables It is more difficult to divide the active variables into linear and nonlinear variables because we can no longer test for equality but have to consider tolerances due to rr-qd-off errors in the f-subroutine and in other computational procedures. We define a variable x. as being linear iff f(x) = g (x1,...,x-1, x. x) + a x. + E for all x with a. independe.nt of x and E < E 1+1n 1 - .ax where Ema is a tolerance level. - 13 - As with active-passive variables it is easy to recognize a variable as being nonlinear while it is more difficult to prove that a variable is linear. We will again propose a fast first search followed by a clean-up procedure to catch any mistakes in the first search. Our first approach was to use a linear regression method based on sufficient statistics, and test the sum of squared residuals against E ma* However, the numerical accuracy of the sum of squared residuals was rather bad so a simpler and less sensitive approach is suggested. When a variable is recognized as being active we have f0 = f(xb,xa(1,i-1)) and f f(xb,xa(1,i)) available. Based on these two function values and the corres- ponding xbi and xai we compute ai = (f f0)/(xbi - xai), deltai 2E max / fxbi - x,i| and we store a = a.+delta. and a 2i = a.-delta . Each time a new pair of -points, differing only in the i'th component, is available new values for a. and delta. are computed and we update a ,j = min 1 11 i (a ,a.+delta.) and a2, = max(a ,a.-delta. If a 1 < a2 after 1,i 1 1 2,ia2,a 1 ) 1, 2 fte the update the slopes were not consistent and the variable x. is labelled non- linear. The first test for linearity proceeds as follows: 0. Label all active variables "possibly linear". 1. For ibase = 1 to bmax perform step 2 to 8: 2. Choose a random base point x b within the bounds and compute f(xb). 3. For each variable x. labelled "possibly linear" perform step 4 to 7: 4. For ialt = 1 to amax perform step 5 to: - 14 - 5. Change xi by some random but reasonably large amount still keeping within the bounes, compute f(x), update a li and a2,i and test for linear4ty. If the test fails label x. nonlinear and go to 7. 6. end of ialt - loop. 7. end of x. - loop. 8. end of ibase-loop. The inner loop runs over different steplengths and if a nonlinearity suddently shows up the variable is skipped at once such that no more function evaluations have to be done for this variable. amax is suggested to be 1 or 2. The outer loop runs over different base points which means that the function evaluation in the base point and the random numberm generated are used effi- ciently. bmax is suggested to be as low as 2 or 3 because the clean-up procedure used afterwards will make a cheaper test of all possibly linear variables at once. The Clean-ur procedure for the linear variables has the following steps: 1. If the number of linear variables = 0 stop. 2. For ibase = 1 to bmaxl perform step 3 to 8: 3. Choose a random base point, xb, within the bounds and compute f = f(xb)' 4. For itest = 1 to tmax perform step 5 to 7: 5. Generate random changes for all linear variables such that xa = xb + &x remain within the bounds and compute: f = f(x ), - 15 - f = f0 + 2*Emax + Ar a1. .x a 0 a 2,i.&X max o ax xlo 1,i x< , f min= f - 2*Emax + r a .xi + A al i.x 6. If (f1 < fmin) or (f1 max) go to 10. 7. End of itest - loop. 8. End of ibase - loop. 9. Stop. 10. Initialize the search for a nonlinear variable: fmax o +2-Emax ;fmin fo 2Emax* 11. For i = 1 to number of linear variables perform step 12 to 14: 12. x bi : = x .; compute f = f (xb); bT ai 1o if &x > 0 : f := f + a .Ax and f = f + a .x i - max max 1, i min min 2,i i if dx< 0 : f~ := fma+ a91 .dx and fmi = f i + a11 . if (f> fi ) and (f f )max to to 14. 13. Variable i is labeled nonlinear. Reduce the number of linear variables by one. If the number is zero then stop else go to step 2. 14. End of i - loop. If the test is satisfied in step 6 there is a nonlinear variable between the variables that we believed were linear. In step 10 to 14 we search for this nonlinear variable using the same values as in step 5 and 6 for x.o and dx. Since the number of linear variables probably is small we have used a linear search. A binary search like the one in section 4 could also be used if the number of variables to search is large. Note, that we make a path from xb to xa changing only one variable at a time, so if the linearity test in step 6 failed we will certainly find a nonlinear variable in the test in step 12. - 16 - 7. Functions in Dynamic Problems. We will now return to the problem that originally started this research - a large dynamic econometric model. The functions in the model can be described by the expression: f(xt, xt-1,..., x t-r ti Yt-1 ' Yt-rl g) where xt-i denotes the vector of optimization variables lagged i periods, g is a vector of fixed parameters and yt, yt-1, ... are exogenous variables, i.e. time dependent parameters. We are interested in the set of active x - variables for all lags incl. zero, and we would like to distinguish between linear variables with time independent coefficients and other active variables. Other optimization codes may take advantage of a more detailed description of the active variables but we will not consider this problem here. We assume throughout this section that we want to find the structural pattern independent of the actual values of g and y for g and y within some predefined bounds. We also assume that the structure of f does not depend on t. The first problem is to identify the active variables. We can use the procedure described in section 4 on the, set of n*(r+1) variables which will require approximately pl. (10g2(n) + log2(r+1)) function evalua- tions. p1 is the total number of active variables and n is the number of variables per time period. However, when we have apriori information on the structure, binary search is no longer optimal. In econometric models the variables entering with different numbers of lags will often be the same. We will therefore propose the following procedure: - 17 - 1. Find the active unlagged variables using the fast procedure once. All lagged variables, exogenous variables and parameters are chosen at random and then Kept fixed for the whole search. 2. For i1ag = 1 to r perform step 3 to 3. Test all variables that were active with a lag between ilag - mlag and ilag-1 by changing the variable and recomputing f. mlag is a "nemory"-parameter that depends on the type of model. 4. Apply the clean-up procedure on all variable. lagged ilag periods. Use the same basepoint and only one ilternative point. 5. If no active variables were found in step 3 or 4 go to 7. 6. End of ilag - loop. 7. Apply the clean-up procedure on the whole set of x - variables. Each time a base point is chosen both g, y and x are chosen at random. In step 3 we will find all variables that appeared not too many lags ago using only one function evaluation per variable. If the number of variables we must test is less than log2(n) times the number of variables we find, the method is better than binary search. New active variables are found afterwards in step 4 using the ordinary clean-up and binary search procedures. When we don't find any active variables at all for a certain lag we assLme there are no more and go directly to the final clean-up. The linearity test is similar to the test in section 6. However, we have to make some changes to take care of the case where a coefficient depends on g. First of all we add an extra outer loop over some random g-points. For each new g-vector we initialize the lower and upper bounds on the slope. Also, both x and y are changed with each new base point so if a derivative depends on y the variable will be classified as nonlinear. - 18 - 8. Conclusions The paper has shown that based on group testing principles we can detect the structure of a nonlinear function, i.e. which out of a large set of variables actually enters into the definition of the function and which variables are linear and which nonlinear, in a fairly small number of function evaluations. The algorithms described in the paper opens up for less detailed input descriptions than earlier for large scale sparsety based nonlinear optimization codes, or alternatively for cheap methods for testing the sparsety patterns supplied by the user. Many related problems are still open. E.g., in many cases the problem functions are not supplied one by one as assumed here, but all func- tions are computed at one time and returned as a whole vector. There seems to be no obvious generalization of the algorithms described here such that the structure of the total Jabobi matrix can be found fast. 9. References [1] Dorfman, R.: "The detection of defective members of large populations', Annals of Math. Statistics, vol 14, 1943, pp 436-440. [2] Sobel, M. and Groll, P.A.: "Group testing to eliminate efficiently all defectives in a binomial sample", Bell Syst. Tech. Journal, vol 38, 1959, pp 1179-1252.