DETECTING OR VERITING SPARSETT STRUCTURE
I2? LARGE SCALE NONLINEAR PROGRAMMING
byk
Arne Drud
Technical Note No. 14
Research Project 671-58
August 1979
Development Research Center
World Bank
1818 H Street, N.W.
Washington, D.C. 20433
The views expressed in this paper are those of the author  and do not
necessarily reflect those of the World Bank or its affiliated o-rganizations.


Detecting or Verifying Sparsety Structure in
Large Scale Nonlinear Programming
Arne Drud
Development Research Center
World Bank
1818 H Street, N.W., Washington, D.C.   20433
0. Abstract
This paper describes some algorithms for the following problem:
Given a nonlinear function defined as a black box, find which variables the
function actually depends on (the set of active variables) and for each of
these variables test if the relation is linear or nonlinear. Alternatively,
verify if a given set of active variables and/or a given linearity pattern is
correct. The algorithms are based on group testing and binary search and
they are especially efficient when the set of possible variables is large.
1.   Introduction
In Nonlinear Programming the user is often asked to define his
problem by writing a subroutine that for given input variables evaluates.one
or more of the nonlinear functions and returns their values. We assume that
no der  atives are supplied by the user. They must be computed numerically
when needed.
In large scale problems it will often be advantageous to know the
sparsety pattern of the different functions, i.e. which variables enter
actively in each function, because this information can be used to decrease
the core storage requirements, speed up many computational procedures in the
optimization routine, and - just as important in many cases - it can decrease
the work required to compute numerical derivatives considerably. Much work
can also be saved if we can distinguish linear variables from nonlinear.


- 2 -
It is of course possible to ask the user to describe the sparsety
and linearity pattern in the input to the optimization code. However, most
users don't like to be asked the same question two times - first write a
subroutine and then describe what the subroutine contains. Therefore, this
paper is concerned with the problem of detecting the sparsety and linearity
pattern of a function when only function values can be computed.
The research described in this paper was motivated by the following
practical problem. A 228 equation econometric model was defined through more
than 35 FORTRAN subroutines. The residuals of each equation could be computed
separately. The set of optimization variables contained approximately 250
variables per time period and since each variable could have up to 19 lags the
total number of potential variables was 5000. Evaluatinv all functions more
than 5000 times each would be very expensive, so a structured search method
was needed.
The paper is organized as follows: in section 2 the problem is
defined. In section 3 we discuss the problem of detecting if a specific
variable is active in the expression, and we introduce the important concept
of group testing known from statistics. The ideas from binary search are
then used in section 4 to develop a fast method for identifying the set of
active variables of an expression from the set of all variables. A method
foi testing if we really found all active variables is developed in Section 5,
and we discuss how this method also can be used to test the correctness of
userspecitied structural information. The problem around detecting linearity
is discussed in section 6, and in section 7 we return to our original dynamic
problem, and some modifications to the static procedures are given. rhe
conclusions are drawn in section 8.


2.   Problem Definition
Assume that a function f(x) is given through a FORTRAN subroutine.
The subroutine has been compiled and we have only access to the compiled ver-
sion. x is an n-dimensional vector of optimization variables. We assume that
f(x) is defined for all x that satisfy the bounds a < x < 8, that the function
is differentiable with continuous first derivatives (although this is not an
important assumption), and that n is a large number but f only depends on a
small but unknown number p -.f all the variables. We call these variables
the active variables, and tne rest of the variables are called passive vari-
ables.
The problem is to find the p active variables and test if they enter
linearly.
3.   Detecting Active Variables
The algorithm described in this paper depends on testing whole groups
of variables at a time. The idea has earlier been used in quality control
where it is called group testing, see Dorfmann [1] and Sobel et.al. [2].
Dorfman was testing blood samples for the presence of a syphilitic
infection in a two level procedure. The blood samples were divided in halves
and first k half-samples were mixed and the mixture was tested. If the mixture
didn't show the infection all blood samples were declared ck, and if the mixture
did show an infection the other halves of the mixed blood samples were tested
incividually. In [2] Sobel et. a]. describes a more general but also rather
complicated procedure with more test levels.
The algorithm suggested in section 4 is based on the following obser-
vations and assumptions:


- 4 -
a.   if x  is a passive variable then changing xi by any amount will
still give the same function value, i.e. the machine representa-
tions will be equal for all bits.
b.   If the variables x  all are passive for i in some set I, then
changing all the xi variables by any amount will still give the
same function value.
c.   If x. is an active variable and x. is changed by some amount &x.   0,
then it is "unlikely" that the function will give the same
value on all bits.
d.   If the variables xi, i cI for some set I, are all-changed by
nonzero amounts and if f depends or at least one of the x. -
variables, then it is "unlikely" that the function will give
the same value.
e.   If x. is an active variable and if xi is changed from m different
base points the "probability" that a change in f is not detected
will decrease with increasing m.
f.   Let the function value in a base point be f . The variables
0
corresponding to the two disjoint index sets A and J are both
changed and the new function value is f l  f0. When only the
variables in the index set I are changed by the same amount as
before the function value is f2'
fl. If f0 = 2 it is not likely that there are any active variables
in the set I, but there are at least one active variable in
the set J.


- 5 -
f2.  If f l   2 the    .-e at least one active variable in the set
I, and it is not .Xely that there are any active variables in
the set J.
f3.  If f2 is different from both f0 and f then both index set I
and index set J contains active variables.
We note that it is easy to identify a variable as being active.
However, it is more difficult to identify a variable as being passive because
we could experience the "unlikely" event of an active variable that did not
cause any change in f.
The following examples show that 'unlikely" events in some cases
can have a significant probability.
Examole 1: f(x) = max (0, X     - 1 < z < 1. If we take two random
1   -1
x-values from the feasible set the probability that f(x) will be the same
is 1/4. Changing x1 from the same base point n times will decrease the
-n-1.
probability to 2-
3
Examole 2: f(x) = x     max (0, X  ), - 1 < x. < 1. If we change x, at
rancom from a random base point the probability that we will detect x1
as an active variable is 1/2, independent of the number of changes made
to x1. With n random ba.epoints the probability of detecting x   is 2-.
Although the examples show that unlikely events can happen, we
propose an algorithm that disregard these events in a first phase and then
makes a "clean up" in a second phase.


-6-
4.   The Basic Algorithm
We start by defining the following notation:
1.   xb<->xa (i1,i2) means that the values in the two vectors
x  and x are interchanges for the indices in the interval from
i  to  i  incl.
1     2
2.   xb,(xa (  2) denotes an n-vector whose coordinates have values
taken from the vector x b except for indices in the interval from
i1 to 12 incl. where the values are taken from Xa'
3. rXO is the largest integer less than or equal to X and
LX is the smallest integer greater than or equal to X.
The algorithm for finding active variables is based on the simple fact
that if f(xb) # f(xa) then for some i f(xb)xa(1,i-1)) / f(xb,xa(1,i)) and
x. is an active variable. The points in the sequence xb,xa(1,j) differ from
each other only in one coordinate and they can be thought of as a path from
x to xa. If there is only one active variable between the variables changed
from xb to xa than f(xb,x a(1,j)) is equal to f(xb) for j < i and equal to
f(x ) for j>i.                                          I
a-
The algorithm now identifies points with different function values
and makes a binary .:arch over the indices to find when f-(x x (i, i-1))
'. bp a
f(xaxb((1,i)). If a new function value different from fkxb) and f(xa) shows up
during the search in say f(xb,xa((1,j)) then there must be an active variable
between the variables 1 to j and another between the variables j+l to n, and
the search is cut in two.


- 7 -
A depth first strategy is used to limit the amount of storage neces-
sary during the search. With this strategy we will need two vectors of
length n, x  and x  , (plus the bound vectors), three vectors of lengtn no
more than log2(n), rb, lb, and fh, plus three -vectors of length p, the number
of active variables, for the linearity tests and the indices. x    and x will
contain two differont points. rb(d) contai:,s the right index of the interval
searched 1n level d of the search. lb(d) = 0 if one of the halves of the
interval in level d was assum,ed v;pty, and if 1b(d) t 0 it contains the left
index of that part of the interval that has n.t y-t been searched in level d.
fh(d) contains the function value f(x .x (1,rb(d))) for each level d that is
o a
divided into two nonempty halves.
The main part of the search is performed in a subroutine whose inpu;
is two vectors yb and y a, two distinct function value- f0 and fl, and two
indices i and i2 .    b and Ya are the basepoint and the alternative point
respectively after the procedure x <->xa(i2) has been performed, i.e.
b. >a(l)2)hsbepefmd,i.
y= Xb,X (1,12) and ya        x,Xb(,i2) f0 = f(Xb,xa(1,l1)), fl j
f(x      a      n  i1 and i2 are the endpoints of the index interval that
is going to be searched. The steps in the FIND (yby a fO5f11   2) subroutine
are:


- 8 -
1.        d: = 0; if (i1 = i2) go to d6;
2.d]:     d:=d + 1; rb (d): = i2   i 3      L(ii + i2)12j +1;
3.         Y     >a (i3i 2);   2    3 - 1;
4.d2:      f2 =    b
if (f2     ) go to d3;
6.         1b(d):  =  0;  i  :  =  i  2   =:  rb(d);
7.         if (ii = i2) yo to d5;
8.         d:=d + 1; rb (d): =  2   2     L 1 + 12 )    2
9.         Yb <   y      1 2); go to d2;
10.12:     if (f. t f ) go to da;
11.        Ib(d):=0; if (i = 12) go to d5 else go to dl;
12.d4:     1b(d): =i2 + 1; fh (d):= fl, f
2                  12
13.        if (i = i2   go to d5 else go to dl;
14.iJ5:    Yb  < >yå (l'i 1)
15.d6:     Save variable no. i  as an active variable and make
16.        a prelminary test for linearity, see section 6;
17.d7:     If (d=0) return;
18.        If (1b(d)=0) go to d8;
19.        i i:=  ib(d);  lb(d):=0,  1 :=  rb(d); f0 :=f1;  f1:=  fh(d);
20.        if (i1 =  2  go to Ud7
21.        i2     (i + i2)12    ; y  <-> ya (i,  ); go to d2;
21       2       1 +  2        b      a   l'2
22.d3:     i3:= rb(d); if (i3 >2) Yb <> ya     2 + 1, i3     2    3
23.        d: = d - 1; go to d7;


-9-
The following coiTments should make it easier to understand the pro-
1.   i2 and yb are always maintain such that y, :: xba(1i2)    In
line 2-3 i2 is decreased, and in line 6-8, 19-21, and in line 22 12
is increased and Yb is in all cases updated at once.
2.   i1 and i 2 indicates the current search interval.  If iI = i2 we
are testing a single variable and if it is active we save it in
"go to d6".
3.   In line 2-3 the current inerval is cut in twoZ and the 'eft end
of it is searched.
4.   Line G-9 correspond tD the case :-hore f(xb,xa(1Il
f(xb,xa(1,i2)) / f(x.,xa(1,rb( ')) such that the interval from
il to i2 is assu-ed to be empZy onc the search is continued with
the interval from i2 + 1 to rb(d).
5.   In line 11 f(xb ,x a(1,i l-1))} F   f(xb)xa(I2  (xb,xa(1,rb(d))
such that the intervai frog i2 + 1 to rb(d) is assu-ed to be
empty and the search is co-tinued in line 2 and 3 with the interval
from i1 to i2*
6.   In  line  12-13  f(xb,xa(,il-   1)  1 f(xbx a(1.i 2
f(xb,xa(1,rb(d)) and the right interval from i2 + 1 to rb(0)
is saved (lb(d) = 12 + 1) w-ile the left interval from i1 to 12
is divided in two ind searched iicediately in line 2-3 unless
11=   2 in which case i1 is an active variable and it is saved.
7.   The lines from 17 to 23 are a backtrack routine. If d = 0 we
are back at the top level after all levels have been searched,
and we are finished.


- 10 -
8.   In line 19-21 lb(d) i 0 which reans that the vecond half of level
d has not yet been searched, see point 6 above.    Restore the
interval from i   = lb(d) to i2 = rb(d) as well as f0
f(xb'a (    1-1)) an  i  = lb' a      2), divide the
interval if i 12 , update Yb and start searching agair.
9.   In line 22-23 1b(d)= 0, i.e. the interval on level d has been
searched co-ple-.ely.  12 is updated to the point to the riglht end
of the interval in level d, and if 2 is incased v is updated
at once.
10.   Since the irter.is are alw-ys  3lved the nun-ber )f levels is
al,o,ays less t    - e  al to lo g(n,  . At cach level a maxiua
of p differert. interva-s will be considered where p is the
nur.ber of active ve r, bes we are girg to idern.ify in the
search. Åt the first levels the nuner of intervals will often
be less than p and at least lets than 2d. Since it only requires
one function evalgtion to- go from one level *o the next the
number of function e-alations in FIND is less than or equal
to p . Flog2(n)l .  Even for large p the search is efficient and it
will never use rore than n+1 functioi evaluations including the
two that 4,ere supplied as input.
The search is initialized thrcucc the folLowing steps: Choose at random two
points xb, the base point, an-4 x a, the alternative point, such that the
bounds are satisfied and xbi ai, e.g. abs(xbi ~ xi) > j.l (Bi -CK).
bi~~ ~ ~ 'j8,  -.ab(*i  i


- 11 -
Compute f0 = f(xb).
xb<->x a(1,n)
Ccmpute f 1  f(xb).
if (fl    0) call FINO with ya                        I xa' b = xb, 1 1  2 = n,
0     0, and fl =f
5.   A Clean-uo Procedure for the Passive Variables
The purpose of the clean-up procedure is to find any active variables
that were not found during the first search. The clean-up procedure is based
on the fact used several times already that changing d passive variable does
not change the function value.
The steps in the procedure are:
1.   For ibase = 1 to nbase perform step 2 to 7.
2.   Choose a random base point, xb, within the bounds and compute
f0 = f(xb)'
3.   For ialt = 1 to nalt perform step 4 to 6.
4.   Change all passive variables by some random amount and compute the
function in the new point xas    = f(xa).
5.   if (f #f2) go to 9.
6.   End of ialt-loop.
7.   End of ibase-loop.
8.   Stop
9.   Call FIND with y     b   b = xa, i = 1, 12    n, f0   0,p and fl  f
10.   Go to 1.
If nothing unexpected happens, i.e. if no new variables are recog-
nized in step 5, the procedure stops after nbase - (nalt + 1) function evalua-
tions. If a change in the function is found when it was not expected the
binary search described in last section is used to 'lind at least one inCex i


where f(x bxa(1,i-1)) i f(xb'xa(1,)). This will require no more than
pl.rlog2(n  function evaluations were p12. is the number of new active
variables found. After the search procedure we start from the beginning of
the clean-up procedure. Note that it is not necessary to take special care
of the active variables found earlier in the second search since they all
have the same values in x  and x b*
The clean-up procedure described above can also be very useful in
an optimizatiGn procedure where the user defines the sparsety structure. The
procedure :an be called aftar the input has been read to verify if the
sparsety structure described by the user is correct. The initial set of
active variablis is simply taken directly from the input, and if no new
active variables are found in the clean-up procedure the input is assumed
to be correct.
A modificaticn of the procedure can also be used during the opti-
mization to find if we have moved into an area of the feasible region whiere a
variable that earlier seemed passive has become active. Only one base point -
the current optimizatior point - should be used, and the alternative point
should be close to the base point. The procedure could f. ex. be called
if something in the optimization procedure behaves "unnormal".
6.   Detecting Linear Variables
It is more difficult to divide the active variables into linear and
nonlinear variables because we can no longer test for equality but have to
consider tolerances due to rr-qd-off errors in the f-subroutine and in other
computational procedures.
We define a variable x. as being linear iff f(x) = g (x1,...,x-1,
x.        x) + a     x. + E for all x with a. independe.nt of x and E  < E
1+1n                                    1                         -   .ax
where Ema is a tolerance level.


- 13 -
As with active-passive variables it is easy to recognize a variable
as being nonlinear while it is more difficult to prove that a variable is
linear. We will again propose a fast first search followed by a clean-up
procedure to catch any mistakes in the first search.
Our first approach was to use a linear regression method based on
sufficient statistics, and test the sum of squared residuals against E ma*
However, the numerical accuracy of the sum of squared residuals was rather
bad so a simpler and less sensitive approach is suggested. When a variable
is recognized as being active we have f0 = f(xb,xa(1,i-1)) and f
f(xb,xa(1,i)) available. Based on these two function values and the corres-
ponding xbi and xai we compute ai = (f      f0)/(xbi - xai), deltai
2E max / fxbi - x,i| and we store a    = a.+delta. and a 2i = a.-delta .
Each time a new pair of -points, differing only in the i'th component, is
available new values for a. and delta. are computed and we update a ,j = min
1          11                              i
(a   ,a.+delta.) and a2, = max(a   ,a.-delta.     If a 1  < a2   after
1,i  1      1       2,ia2,a       1       )       1,     2   fte
the update the slopes were not consistent and the variable x. is labelled non-
linear.
The first test for linearity proceeds as follows:
0.   Label all active variables "possibly linear".
1.   For ibase = 1 to bmax perform step 2 to 8:
2.   Choose a random base point x b within the bounds and compute
f(xb).
3.   For each variable x. labelled "possibly linear" perform step
4 to 7:
4.   For ialt = 1 to amax perform step 5 to:


- 14 -
5.   Change xi by some random but reasonably large amount still
keeping within the bounes, compute f(x), update a li and a2,i
and test for linear4ty. If the test fails label x. nonlinear
and go to 7.
6.   end of ialt - loop.
7.   end of x. - loop.
8.   end of ibase-loop.
The inner loop runs over different steplengths and if a nonlinearity
suddently shows up the variable is skipped at once such that no more function
evaluations have to be done for this variable. amax is suggested to be 1 or 2.
The outer loop runs over different base points which means that the function
evaluation in the base point and the random numberm generated are used effi-
ciently. bmax is suggested to be as low as 2 or 3 because the clean-up
procedure used afterwards will make a cheaper test of all possibly linear
variables at once.
The Clean-ur procedure for the linear variables has the following
steps:
1.   If the number of linear variables = 0 stop.
2.   For ibase = 1 to bmaxl perform step 3 to 8:
3.   Choose a random base point, xb, within the bounds and compute
f = f(xb)'
4.   For itest = 1 to tmax perform step 5 to 7:
5.   Generate random changes for all linear variables such that
xa = xb + &x remain within the bounds and compute:
f = f(x ),


- 15 -
f     =  f0  +   2*Emax  +  Ar     a1. .x          a  0    a 2,i.&X
max o       ax       xlo    1,i x<                   ,
f min=   f   -   2*Emax  +     r   a   .xi     +   A       al i.x
6.   If (f1 < fmin) or (f1      max)  go to 10.
7.   End of itest - loop.
8.   End of ibase - loop.
9.   Stop.
10.   Initialize the search for a nonlinear variable:
fmax      o  +2-Emax ;fmin    fo    2Emax*
11.   For i = 1 to number of linear variables perform step 12 to 14:
12.   x bi : = x .; compute f  = f (xb);
bT  ai            1o
if &x  > 0 : f    := f    + a   .Ax  and f    = f    + a    .x
i -      max    max     1,    i      min    min    2,i    i
if dx< 0 : f~ := fma+ a91 .dx       and fmi  = f i  + a11 .
if (f> fi ) and (f        f   )max  to to 14.
13.   Variable i is labeled nonlinear. Reduce the number of linear
variables by one. If the number is zero then stop else go to
step 2.
14.   End of i - loop.
If the test is satisfied in step 6 there is a nonlinear variable between
the variables that we believed were linear. In step 10 to 14 we search
for this nonlinear variable using the same values as in step 5 and 6 for
x.o and dx. Since the number of linear variables probably is small we
have used a linear search. A binary search like the one in section 4 could
also be used if the number of variables to search is large. Note, that we
make a path from xb to xa changing only one variable at a time, so if the
linearity test in step 6 failed we will certainly find a nonlinear variable
in the test in step 12.


- 16 -
7.   Functions in Dynamic Problems.
We will now return to the problem that originally started this
research - a large dynamic econometric model. The functions in the model
can be described by the expression:
f(xt, xt-1,..., x     t-r ti Yt-1   ' Yt-rl g)
where xt-i denotes the vector of optimization variables lagged i periods,
g is a vector of fixed parameters and yt, yt-1, ... are exogenous variables,
i.e. time dependent parameters. We are interested in the set of active
x - variables for all lags incl. zero, and we would like to distinguish
between linear variables with time independent coefficients and other active
variables. Other optimization codes may take advantage of a more detailed
description of the active variables but we will not consider this problem
here.
We assume throughout this section that we want to find the
structural pattern independent of the actual values of g and y for g and y
within some predefined bounds. We also assume that the structure of f does
not depend on t.
The first problem is to identify the active variables. We can
use the procedure described in section 4 on the, set of n*(r+1) variables
which will require approximately pl. (10g2(n) + log2(r+1)) function evalua-
tions. p1 is the total number of active variables and n is the number of
variables per time period. However, when we have apriori information on the
structure, binary search is no longer optimal. In econometric models the
variables entering with different numbers of lags will often be the same.
We will therefore propose the following procedure:


- 17 -
1.   Find the active unlagged variables using the fast procedure
once. All lagged variables, exogenous variables and parameters
are chosen at random and then Kept fixed for the whole search.
2.   For i1ag = 1 to r perform step 3 to
3.   Test all variables that were active with a lag between ilag - mlag
and ilag-1 by changing the variable and recomputing f. mlag is a
"nemory"-parameter that depends on the type of model.
4.   Apply the clean-up procedure on all variable. lagged ilag
periods. Use the same basepoint and only one ilternative
point.
5.   If no active variables were found in step 3 or 4 go to 7.
6.   End of ilag - loop.
7.   Apply the clean-up procedure on the whole set of x - variables.
Each time a base point is chosen both g, y and x are chosen at
random.
In step 3 we will find all variables that appeared not too many
lags ago using only one function evaluation per variable. If the number of
variables we must test is less than log2(n) times the number of variables
we find, the method is better than binary search. New active variables are
found afterwards in step 4 using the ordinary clean-up and binary search
procedures. When we don't find any active variables at all for a certain
lag we assLme there are no more and go directly to the final clean-up.
The linearity test is similar to the test in section 6. However, we
have to make some changes to take care of the case where a coefficient depends
on g. First of all we add an extra outer loop over some random g-points. For
each new g-vector we initialize the lower and upper bounds on the slope. Also,
both x and y are changed with each new base point so if a derivative depends
on y the variable will be classified as nonlinear.


- 18 -
8.   Conclusions
The paper has shown that based on group testing principles we can
detect the structure of a nonlinear function, i.e. which out of a large set of
variables actually enters into the definition of the function and which
variables are linear and which nonlinear, in a fairly small number of function
evaluations. The algorithms described in the paper opens up for less detailed
input descriptions than earlier for large scale sparsety based nonlinear
optimization codes, or alternatively for cheap methods for testing the sparsety
patterns supplied by the user.
Many related problems are still open. E.g., in many cases the
problem functions are not supplied one by one as assumed here, but all func-
tions are computed at one time and returned as a whole vector. There seems to
be no obvious generalization of the algorithms described here such that the
structure of the total Jabobi matrix can be found fast.
9.   References
[1] Dorfman, R.: "The detection of defective members of large
populations', Annals of Math. Statistics, vol 14, 1943,
pp 436-440.
[2] Sobel, M. and Groll, P.A.: "Group testing to eliminate efficiently
all defectives in a binomial sample", Bell Syst. Tech. Journal,
vol 38, 1959, pp 1179-1252.