DOMAIN ANALYSIS AND EXACT ?OINT-DERIVATIVE GENERATION FOR LARGE NONLINEAR PROGRAMMING SYSTEMS by Johannes Bisschop and Alexander Meeraus Technical Note No. 7 Research Project 671-58 July 1978 Development Research Center World Bank 1818 H Street, N.W. Washington, D.C. 20433 Preliminary and Confidential: Not for quotation or attribution without prior clearance from the authors. Views expressed are those of the authors and do not necessarily ref-ect those of the World Bank. ABSTRACT The role of domain analysis and the automatic generation of exact point-derivatives in the solution of large nonlinear systems of algebraic equations is examined. The paper provides an informal technical discussion of these two topics, and emphasizes the importance of a software system with algebraic manipulative capabilities. Key Words: Nonlinear Algebraic Equations Large-scale Programing Domain Analysis Exact ?oint-Derivatives 1. in-:roduction Most large nonlinear programming problems formulated today tend to become linear programming problems incorporating ?iecewise approximations of simple nonlinearities. For some problems the nonlinearities involve several variables at once, and piecewise linear approximations become prohibitive, as the required number of linear approximation variables is an exponential function of the number of variables determining a specific nonlinearity. Those problems can be solved with current (experimental) nonlinear program- ming codes such as the LSGRG code developed by Lasdon, Jain and Saunders. There still is, however, an large gap between the reliability associated with commercial linear programming codes and that associated with existing nonlinear programming codes. Technical difficulties form the major reason for the existence of this gap as nonlinear problems are so much harder to solve than linear ones. One of the main differences between linear and nonlinear systems of algebraic equations is their domain of definition. Linear equations are always deffned for all ?oints of the Eucledian space E in which they are ccntained, while nonlinear equations are sometimes defined only on (possibly unconnected) subsets of E . Similarly, the solution space of a set of linear algebraic equations is always convex, while this is not true for nonlinear systems of equations where examples of non-convex, non-connected solution sets are easil7 constructed. These differences become i-nediately apparent to anyone involved in developing a solution algorithm for non- linear prov iming prnobems. Most algorithms require gradient information, and have to search alcng a direction within the domain of definition. it is for cais reason :ha: an efficient internal genera:icn of botn feasible -2- domains and exact partial differences takes on importance in the development of large-scale nonlinear programming codes. Characteristics of special classes of nonlinear problems also affect che design of nonlinear programming codes. In most engineering applications the feasible solution set is well within the limits of the domain of definition of the algebraic equations used in the mathematical model. Infeasibilities permitzed during the solution process stay within the domain of definition, and no premacure termination of the solution algorithm occurs. This is not the case for many economic models where the feasible solution set and the domain of definition for each variabl4 often coincide. Any code designed to solve this class of problems must recognize specific simple model constraints as definite boundaries that cannot be crossed at any stage of the algorithm. Although we are interested in large nonlinear programming problems in general, we have a special interest in large nonlinear economic models. They have certain characteristics that make then suitable candidates for large-scale codes. They tend to be sparse; have few nonlinear equations; have simple nonlinearities such as the multiplication of two variables, or unKnowns raised to a fractional power, or simple logarithms; and have non-negative domains as their natural property. Most nonlinear equatio:s in economic models are composed vith :he operators + , - , * , / , ** , r , and F . For conceptual reasons ecuations =ay be represented by a tree, and written as a sequence of simple operations invclving two operands and one operator. Consider the follow:ng illustratcon, wnmore all the s7mbols cn :he left side are unknowns in a model. -3- Equation xa + p/q + Zn(k) = g Tree 9 + f+ 'nf x a P q Secuence of si=ole operations T1 = x ** a - T2 = p/q These are intermediate T3 = T1 + T2 equations T4 - Zi(k) g = T3 + T4 Representing the tree as a sequence of simple operations allows for automatic domain analysis. In the above representation a system could detect on the basis of the operators that Ti. - x **a requires x > , or, 0 < x < c and a- > 0 T2 = p/q requires q > , or, q < -c and T4 = Zn(k) requires k > z where s > 0 is some pre-specified tolerance level. On the basis of our experience with nonlinear models and nonlinear solution algorithms, we know that these intermediate equations are very i=portant. Solution algorithms have sometimes failed simply Decause they searched ir a direction chat di' not consider explicity :be various dcmain -.4- constraints imposed by the intermediate equations. There are two extremes that will remedy this problem. .-.ier a model builder is required to enter all intermediate equations togeth-: with the associated domain constraints (to be entered as hard ccnstraints and never to be violated), or a modeling system assumes this role automatically. We feel strongly that the initial model representation should look as natural as possible, and that any unnecessary impositions on the model builder should be avoided. We therefore suport the idea that the initial model formulation is a zoact version of a larger model to be solved, where the latter, blown-up version is generated internally by a machine. The representation of algebraic equations in ter-s of a secuence of simple operations (intermediate equations) can provide another =a-or advantage in large nonlinear systems. Let us illustrate this with an example. Assume that the equation xa + _z/ - n(k) = g with x a , p , q , and k as variables is part of a triangular structure not uncommon to large sparse problens. In ocher words, there are five ecuations in terms of these variables such that their incidence matrix (indicating which variable is in which equation) takes on a lower triangular for- with nonzero diazonal terms. One -ay visualize :his as follows. Right-azi X a - K a side -~ r occurrance S(x possible oc:rrance (x) 1(x) x no occurrance (x) (x) (x) i -5- In such a system one can solve first for x , then for a giv n a x , then for p given x and a , etc. If the equation x p/q + Zn(k) = g is the last equation of the system with g prespecified, one can perform a Newton Step to solve for q after one has solved for x , a , p , and k. The iterative Newton Step, however, can be replaced with one single function evaluation if one rearranges the equation and expresses q as a function of x , a , p , k , and g. This can be done by inverting some of the intermediate equations, and re-ordering them such that a is evaluated as the last equation. For the above example the re-ordered set of intermediate equations is Tl = x ** a x > E , or, 0 < x < c and a > 0 T4 = Zn(k) k > T3 = g - T4 T2 = T3 - Tl q = p/T2 T2 > e or T2 < -c Note that we have a new domain constraint on T2 that was not there previously, while the explicit domain constraint on q has disappeared The above iy.ample shows that if an equation can be rearranged after one has detec.-d a certain structure induced by the occurrance or non-occurrance of variables in equations, siple function evalua:ons sometimes suffice for the calculation of variables wnich occur as nonlinear terms in the original equations. As most solution algorithms have to take advantage of sparsity in order to become efficient for large problems, this observation becomes impcrzant. -6- The following table enumerates all possible intermediate equations, their associated inverses, and their respective domains of definition. Equation Domain Inversion Domain z = x + y x= z - Y= z - x z = x -y x= z + y y = x - z z *y x =z/y y -c or y I y = z/x x < -E or x > z = X/Y 7 < -C or v > e X = zy y = X/z z < -C or z > e V 1/y z = x' x >c , cr, x = z > E, y < or V > E 0 < X < E and ,or, y > 0 0 < z < E and v> z = ex x = Zn(z) z > z = zn.(x) X > E: x = az Appendix A describes how the above inversion formula's can be used while rearranging algebraic equations at a symbolic level. One may wonder if it is always possible to rearrange nonlinear equations as desired. For arbitrary nonlinear equations the answer is negative. it should be noted, however, that if a variable oczurs only once in the original equation, and the operators + , - , * , / , ** , Zn , and e are used, it is always possible not only to express one variable in ter=s of the remaining ones, but also to have this task performed automatically by a machine. Although these conditions are only sufficient and not necessary, they have been satisfied Zor all economic models we have Abserved thus far. -7- 3. D a eriv=.ves Derivative information (gradients, Hessians) is important in many nonlinear programing algorithms. It is either supplied by the model builder or generated internally by a machine. For model builders this task becomes prohibitive whenever their models are large. An internal generation of derivative information is, therefore, imperative. There are several ways to accomplish this task. Most solution routines have an option to use first differences in the numerical approximation of derivatives. These approximati usually serve their purpose very well, but sometmes bring numerical trouble (inaccuracies) when they are taken near the boundary of their domain of definition. A second me:hod is to have the machine take derivatives at a sy,,mbolic level , and to compute point evaluations of these expressions. In general these generated strings of symbols tend to become quite cumber- some, and require substantial storage. A third possiblity is the evaluation of exact point-derivatives. This method requires the tree representation of equations (i.e. the intermediate equations) and a set of rules. Its major advantages are that no extra storage is needed for generated strings, and no numerical approximations are involved. 3 2 Consider the function f = Zn(x y ) which has the intermediate equations T 1 =x *^3 T2 = 7 =* 2 T3 = Tl * T2 4- *.-, ( 3) -8- One may think of TI , T2 , T3 and f as functions of x and y with associated gradients and Hessians. As the machine goes through the inter- mediate equations it can build up not only a point evaluation of the funczion f , but also a point evaluation of the gradient and Hessian of * Using the rules of Appendix 3 and letting x = 2 and y = 3 che gradient of f can be computed as follows. Tl = x ** 3 VTI = (Tl* 3 / x ,0) V 8 = (12 ,0) Using Rule 4C T2 = y ** 3 VT2 = (0, T2 * 2 /y) =9 = (0, 6) Using Rule 4C T3 = Ti * T2 VT3 = T2(12 , 0) + T1(0 , 6) = 72 = (108 , 48) Using Rule 2a f Zr.(T3) Vf = 1 / T3(108 , 48) - 4.2767 = (1.5 , 0.6667) Using Rule 7 The above example illustrates how a small set of rules can be employed to build up exact gradient information for a large class of algebraic functions. Appendix B develops the necessary notation, and enumerates :he rules by which gradients and Hessians can be calculated. T'..s note has emphasized domain analysis and the generation of exact partial derivatives as two importn t -omponents in the development of large-scale nonlinear programming codes. A special compiler is needed for the implementation of these components as they both opera 2 on the tree structure ass-.:iated with algebraic equations. We feel, however, that the extra effort required for building a compiler is a relatively small price for the gain in reliability that one should expect from any code incorporating the ideas of this -aDer. A-1 Arvendix A 5um.boZic ?earranaemen of Alcebraic Ecuaticrs Although it is not our intent to describe a detailed algorithm for a machinf. to rearrange equations, we do want to give an example to describe how this might be accomplished. Consider the following intermediate equations for the algebraic a equation x + p/q + DZ(k) = g of Section II, and assume that we want to express q as a function of the remaining variables. Ti = x ** a x > E , or, 0 < x < e and a > 0 T2 = p/q q > c or q < -s T3 = Ti + T2 T4 = Zn(k) k > E g = T3 + T4 Search for the q variable on the right-hand side (should be found only once), invert for q , cross out the equation, and start a new list with the inverted equation at 'e bottom. Tl = x ** a T3 Ti + T2 T4 = Zn(k) g = T3 + T4 q = D/T2 Search for T2 on the right-hand side (occurs only once), invert for T2 , cross out the equation, and add the inverted equation next to the A-2 bottom of the new list Tl x ** a T4 Zn(k) T2 - T3 - T1 g T3 + T4 q = p/T2 Search for T3 on the right-hand side, invert for T3 , cross out the equation, and add the inverted equation to the new list. Ti - x ** a TA3 - T! T2- T3 = g -T4 T4 = Zn(k) T2 = T3 - T1 - :E4 q = p/T2 Search for T4 on the right-hand side. As it cannot be found, the rearrangement process is complete, and :he remaining intermediate equations are added to the new list in their original order. The final ne,a list is as follows. T1 = x ** a x < ,or, 0 < x and a > 0 T4 = z?(k) k > c T3 = g - T4 T2 = T3 - TI q = ?/T2 T2 > or :2< - B-1 Annendix 3 Rules for Point Derivatives The purpose of this appendix is to summarize the various rules that one needs to consider in the automatic generation of exact first and second order partial derivatives. The following notation will be used throughout the sequel. a , refer to scalars a , b and c refer to function of n variables xl , x2 , n V refers to a row vector of partial derivative operators 7 refers to the transpose of 7 Va = a a(x1, ***, x) ox1 3x2 axn 1 n a Ia 3a 2x T 2 x Y2a 7 (Va) ;a ;a 1 "C x n 32a 3a (.a ;x3x 1 n ; n 2 2 j a 3 a n _ n n B-2 7a is called the aradient of the function a V 2a is called the Eessian of the function a Va x Tb = ( 7a ) Vb 3>3b 3b ax1 ax1 axn 3a n aa 3b 3a 3b 3x ax ax 3x n 1 n 3a ;b 3a * 3x ax ax Va x Vb is called the cross product of Va and Vb The following properties hold for the cross product operator i) aa( Va x Vb )= ava x B7b (Associative Law) ii) ( Va = 7b) x Vc = Va x Ve = Vb x Vc (Distributive Law) iii) V[ a7b ] - aV2b + Va x vb The remaining portion of this appendix gives the ruies required for the differentiation of functions of several variables. It should be noted that whenever the function is defined on some domain of definition, so are its derivative formula's. One exception is c = ab (rules Aa, 4b, 4c). If 0 < a < e and b > 0 , then Vc and Vc are both zero, and B-3 the formula's for Vc and 7c are to be ignored. For notational 2 2 convenience we have used Zn2 (a) and ran (a) for [ Zn(a) ] and 2 [t,---n(a) I Rules for Polint Derivatives 12) c - a b Vc - Va + Vb v2e m V2a + 2b 2a) e - ab Vc - bVa + aVb V2e = bV2a + aV b + (Va xVb + Vb x Va) 2b) c - na Vc - nVa V2c - a?2 3a) c - Vc- Va - jb V2c m- 2a + 2b - (Va x Vb + Vb x Va) b b b + 2-b x Vb b3 g (Va - cVb) - [V2a - cV b - (Vc x Vb + Vb x Vc) 1 c i2 et 2 2ai 3b) c - Vc - - -^Vb V c - b + --% x Vb bb 2 b2 b3 e _~1 f C 2 S1Vb - [ V b - (Vc x Vb + Vb x Vc) I 3c) c - Ve -Va V2c - 2 30'Vtv V3 va 4a) c - ab Vc = bab-1 Va + ab In(a)Vb v2c - a b 2a + n(a)V2b + (b-1)Va x Va + ['a + 2(-)a9 a + In 2(a) Vb x Vb + - (b In(a) + 1) a (Va x Vb + Vb x Va) ] - c (-Va + ln(a)Vb) c [ bV2a + ln(a)V2b + (Va x Vb + Vb x Va) - Va x Va] + -Ve x Vc 2 c a 4b) c - Ub Ve bin(a)Vb v2c - b [()2b + ln2(a)Vb x Vb 2i - el n(e)Vb - cnt(u)V b + !Ve x Ve c 4c) c - a Vc - B,a0''Va v2c = a( [ 2a + (-1)Va x Va ] c1a 13 2 - " ýVa = c-[ V2a + Va x Va 5) c _ ae m e a Va v2e c ea (2a + Va x Va) - cVa - C2a + c x V• c raV 2 i 2 i 6 ) c - a Ve -7Va V2ea C (V2a -- a x Va) (con'td...) 1 va e ( a - Vc x VC) 7 n(a Vc - va v2c - V2a - 1Va x Va 2 2 -< 2a - Ve x Vc 8 ) c - sin(a) Ve =- cov(a)Va V c - coo(a)V 2a - sin(a)Va x Va 9 ) c - coo(a) Ve - - jin(a)Va V2c - - sin(a)v2 - cod(a)Va x Va 10) c - tcl(a) Ve - ( 1 + t01 (a) )Va v2 ( 1 + ti 2(a) )V a + 2 tan(a)Va x Val - ( + c2 )Va .( + c2 v2 a + 2 cVa x Va 1 11) - |al Vc • Va If a > 0 V2c - V2a If a > 0 Vat (f a < 0 - V2a tf a < 0 undeflned if a - 0 undefitned if a - 0 a, 隊:__-!__一;〔一了一一一一一不「一一一一一一不 -!&&;·:,:-!;&&1.;,:!.-&“雜.;_:-―蘇“&1.:,l -j‘細‘I}一‘。“&-―辟“」‘-jt.&cj&‘一― -&“。方!!:&&”方―-&“才〕〕‘&”〕― ―凡曰不一一一咸L7一一 &&&&!&&”兀一‘’勿”&&- 「不兀不不一不7可不方不一一一一方「不一不才一不一不一兀方一一一一一n 11一-一一一一!‘必”J必一X唱 !L Oc一Oc」、‘&&&&l--cx,:】c,-(1一:)仁〔‘z一1 ―奮h·。,_i-上二I_,.1-,二「×一 〕._二‘.&&&.&L邢’丫.叮1曰“&L&’方‘- \、.&/& 、、、// 、洶了/ &”·―.:。:_ 州妒 I l「,二:&,二,〕「。‘、二,n「,o〕r 0 01 I勵I一i{+,!輩。1 IA一才乙 -L‘之‘&&&’」乞‘:&&&&L。‘&&L。’;」“ 】1誠_二:&_人`x:】。IA:&c】_x,10,_二。1_人。;& -&&&& 斗 《,斗可。5--?一不-?T二 - l一‘-一__&I、 才f才j不才匹工1「兀I 勵I之二一O!I么~二二11蠶?咸,。1 .1 奎{Z一l?,勾1 If一‘一7一,IA蠶I 召l,豐•1 .1一】【‘‘怔一•j△l 1 IX&1 Xl,I,&,→I 』J?tl么?一戶居.才“l l!0;胎一I!一。一!面人_絮q_蘊唱11 ■么絮一,r廈 聖I-一細一iA么了甲么X「1 gee二→一】.嗚 1〔A籵〕:'一之{ L--一一一一一」 乙一g