'(\ (C \' -: ** T57.8 .B475 1979 c.2 Bisachop, Johannes Selected aspects of a general algebraic modeling language / T 57.8 .B475 1979 C-2 SLC011051 DRC- 13 8tQI' BRKC, I%iQ SELECTED ASPECTS OF A GENERAL AL~GEBRAIC MODELING LANGUAGE by Johannes Bisschop and Alexander Meeraus secRroRAL,8R Rr0'5 RV ATI0 ORA BAN)( RscoNsT ORAv XD DEVELOPMENJ DEC b 199U Technical Note No, 13 Research Project 671 - 58 August 1979 Develbpment Research Center World Bank 1818 H Street, N.W. Washington, D.C. 20433 The views expressed in this paper are those of the authors and do not necessarily reflect those of the World Bank or its affiliated organizations. -,SL~C O/1/)S SELECTED ASPECTS OF A GENERAL' ALGEBRAIC MODELING LANGUAGE by Johannes Bisschop and Alexander Meeraus Development Research Center World Bank 1818 H Street, N.W. Washington, D.C. 20433 August 1979 The views expressed in this paper are those of the authors and do not necessarily reflect those of the World Bank or its affiliated organizations. ABSTRACT The paper touches on the role of models in a policy/planning environment, and establishes the need for a general algebraic modeling system. The main purpose of the paper, however, is to develop a nota- tion which can be understood by both man and machine. The language is part of a general algebraic modeling system currently under development. Keywords: Model, Modeling system, Modeling language, Compiler, Data base. 1. INTRODUCTION In the early days of mathematical modeling, large applications were mostly of a military or industrial nature, Models were used to describe and solve well-defined problems in the areas of production and distribution, and they were employed on a routine basis. In many instances it was considered cost-effective to establish a small group of technical people whose sole responsibility was to maintain and to improve the existing package of models. In recent years the scope of mathematical modeling applications has widened, and modeling environments different from those described above have emerged [1], [2], [3]. The U.S. government, for instance, has supported the development of a large number of models, and many planning agencies around the world use mathe- matical models as their major tool for analysis. In the policy/planning environment the role of models is often extended beyond their traditional use as a way to get numerical solutions to well-defined problems. Models are used to express perceptions and abstractions of reality, and they continuously change as their developers learn more about the uncertain real-world problem. Models provide the model builder/decision maker with a formal framework for data collection and analysis. They are seldomly used to get definite answers, but are employed as guides in planning and decision making, or as moderators between groups of people with conflicting knowledge and/or interests. Usually a system of many loosely connected models of different types is developed, and very few models, if any, are used on a routine basis. - 2 - The cost of building and maintaining these models is high, while the benefits are not always clearly defined. The study by the National Science Foundation J1] on the development and use of mathematical models within the U.S. Government provides some interesting figures, The total development cost of the 650 models surveyed was US$ 100 million ($154,000 per model), and it took on the average 17 months to make a model operational. It was observed that 75% of all models can be operated only by the original development team, despite strong efforts in model and program documentation. Actual policy use of these models by groups other than the model designers has been minimal. Given the median size of 25 equations (only 6 models had more than 1,000 equations), the above figures look rather depressing as it takes 3 weeks and $6,000 to develop one equation on the average. Based on our own experience, we find that eighty to ninety percent of total resources currently spent on large modeling exercises is for the generation, manipulation, and reporting of these models. It is evident that this percentage must be reduced significantly if models are to be- come effective tools in planning and decision making in a large variety of disciplines and institutions. This paper is the by-product of an ongoing effort to build a general algebraic Modeling system (GAMS) at the World Bank. In the description of the language we have mostly emphasized the syntactical aspects, and only touch upon some of the semantic issues. A more complete description of both the modeling system and the modeling language is forthcoming. -3- In section 2 we elaborate on the importance of a general algebraic modeling system as a major step toward the reduction of time, skills and money currently required for modeling exercises. Then we consider the major component of any modeling system, namely the language. In section 3 we use a small example to illustrate key components. Section 4 develops a minimal version of the language, while section 5 describes a few extensions. A simplified grammar describing the language together with a complete model description has been included in the appendix. As a concluding remark, we would like to report on the progress we have made over the last few years. At this point in time (summer 1979) several components of the modeling system have been completed. Both the compiler and the internal data base management system are operational. In addition, the system can execute any assignment statement involving sets and parameters. These components are currently being used as opera- tional tools in the Development Research Center of the World Bank. The compiler is used for obtaining complete and unambiguous descriptions of models, while the data base system is used to compute model parameters. The portion still under development is the actual generating of models and the linking with solution algorithms. 2. THE NEED FOR A GENERAL ALGEBRAIC MODELING SYSTEM1 One way to establish the need for a modeling system is to examine some of the problems that the modeling community is currently faced with. Based on our own experience, mostly from attempts to disseminate previous and ongoing research in a planning environment, we have encountered several problem areas. The documentation of large models and their modifications is one such problem. If a project is large, and continues for one or two years, the cost of complete documentation becomes horrendous. A decision is usually made to maintain a few versions of a model. In practice this means that some basic experiments can be repeated. In the long run, however, the value of the available software becomes essentially zero as people change jobs, and any changes to existing versions require extensive set-up time. A related problem is the communication of models to interested persons that are not part of the development team. As there are no standards in notation, it is often difficult to judge from any write-up what exactly the model is. Experimentation with the model may enhance ones understanding, but this requires the use of both the model and report generators. As these programs are nontrivial, they in turn require the use of a technical person. The extensive time and money requirements prohibit many outsiders from even attempting to satisfy their own curiosity with regard to the model. No effective dissemination of knowledge can therefore take place. - 5 - With the existing technology in modeling software there is no common interface with the various solution routines modelers can use for their family of models. As each solution package usually requires different data structures, it becomes both time and money consuming to switch back and forth between solution algorithms. As a result models 'tend to get locked into one solution package which at times limits their development. There is also no general-purpose soft- ware for the linking of models, an activity that has become more pre- valent with the increased use of models. Although the above problem areas tend to discourage large- scale modeling exercises, they are certainly not the major obstacle to the effective use of modeling in a policy/planning environment. It is the extensive time, skill and money resources currently required for the building of models that hinders their effective use. The heart of the problem is the fact that solution algorithms need a data structure and a problem representation which is impossible to comprehend by humans. At the same time, problem representations that are meaning- ful to humans, are not acceptable to machines. The two translation processes required can be identified as the main source of difficulties and errors. With today's technology, each translation process is broken down into a number of interrelated steps where most of the coordination and control has to be done by humans, and is therefore subject to error. That's why extensive time, skill and money resources are required for the completion of Jarge-scale modeling exercises. In addition it is not surprising that the overall reliability (the probability of no mis- takes) of our modeling practice is embarrassingly low. - 6 - We would like to illustrate the current technology as it applies to linear programming problems, IA Table 1 the solution process is broken down into 12 different tasks or processes and 15 classes of associated documents or data files. As an illustration of how to interpret the table, consider the third row. The task is described as "design computer program to generate column/row/value records corresponding to model in matrix form." It can only be performed by a human, and it requires three inputs and one output for its completion. One necessary input is the description of data and model in conventional notation, On the basis of this input, one has to design MPS naming conventions that will be used in the naming of rows and columns of the linear programming tableau. Added to this input will be a data set coded in a program acceptable form. With these inputs the task can be executed, and the final output, a matrix generator program written in some language, will result. Remember that each input and out- put in the table requires human intervention. This is true even if the task itself will be performed by a machine. The final goal, of course, is the solution report while the 13 intermediate documents are an expensive and error-prone detour. Note that the first 7 tasks are performed by humans. The last 5 tasks are performed by the machine, but need additional control instructions to coordinate input and output (again a source of error). From experience we have learned that many errors are of an intri- cate nature, and do not become apparent immediately after they have been committed. They are often carried on throughout the process without having fatal effects on the solution procedure. Tablo I Tasks and Documents Associated with Current Technology to Solve I.P Hodela Documente (files) Data and model NMPS MCodeookatrix Report PS l Restrc Reion L Lolt n S l to Naming Code book Ieot Bai LP-reportI k (p conventional conventions generator generator data (object) (objec) Ii e rep T-ik r.c.r ~~~~notation (b t ojc) fl rga Fc,rmulatL coding rules for row atd column labels INPLU OUrPUT (human) F.rcmlat rt-l. for data INPUT OUrPut representation (human) Dzs;gn computer program to generate c.lumn/rou/value IIPUT INPUT INPUT OTPUT records correbponding to model in a.arrix form (human) ,ranscribe data (human) INPUT OUrPUT Vesign colutioa strategy (human) INPUT OUTrPUr iesig' report program (human) INPUT INPUT INPUTr OTPUT CenLrate revision files (human) INPUT INPUT INPUT INPUT INPUt INPUT OUTPrT Co,pile matrix generator (computer) INPiT OUTPUt Compkle report generator (computer INPUr OtPutPr rXtute matrix generator (computer INPUTr OtrPrT Ex.cute U.-System (computer) INPUT INPUT NPUr INPUT OUTPIUT OUTPIT Ex-uce report generator (computer INPUt INPUt Oi1 Readable by both human and macht -11 - 8 - A remedy to all of the above problems is the evolution of a new modeling technology. We will need to move away from the existing labor/skill intensive approach to model building, and replace it with a machine intensive approach. That is why we have begun the develop- meat of a general algebraic modeling system (GAMS), This system pro- vides the model builder with a notation that can be understood by both humans and machines. As a result, only one document is needed for the representation and generation of models. Most tasks that were previously performed by humans, will now be completed by the machine. Table 2 has been included to contrast Table 1. While the traditional technology requires 12 tasks and 15 associated documents, the new technology requires only 2 tasks and 3 documents. Note that GAMS is an optimal technology.in that it requires the minimum number of tasks and documents. The one single document that can be read by both man and machine serves as the complete model documentation. Table 2: Solution of LP Models Using GAMS Documents (files) Model and data GAMS* Solution in conventional representation reports Tasks (processes) notation Transcription of model INPUT OUTPUT and data (human) Analysis and solution INPUT OUTPUT of problem (computer) * Readable by both human and machine. -9- In addition to providing a unified notation, the system will interface automat cajjy with, existing solution routines. It will also have the capabillty of linking various models. As the main purpose of this paper is to develop the language used in GAMS, we will elaborate on the modeling system in subsequent papers. The next section will serve as a first introduction to key components in the language. - 10 - 3. AN ILLUSTRATION OF THE KEY WORDS IN GAMS Most mathematical models today are specified using index sets, some data tables, some English describtng manipulations involving sets and/or data, aA4 a system of symbolic equations. As there are no guiding standards, the notation often lacks clarity, is incomplete, and shows inconsistencies. With just a little more rigor, namely replacing English with algebraic set and data mappings, existing knowledge and skills can be employed to build models. The language in GAMS stays as close as possible to existing algebraic conventions, but has a few additions to handle complexities inherent in large models. The result is a powerful notation, which allows for a complete and unambiguous representation of models. For illustrative purposes, consider the cannery transportation problem taken from the book Linear Programming and Extensions by G.B. Dantzig. A company desires to supply its three warehouses from two can- neries with given inventories in each, and wants to minimize the total shipping cost. This problem results in the following GAMS representation. SET C CANNERIES / SEATTLE, SAN-DIEGO /; SET W WAREHOUSES / NEW YORK CHICAGO KANSAS KANSAS CITY /; - 11 - PARAMETER A AVAILABLE INVENTORIES (CASES OF TINS PER DAY)/ SEATTLE 350 SAN-DIEGO 650 /; PARAMETER R REQUIRED INVENTORIES (CASES OF TINS PER DAY); R(W) = '300 ; TABLE UTCOST UNIT TRANSPORT COST (DOLLARS PER CASE) FROM CANNERY C TO WAREHOUSE W NEW YORK CHICAGO KANSAS SEATTLE 2.5 1.7 1.8 SAN-DIEGO 2.5 1.8 1.4 VARIABLES X SHIPMNTS (CASES OF TINS PER DAY); EQUATIONS SUPPLY AVAILABILITY CONSTRAINT (CASES OF TINS PER DAY) DEMAND REQUIREMENT CONSTRAINT (CASES OF TINS PER DAY) COST COST ACCOUNTING EQUATION (DOLLARS PER DAY) AVAILABILITY CONSTRAINT IMPOSED ON EACH CANNERY SUPPLY(C).. SUM(W, X(C,W)) =L= A(C); TOTAL SHIPMENTS LEAVING AVAILABILITY AT CANNERY C CANNERY C REQUIREMENT CONSTRAINT IMPOSED BY EACH WAREHOUSE DEMAND(W).. - 12 - SUM(C, X(C,W)) =G= R(W); -* - -- - - - - - - - * TOTAL SHIPMENTS ARRIVING REQUIREMENT AT * AT WAREHOUSE W WAREHOUSE W * * COST ACCOUNTING EQUATION REFLECTING TRANSPORT COST COST.. SUM((C,W), UTCOST(C,W) * X(C,W)) =E= TRCOST; *- - - - - - - - - - - - - - - - * TOTAL TRANSPORT COST MODEL CANNERY THE CANNERY TRANSPORTATION MODEL / ALL / ; SOVE CANNERY USING LP MINIMIZING TRCOST DISPLAY X.AL, SUPPLY.MC ; As can be noted from the model description, we have restricted ourselves to a small character set which is available on most computers. In addition, we have assumed that there is no carriage control available (i.e. no subscripts or superscripts), and that there are only capital letters. Within these few limitations, we have adhered as much as possible to existing mathematical conventions. The above model statement can be viewed as an integrated data base. In addition to the data tables and assignment statements, there are the symbolic equations which represent data that can only be obtained via some solution algorithm. Both data and symbolic equations are needed for a complete model representation in GAMS. - 13 - There are several key words used in the above model description. They are (in order of occurrence) SET, PARAMETER, TABLE, VARIABLE(S), EQUATION(S), SUM, MODEL, SOLVE .. USING .. MINIMZING, AND DISPLAY. We will comment on each of them. Sets are used as driving indices in many mathematical models. They usually have a short name followed by a description. Following the description is a listing of the set elements contained between two "slashes." The set elements are names with up to ten characters (no blanks inside them), all separated by a comma or an end of line. Each name can have an associated description if needed (e.g. the element KANSAS has a description KANSAS CITY). A parameter can be defined in a similar fashion, with a number following each label as we did for parameter A. An algebraic definition using an assignment statement is also possible, and this was done for parameter R (each warehouse requirement is 300 units). A third way to define a parameter is via some tabular arrangement as we did for the parameter UTCOST. Both row and column descriptions of the parameter are required. As we shall see in the next section, this two-dimensional framework can be used to represent parameters with more than two dimen- sions attached to them. As the table name description following the name UTCOST is restricted to one line in GAMS, we extended it using a comment statement. Any statement with a * in the first column is a comment statement in GAMS. Variable and equation names must be defined first before they can appear in any symbolic equations. One can recognize a symbolic equation by the two dots following the equation name. Note that the - 14 - availability constraint SUPPLY is defined over the domain (set) C. It is a short-hand notation for two availability constraints, namely one for each cannery. In the next section we shall see how one can control this domain of definition in equation statements. The summation in the SUPPLY equation is indicated by SUM, and followed by the set name W to which the summation operation is to be applied. Each symbolic equation in GAMS has a type. In the above example we have =L= (a less than or equal to constraint), =G= (a greater than or equal to constraint), and =E= (an equality constraint). A model in GAMS is the selection of a subset of the symbolic equations. In the above example all equations are included in the model. Once a model is defined, a particular algorithm must be chosen. In this case linear programming (LP) is selected to minimize the variable TRCOST in the model CANNERY. Display statements can be used to get selected pieces of data. Here we have asked for the activity levels associated with the variables (X.AL), and the shadow prices (marginal costs) asso- ciated with the availability constraints (SUPPLY.MC). Note that through- out the model description, each statement has started with a key work, and terminated with a semi-colon. This section was written to give the reader a quick overview of several important aspects of the language in GAMS. The cannery example does not portray some of the complexities associated with the representation of large-scale models. That is why a more extensive description of the notation in GAMS is developed in the next section. - 15 - 4. A MINIMAL VERSION OF THE LANGUAGE IN GAMS Most problems associated with model building can be reduced to a basic question involving communication. How can one communicate data and its associated complex mathematical structures when the human mind is limited in its power to grasp and comprehend many issues simultaneously. The only tool available to us is our power of abstraction which aids us in understanding the complexity of real world phenomena. It allows us to define partitionings, mappings, nestings, and short-hand notation. The language in GAMS is essentially a short-hand notation which takes advantage of any partitionings, mappings and nestings. In this chapter we will examine the syntactic and some of the semantic rules that govern the notation. We have organized the material by subsections, each des- cribing an important part of the language. 4.1 Sets and Set Mappings. A simple (one-dimensional) set in GAMS is a finite collection of labels. These sets play an important role in the indexing of algebraic statements. The cannery example in section 3 contains two such simple sets (namely C and W ), and both their syntax and use are illustrated there. Several one-dimensional sets can be related to each other in the sense that there is a correspondence between them. As an example consider the correspondence between countries and regions. Depending on one's viewpoint, this is a one-to-many or a one-to-one correspondence. To each country corresponds a specific set of regions, while each region corres- ponds to one specific country only. As we shall see, these correspondences - 16 - play an important role in GAMS as they can be used to control the domain of definition of assignment statements and symbolic equations. The syntax for set correspondences is much like tha one for single sets. Consider the following illustration SET CR COUNTRY-REGION CORRESPONDENCE / INDONESIA.N-SUMATRA INDONESIA.E-JAVA MALAYSIA.W-MALASIA etc. or, SET CR COUNTRY-REGION CORRESPONDENCE / INDONESIA.(N-SUMATRA, E-JAVA), MALAYSIA.W-MALAYSIA, .../; Note that the period is used as an operator to relate the elements of the different sets, and that the order of the elements in the correspon- dence is fixed (in this case country first, region second). In order to reduce unnecessary repetition, the parentheses can be used when several elements in one set correspond to a single element-of the other set. There can be any number of sets in a correspondence. The following few lines illustrate a 3-dimensional set mapping. - 17 - SET RZD REGION ZONE DISTRICT MAPPING / NORTH.IRRIGATED.(W-NORTH, C-NORTH, E-NORTH) CENTRAL.(IRRIGATED.(NW-UPPER, NE-UPPER) RAINFED.(S-UPPER, W-LOWER, E-LOWER)) etc. There are ways to change the information contents of sets and set mappings. This can be done via algebraic assignment statements, which require all sets to be indexed. Assume that a set R of regions has been defined, and that a copy of this set is desired. Then one can write the following GAMS statements. SET RR COPY OF SET R ; RR(R) = R(R) ; The next example is a redefinition of RR on the basis of the above set correspondence RZD. Assume that the new set RR should contain all regions that are not rain-fed. The instruction SUM, already men- tioned in the previous section, denotes a union instead of a sumnation when applied to sets. RR(R) = R(R) - SUM(D, RZD(R, 'TROPICAL', D)) Note that the 3-dimensional correspondence RZD requires 3 driving indices. Since the middle index is invariant, we have used the quotes to indicate a specific element rather than the entire set. As we have tried to limit the character set as much as possible within GAMS, certain operators - 18 - have more than one use. The following table gives the conventions for some of the operators used. Operator General Use Applied to Sets + Addition Union SUM Repeated addition Repeated union Subtraction Difference * Multiplication Intersection / Division Not defined ** Exponentiation Not defined 4.2 Data Tables Tabular arrangements of data are a very convenient way to describe multi-dimensional parameters. The unit cost table in section 3 is an example of a 2-dimensional parameter. The following table illustrates a 4-dimensional parameter, where 3 dimensions are captured in the row descriptions, while the fourth dimension is contained in the column label. TABLE L LABOR COEFFICIENTS IN HOURS PER RAI * BY REGIONS, CROP ROTATION, TECHNOLOGY AND MONTH JANUARY FEBRUARY MARCH APRIL NORTH-UPP.SUGARCANE.TRAD-BUFF 2 2 2 12 NORTH-UPP.SUGARCANE.MOD-TRACT 1 2 2 10 etc. - 19 - + MAY JUNE JULY AUGUST NORTH-UPP.SUGARCANE.TRAD-BUFF 12 35 30 45 NORTH-UPP.SUGARCANE.MOD-TRACT 12 30 25 40 etc. Note that we have specified the units for the entire table in the table heading. As it stands at the moment, unit analysis has to be done by the model builder, although one of our goals is to make automatic unit analysis an integral part of the data base system in GAMS. The order of the sets used in the row and column descriptions in the table statement must be maintained in later references to the parameter. For the above example this will be L(R,C,T,M) where R, C, T and M refer to the simple sets. Note that all columns could not fit on one line. Any table, however, can be continued by using a plus operator at the beginning of each new set of column headings. 4.3 Assignment and Equation Statements Most of the syntax used in assignment statements and equations are the same, although it is straightforward to detect if a GAMS statement is an assignment or an equation. An assignment statement in GAMS is an instruction to perform some data manipulation and store the result. It can be compared to a FORTRAN statement where the results of the operations performed is stored under the name that appears on the left side of the equal sign. As an example consider the parameter DIST(I,J) indicating the distance from location I to - 20 - location J, where the elements in the sets I and J are identical. Assume that initially only the lower triangular part of DIST was specified in a TABLE statement, and that we are interested in specifying the entire matrix. We can write the following sentence DIST(I,J) = DIST(I,J) + DIST(J,I) The right-hand side is defined for each 2-tuple of the Cartesian product of the sets I and J. A copy of DIST(I,J) is stored in a temporary work array, and the entries in DIST(I,J) are replaced with the results from the additions for all pairs (I,J) in a parallel fashion. Note that all values of DIST(I,J) that were not defined in the TABLE statement are assumed to be zero. An alternative but equivelent GAMS statement for the above replacement is as follows. DIST(I,J) = MAX(DIST(I,J), DIST(J,I)) Here the MAX operator selects the largest of the two values inside the parentheses. An equation in GAMS is a symbolic representation of one or more constraints to be used as part of a simultaneous system of equations, or an optimization model. It always begins with the equation name, possibly indexed, followed by two dots (periods). We again refer to the equations in the Cannery example of section 3. In the next section we will develop additional examples of equations and assignment statements while describing the role of the conditional operator used in the language. - 21 - 4.4 The $ Operator Partitioning large models by using driving indices provides an elegant short-hand notation. Complexities, however, are introduced when there are restrictions imposed on the partitionings. As these complexities arise continually in large-scale models, we have strived for an elegant and effective way to incorporate them in a model statement. Let us begin with an example. Define the sets R and D as regions and districts respectively. Assume that for each district in a region we know the level of income YD(R,D), and that we want to determine the regional income YR(R) for each of the regions. Writing the assignment statement YR(R) = SUM(D, YD(R,D)) is meaningless as not every district is contained in each region. We need to use, therefore, the relationship between the sets R and D. Let RD be the set correspondence between these two sets. Then we can write the following assignment statement YR(R) = SUM(D$RD(R,D), YD(R,D)) Here the dollar sign is used as a conditional operator. For each specific region R it restricts the sum to be over those elements of D for which the correspondence RD(R,D) is defined. Let A be a name or an expression in GANS, and let B be a name or a true-false expression. Then the phrase A $ B is a conditional statement in GAMS where the name A is considered or the expression A is evaluated if and only if the name B is defined or the expression B is true. - 22 - When the dollar operator is used in an assignment statement, it can appear both on the right and on the left of the equal sign. When it appears on the left, it controls the domain over which the assignment is defined. Whenever the condition following the name on the left is not true the existing data values contained under that name remain unaffected. If on the other hand that same condition is applied to the right of the equal sign, the existing values contained in the name on the left will be set to zero whenever the condition is not true. In order to illustrate the conjunctive use of the dollar operator and logical phrases contrained in an assignment statement, consider the next example. Let the sets P, I and M denote processes, plants and machines respectively. The parameter K(M,I) denotes the number of units of available capacity of machine M in plant I, while the parameter B(M,P) describes the required number of units of capacity of machine M per unit level of process P. We want to define a zero-one parameter, PPOSS(P,I), indicating which processes P need to be considered for plant I. We can write the following set of logical relations always resulting in either a zero or one. PPOSS(P,I) = SUM(M $ (K(M,I) EQ 0), B(M,P) NE 0) EQ 0 ; Here the expression B(M,P) NE 0 will contain a value 1 if process P is dependent on machine M, and 0 otherwise. These values are summed over all machines M that are not available in plant I. If the resulting sum is zero for process P then the process is not dependent on unavailable machines, and should therefore be considered. Note that PPOSS is one in this case. If the resulting sum is not 0, the process is dependent on at least one unavailable machine, and should therefore not be considered. The parameter PPOSS is set to zero in this case. - 23 - When the dollar operator appears in an equation statement, it is used to control the generation of equations and (or) variables. As an illustration let CAP be an equation name referring to capacity constraints, and let Z be a variable name referring to levels of process operation. Using the notation of the previous paragraph, we can write the following symbolic equation. CAP(M,I) $ (K(M,I) GT 0).. SUM(P $ PPOSS(P,I), B(M,P) * Z(P,I)) =L= K(M,I) ; In this example the system will generate an equation for a specific pair of machines and plants only when the capacity of that machine in that plant is strictly positive. Similarly, only those variables that refer to processes which can be operated at .a positive level will be generated. In the next section we will examine some further details and extensions to the language described thus far. - 24 - 5. SOME FURTHER EXTENSIONS TO THE LANGUAGE Until now we have always used sets as unordered collections of labels. Most sets employed in large-scale models are of this kind as their only purpose is to identify objects, properties or events that are relevant to the model description. There are a few sets, however, for which the order of the'elements is crucial. One frequently used example of this kind of set is time. That is why we introduce the additional keyword CONSTANT SET. Any such set in GAMS carries with it an implied order, which, as we shall see, allows one to reference elements relative to each other. The term CONSTANT was chosen to indicate that as soon as this type of set has been defined, it cannot change anymore via sub- sequent assignment statements. The syntax for circular sets is exactly the same as that for ordinary sets. For constant sets one is allowed to perform lag (backward) and lead (forward) operations on the labels. Consider the following example. CONSTANT SET M MONTHS / JANUARY, FEBRUARY, MARCH, APRIL, MAY, DECEMBER / ; PARAMETER NSALE PROJECTED CUMULATIVE SALES OF NITROGENOUS FERTILIZERS; * (IN 1000's OF KILOGRAMS) NSALE('JANUARY') = 100. ; LOOP(T, NSALE(T+1) = 1.05 * NSALE(T)) ; In this example a forward projection is made on the basis of the starting value of the first month. The term NSALE('DECEMBER' + 1) will be - 25 - considered as vacuous. Note that the looping device is necessary for the above assignment statement. Without it all operations will be performed in a parallel fashion, which will result in a proper definition of NSALE ('FEBRUARY') only. All other values of NSALE will be equal to the default value of zero. In some agricultural models, the constant set of months has been used in a circular fashion, where JANUARY is the one-period lead of DECEMBER and DECEMBER is the one-period lag of JANUARY. As an example assume that we want to determine the 5-dimensional parameter CLAB denoting the labor requirement coefficient by district, crop, technology, month and plantin- date. Assume also that the planting dates are EARLY and LATE, and that the coefficient values for both are the same except that they differ by a month. Let the parameter LABREQ be the labor requirement coefficients by district, crop, technology and month, obtained via a TABLE statement. The CLAB can be generated from LABREQ as follows. CLAB(D,C,T,M,'EARLY') = LABREQ(D,C,T,M) CLAB(D,C,T,M,'LATE') = CLAB(D,C,T--l,'EARLY') For circular sets we use the special lag and lead operators - and ++ in order to distinguish from the ordinary, non-circular lag and lead opera- tors - and + . Another extension to the language is a simple algebra for MODELS and SETS. Syntactically a model is defined in exactly the same way as a set. The elements of a model are the symbolic equations it contains. If all symbolic equations are included in a model, one can use the keyword - 26 - ALL between the two slashes following the model name and its description. In the case of a set, the keyword ALL means that every set element appear- ing anywhere in the GAMS model representation is included. Such a set is called the universal set, and can be used as a driving index in all set assignment statements. Models can be changed by merely adding and subtracting equation names. These changes are in the form of assignment statements without the use of any driving indices. For instance, CANNERY = CANNERY - COST ; generates the transportation constraints without the cost function. A similar algebra without driving indices can be applied to sets where + denotes a union, the symbol - denotes a difference,and * denotes an intersection of sets. There are several other extensions in the language which will be discussed in subsequent work. Our main goal has been to demonstrate and illustrate some important aspects of the language. The following appendices will serve as additional illustrations as they describe a simplified grammar and a complete model statement. - 27 - Appendix I PARTIAL GRAMMAR Introduction The most compact and complete way to describe the modeling lan- guage used in GAMS is to provide its grammar. The grammar is a set of rules that determines which constructs are in the language. It describes the syntax. Of course one still needs to explain the meaning of the constructs, usually referred to as the semantics. Although many readers may not be familiar with grammars, we have decided to include a simplified grammar for equations, assignments, sets and tables in this appendix. We feel that it is the key to a quick understanding of what constructs are syntactically correct. The grammar is written in a standard notation called BNF (Bachus Normal Form) and contains the following symbols, called metasymbols. Metasymbol English Equivalent Use = is defined as Separates a phrase name from its definitioi or Separates alternative definitions of a phr, < abc > the value of abc Indicates that intervening characters are be treated as a unit For those that are not familiar with the BNF definition of a language, consider the following simple grammar as an illustration. < expression > = < term > < expression > + < term > < term > = < primary > < term > * < primary > < primary > .= x y z - 28 - Each statement is a rewriting rule which allows us to substitute any right part for any occurrence of its associated left part. Examine the expression x * y + z . This structure can be represented by a tree generated from the grammar. The tree is usually referred to as a phrase- structure tree. < expression > < express gam tonlzperm > I I < termn > < primary> < term7n* z < primary > y x One may use the grammar to analyze phrases within the language used in GAMS. Seemingly complex at first, it becomes a straightforward procedure after only a few examples. Figure 1 is the phrase-structure tree for the following GAMS statement: RR(D,J) = SUM (I $ (B(I) NE 0) , A(I,D) * C(I,J)) The following parts of the GAMS grammar are a simplified version of the grammar presently used in the modeling language. This simplified version concentrates only on equations, assignments, set, and table defi- nitions, and is not concerned with relaxed punctuation, operator prece- dence, etc. - 29 - Equations and Assignment - = $ ident ident ( ) := := ident 'index' . I nwnber I SUM ( , ) function ( ) | ) = ident I ident ( ) :: | , :: I $ = ( ) I ident .= I I = $ - + NOT + EQ IGT IGE LT ILE NE IAND I OR I XOR. .G- L =E- I - I ++ - - 30 - b. Set Definitions = SET ; CONSTANT SET I , = I / / !dent i Cdent text = I , ALL .= I text I element>. ( ) ( ) := Z c::n i:,r> , : index 'iad'x' c. Table Definitions .= TABLE
. col
j
eol + . l
- I
- coZ eal .=
::= | Figure 1 : The Phrase-Structure Tree <.eft part> ) den-t sWI(- -soon control> , RR $ I I ioontrol> _ ident ident ( ) iden J I I I NE ident( aI rIary( ) ident C . I index list> , A I I n ( , ident ident | 1 ident ident a B I I I I ident ident D I I R ( D , J ) =SUM( I $(B ( I ) NE 0 ) ,A ( I D ) * C ( I J ) - 31 - Appendix II - The ASEAN Fertilizer Model The following GAMS program is a much simplified and reduced version of a fertilizer-sector planning model for the ASEAN region (Association of South-East Asian Nations). The original model is being used to study the potential gains to be derived from a regionally harmonized policy concerning investments, production, importation, and distribution in the fertilizer industry. Although the computer listing is straightforward to interpret, a few explanations are in order. GAMS permits the user to describe a problem in the most natural and esthetic way possible. Input is completely free field, and relaxed punctuation avoids syntactic rules that would appear unnatural or too restrictive to the non-computer scientist. There exists no predefined order in which different components or sections have to be entered, as long as all referenced entities have been defined in previous sections. The following comments refer to the line numbers of the partial input listing. Line 8 The identifier N is declared to be a set and carries with it the descriptive text 'NTUTRIENTS' which is used in reports or reference maps. The information between the two slashes assigns initial values to the set N , namely N and P205 (separated by commas), where 'NITROGEN' and 'PHOSPHOROUS' are again interpreted as texts. Line 26-29 The set CR defines a correspondence between countries and regions, and is similar to the example given in section 4 . - 32 - Line 35-42 Tables are a very convenienc and widely used way to present data. This statement defines NC to be a parameter having two dimensions, where the index positions are counted from the left to the right (i.e. from row to column labels). Any row or column labels used, need not be defined in advance via set data statements. Line 44 The second statement in this line references the parameter NC with sets C and N which were initialized in line 8 and 31 respectively. The symbol 'YES' following the equal sign is a system constant and means existence. The set FP will contain the elements UREA , MAP , DAP and TSP . Line 74-76 The demand for basic nutrients could only be obtained on a country wide basis (line 47-54) and the regional breakdown (line 56-68) was estimated roughly. The statement in line 74 sums up the columns of table RBC on a country basis, the statement in line 75 divides the original entries by their country totals (normalizing), and line 76 finally distributes the country estimates over regions. The use of the $ operator is similar to the example given in section 4 . Line 120-135: Zeros need not to be entered in data tables which allows a more meaningful representation for sparse matrices. The positions of column and row labels, as well as those of numerical entries are only restricted in that they have to match. Almost any table that is under- standable to the human will be accepted by the system. - 33- Line 244 The-.phrase SUM (CF,I),...) defines a double summation over sets CP and I . The phrase SUM (CF, SUM (I,...)) could have been used as well. GAMS 0 A S r A N F L R i t i z E P M 0 D E L (0EMU,STRA TION) 08/23/79 20,5 17, PAGE 3 a ThIS IS A SII1PLIFIti VERSIO>. OF AN INVESTMENT PLANNING MODEL VEVELOPED NEW MARGIN 02m80 . * By THE DFVELOPMFNT RESEARCH CFNtR OF THF hULD hANK, THOSE M-'ARTS OF 5 * THE t1RIGINjAL MODEL THAT CONTAI41D CONFIDENTIAL. INFO,Uhi TION HÅVE BEEN ELIMINATFD 01< CHANGED, 7 8 SET N NUTRTENTS / N NI TROGF N, P205 PHOSPHifROUS / 9 10 I PLANT SITES / PALEUANI;, GRESIK, KUANTAN, TOLEDO-CTY, HANGKOK 1 11 12 J DEMAND REGIONS / N-SUIATRA AJEH 13 E-JAVA GRESIK 14 W-VAL.AYSTA KUANTAN 15 ACIGKOK 16 N-PHILIPfiS BATAAN 17 C-PHJLIPNS CE13U / 18 19 P PROCFSSFS /UREA, MAP,DAP,SUJLPH-ACID,PiOS-ACID,4HMlNIA-NG, AMMONIAeFO/ 20 21 M PROjjCTTVE UNITS /URFA,AHMONPHOS,SUL PH-ACID,PHOS-ACID,AMMUNIA.NG 22 AMMUNIA.FO/ 23 24 CfR COUNTRIES /INDONESIA, MALAYSIA, Pk4ItIPPINS, THAILAND/ 2n CR C(UNTRY REGION MAPPING / INDONESIA,(N-SUMATRA,-JAVA) 27 MA AYSIA,r-MALAYSIA 21 PHILIPPINS,(N-PHILIPNS,C-PHILIPNS) 24 THAILAND,BANGKOK 30 51 C ALL COMMODITIFS fOR BALANCES/ UREA, MAP, DtP, SULPH-ACID 32 ANNONmA, C02, PHS-ACID,PHOS"RUCK 33 E-SULPHUR,N-GAS, FUELnUIL, TSP / 34 35 TABLF NC NUTRIFiT CONTENT 365 37 N P21)5 38 39 URFA 46 4 0 MAP ,11 ,S4 41 DAP ,1I .46 '42 TSP.4 43 44 SET FP PINAL PRODUCTS; FPC) = yES!SUM(N, NC(C,N) NE 0); 4r 46 47 TA3LF C85 CONSUMPTION 1985 (1000 TONS) 48 49 N P>u5 50 51 IjOONEA( f A 850 290 52 MLAYSTA 2c0 70 3 PHI.IPP INS 360 170 54 THAILAND 1.30 160 GAMS 0 A S E A N F E R To L E R M 0 D t L (DEMUNSTRATION) 08/23/79 20,50 , PAGE 2 56 TA8LE R8C PEGIOAL BRFAKDOWN OF CONSUMPTION 57 58 N P205 59 b0 N-SUMATRA 20 10 b F-JAVA 8) 90 ,6? 63 W-MALAYSIA 100 100 6 4 65 N-PHILIPNS 60 30 .66 C -PHILIPNS f 4) 70 67 ,68 BANGKOK 103 100 69 70 MORMALIZE REGIONAL CONSoMPTIUN FRfAKDOWN AND oISTRIBULTE NATIONAL PROJECTIONS, 71 72 PARAM q RFGIONAL DfMAND IN MUTRIENTS (1000 TICINS) 1 73 74 R8CCNTR,N) = SUM(J $CR(CNTR,J), RGC(J,N)) 75 FBC(J,N) = SUM(CiTR$CR(CINTR,J), RBL(J,N)/RBC(CNTI,N))1 7b R(N,J) : SUMICNTR$CR(CNTR,J), Cöb(CNTR,N)*RC(JI,N); 77 78 79 TABLE DISP DISTANCES BETWEEN PRODUCING AREAS (NAUTICAL MILLS) 80) 81 PALEMBANG GRESIK ICUANTAN TOLEDO-CTY BANGKOK 82 83 P tF-1 t NG 560 450 1.500 1000 84 GRESI< 900 1250 1500 85 KUAJNTAN 1150 680 66 TOLED-CTY Iö5b0 87 88 89 TÄALE DISDIEM DISTANCES BETWFEN DEMAND AND PRODUCING AREAS 90 91 PALILM13ANG GRESIK KUANTAN TOLEDU-CTY BANGKOK 92 93 N 51JMAÄTRA 790 1 360 50 1920 1470 'r,4 E-JAVA sb0 900 12140 1470 95 In-MALAYSIA 450 00 1131) 680 96 N-PHILIHNS 1410 1520 12L40 340 1750 97 C-PHIL IPNS 1300 1240 1130 1640 98 ANGK0K 1000 1500 680 16503 99 100 101 T-ILE RS RATE SChEDILLS 102 103 FIX PRP 104 105 FINAL-PR 4,6 ,00125 loö AtMUNI4 12.7 .,0029 107 PHOS-ACID 8. ,063S GAMS 0 A N E N F L k I&L l Z b R ti 0 D) L L (DEM0NSTRATION) 08/23/79 20.50,17, PAGE 3 109 SET TP CnPY OF SET I 1P(I)=YES1 DISP(I,IP) DISP(I,TP) + 0>ISP(P,I); 110 i1i pARAM ICF TRANSPORT COST FOR Fl'AL PRODUCTS 112 1CI TRANSPORT COST FOR 1NTERMEDIATLS; 113 l!4 SET CI PRODUCTS FFOR INTERPLAtT SHIPMcNT 1 AMMONIA, PHOS-ACID /y 116 TCF( ,J) = i,6 t 00125*'DISDEM(J,I) 31DISDEM(J,): 117 TCI1,IP,CI) (RS(CIs'FIX'I) t RS(CI,IPROPI)*DISP(G,IP)MDISP(I,IP)l 118 119 120 TABLE A INPUT-•OUTPUT COEFFICIENTS 121 122 URLA MAP DAP SULPH-ACID PHOS-ACID AMMONIA.NG AMMONIA.FO 123 124 UREA 1,0 125 MAP 1.0 teo D>1P 1,0 121 SoLPH-ACID 1,0 -1,49 128 AMMUNIA -,6 .,15 .,23 1,0 1,0 129 CO32 W,6 1.0 1,0 130 PmiS-ACIO .99 -987 1,0 131 PHOS-ROCK -1,84 1 3? E-SULPHUR . 344 133 N-GAS -34,7 134l FUFL-IL ,965 135 Q-CLIST -10,0 -10.0 -10,0 -5,0 -5,0 -5,0 -5,0 1 sa 137 13m SET CF FINAL PRODuCTS FROM LOCAL SOURCES; CF(C) = YES$SUM(P,A(L,P) NE 0); 139 1L40 141 IABLE n CAPACITY UTILIZATION 142 143 UREA MAP DAP SULPH-ACID PHOS-ACID AMMONIA-NG AMMONIA.VO 144 145 URFA ,85 146 AMMON-PHOS 1,25 1,0 147 S0LPH-ACID 1,0 148 PHOS-5CID 1,0 149 AMMONIA-NG ,85 150 AMMONIA-FO ,85 151 152 153 TABLE iCA INITIAL CAPACITY (ToNS PER DAY) 154 155 AMMONIA-NG AMMONIA-FO UREA 156 157 PALMb .NG 1160 1860 158 GRESIK 220 150 GAMS o A S E A N F E RA&I L I Z E R M 0 L t L (DEMONSTRATION) 08/23/79 20.i17, PAGE 4 160 TABLE IC INVESTMENT COST 161 162 SIZt CUST FIXED 163 164 URFA 1800 48.1 th,7 lb5 AMMONIA-NG 1100 8b,0 147,9 166 AMMUNIA-FU 1100 112,2 53,9 1S7 AMMON-PHOS 100) 15,3 4, 168 PHOS-ACID 10 29,0 10,0 169 SULPH-ACID 1400 ?4,0 l8,5 170 171 NOTE., ALL ClOTS IN MIl LION USS (CONSTANT 1975) HAStD ON IVA tSTIMATES 172 f EO UTFEERENT PLANT SI7S, (SS bATTLRY LlNIT, AukILIANY IACILITIES, 173 * SupPURT FACILITIESSPAE PARTS, TRAININGSTAiT-UP FTC,p 15 DAYS 174 a STIRAGt FnR ANO1A AND PHOSPHORIC ACIS, 20 0AYS 8AC SIURAGE, 175 x Pfl,ER ANO STrANh GENEFATIJN ETC,) 17e * THF C;)LIIMN 'SIZF GlvFS THF PLANT S11L WHERE ELUNOiES OF SCALE 177 a ARF EXHAISTED (TONS/nAy), COLUMNJ cfSTI GIVES THE 10TAL INVESTMENT 178 * AT CAPACITY LEVEL 'SIZEI (MILL, US$), THE COLUMJFIXr.D' GIVES THE 179 fTy0) CHARGE OF LINARIZED COST IUNCTION (MILL, US$), 180 * THE FOLLOlNG LINFARIZED COST FuNCTION IS USED,, 181 A TOTAL COST = 'PNOPi*:APACITY + ISEG#SLGMENT; WHERE S6GENT lb EQUAL TO 182 ZFRO IF ErONONIES OF SCALE ARF EXrHAUSTED; SEGMENT = tSIZE1 - LAPACITY 1113 * OTHERWISE' Th. ClOSTANT ,33 CONVEkTS TONS/DAY INTO l000TONS/YLAR BASED 184 L IN 330 UAYS OF OPERATION, 189 1ft IC(N,PRPpl) = IC(M,'COST' )/IC(M,'SIl't)/,33j 187 IC(mp SEGi ) = IC(M,'FIXEDl)/IC(M,'SILl)/,33j !88 19 * THE AHUVE INvESTMENT COST FTCI;oRFS IM NOT CONSIDER SiT-SPECIFIC COSTS 190 A SUCH AS SIfE PNEPIRAUIJN, C014STRUCTIjN EQU[PMLNT, HOUbING FACILITIES 191 * MEDICAL FALIL ITIES, ETC, TE LOCATION SITE FACTOR IS USED TO ADJUST 192 ItiVSTMENT COST FOR A SPECIFIC SITE, 193 194 PARAM LSC LOCATION SITE FACTOR / 195 196 PALFM3ANG :,17 197 GRFSIX 1,2S 198 KHANT AN 1 .22 199 TO.Frll-CTY I, 1 200 BANGKGK 11/ /1 201 202 PARAM yPR IMPORT PRICES (C,I,F USS PER METRIC TON) / 203 204 URFA 200 209 D AP 250 206 TSP 175 207 E-SIILPHIR 00 208a Phns-pCK 1.o 209 FUFL-OIL 85/ 210 211 SET CFI FTNAL PRODUCTS FROM FOREIGN SOURCES; CFI(PP)=YESSIPR(FF); 212 213 SET CTI RAW MATERIALS AND INTERMEDIATES IMPORTLDICII()SIPR(C)=YESeFP(C)I GAMS 0 A S E A N F~ T I L I Z E R M ti 0 1 L (DIMUNSTRATION) 08/23/79 5 215 TABLE nMS SUPPLY UF DOMESTIC MATEIALS 216 217 r4-GAS.PRICE N-GAS,LIMIT SULPH-ACID.PilCE SULPH-ACID,LIMIT 218 (s/ioooScF) (1000 ScF/Y) ($/TON) (1000TUNS/YEAR) 219 220 PALEMBANG ,60 25000 221 TOLEDO-CTY 10,0 540 222 KOANTAN 1,60 100000 223 224 225 VAR Z PrangESS LEVEL 226 X SHIpMLNT UF FINAL PRODUCTS 227 MF IDP(iR IS 1. F IDAL, PRODUC TS 228 MI IMPnRTS UF INTERMFDIATES AND RAW MATERIALS 229 D SUPPLY UF DOMESTIC MATERIALS 230 XI IHTFRPLANT ShIPME'TS 231 h CAPACITY EXPANSION 232 Y 8INAhY VAPIABLE 233 S IjVFSTMENT SEGMENTI 234 235 EQUATION NR NUTRIFNT REQUIRFMENTS 236 CCr CAPACITY COINSTRAItJTS 237 RVC bINARY VARIAAIF. CONTRAINIS 238 Sr SEGMFNI CONSTRAINTS 239 n