Research Project No. RPO 672-91 Research Project Name: An Evaluation of Industrial Location Policies for Urban Deconcentration Abstract The author describes the data files used in the research project, shows the steps taken to prepare the data for analyses, and provides in detail the information needed for future use of the data files. The data base consists of three main data files: the Mining and Manufacturing Surveys, 1973-1981, National Bureau of Statistics; the Location Census of Manufacturing Establishments, 1978, Ministry of Trade and Industry; and the Project Establishment Survey. Project Sample Establishment Survey..........................5 4. Location Census of Manufacturing Establishments.............7 III. Data Preparation...............................................10 1. Mining and Manufacturing Survey, 1973-1980..................10 2. Mining and Manufacturing Survey, 1981.......................13 3. Project Sample Establishment Survey........................22 4. Location Census of Manufacturing Establishments.............26 Technical Memorandum........................................33 Unspecified Relocation Classifications.........................20 10. Stratified Subfiles............................................21 11. Stratification of Establishments in the Population.............23 12. Sample Composition.............................................23 13. Conversion of KID Geocode to NBS Geocode.......................27 I. INTRODUCTION This paper describes the data base used for the Korea industrial location policy research project 1/ and provides documentations and guidelines for future users. The data base consi.ts of a sample survey of 500 manufacturing establishments conducted as part of the project and two existing data sets, the Mining and Manufacturing Establishments Survey for 1973-1981 collected by the Korean National Bureau of Statistics (NBS) and the Location Census of Manufacturing Establishments conducted by the Ministry of Trade and Industry in 1978. NBS conducts the Mining and Manufacturing Survey annually covering all manufacturing establishments with five or more employees. It takes the census of manufacturing establishments every five years including a sample of firms with less than five employees; 1973 and 1978 were such census years. In response to our request made, during a mission in February 1982, the NBS included in the 1981 survey conducted in April 1982 six questions on location history of establishments including the date of foundation, the frequency of relocation, the date of relocation, previous location, and reasons for relocation. This data set was used to summarize moving patterns of relocation firms and served as the sample frame for selecting 500 sample firms. I/ "An Evaluation of Industrial Location Policies for Urban Deconcentration." RPO 672-58/91. -2- The survey instrument for the Project Sample Establishment Survey was designed by Kyu Sik Lee, Project Director, and was executed by the local project team in Korea during August-October 1983. The survey was the project's major data collection effort. The survey interviews were completed for 499 firms in the Seoul region. In 1978, under the auspices of the Ministry of Commerce and Industry (4OCI), the Korea Industrial Development Research Institute (KID) conducted a national locacion survey, the Location Census of Manufacturing Establishments. The census focused on location characteristics including information on pollution, electricity and water use, and future plans to move. This paper consists of four chapters. Chapter II describes the information available in the data files; Chapter III documents how the data sets were prepared for various research tasks; finally, Chapter IV provides technical information needed for using these data files. -3- II. DATA FILE DESCRIPTIONS 1. Mining and Manufacturing Survey, 1973-1980 The annual survey covers all mining and manufacturing establishments with five or more workers and the census which is taken every five years (including 1973 and 1978) includes a sample of establishments with less than five workers. As shown in the survey questionnaires in the Appendix 5, the following information is covered by the survey: (1) Identification and establishment characteristics: establishment identification number, goecode for present location, year of foundation, major products by SIC codes; (2) Establishment size: lot size, ground floor space, and total floor space; (3) Employment and annual wage bill: the number of production and office workers by sex, and annual wage bills for the two groups; (4) Output: value of annual shipmetns; (5) Tangible fixed assets: values of land area, buildings, machinery equipments and facilities, and transportation equipments; (6) Annual production costs: raw materials, fuels, purchased electricity and water, and repair and maintenance; -4- (7) Value of inventories: value of products, raw materials, and fuels stored at the beginning and end of the year. The original questionnaires, record layout, and one-way frequency distribution are attached in the Appendix 5 and 6 for detailed information. 2. Mining and Manufacturing Survey, 1981 As indicated earlier the 1981 survey contained the firm relocation module prepared by the research team. The questions included those listed as follows. QUESTIONNAIRE FOR LOCATION HISTORY OF MANUFACTURING ESTABLISHMENTS 1. Has your plant been located at the present site since your establishment was funded? Yes or No. If NO, go to the next question. 2. Where was your previous plant located? Six digit geocode 3. How many years have you been operating at the present site? years. 4. Did you change the line of production after you relocated to the present site? SIC code 5. If YES, what was the line of production at the previous location? SIC code 6. Did you relocate to the present site because of certain government measures? a. Incentive schemes b. Government order to relocate c. None of the above -5- Information in the 1981 Survey File NBS provided to us only a limited amount of information besides the relocation information. The file contains the following information: (1) Identification and establishment characteristics: identification number, six-digit geocode for present location, major products represented by the SIC code, and year of foundation. (2) Location history: four-digit geocode for previous location, date of relocation, and reasons for relocation. (3) Employment: the number of employees by sex and work type. This 1981 "mover" file was essential for our research. This data set made it possible to summarize moving patterns of relocating firms and to verify the extent of policy implementation. The mover file served as the sample frame for drawing a random sample of 500 firms for the project's establishment survey. Even though the mover file had a limited number of variables, it was the only data of its kind available and provided key stratification variables. The sample was stratified by types of location tenure (e.g., births, movers, and non-movers), firm size defined by employment, industry type, and location. The information on reasons for relocation made it possible to oversample those firms that were influenced by the government policy instruments such as incentive systems or relocation orders. 3. Project's Sample Establishment Survey This project's establishment survey was a major data collection effort to obtain the new data necessary for empirical -6- studies. The survey was carried out by the local research team during August-October, 1983. The survey interview was successfully completed for 499 manufacturing establishments in the Seoul region. As stated above, the establishments in the 1981 manufacturing survey were stratified by the following four categories: (1) location tenure (non-movers, movers, and births); (2) the type of industry defined by two-digit SIC codes; (3) firm size by employment; and (4) location defined by subareas in Seoul and Gyeonggi. We chose the textile industry and the fabricated metal industry as the industries to be studied. In order to study location decisions in recent yairs, the sample firms were confined to those founded or relocated in 1979 or thereafter. Movers who relocated in response to government policy actions were oversampled. Large establishments were also oversampled. Regarding the location stratification, the sample allowed equal probabilities among subareas defined by the four-digit geocode, within Seoul and Gyeonggi respectively. Although the sampling had been completed from the 1981 survey file, we needed logistical support from NBS to execute the actual survey in Korea. The 1981 file had only identification numbers and geocodes without the name and address of establishments. The NBS staff provided the name and address of our sample firms by matching their ID's with those of the NBS master file. The realized sample of 499 consists of 221 mature firms (i.e., non-movers), 141 movers, and 137 births. The sample covered all seventeen Gu's (districts) in Seoul, all four Gu's in Incheon, seven of eight satellite Si's (cities), and fifteen of twenty Gun's (counties) -7- in Gyeonggi. The textile industry had 217 firms (43.49%) and 273 (54.71%). The establishment survey questionnaire contains about 150 questions divided into the following five parts (see Appendix 5): Part 1 - A set of comprehensive questions for all establishments: It includes the firm's present location, location history with year of foundation, and plant characteristics such as industry type, lot size, floor space, and land price. It also asks type of workers, shipments of inputs and outputs, public utility services, and government incentive schemes. Part 2 - A set of questions to movers about their previous location, reasons for relocation, government policies intended to influence relocation, and site characterisitcs. Part 3 - Questions about future plans for expansion or relocation for all establishments. Part 4 - For birth firms: A set of questions about important factors considered in choosing the location including incentive schemes. Part 5 - Questions to non-movers about their on-site expansions. 4. MOCI Location Census of Manufacturing Establishment In 1978, under the auspices of the Ministry of Commerce and Industry (MOCI), the Korean Industrial Development Research Institute (KID) conducted the. Location Census of Manufacturing Establishments. The census was intended to cover all manufacturing industry establishments in the ertire country as of December 31, 1978. -8- Unlike the NBS census or surveys which had been regularly conducted to obtain data on the level of production and the industrial structure of the manufacturing sector, this census was designed for a special purpose -- to obtain data necessary for a, study of industrial location. A copy of the K0CI Census data was released to us on December 20, 1982. It contained 18,661 establishments in Seoul-Gyeonggi. Of these, the number of small firms with less than 5 employees is only 878. It was obvious that the MOCI Location Census did not cover all establishments in the manufacturing sector. According to the Address Coding Manual of the MOCI Census, the goecode system that was developed by the KID was different from the NBS standard system. The goecode consists of six digits for each administrative district as follows: (1) the first two-digits denote Seoul (01) or Gyeonggi (02); (2) the next two-digits denote Gu, Si, or Gun; (3) the final two-digits denote Dong, Eup, or Myeon (except for Incheon, where the final two-digits are for Gu). The fact that the MOCI census used its own geocode system makes it impossible to compare the MOCI census directly with the NBS manufacturing census at the four-digit geocode level. To overcome this problem, the KID code has been converted into the NBS standard geocode, (See section 4 of the next chapter for more details.) As shown in Appendix 5, the questionnaire for the MOCI Location Census asks 19 questions consisting of 40 variables. The follwoing information is included: -9- (1) The name and address of the establishment, major products, value of capital, annual shipment, the number of employees by type of work; (2) The zone of establishment location, lot size, and building space; (3) The amount of electricity, water, and fuel consumed; (4) The total volume of rawu materials used, and of goods produced, the means of shipment, and destination of shipment; (5) Types of pollution generated, the facilities to prevent pollution, and whether the establishment received government orders to move due to pollution; (6) Reasons for selection of the present location; (7) Information on future plans to move or expand, such as desired type of location, lot size required, and distance from the present location. - 10 - III. DATA PREPARATION 1. Mining and Manufacturing Survey, 1973-1980 Creation of Subfiles Each annual file for the 1973-1980 period was processed by a Fortran program to extract the records for Seoul and Gyeonggi. From these files mining establishments were eliminated. Diane Reedy used the SYSTEM/SORTMERGER utility to merge chese annual subfiles to create a masterfile for 1977-1980. The records were sorted first by firms' ID, then by year for the period. This merged data file was further divided by the firm type and created subfiles for births, movers, deaths, and mature firms. The birth firms were those first appeared in the file in any year, deaths were those disappeared from the file, and movers were those firms with different geocodes between two years during the 1977-1980 period. The mature firms were those appeared in all years with the same address. Particular attention was given to movers because the 1977 survey file had goecodes different from other years. After the 1977 goecodes were replaced by those corresponding to 1980 (as explained below), the new 1977 files were merged with the 1978-1980 annual subfiles, then a subfile for mover firms was created. Table I and Table 2 provide information about those subfiles created by year and by firm type, respectively. Table 3 shows a subfile with 20 key variables needed for production function estimations. - 11 - Table 1: ANNUAL MANUFACTURING SUBFILES Number of Mining Firms File Name Records (Excluded) D/REEDY/MFG73/SEOUL3 5832 23 D/REEDY/MFG74/SEOUL3 5848 22 D/REEDY/MFG75/SEOUL3 5542 18 D/REEDY/MFG76/SEOUL3 6137 18 D/REEDY/MFG77/SEOUL3 7282 25 D/REEDY/MFG78/SEOUL3 7752 19 D/REEDY/MFG79/SEOUL3 8246 15 D/REEDY/MFG80/SEOUL3 7652 14 D/REEDY/MFG73/GYEONGGI 2437 235 D/REEDY/MFG74/GYEONGGI 2386 208 D/REEDY/MFG75/GYEONGGI 2763 240 D/REEDY/MFG76/GYEONGGI 3329 245 D/REEDY/MFG77/GYEONGGI 2959 242 D/REEDY/MFG78/GYEONGGI 5229 253 D/REEDY/MFG79/GYEONGGI 5680 243 D/REEDY/MFG80/GYEONGGI 5860 234 - 12 - Table 2: FIRMTYPE SUBFILES OF MERGEFILE (1977-1980) Number of Number of First Name Records Establishments D/REEDY/SEOUL/MATURE 12914 3231 D/REEDY/SEOUL/BIRTH 8798 4235 D/REEDY/SEOUL/DEATH 4097 1716 D/REEDY/SEOUL/MOVERS 296 97 D/REEDY/GYEONGGI/MATURE 9576 2394 D/REEDY/GYEONGGI/BIRTH 6541 3258 D/REEDY/GYEONGGI/DEATH 1953 809 D/REEDY/GYEONGGI/MOVERS 149 40 Table 3: Twenty-VARIABLE SUBFILES (1977-1980)-!/ Number of Records b/ File Name Seoul Gyeonggi Total D/CHUN/MFG73/SF20 5832 2437 8269 D/CHUN/MFG74/SF20 5848 2386 8234 D/CHUN/MFG75/SF20 5542 2763 8305 D/CHUN/MFG76/SF20 6137 3329 9466 D/CHUN/MFG77/SF20 7282 2959 10241 D/CHUN/MFG78/SF20 7752 5229 12981 D/CHUN/MFG79/SF20 8246 5680 13926 D/CHUN/MFG80/SF20 7652 5860 13512 a/ See record layout in Annex IV.2 for the descriptions of 20 variables. bk Manufacturing establishments only. - 13 - Geocode System in Survey Files The Korean government revised the geocode system in 1980. Because the original survey files for 1978 and 1979 had only the first two digits of the geocodes, NBS staff entered the 1980 six-digit geocodes to replace the two-digit codes appearing in the original 1978 and 1979 files to produce a consistent geocode system. But the 1977 file still had old geocodes. Replacement of Geocodes in 1977 Files In 1980 Seoul had 17 Gu's, an increase of four new Gu's over 1977. Replacing the 1977 geocodes by those of 1980 required identifying those Dong's in the four new Gu's and assigning the new geocodes to them. Those affected Gu's were: 1117 split into 1118 and 1119, 1120 split into 1122 and 1123, 1121 split into 1124 and 1125, 1122 split into 1126 and 1127. A similar change occured in Gyeonggi. In 1980 Gyeonggi gained one Gun over 1977, resulting from the breakup of Gun 3131 into two, 3131 and 3132. Tables 4 and 5 show the details of the replacement work done. 2. Mining and Manufacturing Survey, 1981 The preparation of the 1981 manufacturing survey data needed extra attention for the following reasons: Since the survey file was released to us before it was finalized, we first had to go through a data cleaning process. Particularly, we concentrated on identifying consistencies existing between variables in the relocation module, a set of information on firm's relocation history. Next, for the purpose of -14- Table 4: REPLACEMENT OF 1977 GEOCODES SEOUL Former Geocode (1977) Replacement Geocode (1980) 1111nn 1111nn 1112nn 1112nn 1113nn 111400 1114nn 1114nn 1115nn 111600 1116nn 111700 111701-111706 111900 111707-111708 111800 1118nn 112000 1119nn 111300 112001-112012, 112014 112300 112013, 112015-112029 112200 112101-112105 112500 112101-112103, 112106 112400 112201 112600 112202-112206 .112700 1123nn 112100 - 15 - Table 5: REPLACEMENT OF 1977 GEOCODES GYEONGGI Former Geocode (1977) Replacement Geocode (1980) 3101nn 3101nn 3102nn 3102nn 3103nn 3103nn 3104nn 3104nn 3111nn 3111nn 3112nn 3112nn 3113nn 3113nn 3114nn 3114nn 3115nn 3115nn 313101 313101 313102-313104 313200 132nn 313300 3133nn 313400 3134nn 313500 3135nn 313600 3136nn 313700 3137nn 313800 3138nn 313900 3139nn 314000 3140nn 314100 3141nn 314200 3142nn 314300 3142nn 314400 3144nn 314500 3145n 314600 3146nn 314700 3147nn 314800 3148nn 314900 n.a. (Banweol) (31500) - 16 - using this file as the sample frame we prepared various subfiles according to the stratification criteria. The steps taken for data cleaning and creating the subfiles were as follows. Relocation Classification (RC) Relocation classification code is a variable which identifies whether a firm is a mover or not. RC takes values of 0, 1, or 2 as 4 follows: (1) If the RC value is 0, then the firm is a non-mover. Consequently, the.previous location and the date of relocation are coded as zero; (2) If the RC value is 1, the firm is a mover who has relocated within the Si, Gu, or Gun where the firm is currently located. Thus, the first four digits of the present location are equal to those of the previous location. The date that the firm relocated is represented in MMDDYY; (3) If the RC value is 2, then the firm is a mover, who has relocated from one Si, Gu or Gun to another. Therefore, the four digit location code has changed. The date of relocation is also given as MMDDYY. Data Editing Data cleaning was the first task required in order to process the 1981 manufacturing survey data. For this purpose, a Fortran program was written to verify coding-and conduct consistency checks. The coding verification was focused on the relocation years and the geocodes of previous location and present location. The consistency check was done by examining the previous location and the present location based on the - 17 - Table 6: IMPROPER RELOCATION YEARS Present Date of Firm ID Location SIC Relocation 3101588 311420 35113 830331 1115567 112220 35299 831000 2104706 211821 37103 840000 relocation classification (RC). In validating the geocodes, the 1980 NBS geocode system served as the base. The following summarizes the results found from the verification and consistency check. The obvious errors, given below, were corrected on a copy file created for back-up: 1. Three firms in Table 6 having '83 or '84 as the year of relocation are removed from the file. 2. Table 7 lists twelve firms which have invalid geocodes representing the previous location or the present location. One of them (ID:2202352) has 215 for its present location. The 215 has been changed to 2215 so that the first two digits of the geocode (22), denoting Si or Do, are consistent with the first two digits of the firm ID. Two of them (ID's 1116431 and 1116447) which are mover firms (RC=2) have invalid geocodes in their previous location (1612, 1515 respectively). These firms are left in the file unchanged. Particular care should be paid to them when they are actually encountered. Finally, the rest of the firms (nine firms) have the previous location geocodes -18- of which the first two digits indicating Gu or Gun are valid but the next two indicating Gu or Gun are not. Since these firms are useful at two-digit geocode level, the first two valid digits have been saved, but the next two invalid ones have been replaced by "00" (uncertain Gu, Si or Gun). Table 7: INVALID GEOCODES AND THE CORRECTIONS Present Previous Date of Firm ID Location SIc RC Location Relocation Corrections 3701441 374212 32132 2 2205 810501 2200 2203352 ?21519 33111 0 0 221519 3800684 381232 35302 2 2300 790000 unchanged 1116431 112233 35599 2 1612 790510 remained 1115446 112218 35609 2 2300 790310 unchanged 3106884 315033 36996 2 2303 810329 2300 3801255 381519 38120 2 1143 80110A 1100 1116447 112233 38192 2 1515 801010 remained 1120858 112322 38196 2 2300 810430 unchanged 1100702 111122 38293 2 3123 80108 3100 1112906 111818 38321 2 1129 740601 1100 1118153 112323 38525 2 2300 770310 unchanged 3. As explained earlier, if the RC value is 1, then the previous location should be the same as the present location. Although the thirteen firms listed in Table 8 have an RC value of 1, the date relocated and their previous location were unspecified (value of 0). Thus, the previous location has been replaced by the first four digits of the present location. - 19 - Table 8: UNSPECIFIED PREVIOUS LOCATION Present Previous Date of Firm ID Location SIC RC Location Relocation 1114263 112121 32135 1 0 810410 1106466 111427 32135 1 0 810310 1114419 112128 33132 1 0 81101Q 2103045 211724 34193 1 0 800311 1101066 111124 34212 1 0 800210 3600455 360228 35113 1 0 810818 3801540 381812 35302 1 0 811210 2102760 211639 35592 1 0 781201 3107527 311522 35609 1 0 811020 3107464 311520 35609 1 0 790928 2200610 221228 36991 1 0 820218 1106838 111432 39010 1 0 800308 2103021 211724 39097 1 0 731020 4. Twelve firms in Table 9 had zeros as the values of the relocation classifications. However, they specified their previous location and the date of relocation as in the table. Since they should certainly be regarded as movers, their RC values were revised as 2 according to the RC definition. - 20 - Table 9: UNSPECIFIED RELOCATION CLASSIFICATIONS Present Previous Date of RC Values Firm ID Location SIC Location Relocation Revised 2202808 221515 32163 2213 810615 2 2105370 211875 34119 2215 790714 2 2204411 221622 34199 2212 820329 2 1119073 112532 34213 1111 790412 2 3402474 344411 33116 3131 801210 2 3402548 344441 34116 1114 810510 2 2105411 211825 35291 2113 731295 2 2105358 211825 35592 2115 760809 2 2105357 211825 35609 2114 780710 2 2105455 211825 38239 2115 770610 2 2105196 211825 38235 2115 740325 2 2105461 211825 38432 2116 810107 2 - 21 - Creation of Subfiles According to the sampling strategy described earlier, a number of subfiles (Table 10) were created to perform stratified random sampling. Three subfiles were first created according to the location history -- mature (non-movers), movers, and births. Then each of them was divided by employment size. The movers were further divided into three subgroups according to the reasons for relocation -- voluntary, government incentives, and government orders. It should be noted here that all subfiles prepared above contained only the establishments in the textile and the fabricated metal industries to be studied. More details of the sampling procedures are described in the next section. Table 10: STRATIFIED SUBFILES File Name Records Comments D/PAHK/MFG81/SAMPLE/RCTMVR/RSN1L 371 Recent Movers-Large Voluntary D/PAHK/MFG81/SAMPLE/RCTMVR/RSNIS 425 Recent Movers-Small Voluntary D/PAHK/MFG81/SAMPLE/RCTMVR/RSN3 84 Recent Movers by Government Incentives D/PAHK/MFG81/SAMPLE/RCTMVR/RSN4 72 Recent Movers by Government Orders D/PARK/MFG81/SAMPLE/BIRTH/LARGE 733 Birth-Large D/PAEK/MFG81/SAMPLE/BIRTH/SMALL 1744 Birth-Small D/PAHK/MFG81/SAMPLE/MATURE/LARGE 1726 Mature-Large D/PAHK/MFG81/SAMPLE/MATURE/SMALL 2115 Mature-Small - 22 - 3. The Project Sample Establishment Survey This section describes sample stratification, random sampling algorithm, and data cleaning done for the establishment survey. Sample Stratification The sample stratification criteria were described before. Table 11 shows that a total of 7,297 establishments in two industries were stratified by those criteria. Table 12 shows the planned sample composition resulted from the stratification with the following controls: (1) Three strata by firm type have equal shares; (2) over- sample large size firms with movers and mature firms; and (3) for births over-sample small firms in Seoul. The actual drawing of sample firms however was performed for 750 firms to maintain reserves for possible replacement of firms that would fail to respond to the survey. Sample Algorithm The final step for sampling was to develop an algorithm to perform the random sampling with strata defined above. 