Policy Research Working Paper 9977 Proximity without Productivity Agglomeration Effects with Plant-Level Output and Price Data Arti Grover William F. Maloney Finance, Competitiveness and Innovation Global Practice March 2022 Policy Research Working Paper 9977 Abstract Recent literature suggests that the positive impact of output quantity from prices. It shows that higher wage population density on wages, the canonical measure of elasticities with respect to density are due to higher marginal agglomeration effects, is multiples higher in developing costs, and agglomeration elasticities of efficiency, physical countries than in advanced economies. This poses an urban total factor productivity, are in fact far lower in developing productivity puzzle because on-the-ground observations countries. Further, congestion costs decrease with coun- do not suggest that cities in developing countries function try income. Both are consistent with often low rates of especially well or are conducive to enhanced productivity. structural transformation that make cities in developing This paper uses manufacturing censuses from four countries countries so-called “sterile agglomerations,” which are pop- at differing levels of income that allow separating plant ulous but not efficient. This paper is a product of the Finance, Competitiveness and Innovation Global Practice and the Office of the Chief Economist, Latin America and the Caribbean Region. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at agrover1@worldbank.org and wmaloney@worldbank.org The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Proximity without Productivity: Agglomeration Effects with Plant-Level Output and Price Data Arti Grover and William F. Maloney1 JEL Classification: 018, 040, D24, E24 Key words: agglomeration, productivity, cities structural transformation. 1 World Bank. We thank Alvaro Garcia, Marcela Eslava, Kaleb Girma Abreha and Massimiliano Cali for graciously allowing access to micro datasets in Chile, Colombia, Ethiopia and Indonesia by running our Stata codes, with excellent research assistance from Nicolas Urdaneta Andrade (Colombia) and Bayu Agnimaruto (Indonesia). We are also grateful to Angelos Theodorakopoulos for his work in the early phases of this project, including his contributions to the Stata codes and for sharing his insights on productivity estimations. Our research benefited greatly from thoughtful and constructive comments by Gilles Duranton, Somik Lall and Tony Venables. I. The Developing Country Urban Productivity Puzzle There is overwhelming evidence of the economic benefits of agglomeration, arising from ‘sharing’ of factors of production, improved ‘matching’ of firms and workers, and externalities related to knowledge spillovers (see, for example, Duranton and Puga, 2004). The canonical measure of such agglomeration effects is the elasticity of wages with respect to density and the estimates are large: 0.043 in the United States (Chauvin et al., 2017); 0.03 in France (Combes et al., 2008); and 0.025 in Spain (de la Roca and Puga, 2012). Recent findings suggest that these effects are even larger in developing countries: 0.1 to 0.19 in China (Combes et al., 2013; Chauvin et al., 2017); 0.12 in India (Chauvin et al., 2017); and 0.17 in Africa (Henderson et al., 2019)- that is, a given increase in density increases wages by four times more in Africa than in the US. This poses something of a puzzle given that other sources of information suggest that developing country cities are not so much extraordinarily productive, as just crowded and even dysfunctional. Satellite and geographic information system (GIS) data covering large cities across the Sub-Saharan Africa region suggest that they are sprawling and disconnected – a far cry from the dense packing of educated workers sharing ideas in advanced country cities such as Chicago historically (McMillen 2003), or today’s Silicon Valley. As an example, land markets often function poorly with overlapping property rights regimes, suboptimal and ineffective zoning regulations, and limited supporting infrastructure, including transport—all of which hinder efficient concentration and raise costs (Lall, Lebrand, and Soppelsa 2021). Further, if Glaeser and Resseger (2010) found that areas in the US with below average human capital showed no agglomeration benefits because of limited benefits of pooling differentiated labor and knowledge spillovers, we might expect even fewer benefits in skill- poor developing countries. This paper contributes to a resolution of this puzzle. It first conceptually disaggregates the canonical wage elasticity into component parts and shows that increasing productivity is only one possible driver – rising urban costs are as well – but also that the data employed to date is not capable of distinguishing between them. We then study four countries ranging across regions and the development spectrum – in order of income, Ethiopia, Indonesia, Colombia and Chile – four of the very few countries whose nationally representative establishment level databases permit separating plant prices from physical output, linkable to a standard set of geographical covariates, which have substantial subnational variation, and which are available for use. This allows estimating physical productivity or efficiency (TFPQ), which should capture the effects of better matching and externalities, and marginal costs, thereby showing how they contribute to the counterintuitive wage elasticity estimates. 2 We first confirm the literature’s findings of high elasticities of wages and revenue total factor productivity (TFPR) in the poorer countries: Our wage elasticity estimates using plant-level data for Ethiopia of .052, and for Indonesia of .057 are not as high as those for China, India and Africa estimated using the labor surveys mentioned above, but they are three times our estimate for the richest country in our sample, Chile (.017) which is close to the estimates for more advanced countries. However, we then show that these measured benefits largely disappear once the concomitant rise in prices is accounted for – efficiency gains measured as the elasticity of physical total factor productivity (TFPQ) with respect to density show negative point estimates for the three lower income countries and significantly so in some specifications for Ethiopia and Colombia. Rather, we find that the elasticity of marginal cost is strongly positive for these countries, suggesting that the costs of operating in developing country cities increase sharply with population, and this pushes up prices and hence wages. Our estimates suggest that the TFPQ elasticity, in fact, increases with development, while the marginal cost elasticity decreases. The net benefit to firms of operating in a city rises steeply with development, with the elasticity of gains in productivity exceeding that of increasing costs likely occurring only at middle income. These findings raise the question about why firms and workers would move to the city in the first place if they are not getting productivity enhancing benefits. It first bears noticing that the equation of urbanization with industrialization and even increased productivity is historically a recent one. Bairoch (1988) reminds that urbanization began with the emergence of surplus arising from sedentary agriculture and cities served as a provider of services and distribution of rents long before the arrival of industrialization. In his most negative account, he sees Rome, the largest city in the ancient world, as an exactor of tribute, a “parasitic” city that had little in the way of industry, and sees many developing country cities as “Romes without empires”(p. 466 ).2 Gollin et al. (2016) echo and update this mechanism by arguing that developing country agglomerations often are not accompanied by structural transformation into industries that would benefit from spillovers and deeper factor markets, but rather are distribution points for natural resource wealth—they are consumption not production cities. International aid, whose first stop is generally the capital, could also serve to support consumption. Ethiopia, for instance, receives roughly 4 percent of GDP in foreign aid, which is large when we consider that Chile’s entire mining sector constitutes 9 percent of GDP. In a similar vein, migrants may be seeking out (subsidized) health and education services and amenities unavailable in the rural areas as much as more productive jobs. Ades and Glaeser (1995) also stress political reasons for urbanization, particularly in primate cities, because spatial proximity to power increases political influence or rents. They find that dictatorships have central cities 50 percent larger than democracies. In these rent-distributing or consumption cities, utility derived from these sources, not firm productivity, needs to exceed the disutility of congestion. In a more traditional take, Krugman and Livas (1996) and others argue that high internal transport costs may lead 2 Apologies to Krugman and Livas (1996). 3 domestic producers to locate close to income sources rather than produce in less costly venues – implying that firm size and revenues may be higher, but efficiency may not be. In all cases, locally concentrated income drives up real estate prices, requiring higher wages for local labor and generating higher marginal costs for firms. 3 In this view, the wage elasticity captures little about the productivity of local industry or the city. II. Decomposing the Agglomeration Elasticity Figure 1 decomposes what is traditionally measured by wage elasticity estimates (see Combes et al., 2010; Duranton and Puga, 2020). Moving right, the figure explores the benefits of agglomeration including the quantity and quality effects, and externalities. Quantity Effects: Cities or productive places tend to attract more workers and firms, due to natural geography (first nature drivers) as well as historical past, and sunk investment (second nature drivers). Thus, the wage elasticity may be measuring the reverse causality usually assumed: places are dense because they are more productive. Econometric estimates attempt to deal with this by instrumenting density with historical density or geological variables that may have historically been important for clustering. Quality Effects: Cities also attract firms and workers of higher quality. More able workers may sort into cities because it offers better and greater opportunities or amenities (static gains). Once they are there, they may learn more (dynamic gains), either because cities offer more learning opportunities or because competition induces them to. The overall effect is to make firms and workers more productive and raise wages. 4 Access to inputs (labor, finance, materials, energy), a more conducive business environment, or cultural amenities will also attract better and more entrepreneurial firms. These sorting or selection effects can be dealt with econometrically by controlling for human capital explicitly and by controlling for firm-specific fixed effects. Externalities: Agglomeration externalities arise from knowledge creation and spillovers from new technology and innovation, networking, collaboration, and information sharing intermediated outside the market (Duranton and Puga 2004). These can include the gains from clustering with similar firms (localization or 3 Another diagnosis, however, finds little correlation with resource sectors and argues that it is the unusually low agricultural productivity and poor service provision that propels migrants to the cities (see Henderson and Turner 2020). This would lead to a true productivity gradient, although it is driven less by urban economies as rural dysfunctionality and hence would be much more modest. The strong attraction of educational or other social services not available in surrounding low productivity agricultural areas provides a further, non- productivity related motivation to move. Effectively, any source of income not produced by local urban industry could create a demand for city living, pushing up prices: disbursement of taxation, foreign aid, national resource rents. This also extends to service provision. If services are concentrated in cities, then populations will move there. 4 What an urban worker learns over a span of 3 to 10 years potentially adds another 23 percent to the urban wage premium in advanced economies such as Spain (De la Roca and Puga 2017). Over a working life, these gains could amount to between 50 percent to 125 percent (Duranton, 2015) estimates. 4 Marshallian economies) or interactions with diverse sectors (urbanization or Jacobian economies). Workers may gain human capital by interacting with a larger number and quality of workers, thus contributing to the dynamic learning effects (see dotted diagonal line in figure 2). US cities that experience higher growth in the number of college graduates also see a rapid increase in average wages, beyond the obvious compositional effects, than those with stagnating stock of human capital, suggesting important knowledge spillovers (Moretti 2004). More recently, researchers have recognized that it is not only about bringing more and better “clones” to the city but also a greater variety of them, which is arguably part of agglomeration externalities (see Duranton, 2016; Diamond, 2016; Atkins, Faber, Navarro, 2018; Handbury and Weinstein 2015). Agglomeration/Congestion Costs: However, cities also present agglomeration costs due to higher crime, disease and congestion in transport, land and housing. Such costs—and the necessary compensation for them-- are often not considered in interpreting estimates of the wage elasticity but they should be as they may create a positive elasticity in the absence of any productivity gains. 5 The same issue affects other measures of city-level productivity, namely, revenue based firm level productivity measure, TFPR, which is the residual of the value of output controlling for capital, labor, and sometimes human capital. Firms charging higher output prices may appear as more productive, even if these gains are actually representing higher costs or market power (Foster et al. 2008). The next section attempts to isolate these distinct effects empirically, focusing particularly on wages, total factor productivity and marginal costs, each of which captures one or more elements discussed above, and has limitations relating to interpretation, measurement and endogeneity. III. Measurement Challenges and Estimation Methodology Data Table 1 details the sources of data for the four countries used. Each is an annual manufacturing survey containing the standard input and performance variables for plants of over 10 workers or, in the case of Indonesia, 20. In addition, the included countries meet four criteria which sharply reduce the universe of possible candidates. First, they need to have data on geographical location of plants, and to be linkable to other data sets that tabulate population density, as well as variables such as precipitation, temperature, elevation, terrain ruggedness and slope needed for instruments Second, though we treat the data in a panel context, identification is largely driven by cross sectional variation in population density and hence we need large countries with many heterogeneous subnational units. Three of the four have between 200 and 300 5 For example, the cost elasticities with respect to city size in France and Colombia are comparable in magnitude to that of agglomeration (Combes, Duranton, and Gobillon 2019; Duranton 2016). Burger, Ianchovichina and Akbar (2022) also estimate the “pure” firm productivity gains of density, net of negative externalities associated with limited mobility, crime, and pollution and suggest that the benefits of agglomeration are lost through both limited uncongested mobility and congestion. 5 municipalities, with only Ethiopia having 91 towns. Third, they need to tabulate product price data that is essential for calculating technical efficiency. Fourth, they need to be accessible. The only advanced country with price data we were able to get access to was Belgium which did not meet the second criterion. In the end, however, we have substantial representativeness across regions (Africa-Ethiopia; Asia-Indonesia; Latin America- Colombia and Chile) as well as level of development ranging from Ethiopia with per capita income $US (2020) 2,422, Indonesia 12,068, Colombia 14,570, and Chile 25,068 (World Bank). After matching to geographical covariates, plant count varies from 3,465 over 9 years in Ethiopia, to 34,366 in over 15 years in Indonesia. The surveys contain the geographical location of each plant which is then matched to local population and geospatial controls. Annexes A.1 and A.2 provide detailed data description for each of the countries, including spatial variables. Measurement and Estimation Challenges The common use of the nominal wage elasticity arises partly because wages capture all agglomeration-related sources of productivity growth – ease of capital accumulation, better job matching, knowledge spillovers and so on – but also because the absence of location specific production data and local prices often dictates a reliance on labor force data. 6 In the present work, we use plant-level average wages (total wages/number of workers) directly observed from the production data which can be directly compared to other plant-level productivity measures. Since we do not have geographically disaggregated data on local consumption baskets, we do not calculate a measure of real purchasing power. Our second principal measure, revenue-based TFP, is the Hicks-neutral measure of plant’s efficiency, commonly used in the literature to capture the residual output variation after controlling for tangible inputs of production. This strips out agglomeration effects arising from availability of capital or other inputs and thereby focuses on externalities arising from matching and spillovers. 7 Because it is based on nominal revenue (sales), this measure incorporates prices and thereby potentially confounds higher costs and plant’s market power, with efficiency. Our data permit constructing plant-level price indices which, using the method in Foster et al. (2008), then allows to work in physical units of output and building a measure of technical efficiency: = ln = ln − ln 6 Even when price data (e.g. local housing costs) are available, it is not clear whether real wages capture agglomeration effects (Chauvin et al., 2017) because the elasticity of wages with respect to housing costs, for example in a sample of US cities examined in Beaudry et al. (2014) is nearly 1 such that “people follow jobs” rather than wages (nominal or real). 7 Further, as Combes and Gobillon (2015) note, wages may differ from true gains to the degree that local firms have monopsony power and TFPR avoids making any assumption about the relationship between the local monopsony power and agglomeration. By construction, however, a limitation of this measure is the assumption that all employees within a firm earn the same average wage. Thus, it masks any underlying agglomeration mechanisms arising from heterogeneity in skills and occupations. For a more granular analysis with administrative worker-level data, see De la Roca and Puga (2017) and Quintero and Roberts (2018). 6 for plant i in period where is a weighted average of prices across products and plants in plant i. 8 As we would also like to know what is driving changes in prices across density, in the process of estimating , we directly recover mark-ups ( ) of plant i at time t and marginal costs ( ): Specifically, mark-ups are calculated based on the expression derived from the first order condition of the plant’s cost minimization of the flexible input – materials. = ln − ln − where is the output elasticity of materials estimated from the production function of plant i at time t, is the share of material inputs expenditure ( ) over total sales ( ) and is the ex-post shocks to the estimated production function. As mark-ups are the wedge between prices and marginal costs, = ln − . A remaining concern is that prices also are a measure of quality which would presumably also be increased by access to better knowledge and skills. Hence, conceptually, the “true” productivity gains lie somewhere between TFPQ and TFPR. We explore the importance of this effect for Colombia. Two econometric issues remain pertinent to the construction of these variables. First, TFPR is unobserved to the econometrician but known to the plant when making its periodic input decisions, potentially inducing a bias arising from the correlation between productivity and inputs (Marschak and Andrews 1944). Hence, we estimate the production function for each 2-digit industry following the structural approach in Gandhi, Navarro, and Rivers (2020) (henceforth GNR). 9 We also experiment with alternative estimation techniques that rely on different functional forms, such as Cobb-Douglas and Translog, and estimate a value-added production function following Ackerberg, Caves Frazer and (2015) (ACF). 10 In addition, though we observe plant-level prices and therefore output is in physical quantities, capital and material inputs are still based on expenditures deflated with an industry-specific price index. This implies that for any deviation from a perfectly competitive input market, capital and material inputs would also include unobserved idiosyncratic input price variation. Thus, we correct for the ‘input price’ bias arising from the correlation between input prices and quantities following Blum et al. (2018) which extends the GNR method to recover estimates of mark-ups using output price data. Following De Loecker et al. (2016), we extend this 8 See Annex B.1 for the price index. For alternative methods of constructing price indices, see Eslava and Haltiwanger (2021). The scarcity of price data has often led analysts to deflate using sectoral price data with the typical limitation in this literature of unaccounted pricing heterogeneity (see, Foster et al., 2008,) that does not separate the efficiency or productivity of firms in terms of real inputs and outputs from other price and demand effects. Thus, the estimated productivity effect could still be capturing, for example, differences in mark-ups charged by firms (Holl, 2016). 9 For more details of this issue see Mendershausen (1938); Marschak and Andrews (1944); Bond and Söderbom (2005); Ackerberg et al. (2015) and a formal discussion by GNR. This approach has two key advantages compared to standard proxy variable methods. First, it solves for identification issues when the production function contains at least one flexible input such as materials. Second, it imposes no restrictions on the elasticity of substitution between inputs by modeling the production function nonparametrically. 10 For more details on the estimation routine, see Annex B.2. 7 methodology to address the input prices bias stemming from unobserved plant-level input prices, henceforth, TFPQ à la De Loecker et al.. Estimation: We employ the standard specification in the literature = () + + + (1) where represents the set of logged values of productivity measures of plant i at time t : wages, ; TFPR, ; and TFPQ, , using both the Foster et al. and à la De Loecker et al. methods, price, ; marginal cost, ; and mark-ups, . () is the log of population density in location ( ) where plant ( ) is located at time (). For each country included in our study, we adopt a comparable spatial scale used in the literature: analysis in Chile and Colombia is at municipality level, while in Ethiopia it is carried out at the towns-level. In Indonesia, we use district as the primary spatial unit of analysis. 11 includes plant-level controls, such as plant age and size (that is, the number of workers , included in wage and price regressions only). are pair-wise 4-digit-industry-year fixed effects to help factor in any sectoral or time trend. Base estimations pool the observations (OLS), and standard errors are clustered at the level of the spatial unit. Later we will exploit the cross-sectional variation to attempt to control imperfectly for firm selection. Both quantity and quality effects described in the previous section can lead to inconsistent estimates. Though previous studies suggest that attempting to remedy this endogeneity does not drastically change the elasticity magnitudes (see meta-analysis by Grover, Lall and Timmis, 2021; Duranton 2016 for Colombia), we follow the literature by: (i) employing geological variables such as land fertility (Combes et al. 2010), land suitability for the construction of tall buildings (Rosenthal and Strange 2008; Combes et al. 2010) as instruments 12 and (ii) including location or plant fixed effects to capture any unobserved attributes that may have attracted more establishments to a given city (Henderson 2003). In addition, cities may attract more educated workers or productive firms in large locations and several studies report lower estimates once controlling for observed skills (e.g., see Glaeser and Mare, 2001 for the US and agglomerations of contiguous and economically integrated spatial units. For example, 11 Typically, research uses administrative regions or metropolitan areas in the United States, microregions in Brazil, districts in India and prefecture cities in China (see, e.g., Chauvin et al., 2017). 12 Studies also instrument using historical measures of density (Ciccone and Hall 1996) and its validity relies on the past populations being uncorrelated with unobserved drivers of contemporaneous wages conditional on the other controls. While a good case can be made for this (Combes et al., 2010), it is by no means decisive. In either case, we are unable to use this instrument because for the four countries studied in this paper, we did not consistently have a comparable measure of past density. 8 Duranton 2016 for Colombia). Glaeser and Maré (2001) and Combes, Duranton, and Gobillon (2008) include worker fixed effects such that the wage elasticity is identified from differences in earnings across changing work location. Since plant relocations are much less frequent than worker relocations, firm sorting is more difficult to control for (see Faberman and Freedman, 2016; Mare and Graham, 2013). IV. Results Base OLS elasticity estimates Table 2 panel a presents the estimates from equation (1) for all plant-level productivity indicators. The patterns identified in the recent literature are confirmed for all four countries: a 1 percent increase in density leads to wage gains of .017 percent for Chile, .038 percent for Colombia, .057 percent for Indonesia; and .052 percent for Ethiopia. The wage elasticities are high and decreasing in national GDP. However, consistent with the meta-analysis in Grover, Lall and Timmis (2021), the elasticity associated with TFPR is much smaller than that for wages, falling to .01 for Chile, 0.03 percent for Colombia, .022 in Indonesia and becoming negative .023 percent but insignificant in Ethiopia. This does not negate the benefits of agglomerations arising from easier capital accumulation that has been stripped out, although our explorations suggest that this effect is positive and significant only in Indonesia. However, it does suggest that other potential gains from matching or externalities are probably overstated using wages. In fact, such efficiency gains disappear altogether or become negative when we control for prices. TFPQ using Foster et al. (2008) and TFPQ à la De Loecker et al. suggest that the elasticity for Colombia, Indonesia and Ethiopia is either close to zero or negative: -0.04, -.011 and -.061 (significant) for the former, and 0.04, -0.02 percent and -0.07 (none of them being significant) for the latter. In fact, the De Loecker estimates will be generally small and insignificant throughout the exercise, supporting the point that productivity gains are unlikely to be large. Only for Chile, our highest income country, does the elasticity enter positively and significantly at .03 suggesting some productivity gains from agglomeration. As detailed in Annex table C.1, the pattern of results remains largely robust to alternative estimation techniques and function forms. Consistent with these findings, prices and marginal costs also rise with agglomeration and increasingly so in poorer countries. 13 In Indonesia, the output price elasticity is .033, the mark-up elasticity -.016 and the marginal cost elasticity is .047. In Ethiopia, the output price elasticity is slightly higher at .038, but the marginal costs elasticity is much higher at .11, driven by a negative mark-up elasticity of -.07. Colombia follows roughly the 13 This result is in line with that of Handbury and Weinstein (2015), who use detailed barcode data and find a positive elasticity of price with respect to population. 9 same pattern, although with no statistical significance, with a positive price elasticity of .006, a negative markup elasticity of -.006, and marginal cost elasticity of .013. These findings are consistent with a higher degree of competition in cities putting pressure on mark-ups, and that rising costs are the principal driver of the price elasticity. Only in Chile, are the price and marginal costs elasticities actually negative and significant, -.022, -024 respectively significant, suggesting that being in cities reduces costs, consistent with having the only positive TFPQ elasticity in the group. Figure 2 summarizes the results graphically by plotting the elasticities of wage, Foster et al TFPQ measure and marginal cost for the four countries against the log of GDP per capita. The dotted lines are “fitted” to indicate the trends with level of development. Again, consistent with the literature, the wage elasticities do appear to be higher for lower income countries, falling with development with Chile which shows “advanced country” magnitudes. However physical productivity is not driving these high wage elasticities. TFPQ elasticity is negative for Ethiopia and Indonesia and rises through Colombia to Chile. Again, TFPQ rises below wages and, for Colombia and Ethiopia, is not significantly different from zero. Using TFPQ à la De Loecker et al. the estimates generate no significant values, although, with the exception of Chile, they generate the same ranking across countries. Robustness checks We check the robustness of our results in several ways: instrumenting for endogeneity in quantity and quality of plants and labor, adjusting for quality in the price indexes to narrow the bounds on the TFPQ; and testing for sensitivity to alternative sample periods. Addressing endogeneity issues Because a positive elasticity may simply reflect the reverse causality that more productive agglomerations attract a larger number of plants and workers, we follow the literature (e.g., Rosenthal and Strange, 2008; Combes et al., 2010) by instrumenting density using geological variables including soil composition or erodability, depth to rock, water capacity, and seismic and landslide hazard. The assumption is that these characteristics are predetermined yet correlated with the current density only through their importance to agricultural settlement and subsequently manufacturing and services (Combes and Gobillon 2015). 14 The estimated wage elasticities presented in table 2 panel b confirm our previous OLS findings although the coefficients are generally larger, perhaps due to reduced measurement error. Similarly, the instrumented TFPR estimates are slightly higher for Chile and more so for Indonesia, but they remain statistically insignificant for Colombia and Ethiopia. The 14 However, it should be noted that the IV strategy using natural features as instruments captures not only the relationship between city size that is endogenous (following the demand for labor) but also the size that is driven by an exogenous labor supply component. That is, it may possibly affect productivity as well, in which case the IV estimates may not be an improvement over OLS. 10 instrumented TFPQ using Foster et al. estimates are also relatively unchanged with only Chile positive and Colombia now significantly negative at -.025, while Indonesia and Ethiopia remain statistically insignificant. TFPQ à la De Loecker et al estimates are null for all countries, including Chile, although preserving the ranking found in the OLS estimates. Chile remains the only country with a negative price elasticity, but now Colombia enters with a strongly significant positive coefficient while Indonesia and Ethiopia are positive but insignificant. Mark-ups broadly follow the previous pattern, albeit with Colombia gaining significance and Ethiopia losing it, but again, all but Chile showing negative coefficients. Marginal costs closely follow the pattern from the OLS results with Chile showing the only negative and significant value, but now with Colombia showing a higher positive and significant coefficient. In sum, the key magnitudes and patterns across levels of development largely hold. Controlling for selection effects. Since better firms may also be attracted to cities, ideally, we would like to control for, for instance, managerial quality or technological capabilities. Such data is only beginning to become available at the firm level. As an alternative, table 2, panel c reruns the OLS specification but with complete plant-level fixed effects. This is highly imperfect since, in the absence of a large number of plants moving from low to high density areas, we are also stripping out any plant specific agglomeration effects arising from being born in a particular density area. The results are, as expected, much weaker: there is a much less clear pattern of wage and TFPR elasticities with Indonesia alone retaining high and significant wage and TFPR elasticities and Colombia and Ethiopia now showing negative wage elasticities. This said, the TFPQ elasticities broadly retain the previous pattern with Chile showing the only positive and significant coefficient. Further, the previous pattern on prices and marginal cost showing Chile with the only significant and negative result continues to hold. Agglomeration continues to put downward pressure on mark-ups. Cities may also attract a better quality of labor and the availability of local literacy data from Colombia and Indonesia offers a proxy for level of local human capital to potentially control for compositional effects (Table 2, panel d). 15 In Indonesia, the wage and TFPR elasticity estimates are lower, falling from .056 to .04 and .022 to .016 respectively, while the TFPQ elasticity using Foster et al. (2008) falls from - .011 to -.021, in both cases still insignificantly so. Price, mark-ups and marginal costs change marginally. By contrast, in Colombia, the wage elasticity increases slightly from .038 to .04 while TFPR rises from .003 to a now significant .005 suggesting that human capital in poor, traditional economies is highly heterogeneous (Chauvin et al 2017). Point estimates for the TFPQ elasticity fall from -.004 to -006, but in both cases insignificant. 15 In Chile and Ethiopia, we did not have access to local human capital by our spatial identifiers. 11 Together, the results suggest an ambiguous impact of controlling for human capital, but in neither case do we find that physical productivity is positive. Sensitivity to sample periods The patterns observed in our base results are also not an artifact of disruptive sample periods. In Colombia and Indonesia, we drop the years when the country was going through a crisis (2000-01 in Colombia and 2008-09 in Indonesia), while in Ethiopia, we drop the year 2005 when the census data collection was rather idiosyncratic. Table 2, panel e shows that in all cases, these estimates are quite close to our original sample period. Sensitivity to quality in price indices As noted earlier, higher product prices in cities may capture higher quality products rather than costs and hence deflating revenues may lead effectively to downward bias in the TFPQ estimates. We explore the possible impact of this using the four different price indices constructed for Colombia by Eslava and Haltiwanger (2021) that account for product quality: the constant basket, Sato-Vartia, Feenstra, and CES Unified Price Index (CUPI). 16 Table 3 shows that the elasticities of the various adjusted indices are now higher than the .006 in the OLS estimates, ranging from .013 to .021, and are now uniformly significant suggesting, counterintuitively, that quality is actually lower as urbanization increases, perhaps reflecting that producing for the mass market leads to a lower average level of quality. 17 Correspondingly, the Foster et al. (2008) TFPQ estimates now move from insignificant -.004 to a range of -.018 to .021 with all but the constant basket coefficients entering significantly. The negative impact of agglomeration on productivity seems exacerbated in the Colombia case. V. Conclusions This paper offers a resolution to what it calls the urban productivity puzzle surrounding the net benefits of agglomeration in developing countries: despite recent estimates of the elasticity of wages with respect to population density which suggest that agglomeration forces are multiples higher than in advanced countries, this seems at odds with the fact that developing country cities are often crowded, congested, polluted and are home to few sophisticated plants and skilled workers that would benefit from being together. They would seem to offer proximity without productivity. To unpack this puzzle, the paper develops a schematic of what is, in fact, being measured, by existing studies using wages, and then provides the first estimates of the elasticities of physical efficiency and productivity. While our firm level data-based estimates of wage elasticities are below 16 CUPI in Redding and Weinstein (2020) adjusts prices to account for the evolution in the distribution of within-plant product appeal, emanating both from changes in appeal for continuing products and the entry/exit of products. The quality adjusted prices in the Feenstra (2004) and Sato-Vartia approach also considers the changing baskets of goods that reflects changing demand for quality, keeping the assumption of a constant firm appeal distribution for continuing products. 17 Our thanks to Marcela Eslava for suggesting this interpretation. 12 those found with labor market surveys, we confirm both that they are very high for poorer countries and that they decrease sharply with development, consistent with the literature. However, we also find that across a wide range of measures and functional forms, physical productivity elasticities are very low or negative in developing countries. These findings are consistent because while agglomeration appears to put downward pressure on firm mark-ups, marginal costs rise steeply with density. Hence, the higher wage and TFPR elasticities are largely driven by higher urban costs which, once adjusted for, leave no gains in efficiency of physical productivity with density. The adverse combination of very low efficiency and high marginal cost elasticities appears to diminish on both fronts with development - Chile, the richest country in our sample, shows both positive efficiency and negative cost elasticities. The finding of limited or negative physical productivity growth with urbanization, which we term “sterile agglomeration,” is likely due to the fact that developing world cities are often not being driven by structural transformation and positive agglomeration effects, but more likely by natural resource and aid rents, in-migration driven by a variety of push factors (e.g. poor basic services in rural areas), access to political power, and location near markets given poor transport infrastructure. References Ackerberg, Daniel A, Kevin Caves, and Garth Frazer. 2015. “Identification Properties of Recent Production Function Estimators.” Econometrica 83 (6):2411–2451. Ades, Alberto F.and Glaeser, Edward L. 1995. Trade and Circuses: Explaining Urban Giants. The Quarterly Journal of Economics, 110(1): 195-227 Amiti, M., O. Itskhoki, and J. Konings 2019. International shocks, variable markups, and domestic prices. The Review of Economic Studies. Atkin, D., B. Faber, and M. Gonzalez-Navarro. 2018. “Retail Globalization and Household Welfare: Evidence from Mexico.” Journal of Political Economy 126 (1): 1–73. Bairoch. P (1988). Cities and economic development: from the dawn of history to the present. University of Chicago Press Beaudry, Paul & Green, David A. & Sand, Benjamin M., 2014. "Spatial equilibrium with unemployment and wage bargaining: Theory and estimation," Journal of Urban Economics, Elsevier, vol. 79(C): 2-19. Blum, Bernardo S, Sebastian Claro, Ignatius Horstmann, and David A Rivers. 2018. “The ABCs of Firm Heterogeneity: The Effects of Demand and Cost Differences on Exporting.” Unpublished mimeo Bond, Stephen and Måns Söderbom. 2005. “Adjustment Costs and the Identification of Cobb Douglas Production Functions.” Unpublished Manuscript, The Institute for Fiscal Studies, Working Paper Series No. 05/4. 13 Burger, Martijn J., Elena I. Ianchovichina, Prottoy A. Akbar. 2022. “Heterogeneous Agglomeration Economies in Developing Countries: The Roles of Firm Characteristics, Sector Tradability, and Urban Mobility”. World Bank Policy Research Working Paper, forthcoming. Chauvin, Juan Pablo, Edward Glaeser, Yueran Ma, Kristina Tobio, 2017. What is different about urbanization in rich and poor countries? Cities in Brazil, China, India and the United States. Journal of Urban Economics, 98, 17-49 Ciccone, A., Hall, R.E., 1996. Productivity and the density of economic activity. American Economic Review 86 (1), 54–70. Combes, P.-P., G. Duranton, and L. Gobillon 2008: Spatial Wage Disparities: Sorting Matters! Journal of Urban Economics, 63 (2), 723--742. Combes, Pierre-Philippe, Gilles Duranton, Laurent Gobillon, 2013. The Costs of Agglomeration: Land Prices in French Cities. PSE Working Papers halshs-00849078, HAL. Combes, P.P., Duranton, G., Gobillon, L., Puga, D., Roux, S., 2010. Estimating Agglomeration Effects With History, Geology, and Worker Fixed-Effects,” in Agglomeration Economics, Vol. 1, ed. by E. L. Glaeser. Chicago, IL: Chicago University Press, 15–65. Combes, P.-P., and L. Gobillon. 2015. “The Empirics of Agglomeration Economies.” In Handbook of Regional and Urban Economics, Vol. 5, edited by G. Duranton, J. V. Henderson, and W. C. Strange, 247–348. Amsterdam: North-Holland. Combes, P.-P., G. Duranton, and L. Gobillon. 2019. “The Costs of Agglomeration: House and Land Prices in French Cities.” Review of Economic Studies 86 (4, July): 1556–89. De la Roca, Jorge, and Diego Puga. 2017. Learning by Working in Big Cities. Review of Economic Studies 84 (1): 106–42. De Loecker, J., Goldberg, P.K., Khandelwal, A.K. and Pavcnik, N., 2016. Prices, markups, and trade reform. Econometrica, 84(2), pp.445-510. De Loecker, J., C. Fuss, and J. Van Biesebroeck. 2014. International competition and firm performance: Evidence from Belgium. NBB Working Paper Series, N. 269. Dhyne, Emmanuel, Amil Petrin, Valerie Smeets, and Frederic Warzynski. Multi product firms, import competition, and the evolution of firm-product technical efficiencies. No. w23637. National Bureau of Economic Research, 2017. Diamond, R. 2016. “The Determinants and Welfare Implications of US Workers’ Diverging Location Choices by Skill: 1980–2000.” American Economic Review 106 (3): 479–524. Duranton, G. 2015. “Growing through Cities in Developing Countries.” World Bank Research Observer 39 (1): 39–73. Duranton, G. 2016. “Agglomeration Effects in Colombia.” Journal of Regional Science 56: 210–38. 14 Duranton, Gilles, and Diego Puga. 2020. "The Economics of Urban Density." Journal of Economic Perspectives, 34 (3), 3-26. Duranton, Gilles, Diego Puga. 2004. Micro-Foundations of Urban Agglomeration Economies, ch 48 in J. Vernon Henderson, Jacques-François Thisse eds, Handbook of Regional and Urban Economics, 4, 2063-2117 Eslava, Marcela and John Haltiwanger. 2021. “The Size and Life-Cycle Growth of Plants: The Role of Productivity, Demand and Wedges,” NBER Working Paper, 27184. Faberman, R.J. and Freedman, M., 2016. The urban density premium across establishments. Journal of Urban Economics, 93, pp.71-84. Feenstra, Robert C., “New Product Varieties and the Measurement of International Prices," American Economic Review, 84 (1994), 157-177 Foster, Lucia, John Haltiwanger, and Chad Syverson. 2008. "Reallocation, Firm Turnover, and Efficiency: Selection on Productivity or Profitability?" American Economic Review, 98 (1): 394-425. Gandhi, A., S. Navarro, and D. A. Rivers. 2020. “On the Identification of Gross Output Production Functions.” Journal of Political Economy 128 (8): 2973–3016. Glaeser, Edward L. and Matthew G. Resseger, 2010. The Complementarity Between Cities and Skills, Journal of Regional Science, Wiley Blackwell, vol. 50(1), pages 221-244, February. Glaeser, E. L., and D. C. Maré. 2001: Cities and Skills, Journal of Labor Economics, 19 (2), pp. 316--342. Gollin, Douglas, Remi Jedwab, and Dietrich Vollrath. 2016. "Urbanization with and without industrialization." Journal of Economic Growth 21, 1, 35-70. Grover, A., S. V. Lall, and J. D. Timmis. 2021. “Agglomeration Economies in Developing Countries: A Meta- Analysis.” Policy Research Working Paper 9730, World Bank, Washington, DC. Handbury, J., and D. Weinstein. 2015. “Goods Prices and Availability in Cities.” Review of Economic Studies 82 (1): 258–96. Henderson, J. V. 2003. Marshall's scale economies. Journal of Urban Economics, 53, pp. 1–28. Henderson, J. Vernon, Nigmatulina, Dzhamilya and Kriticos, Sebastian.2019. “Measuring urban economic density”. Journal of Urban Economics. In press. Henderson, J. V., and M. A. Turner. 2020. “Urbanization in the Developing World: Too Early or Too Slow?” Journal of Economic Perspectives 34 (3): 150–73. Holl, A., 2016. Highways and productivity in manufacturing firms. Journal of Urban Economics, 93, pp.131- 151. Hommann, K., and S. V. Lall. 2019. Which Way to Livable and Productive Cities? A Road Map for Sub-Saharan Africa. International Development in Focus. Washington, DC: World Bank Group. Klette, Tor Jacob and Zvi Griliches. 1996. “The Inconsistency of Common Scale Estimators When Output Prices are Unobserved and Endogenous.” Journal of Applied Econometrics 11 (4):343– 361. 15 Krugman, Paul & Elizondo, Raul Livas, 1996. "Trade policy and the Third World metropolis," Journal of Development Economics, Elsevier, vol. 49(1), pages 137-150, April. Lall, S., M. Lebrand, and M. E. Soppelsa. 2021. “The Evolution of City Form: Evidence from Satellite Data.” Policy Research Working Paper 9618, World Bank, Washington, DC. Lenzu, Simone, David Rivers, and Joris Tielens. 2019. "Financial Shocks and Productivity: Pricing response and the TFPR-TFPQ bifurcation." Available at SSRN 3442156. Maré, D.C. and G. Graham. 2013. Agglomeration elasticity and firm heterogeneity. Journal of Urban Economics, 75: 44-56 Marschak, Jacob and William H. Andrews. 1944. “Random Simultaneous Equations and the Theory of Production.” Econometrica 12 (3-4):143–205 McMillen, D. P. 2003. “Employment subcenters in Chicago: Past, present, and future.” Economic Perspectives, Federal Reserve Bank of Chicago, 27 (2): 2–14. Mendershausen, H., 1938, On the significance of Professor Douglas's production function, Econometrica 6, no. 2, 143-153. Moretti, E. 2004. “Human Capital Externalities in Cities.” In Handbook of Regional and Urban Economics, Vol. 4., edited by J. V. Henderson and J-F Thisse, 2243–91. Amsterdam: North-Holland. Nakamura, S., B. Paliwal, and N. Yoshida. 2018. “Overview of the Trends of Monetary and Non-Monetary Poverty and Urbanization in Sub-Saharan Africa.” Paper presented at the 2018 Jobs and Development Conference, World Bank, Washington, DC. Quintero, L. E., and M. Roberts. 2018. “Explaining Spatial Variations in Productivity: Evidence from Latin America and the Caribbean.” Policy Research Working Paper 8560, World Bank, Washington, DC. Redding, Stephen and David E. Weinstein (2020) "Measuring Aggregate Price Indexes with Taste Shocks: Theory and Evidence for CES Preferences", Quarterly Journal of Economics, 135(1), 503-560, 2020. Rosenthal, S. S., and W. C. Strange. 2008. “The Attenuation of Human Capital Spillovers.” Journal of Urban Economics 64: 373–89. 16 Figures and Tables Figure 1: Components of Agglomeration Elasticity Figure 2: Agglomeration elasticity estimates with respect to wages, marginal cost and TFPQ 17 Notes: The figure presents the elasticity with respect to population density of Physical Total Factor Productivity (TFPQ), Marginal Costs for Colombia, Chile, Ethiopia, and Indonesia. Elasticity estimated using OLS (vertical axis). Dotted lines represent “fitted” values for illustrative purposes. Y axis is the log of GDP per capita (World Bank). Table 1: Data sources and details Table 2: Elasticity estimates of density 18 19 Table 3: Elasticity estimates of prices and TFPQ using varying price indices Annex A: Data sources A.1: plant-level data description Chile Data for Encuesta Nacional Industrial Anual (Annual National Industrial Survey – ENIA) are collected annually by the Chilean National Institute of Statistics (INE), with direct participation of Chilean manufacturing plants. ENIA covers the universe of manufacturing plants with 10 or more workers, and contains detailed information on plant characteristics, such as sales, spending on inputs, employment, wages, investment, and export status. It also collects rich information at the product-level, for every product produced by each plant, reporting sales, total variable production cost, and the number of units produced and sold, which permits constructing plant- level price indexes and backing out quantities and TFPQ. Products in ENIA are defined according to the Clasificador Unico de Productos (CUP). This ENIA-specific product category is comparable to the 7-digit ISIC code. 18 Plant-product-year observations that have zero values for total employment, demand for raw materials, sales, or product quantities are excluded. After merging manufacturing census data with the geospatial variables, the establishment-level data has complete information on 22,151 observations with 6,344 establishments engaged in 102 4-digit industries located across 208 municipalities. Similarly, the establishment-product level data contains the same number of establishments, and 1,235 products (18,198 plant-product pairs). Colombia The National Administrative Department of Statistics (DANE) of Colombia conducts annual surveys of large and medium scale establishments engaged in manufacturing. Economic activities are categorized as manufacturing based on ISIC-Rev.3 classification. 19 They encompass industries in sectors 1511-3699 at the 18 For example, the wine industry (ISIC 3132) is disaggregated by CUP into 8 different categories, such as "Sparkling wine of fresh grapes", "Cider", "Chicha", and "Mosto". 19 The Colombian Department of Statistics adapts the international ISIC classification for a version that corresponds better to Colombian production. Most sectors are defined exactly in the same way, only a few are more disaggregated in the Colombian 20 four-digit level. The survey covers all establishments with at least 10 employees or above a revenue threshold around 340 million Colombian pesos (at 2009 prices), which is close to 100,000 USD. The dataset contains rich information on key variables including plants’ geographical location, output production and sales for each product, total sales (local and export), material inputs, industrial and non-industrial costs, employee composition, and fixed asset holdings. Gross output is the value of production of all products. Value-added is gross output after deducting material cost. Capital is measured using the perpetual inventory method, assuming a 5% depreciation rate, with the exception of land for which there is no depreciation. Employment size is the number of persons employed by the establishment. These include workers engaged in production and in non-production jobs. Labor cost includes wages and fringe benefits. Material costs include material input costs, electric energy expenditure, and land and building rental costs. Gross profit margin is gross output net of material cost and labor costs of workers associated with production, as a share of gross output. And net profit margin is gross output minus material cost, all labor cost, industrial and non-industrial costs, and taxes paid as a share of gross output. Nominal values are converted to 2014 Colombian pesos using the industrial production index reported by DANE which is the same for all industries. We deflate capital with the industrial production index reported by DANE for gross capital formation. After merging manufacturing census data with the geospatial variables, the establishment-level data has complete information on 12,397 establishments engaged in 148 industries located across 267 municipalities. Similarly, the establishment-product level data contains the same number of establishments, and 6,711 products. Ethiopia The Central Statistical Agency of Ethiopia conducts annual surveys of large and medium scale establishments engaged in manufacturing. In the surveys, economic activities are categorized as manufacturing based on United Nations ISIC-Rev.3 classification, and encompass industries 15-37 at the two-digit level. The survey covers all establishments with at least 10 employees and which use power-driven machinery. The data set contains detailed information on key variables including plants’ geographical location, business ownership form, output production, sales (local and export) and other revenue receipts, material inputs (local and imported), industrial and non-industrial costs, employee composition, and fixed asset holdings. Gross output is revenue generated from local and export sales after adjusting for stock of goods at the beginning and end of the year. Value-added is sales revenue after deducting material cost. Capital is measured by the size of fixed assets holdings at the beginning of the year. Employment size is the number of persons engaged in production in full-time equivalents and comprises permanent as well as temporary and seasonal workers. Labor cost includes wages and fringe benefits. Nominal values are converted from local currency to U.S. dollars using exchange rate obtained from IMF’s International Financial Statistics, and finally converted to 2010 U.S. dollars. After merging manufacturing census data with the geospatial variables based on town names, the establishment- level data has complete information on 3,465 establishments engaged in 18 industries located across 91 towns. Similarly, the establishment-product level data contains 2363 establishments, 17 industries, and 116 products. Indonesia The economic and manufacturing census of Indonesia contains data on the universe of all manufacturing plants with at least 20 employees. The dataset provides information on primary activity of the plant (UN ISIC Rev.3 version. For a detailed crosswalk and comparison, see https://www.dane.gov.co/files/sen/nomenclatura/tablasCorrelativas/CIIU3ACvsCIIU3Internal.pdf 21 industry classification) at the 4-digit level, sales revenue, total fixed assets, raw material input expenditure and the number of employees. All monetary variables are in units of national currency (i.e. thousand rupiah) and deflated at the industry-year level with the wholesale price index. We also observe the nominal average wage within a plant, computed as the ratio of total wage bill costs over the number of employees. This data set provides reliable information on the plant’s location at kapubaten or district-level (administrative III level). Although our analysis uses data from 2000-2015, district boundaries are made consistent with the first available year of this data set, that is, 1996. This helps control for changes in the administrative boundaries over time (e.g. district split-ups or mergers). For countries considered in our study, plant-level product information is available with the varying Harmonized System (HS). In order to ensure consistency across the sample, this information on product-level details is concorded to relevant year, such that product codes are comparable over time. For each detailed product we observe the quantity produced, its value, and the unit in which its quantity is measured, e.g. kg, liter, etc. Product specific unit values are calculated by dividing total values over total quantities for each product. To ensure unit prices are comparable, we define products as unique product-unit combinations. After merging manufacturing census data with the geospatial variables, the establishment-level data has complete information on 302,871 observations with 34,366 establishments engaged in 136 4-digit industries located across 291 districts and metros over our sample period 2000-2015. Similarly, the establishment-product level data contains the same number of establishments, and 7,826 products. A.2: Data on local demographic and geospatial variables Demographic variables: For population density and instruments used in IV strategy we construct geospatial variables and population characteristics for spatial units chosen in each country. Population and population density (population/land (km sq.) variables were sourced from Population: WorldPop - Unconstrained Top- down estimation modelling (100m). 20 Sociodemographic characteristics (the share of literate population over 15 years old) for Colombia and Indonesia are obtained at the relevant spatial level from the National Population Census (2005 for Colombia and 2000 and 2010 census for Indonesia). Geospatial variables: For Colombia, precipitation (millimeters, total recorded in the year, average across 1971- 2000) and temperature (Celsius, annual, average across 1971-2000) was obtained from the Institute of Hydrology, Meteorology and Environmental Studies from Colombia (IDEAM). 21 For other countries, spatial variation in temperature was obtained from Willmott and K. Matsuura (2001) datasets and averaged over appropriate spatial units, while the data on precipitation is obtained from Harris et al. (2020). Information about slope and terrain ruggedness in Colombia is obtained from Nunn and Puga (2012). The raw data calculates the average ruggedness and slope for grid cells around the globe. This paper uses the average slope and ruggedness for the grid cells within each municipality in Colombia. Elevation and slope for all countries is obtained from the ASTER Global Digital Elevation Model (GDEM) Version 3 (ASTGTM), 22 while for Colombia it is obtained from the Universidad de los Andes Center for Studies on Economic Development (CEDE) Panel. 23 20 See https://www.worldpop.org/project/categories?id=3 21 See http://www.ideam.gov.co/documents/21021/553571/Promedios+Climatol%C3%B3gicos+1971+-+2000.xlsx/857942de- f9d7-4d5e-bb75-df984aabe55f 22 See https://search.earthdata.nasa.gov/ 23 See https://datoscede.uniandes.edu.co/es/catalogo-de-microdata 22 B: Methodology B.1: Plant-level aggregate price index Since the core of our analysis is at the plant-location level, we construct a plant-level price index by aggregating the plant-product-level information available. The main challenge with constructing a meaningful aggregate price index is to account for both intertemporal changes in the product quality or mix and to aggregate across heterogeneous products. In line with De Loecker et al. (2014), Dhyne et al. (2017), Amiti et al. (2019), and Lenzu et al. (2019), we employ a similar approach whereby the plant-level price index is computed as the , , weighted average of adjusted product prices. Specifically, this is calculated as: = ∑ ∗ , , where is the share of the revenue from product p of plant i at time t (, ) in total revenue ( ). Adjusted price is , computed as: , = , where , , is the quantity of product p of plant i at time t, , , scaled by the , ratio of the average product price across all plant-year observations over a numeraire price. This price is captured by the average product price across all plant-year observations of the product with the highest sales over the sample period. This latter adjustment converts the product quantities to the same unit with the numeraire product. Once, plant-level prices, , are computed using this method, aggregated quantity is then computed as: = . B.2: Methodological details for TFPQ Estimations Extending the GNR procedure to retrieve physical quantity based production function and account for input price bias We now present the steps for estimating a quantity-based production function and accounting for the potential presence of the input price bias. The latter is crucial because the inputs observed in the data are based on input values deflated at the industry-level rather than plant-level prices. We rely on Blum et al. (2018) which extends the GNR method to recover estimates of mark-ups using output price data. 24 In addition, following De Loecker et al. (2016), we further extend this methodology to address the input prices bias stemming from unobserved plant-level input prices in material and capital input. Notably this bias does not arise for labor inputs since we measure labor in quantity terms, that is, the number of employees. The quantity-based production function takes the form: = ( , , ; ) + + (B.2.1) where is the output variable based on plant-level quantities and is the Hicks-neutral measure of physical productivity. Capital and material inputs observed in the data are based on input expenditures deflated with an industry-specific price index. Thus, for any deviation from a perfectly competitive input market, the capital and material inputs considered in the production function would also contain idiosyncratic input price variation not accounted from the industry-specific deflation (De Loecker et al., 2016). With this data at hand, the quantity-based production function is expressed as: � , , = � � ; � + ( � , , � , � , ) + + (B.2.2) 24While the strategy of Blum et al. (2018) accounts for multi-product firms, we rely on a simplified version where we disregard the product-level variation by relying on output prices and quantities aggregated at the firm level, i.e. the level of aggregation at which agglomeration elasticities are estimated in equation (1). 23 where � and � are the observed plant-level input expenditures deflated by an industry-specific deflator. Specifically, � = + � and � � = + , where and are the unobserved input quantities, and � and � are the unobserved plant–specific input prices reflecting the deviation of the actual input prices and from the industry-specific deflators and , respectively. ( ⋅ ) is a correction term for the unobserved variation arising from the plant–specific input prices � � = { � , }, and its exact functional form depends on the production function (⋅), as detailed below. Without the input price correction (⋅), the production function estimates would suffer from an ‘input price’ bias due to the correlation between input prices and quantities. To correct for unobserved input price variation, we follow De Loecker et al. (2016) in using information on observed output prices. Intuitively, more expensive products are manufactured using more expensive and better quality inputs, that is, input prices are assumed to be an increasing function of input quality, which in turn is determines the output quality. Thus, output prices proxy for input prices, expressed as the input price control function: � ( ) � = Where the control function is assumed to be the same for both capital and material inputs. Otherwise, with input-specific control function, we would not be able to separately identify the price of each input and thus the estimated output elasticity of materials used to compute mark-ups. Substituting the input price control function in (B.2.2) yields: � , , = ( , , ; ) + ( × (, � ); ) + + (B.2.3) where is a constant term and is a parameter vector estimated along with the production function parameters . Specifically, = (, ) is a combination of the production function parameters and the parameter vector governing the input price control function � (⋅). The exact form of (⋅) depends on the functional form of � (⋅). (⋅) and Although the first stage of GNR relies on the plant’s static profit maximization problem with respect to the flexible input, i.e. materials, however, in this case we do want to impose assumptions over the type of competition in the output market. Following De Loecker et al. (2016), we rely on the first order conditions from the plant’s cost minimization problem to derive the first stage estimates. 25 After re-arranging the terms and combing them with the production function (B.2.3) we retrieve the share equation: ̃� = ln � , , � , , � ; � − ln − ln ̃( × (, � ) − (B.2.4) � ); where = is the plant-level mark-up defined as the ratio of output price over marginal cost and ̃(⋅) being based on industry- ̃(⋅) captures the input price bias that stems from output elasticity of materials ̃(⋅) deflated input values. The functional form of ̃(⋅) is based on that of the output elasticity of materials which in turn determines the parameter vector � = � (, ). Quantity produced is expressed as a function of the output price and an unobserved demand shock , i.e. = ( , ), which implies that the mark-ups can be expressed as: 25 For a detailed discussion over the challenges involved with the cost-minimization conditions used in proxy variables methods under imperfect competition in the output market, see Doraszelski and Jaumandreu (2019). 24 = ( , ) Assuming that the quantity is monotonic in the demand shocks we get that: = −1 ( , ) Implying that: = ( , ). Thus, the share equation can account for both mark-ups and unobserved input prices: ̃� = ln � , , � , , � ; � − ln ( , ) − ln ̃( × (, � ) − � ); (B.2.5) Estimation Step 1: To estimate equation (B.2.5), we regress the material cost shares on inputs, output price, output quantity and the interaction of output price and quantities. Since quantity and prices could be measured with error we use lagged values of inputs, prices and quantities as instruments.26 This allows us to recover a combined function of the output elasticity of intermediate inputs, the mark-up and the control function for the unobserved input price variation, and the ex-post shocks to production. Estimation Step 2: Here we follow the GNR approach to estimate a quantity-based production function, with two key differences: (i) the output elasticity of the flexible input is estimated in this step instead of the first step; and (ii) the input price bias is controlled for as discussed above. As such, the logarithm of physical productivity is expressed as: � , , = − � � , , � ; � − ( × (, � ); ) − (B.2.6) Similar to the evolution process imposed for revenue productivity in the GNR approach to estimating TFPR, we rely on a first-order Markov process for physical productivity: = (−1 ) + + (B.2.7) However, the only difference from the GNR approach being in setting the instrument matrix, where in line with Doraszelski and Jaumandreu (2013) and Blum et al. (2018) we instrument for materials with lagged output prices to account for input price bias. Relying on a Translog functional for to estimate a quantity-based production function, we have: � , , � � � ; � = ∑ + + ≤ , , (B.2.8) ≥ 0 � , ℎ , , where = 2, which implies that: ̃� � , , � + + � ; � = + � (B.2.9) In turn, the correction for the input price bias would be of the form: 26 We also use twice-lagged values as instruments for over-identifying restrictions. 25 � , , � × �, � �; � = −( + ) �2 � + ( + + ) (2 − + ) (2 � � − + ) � � −( + ) � (B.2.10) which in turn implies that: � , , � � × �, � � = −( + ) � �; � (B.2.11) Finally, for the input price control function we use a simple linear approximation, where: � ( ; ) = � = � and by substituting it in (B.2.8) and (B.2.9), we get the set of parameters and which are estimated alongside the parameters of the production function and of the output elasticity of materials , respectively. Output elasticities with respect to inputs With the parameter estimates of the production function and the input price bias correction function , we directly compute the plant-specific output elasticity with respect to capital ( ), labour ( ) and material ( ), which under a Translog production function take the following forms: � � − (2 + ) = + 2 + + (B.2.12) � � − ( + ) = + 2 + + (B.2.13) = + 2 � + − (2 + ) � + (B.2.14) Mark-up, marginal cost and input price index Mark-up.—With these estimates at hand, we directly compute plant-specific mark-ups from the first order condition of the plant’s cost minimization of the flexible input used to derive the share equation for the first −1 stage (B.2.4): = ( ) , where is the output elasticity of the flexible input and is the share of material inputs expenditure ( ) over total sales ( ) (directly observed in the data) corrected for the ex-post shocks to production ( ) which are estimated from the first stage. Marginal cost.—Based on the definition of mark-ups plant-level marginal costs is computed by dividing the observed output price by the estimated mark-up: = . Input price index.—We use the parameter estimates of the production function and the input price bias control function , to back out the parameter governing the input price index function. Thus, the input price index � ( ) = . Recall that this input price index is assumed to be the same for both capital and � = material inputs and thus cannot capture any major underlying price differences between the two inputs. C: Robustness checks Table C.1: Elasticity estimates using alternative methodologies and functional forms 26 Annex References Harris I, Osborn TJ, Jones P and Lister D (2020) Version 4 of the CRU TS Monthly High-Resolution Gridded Multivariate Climate Dataset. Sci Data 7, 109. Nunn, N. and D. Puga. 2012. Ruggedness: The Blessing of Bad Geography in Africa. The Review of Economics and Statistics (2012) 94 (1): 20–36. Willmott, C. J. and K. Matsuura. 2001. Terrestrial Air Temperature and Precipitation: Monthly and Annual Time Series (1950 - 1999). http://climate.geog.udel.edu/~climate/html_pages/README.ghcn_ts2.html 27