Policy Research Working Paper 10286 Does Market Integration Increase Rural Land Inequality? Evidence from India Claudia Berg Brian Blankespoor M. Shahe Emran Forhad Shilpi Development Economics Development Research Group January 2023 Policy Research Working Paper 10286 Abstract Investments in transport infrastructure lower trade costs objections to the exclusion restrictions. The evidence sug- and lead to integration of villages with urban markets. Does gests that a 10 percent increase in a gravity measure of spatial market integration increase land inequality in rural market access increases the land Gini coefficient by 2.5 areas? Theoretical analysis by Braverman and Stiglitz (1989) percent and the share of landless households by 6.8 percent. suggests that the interactions of lower trade costs with credit This paper finds evidence consistent with the Braverman market imperfections can increase land inequality. The and Stiglitz (1989) hypothesis that the interaction of credit primary mechanism is the adoption of increasing returns market imperfections with lower trade costs increases land technology by large landowners facing lower trade costs inequality: a 10 percent increase in market access increases which makes it more profitable to expand their scale by the adoption of increasing returns farming technology by buying land from small, credit-constrained farmers. Using 3.5 percent. There is a positive effect on land sales, but the high- quality household survey data (the India Human instrumental variables estimates are imprecise. The robust- Development Survey) on land ownership in rural districts ness of the conclusions is checked by relaxing the exclusion of India, this paper provides the first evidence on the restrictions using the Conley et al. (2012) approach, and effects of market integration on land ownership inequality. the bias-adjusted ordinary least squares estimator of Oster It develops an instrumental variables approach exploiting (2019) that does not impose any exclusion restrictions. The two sources of exogenous variation: the location of a rural estimated effects of market access cannot be accounted for district relative to the Golden Quadrilateral network (an by the colonial land revenue system, demographic pressure inconsequential place design) and the length of colonial on land, and differences in inheritance law between the railroad in the 1880s in a district (a historical infrastruc- Hindu and Muslim population in a district. ture design). This paper discusses and deals with potential This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at fshilpi@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Does Market Integration Increase Rural Land Inequality? Evidence from India Claudia Berg. The World Bank Brian Blankespoor, The World Bank M. Shahe Emran, IPD, Columbia University Forhad Shilpi, DECRG, World Bank First Draft: July 22, 2022; This Version: January 16, 2023 ABSTRACT Investments in transport infrastructure lower trade costs and lead to integration of villages with urban markets. Does spatial market integration increase land inequality in rural areas? Theoretical analysis by Braverman and Stiglitz (1989) suggests that the interactions of lower trade costs with credit market imperfections can increase land inequality. The primary mechanism is the adoption of increasing returns technology by large landowners facing lower trade costs which makes it more protable to expand their scale by buying land from small, credit-constrained farmers. Using high- quality household survey data (the India Human Development Survey) on land ownership in rural districts of India, this paper provides the rst evidence on the eects of market integration on land ownership inequality. It develops an instrumental variables approach exploiting two sources of exogenous variation: the location of a rural district relative to the Golden Quadrilateral network (an inconsequential place design) and the length of colonial railroad in the 1880s in a district (a historical infrastructure design). This paper discusses and deals with potential objections to the exclusion restrictions. The evidence suggests that a 10 percent increase in a gravity measure of market access increases the land Gini coecient by 2.5 percent and the share of landless households by 6.8 percent. This paper nds evidence consistent with the Braverman and Stiglitz (1989) hypothesis that the interaction of credit market imperfections with lower trade costs increases land inequality: a 10 percent increase in market access increases the adoption of increasing returns farming technology by 3.5 percent. There is a positive eect on land sales, but the instrumental variables estimates are imprecise. The robustness of the conclusions is checked by relaxing the exclusion restrictions using the Conley et al. (2012) approach, and the bias-adjusted ordinary least squares estimator of Oster (2019) that does not impose any exclusion restrictions. The estimated eects of market access cannot be accounted for by the colonial land revenue system, demographic pressure on land, and dierences in inheritance law between the Hindu and Muslim population in a district. JEL Codes: O18, O12, D30, D63, Q15, R40 Key Words: Market integration, Transport Infrastructure, Land inequality, Landlessness, Gravity Measure, Rural India, Colonial Railroad, Golden Quadrilateral, Credit Market Imperfections, Increasing Returns Farming Technology  We would like to thank Treb Allen for very helpful discussions on market access data, Lakshmi Iyer for help with understanding the colonial land revenue system in India, and Latika Chaudhury for population density data at the district level. We also thank Simon Alder, Harris Selod, Saher Asad, Raghbendra Jha, Hrishikesh Prakash Patel for discussions and/or comments. Standard disclaimers apply. Email for correspondence: shahe.emran.econ@gmail.com. (1) Introduction A highly skewed land ownership distribution in a country not only reects inequality but can adversely aect economic eciency and political stability (for surveys of the relevant literature see Binswanger et al. (1995), Ray (1998), Todaro and Smith (2015)). Land inequality (and wealth inequality) at the time of colonial settlement aected the long-term development of economic and political institutions in the countries colonized by European powers (Sokolo and Engerman (2000)). High land inequality may lower provision of public goods, including public schools (Mariscal and Sokolo (2000)), and exclude a large proportion of households from access to the formal credit market, constraining investment in human capital (Becker (1981)) and development of entrepreneurship (Banerjee and Newman (1993)). Many countries attempted land reform in the 1950s-1970s, but recent evidence suggests that land inequality has increased in most of the countries after economic liberalization in the 1980s (Bauluz et al. ! (2020)). The eects of market liberalization depend on the trade costs which are largely determined by the location of a village relative to the urban centers and the availability and quality of " transport infrastructure. Public investment in transport and communications infrastructure has experienced a sustained increase over many decades, as governments in many developing countries focused on spatial isolation from urban markets as a primary factor behind high poverty rates in lagging hinterlands and widening regional inequality. The huge investments in roads and highways, bridges, and railways lowered the trade costs and led to spatial market integration in many developing countries with China and India leading the way. For example, As discussed by Sokolo and Engerman (2000), the initial land inequality was largely determined by dierences in factor endowment across countries. ! Ghatak and Roy (2007) present evidence that many decades of land reform policies in India failed to reduce land inequality signicantly. " For evidence that the transmission of price signals in developing countries depend on the remoteness of a location, see the analysis by Atkin and Donaldson (2015) in the context of Ethiopia and Nigeria. It is easier to appreciate the importance of trade costs by considering the polar case of extremely remote villages cut-o from the urban markets and thus eectively in autarkic equilibrium. If a village was in autarkic equilibrium and is integrated with the urban markets by new transport infrastructure, it is directly aected by liberalization policies, and reaps allocational eciency (comparative advantage) and productivity gain through lower costs of modern inputs. See Emran et al. (2019) in the context of Mali and Burkina Faso, and Blankespoor et al. (2022) in the context of Bangladesh. 2 according to a gravity measure, average market access in India increased by 25 percent each decade from 1962 to 2011. Does integration of a village with urban markets unleash economic forces that lead to higher land inequality? The existing theoretical analysis suggests that lower trade costs can lead to a concentration in land ownership because of credit market imperfections. Braverman and Stiglitz (1989) develop a model where lower trade costs increase returns to land by increasing producer prices of agricultural goods irrespective of farm size but also improve protability of technology adoption. Since large farmers have better access to credit to nance technology adoption, they buy land from credit-constrained small farmers, especially when the technology # involves increasing returns such as irrigation and mechanization of cultivation and harvesting. Using data on rural districts from India Human Development Survey II in 2012 (henceforth IHDS-II), we provide the rst (to our knowledge) empirical analysis of the eects of market $ integration on land inequality, and explore the role of technology adoption as a mechanism. India is an excellent case study to understand the eects of trade costs on land inequality. India is a vast country with a long history of transport network and substantial geographic variation: an index of market access in 1996 at the district level varies from 3.5 to 100. This spatial variation in market access helps to identify the eects of market integration. The eects of transport infrastructure on land inequality are also important from a policy perspective as land inequality and land reform have been central policy issues in India for many decades. The empirical analysis uses two measures of land inequality in 2012: a district level land Gini for household land ownership and the proportion of landless households in a district. As a measure of adoption of increasing returns technology in agriculture, we use the proportion of households in a district owning at least one of following types of farming equipments: tube # In addition to quantitative credit rationing, the small farmers also face higher interest rate. If market integration deepens the formal credit market and lowers the interest rate on collateralized loans from formal banks, the large landowners will not only get larger amount of credit but also at a lower interest rate. A lower interest rate implies higher capitalization of the land rents for these large landowners. Thus they would be willing to pay much more than the demand price of the small landowner because of higher rents (higher crop price), lower interest rate, and technology adoption. $ We rely on IHDS-II because it is a high quality household survey. It is not possible to use agricultural census data because of unavailability of land ownership information. See Bauluz et al. (2020) for a discussion on why agricultural census data are not suitable and the advantages of household surveys in studying land ownership inequality. 3 well, electric pump, diesel pump, tractor, and thresher. We also analyze the eects of market integration on land sales, but the empirical results are not robust. For market integration, we calculate a gravity measure using travel time through roads and highways. Our main results are based on 1996 travel time, but we report results from travel time in 1988, and 2004 in the online appendix. It is important to note that our identication approaches are designed to understand the long-term eects. If the identifying variations are successful in capturing long-term changes, then the estimates should not vary substantially whether we use the 1988, 1996, or 2004 market access measure for our analysis. We develop an instrumental variables approach for potential endogeneity of market access by exploiting two sources of exogeneous variation: colonial railroads in the 1880s and Golden Quadrilateral (GQ) highway network. Motivated by Duranton and Turner (2012) and Donaldson (2018a), the location of historical railroad infrastructure in the 1880s is used for estimating the eects of better market access via roads and highways on land inequality in 2012. We discuss (and deal with where necessary) potential threats to the exclusion restriction imposed on historical railroad: (i) geographic targeting by the colonial government for poverty alleviation and tax revenue, (ii) commercial motives for the railroads nanced by private British investors, and (iii) potential long-term eects of colonial railroads through agglomeration and persistence. We also check whether colonial railroad captures partly the eects of colonial land revenue systems on land inequality (Banerjee and Iyer (2005)). For details, please see section (4.1) below. As a second source of identication, we exploit the distance of a district from the relevant arm of Golden Quadrilateral highway network (henceforth GQ) in an inconsequential place design. To strengthen the credibility of the research design, we implement three steps: (i) exclude the districts located in the main nodes connected by the GQ, (ii) construct two hypothetical GQ networks using Euclidean distance and least cost path algorithm, (iii) construct hypothetical feeder roads connecting a district center to the nearest GQ arm, again using Euclidean distance and least cost path algorithm. Our identifying instrument is based on double hypothetical routes: the hypothetical distance from the district center to the nearest arm of the hypothetical GQ network. For details, please see section (4.2) below. To check whether the main conclusions are robust to allowing for local violation of the strict exclusion 4 restriction (i.e., allow for small direct impact of an instrument on the outcome variable), we implement the plausibly exogenous approach of Conley et al. (2012). The empirical results from 2SLS, Lasso-IV, and Oster's bias-adjusted OLS show that market integration increases land Gini and the proportion of landless in a district. Estimates from Lasso-IV suggest that a 10 percent higher market access increases land Gini by 2.55 percent, and the proportion of landless by 6.78 percent in a district (both estimates are signicant at the 1 percent level). Evidence on modern (increasing returns) technology adoption provides support for the Braverman-Stiglitz (1989) hypothesis: a 10 percent higher market access increases modern technology adoption by 3.5 percent (signicant at the 1 percent level). The estimated eect of higher market integration on the incidence of land sales is positive and substantial in magnitude, but it lacks precision in the IV regressions (not % signicant at the 10 percent level). These conclusions are robust to the relaxation of the exact exclusion restrictions on the instruments imposed in the IV estimation using Conley et al. (2012) approach, and the use of alternative values of the trade elasticity parameter (1.5, 3.8) and travel time year (1988, 1996, 2004) for calculating market access. We also nd that the main conclusions are robust to an alternative empirical approach that does not rely on any exclusion restrictions. In particular, we implement Oster (2019) bias adjusted OLS which extends the Altonji et al. (2005) approach of exploiting selection on observables as a guide & to selection on unobservables. We also nd that the estimated eects of market integration are not driven by spatial heterogeneity in demographic pressure, dierences in colonial land revenue systems across districts, dierences in land reform policies across states, or dierences ' in land inheritance laws and customs between Hindu and Muslim population. The rest of the paper is organized as follows. Section (2) provides a discussion on the % The OLS estimate with state xed eects suggests that a 10 percent higher market access increases the incidence of land sales by 8 percent (signicant at the 5 percent level). & For recent applications of the Oster (2019) approach for estimating causal eects, see, for example, van Maarseveen (2020). However, the Oster's bias adjusted OLS estimates should be treated as lower bounds in our application as this approach does not correct for attenuation bias owing to measurement error in the market access variable. ' Bardhan et al. (2014) provide evidence that demographic pressure and inheritance play important roles in shaping land inequality in the state of West Bengal in India. Banerjee and Iyer (2005) nd that colonial land revenue system had long-term eects on land inequality, agricultural productivity and irrigation in a district. 5 country background, focusing on economic and land inequality and the development of transport infrastructure. We discuss the related literature in section (3) with a focus on India. Section (4) develops the empirical strategy for identifying the eects of market integration on land inequality. Section (5) provides a discussion on the household survey data we use from Indian Human Development survey and the construction of the main variables including the gravity measure of market access. The next section (6) is devoted to the estimation results from our empirical analysis. Section (7) oers the evidence on the mechanisms and tests whether the Braverman-Stiglitz hypothesis is rejected by data. The paper ends with a set of conclusions summarizing the main ndings and the contributions of the paper to the existing literature. (2) Country Background Inequality in India A substantial body of evidence suggests that income and wealth inequality increased in India in recent decades. According to the estimates reported by Himanshu (2019), consumption Gini increased from 0.30 in 1983 to 0.37 in 2011-2012 (based on NSS data). Evidence suggest that wealth inequality also increased: the share of top 10 percent grew from 45 percent in 1981 to 65 percent in 2012. The most important component in the wealth portfolio of Indian households is land (farming land and house). Over the years, land contributed more than 60- 65 percent of the total household wealth; land and building combined forming around 85-90 percent of the total household wealth (Bharti (2018)). After independence, India adopted a socialist economic system, nationalizing the industrial sector, and imposing restrictions on international trade. But the vast swath of the agricultural economy was never seriously considered for public ownership (unlike China), the distributional concerns were to be addressed by land and tenancy reform. Over the decades, various tenancy reform and redistributive land reforms imposing ceilings were implemented, and the policies vary widely across dierent states (Besley and Burgess (2000)). We control for these state level variation in land policies by including state xed eects. There is substantial evidence that the land reform policies failed to reduce land inequality. Even in the state of West Bengal which implemented perhaps the most comprehensive tenancy reform, evidence suggests that land 6 inequality did not decline after the implementation of the reforms, the forces of demographic pressure and land inheritance law overwhelming the eects of land reform (Bardhan et al. (2014)). Ghatak and Roy (2007) report evidence that land reform in India was not eective in reducing land inequality. Besley et al. (2016) reach somewhat dierent conclusions: land inequality is lower in areas that saw greater intensity of tenancy reform over 3 decades with heterogeneity across caste groups. Transport Development in India The transport sector in India has experienced dramatic growth in the post independence period with important changes in the mode of transport for both freight and passenger trac. The freight transport volume by roads and highways increased from 12.09 billion ton kilometers (henceforth btkm) in 1951 to 82.36 btkm in 1971 (a 680 percent increase), and to 899.26 btkm in 2001 (a 1092 percent increase between 1971 and 2001) (Chaudhury (2005)). The rail freight volume also increased but at a much lower rate: from 127 btkm in 1971 to 312 btkm in 2001 (a 246 percent increase). Similar trends were observed for passenger trac. This resulted in a dramatic reversal in the share of roads vs. railways: from 25 percent in 1951 to 75 percent  in 2001 in favor of roads and highways. This evidence motivates our measure of market access which is based on travel time through roads and highways. Note, however, that the estimates using colonial railroad as a source of identication may pick up some of the eects of the railroad to the extent colonial railroad length in the 1880s in a district is positively correlated with the railroad length in 1996 (or 1988, 2004). We underscore here that this poses no complications for our analysis because our goal is not to isolate the eects of roads and highways from that of railroads, but to understand the eects of better market access on land inequality. Indian government invested heavily in transport infrastructure in the last few decades, with the Golden Quadrilateral network being the most ambitious project. A number of interesting  These estimates are from Chaudhury (2005). Alternative estimates reported by the Department of Roads and Highways of Government of India suggest a similar picture. In 1951, the share of roads and highways in freight trac in India was 13.8 percent and in passenger 15.4 percent, which grew to 65 percent (freight) and 86.7 percent (passenger) in 2004-2005. While the national highways constitute about 2 percent of the total road network, it carries 40 percent of total road trac (Annual Report, Department of Road Transport and Highways, GOI, 2006-2007). 7 recent studies analyze the eects of this expansion and upgrading of the GQ network during the 2000s (see the discussion on related literature below). Note that our analysis does not attempt to estimate the eects of the recent expansion (6 lanes) and improvements in the GQ network. Because the eects of changes in trade costs due to the GQ expansion and upgrading on land inequality may take many decades to materialize. Our analysis focuses on the fact that many parts of the GQ network have been in existence for a long time, and constituted the main transport arteries for long distance trade even before the Mughal period. The Grand Trunk Road which forms a large part of the Kolkata-Delhi arm of the GQ network is a prominent example, which goes back to Maurya era and underwent substantial improvements during the British rule, between 1833 and 1860 (Arnold (2000)). Our analysis thus deals with the long-term cumulative eects of better access to markets for the districts that are located closer the dierent arms of the GQ network (for details on the identication scheme based on the GQ network, see section (4) below). (3) Related Literature Our analysis contributes to a large and growing literature on the eects of lower trade costs due to transport infrastructure investments. Donaldson (2015) and Berg et al. (2016) provide excellent surveys of the literature. Evidence suggests that a better access to markets reduces impediments to trade and spatial price dispersion (Donaldson (2018b), Duranton (2015), Aggarwal (2013), Jones and Salazar (2021)), changes composition of trade, employment, and pattern of specialization (Michaels (2008), Duranton et al. (2014), Blankespoor et al. (2017)), increases household consumption in villages (Emran and Hou (2013)), causes deindustrialization (Faber (2014)), counters the eects of deindustrialization by increasing agricultural productivity and welfare (Blankespoor et al. (2022)), accelerates technology adoption, and structural change in and commercialization of agriculture and the rural economy (Damania et al. (2017), Fafchamps and Shilpi (2003), Emran and Shilpi (2012)), induces spatial decentralization of economic activities (Baum-Snow et al. (2017)). India has been a prominent case study in the recent literature on the eects of trade costs on prices, productivity, allocational eciency, and household welfare. There has been 8 a surge of interest in understanding the eects of the recent expansion and upgrading of the GQ network. Ghani et al. (2016) nd evidence that GQ expansion and upgrading improved the organization and eciency of the manufacturing activities through sorting, scaling, and reallocation, by both the incumbents and the new entrants. Datta (2012) nd that the rms located closer to GQ beneted in the form of more ecient inventory management. Das et al. (2019) provide evidence that GQ spurred nancial depth in the districts along the GQ, especially in the districts with relatively better level of nancial sector development to begin with. Abeberese and Chen (2021) nd both rm productivity rises and product scope falls as a result of the connection with the GQ highway. Estimates from a model of internal trade with variable markups calibrated to the Indian manufacturing sector suggest that real income increased by 2.7 percent as a result of lower trade costs from the expansion and upgrading of GQ (Asturias et al. (2018)). However, to the best of our knowledge, there are no studies that analyze the eects of market integration owing to lower trade costs on land inequality in India or any other country. (4) Empirical Issues and Identication Strategy We calculate a gravity measure of market access using travel time (in hours) in 1996, and population in a destination district in 1991 as an indicator of market size. Following Allen and  Atkin (2016), our main market access measure uses a trade elasticity of 1.5. For details on the construction of the gravity measure of market access, please see section OA.1 in the online appendix. As noted earlier, the focus on roads and highways reects the fact that roads and highways have become the main modes of transportation in India for both goods transport and traveling needs. To understand the empirical issues, it is useful to consider the following triangular empirical model for estimating the impact of market integration on land Gini (LGini): ln(LGini)j = δ0 + δ1 ln(M A)j + ΓX1j + ζj ln(M A)j = α0 + ΦX1j + νj  The conclusions in this paper are robust to alternative choices of the trade elasticity parameter, and travel time in dierent years. Please see the online appendix. 9 where ln(LGini)j is log of land Gini in 2012 and ln(M A)j is the log of market access based on travel time in 1996 in district j , and X1j is a vector of variables observed by the researcher that determine market access of a district and also aect land inequality. The central empirical challenge in understanding the eects of market access on land inequality is that the placement of transport infrastructure is not random but determined by government policy objectives. The objectives may vary over time with political change, for example, when political parties have sharply dierent policy agendas, and may dier from the goals pronounced by the politicians publicly. It is not possible to identify and gather data on many of the variables that went into the actual route choice, and as a result, a vector of variables X2j is omitted and subsumed in the error terms in the triangular model. This implies that Corr (ζj , νj ) ̸= 0. If the overriding objective for the government was to integrate the poor lagging regions to the growth centers, then the OLS estimate of the eects of transport infrastructure may be negative even though the true eect is positive when the poor regions have lower land inequality to begin with. Evidence on India in fact suggests that land inequality is lower in a poor district; a bivariate regression of ln(LGini) on ln(GDP ) at the district level yields a coecient of 0.01 with a t statistic of 7.7. In contrast, when the roads are primarily targeted to areas with high economic potential to maximize economic growth and tax revenue, the OLS estimate of the better market access on an economic outcome is biased upward (towards a substantial positive eect) because these areas also have higher land inequality due to factors unrelated to market access. It may not be possible to pin down the net direction of bias arising from such endogeneous placement because, in general, both poverty targeting and tax revenue extraction have been important motives for governments and the objectives change over time. To address the biases in the OLS estimates, we exploit two sources of exogeneous variations in the market access of a district to develop an instrumental variables approach: (i) the location of colonial railroads in the 1880s, and (ii) the distance of a rural district from the Golden Quadrilateral highways. We discuss the credibility of these identifying sources in detail below. 10 (4.1) Colonial Railroads in the 1880s The length of colonial rail track in the 1880s in a district is our rst instrument for identifying the eects of market access of a district in 1996 on land inequality in 2012. We rst discuss the plausibility of the exclusion restriction imposed. Then we explain why the colonial railroads are expected to be positively correlated with the market access in 2012 and report the relevant evidence. As noted by Donaldson (2018), the locations of railroads built by the colonial government up to the 1880s were primarily dictated by defense considerations rather than economic objectives. The railroads were built and maintained by the military engineering core (National Transport Development Committee Report, vol 3, 2013, GOI). Following Donaldson (2018), we exclude the rail stations built after the 1880s as the Famine Commission report prompted the colonial government to target railroads to poor drought-prone districts more vulnerable to famine, thus making them potentially endogenous. Some of the colonial railroads were nanced by private railroad companies (Macpherson (1955)), and one might worry that they are likely to target the districts with higher economic  potential to ensure adequate returns to the investors. However, in colonial India, the privately nanced railroads were guaranteed a 5 percent return by the government which ensured that the location choices were not driven by the imperative of ensuring a reasonable return to the investors. As a result, many of the private railroads were built in economically lagging districts. Hurd (1983, P. 743) writes:  ..., many, if not most, of the unprotable lines depended for their very existence upon the guarantee. Those earning less than 5 per cent included some of the lines in the north-west and in the Ganges valley, most of those in the Deccan, and all of the lines in Sind and south India. Thus, ... , had the guarantee not existed, it is unlikely that private capital would have invested in railways for large areas of India. These areas would, then, have had no rail service at all. Another potential threat to the exclusion restriction is the possibility that the colonial railroads might have led to agglomeration and persistent eects. If historical rail stations created centers of commerce, they might lead to agglomeration economies and persistent  British investors invested 95 million pounds between 1845-1875 (Macpherson (1955)). 11 growth eects even after rail transport became less important or train stations were abandoned. A substantial body of recent research on the economic history of India allays this concern. The colonial railroads in India were unique in that they did not aect long-term growth and structural transformation in any signicant way, unlike historical railroads in many other countries (see Bogart et al. (2015)). Thus, the long-term direct eects of colonial railroads on a district are likely to be negligible once we condition on the market access of a district in 1996. As a conservative strategy, we control for 1961 population density in a district to mop ! up any potential long-term impacts working through the agglomeration channel. A concern for the interpretation of the estimates based on the colonial railroads is that they might be picking up the eects of the colonial land revenue policies. In a widely cited paper, Banerjee and Iyer (2005) provide convincing evidence that the districts under the landlord system of land taxation had higher land inequality and lower irrigation investment. If the railroad length in a district is correlated with whether it was under the landlord system, then the IV estimates using the colonial railroads will partly reect the eects of the colonial land revenue system. We will check this possibility by including an indicator for the colonial revenue system in a district using data from Banerjee and Iyer (2005). The discussion above suggests that the exclusion restriction imposed on the colonial railroad is plausible. However, a reader might wonder whether we could be unaware of some other channels through which 1880s railroads may have very small direct eects on land inequality in 2012. Note that if this direct eect captures the role of the current railway network, we do not consider this as a violation of the identifying assumption. To the extent our instrument captures some of these other components of transport infrastructure (railway and water transport), it is part of the causal eect under focus. The sources of violation of the exclusion restriction have to be something dierent from these other transport infrastructures captured by the instrument. The important question here is whether allowing for such arbitrarily small direct impact of colonial railroad through non-market access channels have substantial impacts on the magnitude of the estimated causal eect of interest. To asses the sensitivity of the IV estimates with regards to such local (small) violation of the exclusion ! Population density is the most commonly used indicator of agglomeration in economic geography. 12 restriction, we take advantage of the Conley et al. (2012) bounds approach (see section 6.3 below). The next question we address is that of relevance of the historical railroad as an instrument. Recall that our measure of market access is based on travel time through roads and highways. A natural question then arises: why should we expect colonial railroad to have a signicant correlation with the gravity measure of market access based on roads and highways? If colonial railroad is only tangentially related to the market access through roads and highways in 1996 (or 1988, 2004) then the IV estimates will be biased and unstable, and can yield implausible magnitudes (Stock and Yogo (2005)). To check whether the historical railroad locations have systematically higher market access, we plot the kernel density function of market access for two samples, with and without a colonial rail station. Figure 1 shows clearly that the presence of a colonial railroad in the 1880s shifts substantially the density function of market access in 1996 to the right. The " districts with colonial railroad have a higher mean and lower variance. The rst stage F statistics later in the IV regressions conrm that the 1880 railroad length in a district has substantial power in explaining the variation in market access of districts in 1996. There are a number of plausible reasons behind a signicant positive correlation between colonial railroads and current market access via roads and highways in Figure 1. To the extent transport infrastructure placement is determined by topography, we would expect a positive correlation between the placement of rail line and roads. Two topographical features are especially important for the placement of railroads: slope and curvature. The optimal rail track location tries to minimize the slope (for steep slope, going up hill requires more powerful locomotive, and braking is dicult going down hill), and curves (a sharp curve reduces the maximum speed) (AREMA, 2003, Chapter 6). These two factors are also important for the # choice of cost-minimizing road and highway routes. Because of the topographical constraints, it is common to have rail tracks and highways placed close to each other. In fact, the old " Districts with colonial railroad: mean=15.641 and variance=0.348. Districts without railroad: mean=14.911 and variance=0.372. # The most widely used HDM highway planning model of the World Bank highlights the importance of slope (rise and fall) in choosing an optimal' path for highways. See the discussion by Robinson and Thagensen (2004). 13 railroad bed may be the lowest-cost route for a new highway. As Duranton and Turner (2012) note:  (B)uilding both railroad tracks and automobile roads requires leveling and grading a roadbed. Hence, an old railroad track is likely to become a modern road...without the expense of leveling and grading. (4.2) Golden Quadrilateral: An Inconsequential Place Design The basic insight behind the inconsequential place design is that most of the interstate (national) highways are built to connect major metropolitan cities, and whether a village (or a small town) is located close to such highways is purely accidental and can be treated as quasi $ random (see Redding and Turner (2015) and Donaldson (2015) for excellent discussions). In India, the Golden Quadrilateral highways (GQ) that connect 5 metropolitan cities: New Delhi, Kolkata, Chennai, Mumbai and Bangalore oer an excellent opportunity to develop an inconsequential place design (see Figure 2 for a map of the GQ network). For example, the fact that the distance from Patna (in the state of Bihar) to the Delhi-Kolkata arm of GQ is much lower than that from Darjiling (in the state of West Bengal) is not because GQ was targeted to Patna; the better exposure to markets for Patna is incidental (see Figure 2). We rely on such incidental variation in market access of dierent rural districts to identify the eects of market access on land inequality. As noted earlier, many parts of the GQ network existed for centuries. A comparison of the roads and highways network in 1872 (see Figure AF.1 in online appendix) with the network in 1992 in Figure 2 shows substantial overlaps. Most notably, the Grand Trunk Road (Kolkata to Delhi) goes back to ancient times and formed the main conduit for commerce and % development (the river of life in the words of poet Rudyard Kipling) over centuries. Our analysis attempts to capture the long term cumulative eects of market integration on land inequality across districts. For a credible empirical design, we need to address three issues in this set-up. First, $ For applications of inconsequential place design see, among others, Banerjee et al. (2012), Faber (2014), Datta (2012), and Ghani et al. (2016). % The full length of this ancient transport corridor stretches as far north as Kabul, Afghanistan and as far south as Chittagong, Bangladesh. During the British rule, the Grand Trunk Road was developed into a two lane carriage and motor way (1833-1860). For discussions on the history of Grand Trunk Road in India, please see Singh (1995), and World Bank (2018). 14 as widely noted in the literature (see, for example, Faber (2014), Ghani et al. (2012)), we need to exclude the nodal districts (for example, Kolkata) as they were the targets of the GQ network. Second, the arms of GQ network show a substantial amount of zigzag, and one might worry (with some measure of justication) that the actual placement of the GQ arm reects government targeting and political lobbying. Third, similar (perhaps stronger) concerns apply to the placement of feeder roads that connect a district center to the nearest point of the GQ arm. To strengthen the credibility of the identifying assumption, we need to purge such potentially endogeneous components of the GQ arm and the feeder roads. We implement an approach developed by Faber (2014) where in place of the actual road and highway network, & a hypothetical road and highway network is used to purge out the endogeneous components. To construct the hypothetical highways and feeder roads, we use two approaches: (i) Euclidean distance, and (ii) the least cost path that exploits topographical features, especially slope and elevation. The Euclidean distance does not take into account the exogenous variation in the distance due to topographical constraints, for example, when the least cost path goes around a mountain rather than over it (or through it by tunnel). The deviations in least cost path from the linear (Euclidean) network arising from dierences in elevation and slope ' provide a source of exogeneous variation in market access. Our main results are based on the Euclidean distance and the corresponding estimates using the least cost path are reported in the online appendix. The advantage of the Euclidean distance is that it is independent of topographical features, and thus the exclusion restriction is not threatened by any direct impact of topography on land inequality. As an additional precaution, we include slope and elevation of a district as controls in all regressions in addition to other indicators of natural agricultural endowment such as rainfall (mean and SD) and an index of land productivity. However, these controls are more important for the exclusion restriction imposed on the instrument based on the least cost path distance of a district to the nearest GQ arm. & See the discussion by Donaldson (2015) on the Faber (2014) approach. ' The insight that deviations of roads from a linear network caused by topography oers credible identifying variation has been used by many papers, for example, Emran and Hou (2013). 15 (5) Data and Variables Denitions Our analysis uses data constructed from several sources, which are presented below. Our unit of observation is the District dened from the 2001 census. Data on our outcome variables, land inequality, landlessness, modern technology adoption, come from the second round of the India Human Development Survey (IHDS) in 2012. The IHDS is a high quality household survey with a nationally representative coverage. IHDS II (2012) surveyed 42,152 households in 1,420 villages and 1,042 urban neighborhoods. IHDS was jointly organized by the University of Maryland and the National Council of Applied Economic Research in New Delhi (Desai et al., 2005; Desai and Vanneman, 2012). The summary statistics for the variables used in our analysis are reported in online appendix Table A.6. Our main indicator of land inequality is the Gini coecient for land ownership in a district. Formally, we calculate the Gini as follows: 1 ∑∑ N N −1 = |yi − yj | 2N 2 y i=1 j ̸=i where: N is the number of households within the District; yi is land owned by household i; y is the average land ownership within the District. As a second indicator of land inequality, we also consider the percentage of households in a district that are landless. Both measures of land inequality are calculated from the area of land owned variable of the IHDS survey. To explore the role of technology adoption as a mechanism a la Braverman and Stiglitz (1989), we use the share of households within a district that report using modern technology in agriculture. Specically, whether they report using any one of the following equipments: tube well, electric pump, diesel pump, tractor/tiller, or a thresher. We also look at whether market integration helps deepen the formal credit market with an indicator of formal bank branch in a rural district. The survey has information on land sales at the household level, and we estimate the impacts of market integration on land sales. We include several geographic control variables in our analysis. Climatological variables, including rainfall and temperature variation, are from BioClim and use 1961-1990 as reference (Hijmans et al., 2005). Elevation data are from Shuttle Radar Topography Mission (SRTM) 16 Digital Elevation Data 30m (Farr et al., 2007). Mean slope estimates are from Verdin et al. (2007). Crop suitability is calculated as the maximum suitability among four high-input, rainfed crops (cotton, dry-land rice, maize, and wheat). These are calculated from Global Agro-Ecological Zone (GAEZ) data available from the Food and Agriculture Organization (Fischer et al., 2012). We also control for population density in 1961 using population data from the India District Database (Vanneman and Barnes, 2000). District area data are from Statoids. For the GQ based identication scheme, we construct a hypothetical linear network connecting the main cities targeted by the GQ (see Figure 2). We then calculate the Euclidean distance between each district's centroid and this linear network. For robustness, we also consider the Euclidean distances to the least cost path GQ arms in the online appendix. Our treatment variable, market access, is calculated as the weighted average of the populations of all other locations, with a weight that decreases with travel time. We calculate market access as follows: M Ait = Σi̸=j (1/ttθ ijt )Pjt , where M Ait = market access of District i at time t, ttijt is the travel time (in hours) between Districts i and j at time t, Pjt = population in destination District j at time t, and θ = trade elasticity. Travel time between pairs of districts are from Allen and Atkin (2016a). District populations are from Brinkho (2020). For trade elasticity, we follow Allen and Atkin (2016a) and adopt 1.5 for our main results (and use alternative values for robustness). Our main results focus on market access calculated from travel time in 1996 and population in 1991. (6) Evidence on the Eects of Market Integration on Land Inequality We use two measures of land inequality for our empirical analysis: a land Gini at the district level and the proportion of landless households in a district. These measures are calculated for the year 2012. The regressions reported below include state xed eects. State xed eects are important for our empirical approach because there are important inter-state dierences in land policy which can be traced back to the Zamindari system of revenue collection under British colonial rule (Banerjee and Iyer (2005)). The implementation of land reform in the 1960s and 1970s also varied substantially across dierent states (Ghatak and Roy (2007)). 17 We build the evidence in a step by step fashion. The goal is to use a battery of alternative approaches and see if the evidence, taken together, leads to a robust conclusion. (6.1) OLS and Oster Bias-Adjusted OLS Estimates We begin with the OLS estimates, and check whether the OLS estimates remain robust once we correct for omitted variables bias using the approach developed by Altonji et al. (2005) and Oster (2019) where selection on observables is used as a guide to selection on unobservables. In particular, we use the bias adjusted OLS estimator (henceforth BA-OLS) proposed by Oster (2019). The advantage of the BA-OLS estimator is that the conclusions do not rely on any exclusion restrictions, but, unlike the IV estimates, this approach cannot correct for attenuation bias due to measurement error in the market access measure. The estimates from OLS and Oster (2019) BA-OLS estimators are reported in Table 1. The OLS estimate of the coecient of the indicator of market access (ln(M A)j ) is 0.108 without any controls, and it is signicant at the 1 percent level. To check whether the positive impact of market integration found in the baseline OLS estimate is driven by unobserved heterogeneity, we include state xed eects and a set of agro-climatic controls that can aect the productivity of land: an index of crop suitability (from FAO), the elevation and slope of a district, the long- term average and standard deviation of rainfall, the average of and seasonality in temperature. The estimates in column 2 of Table 1 are striking; the point estimate remains virtually unchanged (0.108 (column 1) to 0.111 (column 2)), even though the R2 more than doubles from 0.166 (column 1) to 0.380 (column 2) once we add the control variables. As discussed by Oster (2019), the sensitivity of an OLS estimate to the inclusion of control variables is informative about the importance and direction of omitted variables bias only when the control variables increase the R2 substantially. The BA-OLS estimate in column 3 of Table 1 corrects for selection on unobservables in addition to the observed control variables added in column 2, and the point estimate again increases slightly: from 0.110 (column 2) to 0.115 (column 3). This pattern of estimates contradict the idea that the unobserved heterogeneity biases the estimated impact of market integration upward. This suggests that the OLS estimate with controls is likely to be biased downwards when we take into account measurement error in the 18 measure of market access ln(M A)j . The OLS and BA-OLS estimates of the eects of market integration on the proportion of landless households also suggest a positive impact of market integration which is signicant at the 1 percent level across the board (see columns 4-6 in table 1). The estimated coecients vary only marginally: from 0.360 (OLS without controls) to 0.357 (OLS with controls) to 0.352 (BA-OLS). Again this lack of sensitivity is observed despite the fact that the set of controls have substantial explanatory power: the R2 increases from 0.167 to 0.428 once we add the controls. This strengthens the idea that the OLS estimates of the eects of market integration on land inequality are biased downward. (6.2) IV Estimates of the Impact of Market Integration on Land Inequality The IV estimates are reported in Tables 2A (land Gini) and 2B (landless). The regression specication includes the set of controls in column 2 of Table 1 discussed above in addition to the state xed eects. We report 3 dierent estimates for each measure of land inequality. The rst two columns use 2SLS and rely on the 1880 railroad length (column 1) and Euclidean distance to the hypothetical linear GQ network (column 2) as identifying instruments. The third column reports estimates from Lasso-IV based on an extended set of instruments consisting of the colonial railroad, Euclidean distance to the nearest GQ arm, and the interactions of these two instruments with all the exogeneous controls in the model such as slope, elevation, and rainfall. The Lasso-IV picks a parsimonious subset of ecient instruments from the extended set of instruments. The evidence suggests that the 1880 railroad is a particularly strong source of exogeneous variation in our market access variable with rst stage F statistics of 18.94 (land Gini regression) and 18.49 (landless regression). Euclidean distance to linear GQ network has good power in the landless regression with an F statistic of 11.14, but it lacks adequate power in the land Gini regression (F=6.25). Consistent with a priori expectations, a district with longer railroad network in the 1880s has better market access in 1996, and a district further away from the linear GQ network has a lower market access, and both instruments are signicant at the 1 percent level irrespective of the indicator of land inequality we consider. Since Lasso picks multiple instruments, we can implement the estimation in two dierent ways. The rst and 19 straightforward approach is to use the set of instruments in a 2SLS regression. However, it is well-understood that the weak instrument bias is the minimum in a just identied model and we can exploit this by adding a zero stage that predicts market access using the set of instruments along with all other exogeneous variables and then use the predicted MA as a  single instrument, thus converting it into a just identied model. As noted by Kolesár et al. (2015), this approach relies on a weaker identifying assumption in that we do not impose  exclusion restrictions separately on each instrument. However, the point estimates from these alternative approaches are very close in all our estimation. Column (4) in Tables 2A and 2B reports the estimate from Lasso-IV using the multiple instruments directly. The last column (5) contains estimate from a specication where we combine the three interaction based instruments picked by lasso in column (4) with the railroad and GQ instruments and use the predicted market access from a zero stage to convert it to a just identied model. The point estimates vary somewhat between columns (4) and (5) for land gini (Table 2A), but are very close for landless (Table 2B). The IV estimates strengthen the conclusion that market integration increases land inequality, the impact on both land Gini and proportion of landless households in a district is positive and signicant at the 1 percent level. The numerical magnitudes of the estimates are larger compared to the corresponding Oster (2019) BA-OLS estimate. This probably reects a combination of correction of attenuation bias and dealing with the omitted variables bias in a more adequate way. To have sense of the magnitudes of the impacts, we focus on the Lasso-IV estimates. The Lasso-IV estimates imply that land Gini in a district with 10 percent higher market access is 2.6 percent higher. For the impact on the landless, the corresponding estimate is a 6.78 percent higher landlessness. These are clearly substantial impacts.  For application of this procedure, see Rajan and Subramanian (2008), and Emran et al. (2020), among others.  Note that we do not report Hansen's J test as a test for validity of the exclusion restrictions when using both historical railroad and GQ instruments. Since there is no reason to expect that the compliers for the two instruments overlap substantially, the IV estimates from just identied models using the alternative instruments one at a time should be dierent. In fact, the estimates in table 2 show clearly the heterogeneity (compare the results in column 4 and 5 for landless in Table 2) and a Hansen's J test would reject the exclusion restriction incorrectly because of heterogeneity in the eects of market integration. 20 (6.3) Relaxing the Exclusion Restriction: Evidence from Conley et al. (2012) Bounds The IV estimates in Table 2A and 2B require that the exclusion restriction imposed on an instrument holds exactly (Conley et al. (2012)), i.e., the IV has a precisely zero direct impact on the outcomes of interest: land Gini and the proportion of landless in a district. A reader might worry that this exact exclusion restriction may be violated locally where an instrument exerts a small direct impact (positive or negative) through some unspecied channels. It is thus a reasonable question to ask: is the main conclusion that market integration increases land inequality robust to allowing for such small direct impact of the instruments? To address this, we implement the approach developed by Conley et al. (2012). The relaxation of the exact exclusion restriction implies that we no longer have point identication, but can estimate bounds on the causal eect of interest. To understand the basic intuition behind the approach, consider the following extension of the triangular empirical model set out earlier in section 3 above (with colonial railroad (denoted as Rj ) as the identifying instrument): ln(LGini)j = δ0 + δ1 ln(M A)j + ΓX1j + θRj + ζj ln(M A)j = α0 + ΦX1j + βRj + νj The IV (2SLS) estimates in Tables 2A rely on the following identifying assumptions: θ=0 (exact exclusion restriction) and β ̸= 0 (instrument relevance). The Conley et al. (2012) develop methods to estimate bounds on the parameter δ1 under the assumption that θ belongs to a narrow interval around zero, i.e., θ ∈ [−ϵ, +ϵ] for arbitrarily small values of ϵ > 0. In particular, we implement the UCI (union of condence intervals) method proposed by Conley et al. (2012). This approach is the most conservative as it only species the support of the distribution for the parameter θ. The results from the Conley et al. (2012) bounds approach are reported in Table 3. We report bounds on the estimated δ1 for three values of ϵ = 0.0001, 0.001,0.01. The results show that the estimated bound for the parameter remains positive even when we assume a relatively large interval for θ with ϵ = 0.01, except for the cases when Euclidean distance to GQ is used as the sole instrument. The wide bounds for the GQ instrument reect its lack of 21 strength in the rst stage regression discussed earlier. The estimates from the IVs picked by Lasso together provide us the most credible evidence and the impact of market integration on both measures of land inequality (land Gini and proportion of landless) remains numerically substantial. (6.4) Robustness Checks The main results on land inequality in 2012 presented in Tables 1-3 are based on a measure of market access calculated using travel time in 1996 (based on Allen and Atkin (2016b)). We check whether the conclusions change when we use travel time from other years. We use 1988, and 2004 travel time to calculate market access, and the results from alternative estimators are reported in Table A.1 in the online appendix. The estimates are broadly consistent with the conclusion that market integration increases land Gini and the proportion of landless in a district. We also provide evidence on potential sensitivity of the main conclusions to dierent assumptions regarding the trade elasticity parameter in calculating market access. As discussed before, our main results are based on a trade elasticity of 1.5. In online appendix Table A.2, we report estimates for an alternative value: 3.8. The results are again consistent with the main conclusions based on Tables 1-3 in the paper. Next, we check if the positive impact of market integration on land inequality in Tables 1-3 partly captures the eects of (i) dierences in colonial land revenue system, (ii) demographic pressure, and (iii) dierences in inheritance rules (laws and customs) between Hindu and Muslim populations. As noted earlier, Banerjee and Iyer (2005) provide evidence that colonial land tax policies had long-term eects on land inequality in India. To check if our IV estimates (especially using the colonial railroad as a source of identifying variation) partly captures the persistent eects of land revenue system, we include two dummies indicating the type of land tax system was in place in a district during the British colonial period. The estimates in column (1) of Table A.3 in the online appendix suggests that our estimated eects of market access remain virtually unchanged. In an analysis of land inequality in West Bengal, Bardhan et al. (2014) show that land inequality is inuenced by the demographic changes through population growth, and land 22 inheritance law and customs. Note that we include 1961 population density in all the regressions as a control for possible agglomeration eects of historical infrastructure. This also takes care of dierences in population growth and demographic pressure on land up till 1961. To understand the role of population growth after 1961, we include 2011 population density as an additional control. We also include the proportion of Muslim in the regressions and nd that the impact of market integration on land inequality is barely aected by the inclusion of these two control variables (see Table A.3). (7) Mechanisms The theoretical analysis of Braverman and Stiglitz (1989) emphasize the role of technology adoption as a mechanism for increasing land inequality in response to market integration caused by falling trade costs. Since agricultural technology that can give rise to increasing returns are especially important, we estimate the impact of market integration on a measure of technology adoption based on the ownership of the following farming equipments: tube well, electric pump, diesel pump, tractor/tiller, or a thresher. We also check whether market integration has had a signicant eect on land sales in a district in 2012. Table 4 reports the estimates for technology adoption (panel A) and land sales (panel B). We also report estimates of the eects on nancial deepening measured by formal bank branch (see panel C). For each outcome of interest, we present 8 estimates, including OLS, BA-OLS, and dierent IV estimates. The evidence suggests that market integration increases adoption of farming technology that are subject to increasing returns, and the eect is statistically signicant at the 1 percent level in the specication using the Lasso-IVs (see column (7) in Table 4). The magnitude of the eect is also not small: a 10 percent increase in the market access index increases the adoption of technology by 3.5 percent. The evidence on land sales is statistically not precise, but the estimates overall suggest a positive impact of market integration. Even though the OLS estimate of the eects on land sales is positive, numerically substantial, and signicant at the 5 percent level after controlling for the agroclimatic heterogeneity and 1961 population density, the BA-OLS and IV estimates have large standard errors. Interestingly, the point 23 estimates from BA-OLS and Lasso-IV are larger in magnitude when compared to the OLS estimates in columns 1 and 2 (see panel B of Table 4). As noted briey in the introduction, when market integration deepens the formal credit market through expansion of bank branches, the advantages the large land owners enjoy relative to the functionally landless and small landholders would be reinforced. Das et al. (2019) provide an extensive analysis of this issue providing convincing evidence that better market access does in fact help develop the formal nancial sector in India. We also add some suggestive evidence. Panel C in Table 4 report estimated impacts of market integration on the access to formal banks: the dependent variable of interest being a dummy for the existence of a formal bank branch in a village. The estimates suggest that better market access leads to a higher probability of having a formal bank branch in a village. The evidence taken together thus suggests that the deepening of the formal nancial sector is an important mechanism for understanding the eects of market integration on land inequality. (8) Conclusions The world has witnessed dramatic improvements in transport infrastructure in the last few decades which substantially reduced trade costs and led to spatial market integration. We provide evidence on the eects of the market integration on land inequality in the rural areas. Our empirical analysis uses data on land ownership from a high quality household survey in India, and exploits two sources of exogeneous variation: a historical infrastructure design based on colonial railroads in the 1880s and an inconsequential place design based on the Golden Quadrilateral network of highways. We also report estimates from the Oster (2019) approach that does not impose any exclusion restrictions, and following Altonji et al. (2005), relies on selection on observables as a guide to unobservables for tackling the omitted variables biases. The evidence suggests that market integration increases land inequality in a district: a 10 percent higher market access leading to a 2.6 percent increase in land inequality (land Gini), and a 6.8 percent increase in the incidence of landless. These conclusions are robust across alternative econometric approaches, and dierent measures of market access. The conclusion 24 that market integration increases land inequality holds even when we relax the exclusion restrictions imposed in the IV estimation by using the plausibly exogenous approach developed by Conley et al. (2012). We explore the mechanisms giving rise to the positive eect of market integration on land inequality. We nd evidence that market integration increases the adoption of increasing returns technology in agriculture, a mechanism emphasized by the theoretical model of Braverman and Stiglitz (1989). The evidence on the eects of market access on land sales shows a positive impact, but the estimates are imprecise. Evidence also suggests that market integration leads to a deepening of the formal banks which reinforces the advantages of the large landholders in the land market transactions. References Abeberese, A. B. and Chen, M. (2021). Intranational trade costs, product scope and productivity: Evidence from india's golden quadrilateral project. Journal of Development Economics, page 102791. Aggarwal, S. (2013). Do rural roads create pathways out of poverty. Evidence from India: UC Santa Cruz. Allen, T. and Atkin, D. (2016a). Volatility and the gains from trade. Technical report, National Bureau of Economic Research. Allen, T. and Atkin, D. (2016b). Volatility and the Gains from Trade. NBER Working Papers 22276, National Bureau of Economic Research, Inc. Altonji, J., Elder, T., and Taber, C. (2005). Selection on Observed and Unobserved Variables: Assessing the Eectiveness of Catholic Schools. Journal of Political Economy, 113(1):151 184. Arnold, D. (2000). The New Cambridge History of India. Cambridge University Press. Asturias, J., GarcÃa-Santana, M., and Ramos, R. (2018). Competition and the Welfare Gains from Transportation Infrastructure: Evidence from the Golden Quadrilateral of India. Journal of the European Economic Association, 17(6):18811940. 25 Atkin, D. and Donaldson, D. (2015). Who's Getting Globalized? The Size and Implications of Intra-national Trade Costs. NBER Working Papers 21439, National Bureau of Economic Research, Inc. Banerjee, A., Duo, E., and Qian, N. (2012). On the Road: Access to Transportation Infrastructure and Economic Growth in China. In NBER Working Paper. National Bureau of Economic Research. Banerjee, A. and Iyer, L. (2005). History, Institutions, and Economic Performance: The Legacy of Colonial Land Tenure Systems in India. American Economic Review, 95(4):1190 1213. Banerjee, A. V. and Newman, A. F. (1993). Occupational Choice and the Process of Development. Journal of Political Economy, 101(2):274298. Bardhan, P., Luca, M., Mookherjee, D., and Pino, F. (2014). Evolution of land distribution in West Bengal 19672004: Role of land reform and demographic changes. Journal of Development Economics, 110:171190. Bauluz, L., Govind, Y., and Novokmet, F. (2020). Global Land Inequality. World Inequality Lab Working Papers halshs-03022360, HAL. Baum-Snow, N., Brandt, L., Henderson, J. V., Turner, M. A., and Zhang, Q. (2017). Roads, Railroads, and Decentralization of Chinese Cities. The Review of Economics and Statistics, 99(3):435448. Becker, G. (1981). A Treatise on the Family. Harvard University Press. Berg, C. N., Deichmann, U., Liu, Y., and Selod, H. (2016). Transport Policies and Development. The Journal of Development Studies, 53(4):465480. Besley, T. and Burgess, R. (2000). Land reform, poverty reduction, and growth: Evidence from india. The Quarterly Journal of Economics, 115(2):389430. Besley, T., Leight, J., Pande, R., and Rao, V. (2016). Long-run impacts of land regulation: Evidence from tenancy reform in india. Journal of Development Economics, 118:7287. Bharti, N. (2018). Wealth Inequality, Class and Caste in India, 1961-2012. Working paper no. 14, WorldInequality Lab. Binswanger, H. P., Deininger, K., and Feder, G. (1995). Power, distortions, revolt and reform 26 in agricultural land relations. In Chenery, H. and Srinivasan, T., editors, Handbook of Development Economics, volume 3 of Handbook of Development Economics, chapter 42, pages 26592772. Elsevier. Blankespoor, B., Emran, M. S., Shilpi, F., and Xu, L. (2022). Bridge to Bigpush or Backwash? Market Integration, Reallocation, and Productivity Eects of Jamuna Bridge in Bangladesh. Journal of Economic Geography. Blankespoor, B., Khan, A., and Selod, H. (2017). Consolidating Data of Global Urban Populations: a Comparative Approach. Technical report, Wrold Bank. Bogart, D., Chaudhary, L., and Herranz-Loncan, A. (2015). The Growth Contribution of Colonial Indian Railways in Comparative Perspective. CEH Discussion Papers 033, Centre for Economic History, Research School of Economics, Australian National University. Braverman, A. and Stiglitz, J. (1989). Credit rationing, tenancy, productivity, and the dynamics of inequality. In Bardhan, P., editor, The Ecnomic Theory of Agrarian Institutions. Oxofrd University Press. Brinkho, T. (2020). City Population http://www.citypopulation.de. Chaudhury, P. D. (2005). Modal split between rail and road modes of transport in india. Vikalpa, 30(1):1734. Conley, T. G., Hansen, C. B., and Rossi, P. E. (2012). Plausibly Exogenous. The Review of Economics and Statistics, 94(1):260272. Damania, R., Berg, C., Russ, J., Barra, A. F., Nash, J., and Ali, R. (2017). Agricultural Technology Choice and Transport. American Journal of Agricultural Economics, 99(1):265 284. Das, A., Ghani, E., Grover, A., Kerr, W., and Nanda, R. (2019). Infrastructure and Finance: Evidence from India's GQ Highway Network. Harvard Business School Working Papers 19-121, Harvard Business School. Datta, S. (2012). The impact of improved highways on indian rms. Journal of Development Economics, 99(1):4657. Desai, S., Dubey, A., Joshi, B., Sen, M., Shari, A., Vanneman, R., and Codebook, H. (2005). India human development survey (ihds). NCAER, India and Inter-university Consortium 27 for Political and Social Research, MI, USA. Desai, S. and Vanneman, R. (2012). India human development survey (ihds)-ii. icpsr36151. Ann Arbor, MI: Inter-university Consortium for Political and Social Research. Donaldson, D. (2015). The Gains from Market Integration. Annual Review of Economics, 7(1):619647. Donaldson, D. (2018a). Railroads of the Raj: Estimating the Impact of Transportation Infrastructure. American Economic Review. Donaldson, D. (2018b). Railroads of the raj: Estimating the impact of transportation infrastructure. American Economic Review, 108(4-5):899934. Duranton, G. (2015). Roads and trade in Colombia. Economics of Transportation, 4(1):1636. Duranton, G., Morrow, P. M., and Turner, M. A. (2014). Roads and Trade: Evidence from the US. The Review of Economic Studies, 81(2):681724. Duranton, G. and Turner, M. A. (2012). Urban Growth and Transportation. Review of Economic Studies, 79(4):14071440. Emran, M. S. and Hou, Z. (2013). Access to Markets and Rural Poverty: Evidence from Household Consumption in China. Review of Economics and Statistics, 95(2):682697. Emran, M. S., Islam, A., and Shilpi, F. (2020). Distributional Eects of Corruption When Enforcement is Biased: Theory and Evidence from Bribery in Schools in Bangladesh. Economica, 87(348):9851015. Emran, M. S. and Shilpi, F. (2012). The extent of the market and stages of agricultural specialization. Canadian Journal of Economics, 45(3):11251153. Emran, M. S., Shilpi, F. J., Coulombe, H., and Blankespoor, B. (2019). Temporary Trade Shocks, Spatial Reallocation, and Persistence in Developing Countries : Evidence from a Natural Experiment in West Africa. Policy Research Working Paper Series 8962, The World Bank. Faber, B. (2014). Trade Integration, Market Size, and Industrialization: Evidence from Chinas National Trunk Highway System. The Review of Economic Studies, 108((4-5)):899934. Fafchamps, M. and Shilpi, F. (2003). The spatial division of labour in Nepal. Journal of Development Studies, 39(6):2366. 28 Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., et al. (2007). The shuttle radar topography mission. Reviews of geophysics, 45(2). Fischer, G., Nachtergaele, F. O., Prieler, S., Teixeira, E., Tóth, G., Van Velthuizen, H., and Wiberg, D. (2012). Global agro-ecological zones (gaez v3. 0): Model documentation. international institute for applied systems analysis (iiasa), laxenburg. Ghani, E., Goswami, A. G., and Kerr, W. R. (2016). Highway to success: The impact of the golden quadrilateral project for the location and performance of indian manufacturing. The Economic Journal, 126(591):317357. Ghatak, M. and Roy, S. (2007). Land reform and agricultural productivity in India: a review of the evidence. Oxford Review of Economic Policy, 23(2):251269. Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology: A Journal of the Royal Meteorological Society, 25(15):19651978. Himanshu (2019). Inequality in india. Technical report. Jones, S. and Salazar, C. (2021). Infrastructure Improvements and Maize Market Integration: Bridging the Zambezi in Mozambique. American Journal of Agricultural Economics, 103(2):620642. Kolesár, M., Chetty, R., Friedman, J., Glaeser, E., and Imbens, G. (2015). Identication and Inference With Many Invalid Instruments. Journal of Business & Economic Statistics, 33(4):474484. Mariscal, E. and Sokolo, K. L., editors (2000). Schooling, Suurage and the Persistance of Inequality in the Americas, 1800-1945. Palo Alto: Hoover Institution Press. Michaels, G. (2008). The Eect of Trade on the Demand for Skill: Evidence from the Interstate Highway System. The Review of Economics and Statistics, 90(4):683701. Oster, E. (2019). Unobservable selection and coecient stability: Theory and evidence. Journal of Business & Economic Statistics, 37(2):187204. Rajan, R. and Subramanian, A. (2008). Aid and Growth: What Does the Cross-Country Evidence Really Show? The Review of Economics and Statistics, 90(4):643665. 29 Ray, D. (1998). Development Economics. Pricneton University Press. Redding, S. and Turner, M. (2015). Transportation Costs and the Spatial Organization of Economic Activity. pages 13391398. Robinson, R. and Thagensen, B., editors (2004). Road Engineering for Development. Spon Press: Taylor and Francis Group. Singh, R., editor (1995). Grand Trunk Road: A Passage Through Time. Aperture Books. Sokolo, K. L. and Engerman, S. L. (2000). Institutions, Factor Endowments, and Paths of Development in the New World. Journal of Economic Perspectives, 14(3):217232. Todaro, M. and Smith, S. (2015). Economic Development. Pearson. van Maarseveen, R. (2020). The urban-rural education gap: do cities indeed make us smarter? Journal of Economic Geography, 21(5):683714. Vanneman, R. and Barnes, D. (2000). Indian district data, 1961-1991: machinereadable data le and codebook. Center on Population, Gender, and Social Inequality. University of Maryland. Verdin, K. L., Godt, J. W., Funk, C., Pedreros, D., Worstell, B., and Verdin, J. (2007). Development of a global slope dataset for estimation of landslide occurrence resulting from earthquakes. Technical report. World Bank (2018). The Web of Transportation Corridors in South Asia. Technical report, World Bank Group, Washington DC. 30 Figure 1 Kernel Density of Market Access in Districts with and without colonial railways Source: Author calculation. Figure 2. Golden Quadrilateral Road Networks Source: Author elaboration using data from Ghani et al (2016). Table 1. Effects of Market Access on Land Inequality: OLS and Bias-Adjusted OLS Estimates (1) (2) (3) (4) (5) (6) Dep. var. ln(Landgini 2012) ln(Landless 2012) Bias- Bias- Estimator OLS OLS OLS OLS Adjusted Adjusted ln(MA1996) 0.108*** 0.111*** 0.115*** 0.360*** 0.357*** 0.352*** (0.02) (0.02) (0.04) (0.07) (0.07) (0.11) ln(Slope) -0.037 -0.078 (0.03) (0.09) Elevation -0.138 0.148 (0.14) (0.43) Crop Suitability -0.002 -0.005 (0.00) (0.00) ln(Rain) 0.039 0.153 (0.05) (0.16) ln(Temperature) -0.188 0.621 (0.35) (0.96) ln(Rain CV) 0.089 0.256 (0.13) (0.39) ln(Temp. seasonality) -0.056 -0.064 (0.08) (0.21) Pop. density, 1961 0.023*** 0.037 (0.01) (0.02) Constant -1.942*** -0.980 -6.262*** -11.059* (0.25) (2.08) (1.14) (5.84) State Dummies No Yes Yes No Yes Yes Observations 200 200 200 212 212 212 R-squared 0.166 0.380 0.167 0.428 Note: Columns (1), (2), (4) and (5) are estimated by OLS. Columns (3) and (6) are estimated by Oster's bias- adjusted OLS (Oster (2019)). Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1 Table 2A. Effects of Market Access on Land Gini (IV Estimates) (1) (2) (3) (4) (5) Dep. var. ln(Landgini 2012) Estimator 2SLS 2SLS 2SLS Lasso-IV 2SLS ln(MA1996) 0.244*** 0.232** 0.241*** 0.255*** 0.197*** (0.07) (0.09) (0.06) (0.06) (0.05) ln(Slope) -0.044 -0.043 -0.044 -0.044 -0.041 (0.03) (0.03) (0.03) (0.03) (0.03) Elevation -0.191 -0.186 -0.190 -0.195 -0.172 (0.15) (0.15) (0.15) (0.16) (0.14) Crop Suitability -0.003** -0.003** -0.003** -0.003** -0.003** (0.00) (0.00) (0.00) (0.00) (0.00) ln(Rain) 0.053 0.052 0.053 0.054 0.048 (0.06) (0.06) (0.06) (0.06) (0.05) ln(Temperature) -0.428 -0.406 -0.421 -0.447 -0.343 (0.42) (0.44) (0.41) (0.42) (0.37) ln(Rain CV) 0.138 0.134 0.137 0.142 0.121 (0.14) (0.14) (0.13) (0.14) (0.13) ln(Temp. seasonality) -0.024 -0.027 -0.025 -0.021 -0.035 (0.08) (0.08) (0.08) (0.08) (0.08) Pop. density, 1961 0.015** 0.016* 0.016** 0.015** 0.018*** (0.01) (0.01) (0.01) (0.01) (0.01) Constant -1.908 -1.806 -1.877 -2.271 -1.506 (2.45) (2.42) (2.40) (2.47) (2.22) First Stage ln(km of railroad) 0.117*** 0.114*** 0.192*** (0.03) (0.03) (0.07) ln(dist. to GQ) -0.128*** -0.120** -0.015 (0.05) (0.05) (0.08) ln(dist. to GQ) x -0.131*** -0.112* ln(slope) (0.04) (0.06) ln(dist. to GQ) x 0.152 0.106 Elevation (0.18) (0.19) ln(km of railroad) x 0.002*** -0.001 Crop suitability (0.00) (0.00) Weak identification 18.94 6.25 21.93 9.19 34.23 0.0000 0.0134 0.0000 0.0000 0.0000 Observations 200 200 200 200 200 R-squared 0.239 0.263 0.247 0.216 0.321 Notes: Columns (1) and (2) are estimated by two-stage least squares (2SLS) using railroad length and distance to the Golden Quadrilateral as instrumental variables, respectively. Column (3) is estimated using predicted Market Access as an instrumental variable. The predicted Market Access are fitted values from a zero stage with both railroad length and distance to GQ. The Lasso in column (4) is estimated using a parsimonious set of instruments chosen by Lasso from a broad set that include railroad length, distance to GQ, and each of their interactions with the exogenous control variables as instrumental variables. Lasso selects three instruments which are reported in the first stage in column (4). Column (5) is estimated using predicted Market Access as an instrumental variable. The predicted Market Access are fitted values from a zero stage including railroad length, distance to GQ, and all the instruments chosen by Lasso. In columns (3) and (5), under the first stage, we report the zero stage estimated coefficients of the instrumental variables. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table 2B. Effects of Market Access on Landlessness (IV Estimates) (1) (2) (3) (4) (5) Dep. var. ln(Landless 2012) Estimator 2SLS 2SLS 2SLS Lasso-IV 2SLS ln(MA1996) 0.974*** 0.403* 0.758*** 0.678*** 0.662*** (0.23) (0.21) (0.16) (0.14) (0.14) ln(Slope) -0.079 -0.078 -0.078 -0.078 -0.078 (0.11) (0.09) (0.10) (0.10) (0.10) Elevation -0.106 0.129 -0.017 0.016 0.022 (0.50) (0.41) (0.44) (0.42) (0.42) Crop Suitability -0.010** -0.005 -0.008** -0.008** -0.008** (0.00) (0.00) (0.00) (0.00) (0.00) ln(Rain) 0.120 0.151 0.132 0.136 0.137 (0.19) (0.15) (0.17) (0.16) (0.16) ln(Temperature) -0.530 0.534 -0.127 0.021 0.051 (1.34) (0.99) (1.12) (1.06) (1.03) ln(Rain CV) 0.575 0.280 0.463 0.422 0.414 (0.50) (0.38) (0.43) (0.41) (0.41) ln(Temp. seasonality) -0.104 -0.067 -0.090 -0.085 -0.084 (0.26) (0.19) (0.22) (0.21) (0.21) Pop. density, 1961 0.000 0.034 0.013 0.018 0.019 (0.02) (0.02) (0.02) (0.02) (0.02) Constant -14.063* -10.750* -12.808* -13.660** -12.255* (7.87) (5.56) (6.68) (6.33) (6.33) First Stage ln(km of railroad) 0.114*** 0.114*** 0.186*** (0.03) (0.03) (0.07) ln(dist. to GQ) -0.148*** -0.120** -0.030 (0.04) (0.05) (0.08) ln(dist. to GQ) x -0.131*** -0.106* ln(slope) (0.04) (0.06) ln(dist. to GQ) x 0.119 0.081 Elevation (0.17) (0.18) ln(km of railroad) x 0.002*** -0.001 Crop suitability (0.00) (0.00) Weak identification 18.49 11.14 26.96 12.36 41.70 0.0000 0.0010 0.0000 0.0000 0.0000 Observations 212 212 212 212 212 R-squared 0.154 0.427 0.312 0.3535 0.361 Notes: Columns (1) and (2) are estimated by two-stage least squares (2SLS) using railroad length and distance to the Golden Quadrilateral as instrumental variables, respectively. Column (3) is estimated using predicted Market Access as an instrumental variable. The predicted Market Access are fitted values from a zero stage with including both railroad length and distance to GQ. The Lasso in column (4) is estimated using a parsimonious set of instruments chosen by Lasso from a broad set that includes railroad length, distance to GQ, and each of their interactions with the exogenous control variables as instrumental variables. Lasso selects three instruments which are reported in the first stage in column (4). Column (5) is estimated using predicted Market Access as an instrumental variable. The predicted Market Access are fitted values from a zero stage with railroad length, distance to GQ, and the instruments chosen by Lasso. The predicted Market Access are fitted values from a zero stage including railroad length, distance to GQ, and all the instruments chosen by Lasso. In columns (3) and (5), under the first stage, we report the zero stage estimated coefficients of the instrumental variables. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table 3. Relaxing the Exclusion Restrictions: Conley et al. Bound Estimates Lower Upper Bound Bound ln(Landgini 2012) ln(km of railroad) θ ϵ [-0.01*β, 0.01*β] 0.097 0.399 θ ϵ [-0.05*β, 0.05*β] 0.023 0.502 θ ϵ [-0.1*β, 0.1*β] -0.082 0.636 ln(dist. to GQ) θ ϵ [-0.01*β, 0.01*β] 0.028 0.445 θ ϵ [-0.05*β, 0.05*β] -0.033 0.541 Predicted Market Access 1 θ ϵ [-0.01*β, 0.01*β] 0.129 0.353 θ ϵ [-0.05*β, 0.05*β] 0.121 0.364 θ ϵ [-0.1*β, 0.1*β] 0.110 0.378 Predicted Market Access 2 θ ϵ [-0.01*β, 0.01*β] 0.140 0.371 θ ϵ [-0.05*β, 0.05*β] 0.131 0.383 θ ϵ [-0.1*β, 0.1*β] 0.120 0.398 ln(Landless 2012) ln(km of railroad) θ ϵ [-0.01*β, 0.01*β] 0.441 1.549 θ ϵ [-0.05*β, 0.05*β] 0.157 1.991 θ ϵ [-0.1*β, 0.1*β] -0.263 2.563 ln(dist. to GQ) θ ϵ [-0.01*β, 0.01*β] -0.068 0.876 Predicted Market Access 1 θ ϵ [-0.01*β, 0.01*β] 0.427 1.091 θ ϵ [-0.05*β, 0.05*β] 0.402 1.124 θ ϵ [-0.1*β, 0.1*β] 0.370 1.165 Predicted Market Access 2 θ ϵ [-0.0001, 0.0001] 0.375 1.022 θ ϵ [-0.05*β, 0.05*β] 0.351 1.053 θ ϵ [-0.1*β, 0.1*β] 0.320 1.092 Notes: (1) θ is the direct effect of an instrument on the outcome variable. The lower and upper bounds are the estimated effects of market access on the relevant measure of land inequality given that θ belongs to a specified interval. (2) Predicted Market Access refers to estimated fitted values from a zero stage regression. This allows us to estimate a just identified IV model using predicted market access as a single instrument. (3) Predicted Market Access 1 is from a zero stage with both railroad length and distance to GQ. Predicted Market Access 2 is from a the zero stage with the three instruments chosen by the Lasso: distance to GQ interacted with slope, distance to GQ interacted with elevation, and railroad length interacted with crop suitability. Table 4. Understanding the Mechanisms (1) (2) (3) (4) (5) (6) (7) (8) Bias- Estimator OLS OLS 2SLS 2SLS 2SLS Lasso-IV 2SLS Adjusted Panel A ln(Tech. Use 2012) ln(MA1996) 0.273*** 0.235*** 0.188 0.423*** 0.289** 0.366*** 0.350*** 0.364*** (0.06) (0.04) (0.70) (0.14) (0.12) (0.09) (0.09) (0.08) First Stage Weak Identification 18.49 11.14 26.95 12.36 41.70 0.0000 0.0010 0.000 0.0000 0.000 R-squared 0.186 0.538 0.488 0.533 0.5136 0.519 0.514 Panel B ln(Land Sale, 2012) ln(MA1996) 1.172* 2.330** 3.093*** -1.537 1.902 -0.072 0.433 1.127 (0.66) (0.90) (0.96) (2.65) (3.13) (2.04) (2.11) (1.90) First Stage Weak Identification 18.49 11.14 26.95 12.36 41.70 0.0000 0.0010 0.000 0.0000 0.000 R-squared 0.014 0.163 0.080 0.162 0.131 0.143 0.155 Panel C Credit: Formal Bank ln(MA1996) 0.017 0.014 0.011 0.067 0.064 0.066* 0.063* 0.057* (0.01) (0.01) (7.07) (0.04) (0.04) (0.04) (0.04) (0.03) First Stage Weak Identification 18.49 11.14 26.95 12.36 41.70 0.0000 0.0010 0.000 0.0000 0.000 R-squared 0.005 0.340 0.282 0.318 0.314 0.317 0.322 State Dummies No Yes Yes Yes Yes Yes Yes Yes Observations 212 212 212 212 212 212 212 212 Notes: Columns (1) and (2) are estimated by OLS. Column (1) is a bivariate regression without any controls while column (2) includes state fixed effects and other controls. Column (3) is estimated by Oster's bias-adjusted OLS estimator. Columns (4) and (5) are estimated by 2SLS using railroad length and distance to GQ as IVs, respectively. Column (6) is estimated by 2SLS using predicted market access as the IV; which is estimated from a zero stage OLS regression including both railroad length and distance to GQ. Column (7) reports the Lasso estimates using a broad set of instruments that includes railroad length, distance to GQ, and each of their interactions with the exogenous control variables as IVs. The three IVs chosen by the lasso are: distance to GQ interacted with slope, distance to GQ interacted with elevation, and railroad length interacted with crop suitability. Column (8) is estimated by 2SLS using predicted market access estimated from a zero stage OLS regression including railroad length, distance to GQ, and all the lasso chosen IVs. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Online Appendix OA.1 Construction of the Market Access Index Our Market Access measure is calculated as follows: 1 = ∑ ( ) ≠ where, is market access of District i at time t is the travel time (in hours) between Districts i and j at time t = population in destination District j at time t = trade elasticity To construct the Market Access measures, population and travel times were aligned as follows: MA Travel Time Population MA1988 1988 1991 MA1996 1996 MA2004 2004 2001 Note that for the main analysis, we focus on Market Access calculated using travel time from 1996 and population from 1991. We explore alternative travel times (1988, and 2004) as robustness checks. Following Allen and Atkin (2016) we use trade elasticity equal to 1.5, and report results using trade elasticity equal to 3.8 as well. OA.2 Calculation of the GQ-based instruments Our identification strategy relies on two main sets of instrumental variables: one inspired by India’s Golden Quadrilateral (GQ) highways and the other by its colonial railways in the 1880s. Both are calculated using geospatial software. Here, we describe the calculation of the first instrumental variable. For the first GQ IV, we construct a hypothetical linear network connecting the main cities targeted by the GQ project (see Figure 2). The target cities of the GQ include: New Delhi, Kolkata, Chennai, Mumbai and Bangalore. We construct two networks: one based on Euclidean distance and another “least cost path” derived from elevation and slope data. Our main results rely on the linear, Euclidean distance instrument but are robust to using the least-cost path instrument instead. It is also important to note that though the GQ highway project is a relatively recent investment, parts of it follows historical roads (Figure AF.1). The least cost path is calculated based on a time cost raster method using the Cost Connectivity algorithm available in ESRI ArcGIS as the minimum time result from off road speeds on land. We construct the time cost raster by assigning a speed to cross each pixel based on Tobler’s hiking function (1993) with a weight derived from historical land cover class (1900) following Ali et al. (2015). We use land cover from the HYDE model (Goldewijk et al. 2011) and we use slope data from Verdin (2007). = 6 ∗ −3.5 |+0.05|∗0.6 where s is mean slope. We then calculate the distance between each district’s centroid and each linear network. For robustness, we also consider the Euclidean distances from each district’s centroids to the actual GQ highways, which are available from Ghani et al (2014). Similar to Faber (2014), we address the concern of non-random local route placements on the way between targeted city nodes by constructing a hypothetical network based on Euclidean distance and least cost path based on elevation and slope. Faber uses a simple land cover model based on the engineering literature (see Jha et al., 2001; Jong and Schonfeld, 2003; cited in Faber 2014) and includes measures of slope, development, water and wetland, where the algorithm prefers short and flat routes. Figure AF.1: Historical Roads of India, 1872 vs Golden Quadrilateral Roads in 1992 Note: Schwartzenberg roads in red are from 1872. Digital Chart of the World in black depict roads in 1992. Source: Schwartzberg Atlas, Growth of Road Network, p. 125, available from the Digital South Asia Library. Digital Chart of the World. Table A.1. Using Market Access with travel times from other years: 1988, and 2004, Lasso Table A.1, Panel A: 1988 (1) (2) (3) (4) ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) ln(Land Sale, 2012) ln(MA1988) 0.260*** 0.732*** 0.372*** 0.495 (0.07) (0.19) (0.12) (2.62) ln(Slope) -0.033 -0.060 -0.065 -0.591 (0.03) (0.10) (0.05) (1.13) Elevation -0.251 -0.148 0.177 3.611 (0.16) (0.45) (0.27) (5.49) Crop Suitability -0.003** -0.007* -0.001 0.004 (0.00) (0.00) (0.00) (0.05) ln(Rain) 0.052 0.155 0.017 -1.143 (0.06) (0.17) (0.10) (2.14) ln(Temperature) -0.552 -0.363 0.459 2.242 (0.42) (1.13) (0.75) (15.62) ln(Rain CV) 0.161 0.497 0.457* 10.056* (0.14) (0.43) (0.26) (5.73) -0.022 -0.060 -0.087 -4.483 ln(Temp. seasonality) (0.08) (0.22) (0.12) (3.27) Pop. density, 1961 0.016** 0.017 -0.009 -0.076 (0.01) (0.02) (0.01) (0.33) Constant -1.725 -12.714** -10.57*** -13.192 (2.33) (6.29) (3.88) (89.47) First Stage ln(dist. to GQ) x Elevation -0.258** -0.310*** -0.310*** -0.310*** (0.11) (0.11) (0.11) (0.11) ln(km of railroad) x Crop 0.002*** 0.002*** 0.002*** 0.002*** Suitability (0.00) (0.00) (0.00) (0.00) Weak Identification 7.10 8.59 8.59 8.59 0.0011 0.0003 0.0003 0.0003 Observations 200 212 212 212 R-squared 0.210 0.322 0.511 0.146 Table A.1, Panel B: 2004 (1) (2) (3) (4) ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) ln(Land Sale, 2012) ln(MA2004) 0.277*** 0.712*** 0.382*** 0.879 (0.07) (0.17) (0.09) (2.35) ln(Slope) -0.063** -0.120 -0.097* -0.656 (0.03) (0.10) (0.05) (1.12) Elevation -0.170 0.054 0.272 3.612 (0.15) (0.41) (0.24) (5.28) Crop Suitability -0.003** -0.008** -0.001 0.000 (0.00) (0.00) (0.00) (0.05) ln(Rain) 0.086 0.186 0.034 -1.114 (0.06) (0.17) (0.10) (2.14) ln(Temperature) -0.491 -0.147 0.526 1.586 (0.44) (1.08) (0.66) (14.85) ln(Rain CV) 0.143 0.451 0.445* 10.237* (0.15) (0.42) (0.24) (5.65) 0.014 -0.036 -0.074 -4.464 ln(Temp. seasonality) (0.08) (0.22) (0.11) (3.25) Pop. density, 1961 0.017*** 0.026 -0.006 -0.088 (0.01) (0.02) (0.01) (0.31) Constant -2.983 -14.393** -11.56*** -16.911 (2.65) (6.53) (3.77) (90.32) First Stage ln(dist. to GQ) x ln(Slope) -0.132*** -0.127 -0.127 -0.127 (0.04) (0.04) (0.04) (0.04) ln(dist. to GQ) x Elevation 0.162 0.111 0.111 0.111 (0.17) (0.17) (0.17) (0.17) ln(km of railroad) x Crop 0.001*** 0.001*** 0.001*** 0.001*** Suitability (0.00) (0.00) (0.00) (0.00) Weak Identification 7.68 9.93 9.93 9.93 0.0001 0.0000 0.0000 0.0000 Observations 200 212 212 212 R-squared 0.181 0.331 0.524 0.151 Note: All columns are estimated by Lasso and include state fixed effects. Weak identification and R-squared are from 2SLS using the IVs chosen by Lasso. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table A.2: Effects of Market Integration: Market Access Measure Based on trade elasticity 3.8 (1) (2) (3) (4) ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) ln(Land Sale, 2012) ln(MA1996),θ = 3.8 0.092** 0.306*** 0.182*** 0.775 (0.04) (0.11) (0.07) (1.07) ln(Slope) -0.003 0.079 0.019 -0.210 (0.04) (0.14) (0.08) (1.25) Elevation -0.560** -1.117 -0.438 0.339 (0.27) (0.75) (0.42) (7.08) Crop Suitability -0.003* -0.008* -0.002 -0.009 (0.00) (0.00) (0.00) (0.05) ln(Rain) -0.099 -0.303 -0.257* -2.334 (0.07) (0.24) (0.15) (2.73) ln(Temperature) -1.452** -3.052 -1.285 -7.621 (0.71) (1.94) (1.12) (19.96) ln(Rain CV) 0.403* 1.146* 0.880** 12.487** (0.21) (0.64) (0.38) (6.34) ln(Temp. seasonality) -0.133 -0.357 -0.265 -5.270 (0.09) (0.29) (0.17) (3.37) Pop. density, 1961 0.006 -0.018 -0.034 -0.241 (0.01) (0.04) (0.02) (0.38) Constant 7.254* 13.753 5.086 40.458 (3.78) (10.45) (5.94) (113.36) First Stage Weak Identification 9.06 10.94 10.94 10.94 0.003 0.0011 0.0011 0.0011 Observations 200 212 212 212 R-squared -0.126 -0.147 0.161 0.165 Notes: Estimated by 2SLS using predict market access as the instrumental variable. Predicted market access is estimated from a zero stage including railway length, distance to GQ, and the IVs chosen by Lasso (distance to GQ x Pop. density, 1961; and railway length x Crop suitability). All columns are estimated by 2SLS and include state fixed effects. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table A.3: Estimates Controlling for the Colonial Land Revenue System (1) (2) (3) (4) ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) ln(Land Sale, 2012) ln(MA1996) 0.278*** 0.386 0.285** -0.403 (0.06) (0.24) (0.14) (2.31) British control (yes=1) -0.061 0.075 -0.020 2.926* (0.04) (0.13) (0.08) (1.60) No landlord (yes=1) -0.084 -0.019 0.074 3.497 (0.05) (0.15) (0.10) (2.15) ln(Slope) -0.030 -0.078 -0.084* -1.226 (0.03) (0.09) (0.05) (1.21) Elevation -0.242 0.169 0.283 5.813 (0.16) (0.42) (0.25) (5.52) Crop Suitability -0.003** -0.005 -0.001 -0.001 (0.00) (0.00) (0.00) (0.05) ln(Rain) 0.039 0.146 0.028 -0.383 (0.06) (0.15) (0.09) (2.21) ln(Temperature) -0.542 0.554 0.816 6.563 (0.43) (1.04) (0.67) (15.37) ln(Rain CV) 0.129 0.340 0.340 10.346* (0.15) (0.38) (0.23) (5.61) -0.042 -0.045 -0.090 -3.010 ln(Temp. seasonality) (0.08) (0.21) (0.12) (3.38) Pop. density, 1961 0.011 0.038 -0.006 0.098 (0.01) (0.03) (0.02) (0.35) Constant -1.657 -11.039** -10.019*** -46.566 (2.54) (5.51) (3.74) (92.78) First Stage -0.129*** -0.128*** -0.128*** -0.128*** ln(dist. to GQ) x ln(Slope) (0.04) (0.03) (0.03) (0.03) ln(dist. to GQ) x Elevation 0.217 0.161 0.161 0.161 (0.18) (0.17) (0.17) (0.17) ln(dist. to GQ) x ln(Slope) 0.002*** 0.002*** 0.002*** 0.002*** (0.00) (0.00) (0.00) (0.00) Weak identification 8.05 10.96 10.96 10.96 0.0000 0.0000 0.0000 0.0000 Observations 200 212 212 212 R-squared 0.1756 0.3458 0.5119 0.1446 Note: All columns are estimated by Lasso and include state fixed effects. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table A.4: Including population density from 2011 and proportion of Muslims as additional controls (1) (2) (3) (4) Dep. var. ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) ln(Land Sale, 2012) ln(MA1996) 0.273*** 0.691*** 0.327** -0.162 (0.06) (0.15) (0.13) (2.37) ln(Slope) -0.012 -0.003 -0.048 -1.318 (0.03) (0.09) (0.05) (1.24) Elevation -0.217 -0.009 0.267 4.322 (0.15) (0.40) (0.24) (5.27) Crop Suitability -0.003** -0.007** -0.001 -0.003 (0.00) (0.00) (0.00) (0.05) ln(Rain) 0.042 0.160 0.036 -0.874 (0.06) (0.16) (0.09) (2.21) ln(Temperature) -0.478 0.083 0.774 4.327 (0.44) (1.04) (0.69) (15.09) ln(Rain CV) 0.166 0.371 0.333 8.771 (0.14) (0.41) (0.25) (5.80) -0.012 -0.015 -0.073 -5.132 ln(Temp. seasonality) (0.08) (0.20) (0.11) (3.24) Pop. density, 1961 0.335*** 0.716* 0.145 -9.044* (0.12) (0.37) (0.25) (4.78) Pop. density, 2011 -0.172*** -0.373* -0.081 4.828* (0.07) (0.20) (0.13) (2.59) Muslim prop. 0.251*** 0.989*** 0.529*** -4.683* (0.10) (0.25) (0.14) (2.82) Constant -2.566 -14.956** -10.655*** -3.339 (2.53) (6.11) (3.76) (89.29) First Stage ln(dist. to GQ) x ln(Slope) -0.101*** -0.102*** -0.102*** -0.102*** (0.04) (0.03) (0.03) (0.03) ln(dist. to GQ) x Elevation 0.067 0.049 0.049 0.049 (0.17) (0.16) (0.16) (0.16) ln(dist. to GQ) x ln(Slope) 0.002*** 0.001*** 0.001*** 0.001*** (0.00) (0.00) (0.00) (0.00) Weak identification 8.02 10.66 10.66 10.66 0.0000 0.0000 0.0000 0.0000 Observations 200 212 212 212 R-squared 0.2310 0.4024 0.5478 0.1567 Note: All columns are estimated by 2SLS and include state fixed effects. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table A.5 : Using Distance to Least-Cost GQ Network as Instrument (1) (3) (5) (7) ln(Land Sale, Dep. var. ln(Landgini, 2012) ln(Landless, 2012) ln(Tech use, 2012) 2012) ln(MA1996) 0.188*** 0.653*** 0.334*** 1.983 (0.06) (0.17) (0.10) (2.05) ln(Slope) -0.041 -0.078 -0.074 -0.607 (0.03) (0.10) (0.05) (1.12) Elevation -0.169 0.026 0.264 3.094 (0.14) (0.42) (0.25) (5.19) Crop Suitability -0.003** -0.008** -0.001 -0.010 (0.00) (0.00) (0.00) (0.05) ln(Rain) 0.047 0.137 0.008 -1.237 (0.05) (0.16) (0.09) (2.15) ln(Temperature) -0.326 0.069 0.672 -0.343 (0.37) (1.03) (0.64) (14.30) ln(Rain CV) 0.117 0.409 0.414* 10.793* (0.13) (0.40) (0.23) (5.60) ln(Temp. seasonality) -0.037 -0.083 -0.098 -4.600 (0.07) (0.21) (0.12) (3.21) Pop. density, 1961 0.019*** 0.019 -0.009 -0.165 (0.01) (0.02) (0.01) (0.33) Constant -1.426 -12.200* -10.039** -27.137 (2.18) (6.32) (3.95) (87.85) First Stage Weak Identification 24.9 28.46 28.46 28.46 0.0000 0.0000 0.0000 0.0000 Observations 200 212 212 212 R-squared 0.333 0.365 0.524 0.162 Estimated by 2SLS using predict market access as the instrumental variable. Predicted market access is estimated from a zero stage including railway length, distance to least cost network, and the IVs chosen by Lasso (distance to GQ x Pop. density, 1961; and railway length x Crop suitability). All columns are estimated by 2SLS and include state fixed effects. Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. Table A.6 Summary Statistics Mean Std. Dev. Min Max Dependent Land Gini (index), 2012 0.78 0.13 0.46 0.99 Landlessness (%), 2012 56.36 24.71 4.29 98.63 Technology use (%), 2012 73.19 21.19 9.09 100.00 Land sale (rupees), 2012 495,037 1,427,056 0 12,900,000 Treatment Market access (index), 1996 6,862,575 4,261,530 817,823 22,800,000 Controls Mean Slope (%) 5.74 7.87 0.97 53.20 Elevation (1,000 meters) 0.35 0.35 0.01 2.94 Crop suitability index 63 19 1 100 Rainfall (millimeters) 1,151 678 209 4,157 Rainfall coefficient of variation 254 24 80 289 Temperature (Celcius) 120 21 57 156 Temperature seasonality (st. dev. Celcius) 4421 1687 926 7399 Population density, 1961 0.33 0.71 0.02 9.50 Number of observations 200