Policy Research Working Paper 9908 Nonclassical Measurement Error and Farmers’ Response to Information Reveal Behavioral Anomalies Kibrom A. Abay Christopher B. Barrett Talip Kilic Heather Moylan John Ilukor Wilbert Drazi Vundru Development Economics Development Data Group January 2022 Policy Research Working Paper 9908 Abstract This paper reports on a randomized experiment conducted plot area measures. Second, farmers update asymmetrically among Malawian agricultural households to study non- in response to information, with upward corrections being classical measurement error in self-reported plot area and far more common than downward ones even though most farmers’ responses to new information (the objective plot plot sizes were initially overestimated. Third, the magnitude area measure) that was provided to correct nonclassical mea- of updating varies by true plot area and the magnitude and surement error. Farmers’ pre-treatment self-reported plot direction of initial nonclassical measurement error. Fourth, areas exhibit considerable nonclassical measurement error, the information treatment affects self-reported information most of which follows a regression-to-mean pattern with about non-land inputs, such as fertilizer and labor, indi- respect to plot area, and another 18 percent of which arises cating that the effects of measurement error and updating from asymmetric rounding to half-acre increments. Ran- spill over across variables. Nonclassical measurement error domized provision of GPS-based measures of true plot area reflects behavioral anomalies and carries implications for generates four important findings. First, farmers incom- both survey data collection methods and the design of infor- pletely update mistaken self-reports; most nonclassical mation-based interventions. measurement error persists even after the provision of true This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at k.abay@cgiar.org or tkilic@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Nonclassical Measurement Error and Farmers’ Response to Information Reveal Behavioral Anomalies Kibrom A. Abay†^, Christopher B. Barrett‡, Talip Kilic⁑, Heather Moylan⁑, John Ilukor⁑ and Wilbert Drazi Vundru⁑ JEL Codes: C83, C93, D83, Q12 Keywords: asymmetric learning, inattention, misperception, measurement error, land area, household surveys, Malawi, Sub-Saharan Africa. † International Food Policy Research Institute (IFPRI).^Corresponding author: k.abay@cgiar.org. ‡ Cornell University, cbb2@cornell.edu. ⁑ Living Standards Measurement Study (LSMS), Development Data Group, World Bank, tkilic@worldbank.org; hmoylan@worldbank.org; jilukor@worldbank.org; wvundru@worldbank.org. The authors thank Brian Dillon, Sylvan Herskowitz, Ben Norton and participants at the IPA/Northwestern conference for comments on an earlier draft, and the Malawi National Statistical Office (NSO) management and field staff for their hard work in survey implementation. This paper was produced with the financial support from the World Bank LSMS Program (www.worldbank.org/lsms) and the 50x2030 initiative (www.50x2030.org), a multi-partner program that seeks to bridge the global agricultural data gap by transforming data systems in 50 countries in Africa, Asia, the Middle East and Latin America by 2030. Any remaining errors are the authors' sole responsibility. 1. Introduction Household survey data commonly exhibit considerable nonclassical measurement error (NCME) (Bound et al., 2001). This is true even for key assets and factors of production, like agricultural land, that are readily measurable and observable to the survey respondent, and that heavily affect farmers' incomes and livelihood. The measurement error literature largely treats NCME as an econometric challenge to overcome. This would be appropriate if NCME arises purely due to respondents misreporting their true, accurate beliefs about land holdings and other key variables. In that case, NCME carries no implications for behavior and respondents’ decision-making processes; it is merely a statistical nuisance. Abay et al. (2021) show, however, that a large share of NCME in plot sizes reported by farmers in four African countries appears to reflect farmers' reporting mistaken beliefs, not misreporting of accurate beliefs. 1 Such findings suggest that NCME in household survey data not only matters for statistical inference but can also shed light on respondents’ decision-making processes and behaviors in ways that may reveal actionable information, consistent with a longstanding behavioral economics literature that routinely finds that people often act upon mistaken beliefs or misperceptions (Tversky and Kahneman, 1973; Kahneman and Tversky, 1984; Angner and Loewenstein, 2012). If at least some NCME reflects mistaken beliefs, then uncovering the nature and sources of NCME and their prospective implications for measurement and policy design becomes important for multiple reasons. First, mistaken beliefs might reveal smallholder farmer behavioral phenomena that drive these errors, such as inattention, self-esteem, or confirmation bias. As a burgeoning literature on “choice architecture” underscores, organizations worldwide increasingly design policies around behavioral anomalies (Thaler and Sunstein, 2009; Ariely, 2016), but the agricultural development community has been slow to do so. Second, we know little about whether and how farmers incorporate objective information to correct mistaken beliefs, although this surely matters for information-based interventions, such as agricultural extension programming, cadastral surveys, and market information services. If there exists heterogeneity or asymmetries in learning – i.e., the updating of reported beliefs – or learning 1 Berazneva et al. (2018), Burke et al. (2020), Wineman et al. (2020), Michelson et al. (2021) and Wossen et al. (2021) provide similar evidence of misperception in soil quality, plot size, crop variety, chemical fertilizer quality, and crop variety, respectively. 2 failures – i.e., limited or no updating of mistaken beliefs after receiving accurate information – that should inform the design and investment in interventions intended to mitigate misperceptions. Third, while conventional wisdom suggests that information campaigns can ameliorate farmers’ mistaken beliefs, the empirical validity of this assumption remains largely untested. Because some behavioral phenomena that may generate NCME (e.g., inattention, confirmation, and self-esteem bias) might also obstruct learning and updating of mistaken beliefs, implying that the effectiveness of information interventions may be limited and might vary across observable and unobservable characteristics of farmers. Different farmers may pay attention to different technological features – or 'objects' – as the multi-object and selective attention learning literatures emphasize (Gabaix et al., 2006; Hanna et al., 2014; Schwartzstein, 2014; Ghosh, 2016; Wolitzky, 2018; Gabaix, 2019; Nourani, 2019; Kohlhas and Walther, 2021; Maertens et al., 2021). Fourth, mistaken beliefs about one object may spill over to affect beliefs and decisions about other objects. As such, the provision of information that is meant to correct NCME in self- reported plot areas may subsequently impact a farmer’s reporting on other objects during the same household survey interview (as we demonstrate concerning fertilizer and labor inputs), and, given that some NCME reflects mistaken beliefs, prospectively the farmers' future decisions regarding these objects (which is beyond the scope of this paper). Both scenarios have implications for measurement, inference and policy. Understanding whether and how farmers respond to new information about the size and quality of one production input by adjusting their reporting of other agricultural inputs can help us understand survey data generating processes and farmers’ decision- making process. 2 Misinformation spillovers generate correlated NCME, which considerably complicates econometric correction for measurement error, because incomplete correction of measurement error in the presence of correlated NCME can aggravate rather than reduce bias in key parameter estimates (Abay et al., 2019). We embedded an information experiment within an agricultural household survey in Malawi, providing an uncommon opportunity to study the nature of NCME and learning in response to corrective information to subjects' erroneous self-reports regarding cultivated land areas. Part of the appeal of this design is that most economics studies on inattention, learning, or 2 For example, if providing farmers with GPS measures of their plots' size affects self-reported input use, introducing GPS-based plot area measurement in follow-up longitudinal data collection (as part of national panel surveys or impact evaluation studies) may compromise the inter-temporal comparability of self-reported, non-land variables. 3 confirmation, self-esteem or self-serving bias study prediction tasks concerning choice outcomes – e.g., the returns to stock choices, entrepreneurial efforts, technology adoption, etc. or the welfare gains from consumption choices among goods or services (Foster and Rosenzweig, 1995; Handel, 2013; Hanna et al., 2014; Handel and Kolstad, 2015; Bhargava et al., 2017; Hastings et al., 2017; Kohlhas and Walther 2021). However, estimation of respondent learning about outcomes requires accurately specifying the outcome data generating process, typically represented as a production, cost, or profit function. The possibility always exists that a respondent's beliefs accurately reflect unobservable farmer attributes that cause their outcome distribution to deviate from the analyst's estimates. Our experimental design, by contrast, rules out the possibility of unobserved heterogeneity by studying directly measurable, observable, and valuable agricultural inputs. No unobservables should materially influence answers to questions such as: what is the size of this plot? Nor should it influence responses to questions about how much fertilizer or labor the farmer applied to the plot before or during planting. Errors in reporting observable production inputs almost surely reflect either misreporting or systematic behavioral errors that lead to mistaken beliefs. NCME in directly measurable inputs are less likely confounded by other, unobserved arguments to the respondent's mental model of the relevant data generating process. Furthermore, failure to update completely in response to demonstrably accurate, corrective information about an objectively verifiable value, such as the size of a plot, signals asymmetric or incomplete learning – or even complete learning failures, i.e., no updating at all – carries important implications for information-based interventions, such as extension messaging, market information services, cadastral surveys, public health, and nutrition education. The literature on attention and learning typically focuses on learning about outcomes, technologies, or some other phenomena that are not directly observable. By studying response to corrective information about a directly observable variable, we offer a more direct test of incomplete learning of various sorts. Consistent with many prior studies (e.g., Carletto et al., 2013; 2015; 2017; Kilic et al. 2017; Abay et al. 2019; Abay et al., 2021; Dillon et al., 2019), we find pervasive, mean-reverting NCME in farmers' self-reported plot areas and considerable asymmetric rounding to easy-to-remember half-acre increments. In view of the importance of land for social status and income generation in these agrarian communities, these findings suggest behavioral anomalies that have been documented in other contexts, in particular, inattention to important details and self-esteem bias. 4 Furthermore, the analysis reveals that farmers' updating of mistaken beliefs in response to information on their true plot size is remarkably incomplete and asymmetric, indicating a greater willingness to adjust beliefs up than down. Updating is stronger only among larger plots, but still asymmetric. These patterns are likewise consistent with inattention as well as confirmation and self-esteem biases. Moreover, the information treatment affects self-reported information on other, non-land inputs, such as fertilizer and labor, consistent with the hypothesis that farmers employ simplifying mental models – e.g., optimal prediction error (Hyslop and Imbens, 2001) – to track these variables. This implies that NCME in one production input likely propagates to other production inputs, generating correlated measurement error, which further complicates econometric correctives, because replacing an erroneous self-reported variable with an accurate measure of the same variable can aggravate rather than reduce bias in regression coefficient estimates if one cannot also correct for the correlated NCME in other variables (Abay et al., 2019). The scale and persistence of the NCME we observe almost surely has distributional and welfare implications, although estimating those effects falls beyond the scope of this paper. 2. Experimental Design and Data The data come from a randomized experiment that was embedded into the Malawi National Crop Cutting Study (NCCS), which was implemented by the National Statistical Office (NSO) in 2019/20, in collaboration with the World Bank’s Living Standards Measurement Study (LSMS) team. The NCCS was implemented in a national sample of 72 enumeration areas (EAs) selected at random from the sample of EAs that were scheduled to be visited by the Fifth Integrated Household Survey (IHS5) in the months of December 2020 and January - February 2021. 3 In each EA, 24 maize cultivating households were selected at random from the universe of maize cultivating households identified through a full household listing in each EA. Of the sampled households, 16 were selected at random for a separate crop cutting experiment and were subject to two visits (post-planting and crop cutting/post-harvest) – these comprise our treatment group – while the remaining 8 households were subject to a single, post-harvest visit – they serve as our control group. 3 The IHS5, a nationally representative household survey, ran from April 2019 to April 2020, covering a sample of 11,472 households in 717 EAs. To access the anonymized survey data and documentation from the IHS5, please visit: https://microdata.worldbank.org/index.php/catalog/3818. 5 During the first, post-planting survey visit, each treatment group household completed a short parcel-plot-crop-level module on farm organization, which listed all parcels and plots within, in accordance with the IHS5 parcel and plot definitions. 4 Once the roster of parcels and plots was completed, one maize plot was selected at random. The manager of this plot was the target respondent for questions on the rest of the activities occurring on this plot. The target respondent then self-reported the plot area. Subsequent to the respondent self-reporting the selected plot size, the enumerator and respondent visited that plot together. The enumerator measured the plot area with a handheld Garmin eTrex 30 GPS unit, 5 then recorded and shared with the farmer both the GPS-based plot area and the measurement error in farmer-reported area vis-à-vis the GPS-based counterpart (both in levels and as a share of GPS-based plot area). 6 This was the information treatment, a demonstrably accurate measure of the plot area. 7 After finishing the plot visit, the enumerator and the farmer returned to the dwelling to administer the rest of the post-planting questionnaire, which asked the farmer to self-report labor, fertilizer and other inputs used on the plot before and during planting. During a second, post-harvest visit to each treatment household, 8 the respondent was asked again to report the selected plot area, whose GPS-based measure had been shared with the respondent during the post-planting visit. If the self-reported plot area was different than the GPS- based plot area, the respondent was asked again his/her recollection of the GPS-based plot area. The respondent then re-reported all labor and non-labor inputs on the plot, in view of the possibility of non-labor input applications not having been finalized at the time of the post-planting interview. Conversely, control group households received a single, post-harvest visit during which they completed a unified agricultural questionnaire, including the same self-reporting of plot size, 4 A parcel is defined as a continuous piece of land that is not split by a river or a path wide enough to fit an oxcart or vehicle. A plot is continuous piece of land on which a unique crop or a mixture of crops is grown, under a uniform, consistent crop management system. 5 After walking the perimeter of a given plot with the plot manager to identify the boundaries, the enumerators measured the area with the GPS unit. The NSO enumerators were experienced users of the handheld GPS technology, which was adopted by the NSO for land area measurement in 2010 in the context of the Third Integrated Household Survey and the Integrated Household Panel Survey (IHPS). 6 Once the GPS-based plot area information was imputed into the Survey Solutions Computer-Assisted Personal Interviewing (CAPI) application, the measurement error was calculated and displayed automatically. 7 We cannot gauge the extent to which the sampled farmers trusted the GPS-based plot area measures and therefore cannot test whether mistrust of objective evidence might help explain highly imperfect updating of self-reported plot area. 8 The post-harvest visit was scheduled according to the households’ harvest calendars, as the primary purpose of this visit was to harvest and weigh the crops on pre-designated crop cut sub-plots on the selected maize plot. 6 in one sitting prior to any plot visits. The only contact with these households was made during the time that the field teams returned to the EA for the post-harvest visit to the treatment households. At the conclusion of each interview, one maize plot was then selected at random, and the enumerator accompanied the farmer to the selected maize plot, whose area and plot outline was obtained using the handheld GPS device. This protocol mirrors the current interview flow in the household surveys that have been supported by the World Bank Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) initiative, including the IHS5 for Malawi. Table A1 provides an overview of the fieldwork implementation timeline, and the distribution of interviews with treatment and control households over time. 3. NCME as a Window on Behavioral Anomalies NCME in self-reported plot areas has been widely reported (e.g., Carletto et al., 2013; Carletto et al., 2015; Dillon et al., 2019; Abay et al., 2019; Gourlay et al., 2019, Abay et al. 2021). This analysis corroborates prior findings and allows us to make more nuanced observations and link these patterns to behavioral anomalies observed in the broader economics literature. These findings lay the foundation for section 4's exploration of the causal impacts of providing objectively verifiable information that might resolve NCME. Table 1 reports the descriptive statistics for sampled households and plots. Most households are male-headed and literate with about three-quarters of them relying on farming as main source of livelihood. On average, the plots are half an acre while the average farm size is roughly 1.5 acres. Table 1 shows that the randomization worked. Most of the observable characteristics are balanced across the treatment and control groups, and we cannot reject the null hypothesis of jointly zero coefficients associated with the regressors in Table 1. 9 Most importantly, pre-treatment self-reported plot area appears to be statistically similar across the control and treatment group plots. 9 The F-test statistic for the regression of the treatment indicator variable on the characteristics listed in Table 1 equals 1.14, with p-value=0.29. 7 Table 1: Balance between control and treatment groups Control group Treatment group Mean No. obs. Mean No. obs. Mean difference Household head male (0/1) 558 0.738 977 0.713 0.025 Age of household head 558 44.26 977 44.663 -0.403 Household head literate (0/1) 558 0.76 977 0.738 0.022 Household head married (0/1) 564 0.739 982 0.732 0.007 Household head Christian (0/1) 558 0.81 977 0.801 0.009 Household head main occupation (farming) (0/1) 564 0.761 982 0.746 0.014 Household engaged in nonfarming (0/1) 564 0.124 982 0.119 0.005 Area: self-reported, post planting (acre) 564 0.642 982 0.610 0.032 Area: GPS (acre) 564 0.558 982 0.507 0.051* Area: self-reported, post planting (log, acre) 564 -0.755 982 -0.796 0.041 Area: GPS (log, acre) 564 -1.042 982 -1.119 0.077 Farm size (acre) 564 1.474 982 1.398 0.076 Plot acquired through local admin or inherited 564 0.243 982 0.22 0.023 Plot acquired through rental or purchase 564 0.073 982 0.116 -0.043*** Plot under customary tenure system 564 0.832 982 0.833 -0.001 Pure stand cropping (0/1) 564 0.426 982 0.412 0.013 Soil type sandy or clay (0/1) 564 0.555 982 0.601 -0.046* Soil color red or brown (0/1) 564 0.381 982 0.376 0.005 Slope of plot is flat (0/1) 564 0.644 982 0.678 -0.035 Soil quality good (0/1) 564 0.495 982 0.529 -0.034 Soil texture fine or very fine (0/1) 564 0.413 982 0.395 0.018 Soil texture coarse or very coarse (0/1) 564 0.138 982 0.153 -0.014 Notes: This table compares characteristics of control and treatment plots using information collected before the treatment. The last column provides mean differences across treatment and control group characteristics. * p < 0.10, ** p < 0.05, *** p < 0.01. Table 2 reports the distribution of plot sizes and measurement error in the post-planting and post-harvest interviews. We present these discrepancies across quartiles of "true" plot size (based on GPS measurement) and report differences in terms of biases relative to the true size. � − ( � )∗100 That is, for each quartile and survey round we compute, (i) : = � for the � ( � − )∗100 post-planting, pre-treatment round and (ii) : = � for the post-harvest, post-treatment round. 10 Although the overall relative bias (at the sample mean) in self-reported plot size in the post-planting survey is just 18 percent, this disguises a strong, systematic pattern across the plot size distribution. Farmers overestimate the size of the smallest quintile of plots by 10 � stand for average self-reported plot size in the post-planting and post-harvest interviews, � and respectively. � stands for average true plot size (measured using GPS device). 8 an average of 194 percent while they underestimate the largest quintile's plot sizes by an average of 11 percent. These findings are consistent with previous studies showing that self-reported area measurement suffers from regression-to-mean biases: overestimation for smaller plots and underestimation for larger plots (e.g., Carletto et al., 2013; Carletto et al., 2015; Dillon et al., 2019; Abay et al., 2019; Gourlay et al., 2019). There is not much difference between reporting biases in the post-planting and post-harvest periods, except for the last quintile. We investigate this fact further in the next sections. Considerable average overestimation is one striking feature of self- reported plot areas. Table 2: Discrepancies between self-reported (SR) and GPS-based plot size measures Plot size Observations Mean Mean Mean Relative bias Relative bias quartile SR: PP GPS SR:PH (%): PP (%): PH 0-25% 387 0.28 0.10*** 0.28 193.73 198.52 25-50% 389 0.46 0.26*** 0.47 74.03 76.85 50-75% 390 0.66 0.54*** 0.66 23.93 23.23 75-100% 380 1.09 1.22*** 1.14 -10.57 -6.30 Total 1,546 0.62 0.53 0.64 18.27 21.10 Notes: GPS stands for area measurement using handheld Global Positioning Systems, while SR stand for self-reported plot size in acres. PP stands for post planting visit while PH stands for post-harvest visit. These are quartile-specific mean and relative biases as a percent of (mean) GPS-measured plot size. *** Represents statistical test differences between GPS values and self-reported value in the post-planting survey. Figure 1 shows that rounding is a major source of measurement error in self-reported plot area. Moreover, Figure 1 shows that heaping at focal points remained pervasive even after the true plot areas and the extent of measurement error were shared with treatment group farmers. About half of the self-reported plot sizes assume either of the four rounded values (0.5, 1.0, 1.5 or 2.0) in the post-planting and post-harvest interviews. 11 The slight differences in the distribution of plot sizes between the post-planting and post-harvest rounds appears to be only for those plots greater than 1 acre, where rounding at 1.5 and 2 acres slightly diminishes in the post-harvest survey. Appendix Table A2 reports Kolmogorov-Smirnov tests for differences between each pair of distributions in Figure 1. These non-parametric tests suggest that we cannot reject the null hypothesis of no significant differences between self-reported values across control and treatment groups, and between self-reported pre-treatment and post-treatment values within the treatment group. However, we can clearly detect statistically significant differences between self-reported 11 The corresponding share associated with GPS-measured values is less than 3 percent. 9 and GPS values, both for the control and treatment group plots. We explore some of these differences in the next section. Figure 1: Distribution of self-reported and GPS plot size measures (a) Control group (b) Post-planting visit:treatment (c) Post-harvest visit:treatment .5 .5 .5 .4 .4 .4 .2 Fraction .3 .2 Fraction .3 .2 Fraction .3 .1 .1 .1 0 0 0 0 .5 1 1.5 2 0 .5 1 1.5 2 0 .5 1 1.5 2 GPS- Self-reported GPS Self-reported GPS Self-reported Furthermore, the heaping around focal points is quite asymmetric. Figure 2 displays the distribution of GPS values across the most common self-reported focal values. The mode and median of GPS plot size measures consistently fall below the rounded, self-reported area. Farmers systematically asymmetric (upward) round at all plot sizes, although far more acutely for smaller plots, those less than 1 acre. Farmers are more likely to round upward than downward, but that likelihood decreases with true plot size. Overall, more than twice as many farmers overestimated their plot size than underestimated it (647 of 982 plots). 10 Figure 2: Distribution of GPS-based plot sizes at selected self-reported rounded intervals (post-planting) 2 Density 1.51 .5 0 0 .5 1 1.5 2 2.5 3 3.5 Acres Distribution of GPS at 0.25 acres Distribution of GPS at 0.5 acres Distribution of GPS at 1 acres Distribution of GPS at 1.5 acres Distribution of GPS at 2 acres We combine these features to explore parametrically the multivariate patterns in self- reported plot size measurement error. Table 3 reports the results from the regressions of (i) measurement error, computed as logarithmic differences between self-reported and objective measures, and (ii) overestimation and underestimation rates, each computed as the non-negative percentage difference between self-reported and GPS measure, as a function of observable plot- level and household-level characteristics. We also report the Shapley decomposition of the explained variation (measured by R2) in measurement error over groups of regressors (Huettner and Sunder, 2012). 11 Table 3: Characterizing measurement error in plot size (post-planting round only) (1) (2) (3) (4) Log (SR)-log(GPS) %Overestimation %Underestimation OLS estimates Shapley Tobit estimates Tobit estimates Plot size 78.72% Log (area: GPS)-centered -0.680*** -210.759*** 37.617*** (0.017) (6.953) (1.859) Log (area: GPS)-centered-square -0.014 11.307*** 8.928*** (0.011) (4.097) (1.039) Rounding of values 18.28% Rounding at 0.5 acre 0.419*** 120.048*** -19.058*** (0.034) (11.903) (2.781) Rounding at 1 acre 0.905*** 270.761*** -58.441*** (0.042) (15.149) (4.025) Rounding at 1.5 acre 1.186*** 354.823*** -81.165*** (0.070) (24.455) (6.902) Rounding at 2 acres 1.430 *** 401.143 *** -92.946*** (0.096) (32.847) (9.508) Household characteristics 1.69% Plot characteristics 1.32% Constant 0.006 -94.661* -11.636 (0.160) (55.243) (13.848) Controls Yes Yes Yes Mean dependent variable 0.313 96.248 10.423 R2 0.548 100% No observations 1535 1535 1535 No. censored observations - 554 1035 Notes: The first column provides OLS estimates, and the second column reports Shapley decomposition associated with the R2 in the first column. The third and fourth columns are Tobit estimates. Prior to expressing it natural logarithmic terms, GPS-based plot size was demeaned to center the data. Household characteristics include the female identifier, age, literacy, religion, marital status, and non-farm work status of the household head, as well as total farm size and number of plots managed. Plot characteristics include indicator variables for tenure status, rental or owned, pure stand cropping, soil type, and slope. Standard errors, clustered at enumeration area level, are given in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01. Appendix Table A4 reports full regression results. Several important findings stand out from Table 3. First, measurement error is not random (i.e., classical) but rather is strongly correlated with a range of observables, as manifest in an R2 of 0.55 in column 1. Second, measurement error is negatively and significantly correlated with true plot size, implying that larger (smaller) plots are more likely to be underestimated (overestimated), confirming prior findings of regression-to-mean patterns in area NCME. True plot size is the primary variable associated with NCME, accounting for 79 percent of the explained variation in measurement error. Third, the coefficient estimates on rounding indicator dummy variables for a self-reported plot size of 0.5, 1.0, 1.5 or 2.0 acres are statistically significant in all specifications, negatively (positively) correlated with under- (over-)estimation rates, confirming that farmers are more likely to round up than down (i.e., asymmetric focal point bunching). These 12 rounding indicators together explain 18 percent of measurement error. Farmers with significantly larger and more plots are statistically significantly less likely to incorrectly report a rounded plot size (Appendix Table A3). Fourth, other observable plot and household level characteristics explain less than 3 percent of the variation in measurement error. Even though some such characteristics – e.g., farm size, the tenure status of the plot – are statistically significantly associated with plot size measurement error (see Appendix Table A4), these collectively make little difference to explaining measurement error. Fifth and finally, overestimation is proportionately far greater than underestimation; the mean overestimation is roughly double the true plot size while the mean underestimation is only 10 percent. Systematic errors in self-reported plot areas are consistent with several behavioral phenomena identified in the broader behavioral economics literature. Various formal models exist to help isolate one or another of these phenomena. Because we study empirically several such phenomena at once, exploiting a field experiment to address prospective confounders, we remain agnostic as to which among many candidate structural models best explains the data, and eschew construction of a unified model of behavioral phenomena related to farmers' beliefs about salient features of their livelihoods. Rather, we tackle several key behavioral phenomena in turn, starting with inattention to salient information, then working through self-esteem and confirmation bias, in each case using the field experimental data to show the concepts' salience to smallholder farmers' reporting on their agricultural activities. The first behavioral anomaly apparent in the NCME patterns is inattention to salient, observable information. Inattention to detail may be rational, in the sense that the costs of expending mental energy, space and time on remembering fine details may exceed the corresponding benefits of accurate recall of more granular information (Sims, 2010; Kohlhas and Walther 2021). Or maybe humans' memory is just imperfect, and people pay only selective attention to even key details they would benefit from remembering (Kahneman, 1973; Mullainathan, 2002; Schwartzstein, 2014; Gabaix, 2017). We cannot identify why our survey respondents appear inattentive to directly observable and highly salient information – the size of the plot they cultivate – i.e., whether NCME reflects frictions, mental gaps or both (Handel and Schwartzstein, 2018). However, only 31 out of 982 farmers (3.1 percent) had accurately self- reported their plot size (prior to the information treatment). 13 One would not expect farmers to exhibit uniform inattention to plot sizes for the simple reason that the costs of inattention likely vary predictably with observable farmer, farm, and plot characteristics. In particular, respondents who are more likely to incur greater financial or material losses from holding mistaken beliefs about farm size – e.g., those with larger plots – might be more likely to hold accurate beliefs that inform their production and marketing choices, and thus their incomes. Consistent with Kohlhas and Walther's (2021) model of asymmetric attention, farmers with larger plots are therefore less likely to round and exhibit measurement error of smaller relative magnitude. This may help explain the strong relationship between NCME and plot size. Another natural result of inattention will be focal point heaping in reported plot sizes. If remembering detailed information is costly (i.e., rational inattention) or if people just do not bother to pay attention to key details then they likely do not respond "I don't know" but rather self-report, and perhaps believe, a simple proxy measure. 12 For example, a respondent who cultivates a 0.824- acre plot or a 1.107-acre plot may believe and report its size as one acre – leading to heaping around focal points that we observe in Figure 1. 13 The mental cost of retaining precise information may not be the only reason for a farmer's mistaken beliefs about the amount of land she operates. People might favor false beliefs that boost their self-esteem. 14 A vast psychology literature finds that people routinely hold beliefs that boost their self-esteem (Pyszczynski et al., 2004). 15 In smallholder farming communities, land is not merely a critical production input; it is also a source of status and identity. If a farmer's utility rises with her (perhaps mistaken) belief in the size of her own plot or farm, then the (psycho-emotional) gains from mistakenly believing an inflated estimate of one's plot size may exceed the (material or 12 The survey also inquired about whether the plots had been measured in the past. Only 10 percent of treatment and control plots had ever been measured by any method, and of those, only 20 percent (i.e., 2 percent of our sample) had been measured with GPS. This pattern is balanced across the treatment and control groups, and while not shown here, the incidence of the plot having been measured by any method in the past is not a significant predictor of NCME. Those results are available upon request. 13 Note that random misreporting would exhibit a regression-to-mean pattern, with no focal point bunching. There could be random misreporting around focal points, although in our data, the drop off from the focal point to surrounding values appears far too sharp for random misreporting to play a major role. 14 Self-esteem bias is closely related to – arguably, it is a sub-set of – self-serving bias that has been widely studied and relates to the seemingly-unfounded confidence people often exhibit in their own ability and accomplishments (Camerer and Lovallo, 1999). We focus on self-esteem as the benefit that comes from overconfidence that may come at a price, as when people exaggerate their ability to pick winners in financial markets or to succeed in a job that requires technical skills. But where overconfidence in one's ability may actually improve performance (Compte and Postlewaite, 2004; Rabin and Vayanos, 2010; Rosenqvist and Skans, 2015) in ways that seem less likely for a farmer holding mistaken beliefs about her plot size. 15 For example, students routinely overstate their achievements, resulting in the well-known "Lake Woebegone effect" (Maxwell and Lopus, 1994). 14 financial) costs of acting on erroneous information. Self-esteem bias could thus be rational, in the sense that it is a natural (if perhaps subconscious) choice in response to the non-material returns to retaining mistaken beliefs. The asymmetric errors and asymmetric focal point bunching evident in the data might reflect self-esteem bias, i.e., respondents are more likely to round up than to round down to the nearest simple fraction or integer because they feel better overstating rather than understating their land holdings. Misreporting or pure rational inattention should have symmetric effects on NCME in plot size. The patterns of NCME evident in farmers' misreporting/misperception of a readily observable and measurable variable that matters a great deal to their livelihood seems to be a matter not only of concern for survey measurement and statistical inference, but also of interest for the behavioral insights they offer that might help inform policy design. Those insights are further corroborated by the experimental results from the information treatment we ran among these farmers. 4. Incomplete, Heterogeneous, and Asymmetric Learning We observe plot size measurement error both before and after treatment group farmers observed the GPS measurement of their plot and were told the true area and the measurement error in the self-report they had provided earlier in that interview. Thus, whatever measurement error existed in the first survey round was easily and fully correctable before the subsequent round, which was fielded three to four months later. Studying treatment group farmers' response to that information treatment, on its own and in comparison to the control group, reinforces the prior section's suggestion of ubiquitous behavioral phenomena, especially inattention and self-esteem bias, compounded by confirmation bias. Table 4 reports the distribution of measurement error for the control group and the treatment group pre-treatment (PP), as well as the treatment group post-harvest (PH) measure following the information treatment. Because we have already seen that NCME is strongly associated with plot size, we disaggregate results across plot size quartiles. In the absence of information treatment both control and treatment group farmers report statistically similar error in self-reported plot size for all plot size quintiles, consistent with the experimental balance we already established between the control and treatment groups (Table 1). Table 4 shows that pre- treatment the size of the error in self-reported plot size are statistically similar. As reflected by the 15 significance indicators (* and #) in the last column of Table 4, the information treatment clearly affected the share of plots with under/overestimated plot sizes and differentially across the plot size distribution. Table 4: Distribution of measurement error in plot size, before and after treatment Treatment group Plot size Control group quartile Obs. Mean Obs. Mean: PP Mean: PH Share of plots overestimated (binary): Q1 141 0.865 246 0.907 0.886 Share of plots underestimated (binary): Q1 141 0.128 246 0.077 0.061* %Overestimation Q1 141 229.455 246 259.170 266.113 %Underestimation Q1 141 5.052 246 4.431 3.047 Share of plots overestimated (binary): Q2 125 0.752 264 0.746 0.705 Share of plots underestimated (binary): Q2 125 0.216 264 0.220 0.208 %Overestimation Q2 125 92.405 264 78.352 80.046 %Underestimation Q2 125 4.560 264 5.393 4.046 Share of plots overestimated (binary): Q3 136 0.493 254 0.559 0.488 Share of plots underestimated (binary): Q3 136 0.478 254 0.413 0.362* %Overestimation Q3 136 38.732 254 36.952 33.13 %Underestimation Q3 136 14.127 254 11.394 7.49*[#] Share of plots overestimated (binary): Q4 162 0.383 218 0.390 0.376 Share of plots underestimated (binary): Q4 162 0.599 218 0.560 0.39*[#] %Overestimation Q4 162 15.333 218 11.966 8.085* %Underestimation Q4 162 20.790 218 19.586 9.484*[#] Notes: Overestimation and underestimation rates are computed as percentage difference between self-reported and GPS measures for values above zero and zero otherwise. That is, for each over(under)estimated plot we compute percentage difference between self-reported and objective measures. Standard errors in parentheses. Q1, Q2, Q3, Q4 stands for first, second, third and fourth quintiles of plot size. PP stands for post-planting round and PH stands for post-harvest round. * indicates that differences between control and treatment group are statistically significant while # implies that differences between post-planting and post-harvest values (for the treatment group) are statistically significant. Three important patterns stand out from Table 4. First, adjustment in response to new information appears highly asymmetric. Farmers who underestimated their plot size initially are far more likely to update their answer, and by a larger amount, than those who had initially overestimated their plot size. Second, farmers with larger plots appear to respond more than those with smaller plots; differences between treatment and control group plot size measurement errors are only significant for the last two quartiles. Third, among the treatment group farmers, those with larger plots are more likely to update and correct their pre-treatment errors, as shown by the significant differences between pre-treatment and post-treatment values for the third and fourth 16 quartiles. Consistent with other patterns of asymmetry already reported, farmers' revisions to their self-reported plot size are more pronounced among those farmers who underestimated their plots. Incomplete updating is apparent in the scatter plot and nonparametric kernel regression displayed in Figure 3, which plots the percentage measurement error in the post-treatment (post- harvest) survey round on the vertical axis, against the pre-treatment measurement error. If the information treatment fully eliminated NCME, the scatter plot would be a horizontal line at the zero mark on the vertical axis. Only 13 percent of treated households report plot size accurately after receiving the GPS measure of the plot. Notably, those observations are heavily concentrated among respondents whose initial measurement error was modest, within roughly the [-50,50] interval. Meanwhile, only another 28 percent of households exhibit some correction of mistaken beliefs, as shown by the clustering of points around the zero value on the vertical axis and within the range bounded by zero and the post-planting measurement error. A plurality of farmers (37 percent) did not change their incorrect beliefs, which remained identically wrong over time, as depicted by the observations along the 45-degree line. It is striking that nearly three times as many farmers did not change their mistaken beliefs at all than fully updated to the correct plot size based on measurement they witnessed on an observable variable. The nonparametric regression indicates a strong, positive correlation between pre- and post-treatment measurement errors, but of shallower slope than the 45-degree line, signaling incomplete adjustment to corrective information, indicating partial learning failures. Figure 3 reveals two other linear relationships in the pre- and post-treatment data. These reflect the propensity of respondents to report round values for plot area. To see this, define the self-reported plot size from survey round t (PP or PH), SRt. true value, T (which does not vary over time, thus no subscript), and rounded values, Ri (with i indexing different positive integer multiples of 0.5, e.g., 0.5, 1.0, 1.5, 2.0). Then, for respondents who adjust from SRPP=R1 to SRPH=R2 with R1≠ R2, R1≠T and R2≠T, a linear relationship between measurement error in post- harvest and post-planting emerges with a slope R2/R1≠1 as one varies T. For R1>R2 (R1