Policy Research Working Paper                9745




            Agricultural Data Collection
          to Minimize Measurement Error
              and Maximize Coverage
                         Calogero Carletto
                          Andrew Dillon
                          Alberto Zezza




Development Economics
Development Data Group
July 2021
Policy Research Working Paper 9745


  Abstract
 Advances in agricultural data production provide ever-in-                          sources of measurement error and coverage bias in agri-
 creasing opportunities for pushing the research frontier in                        cultural data collection. Second, it provides examples of
 agricultural economics and designing better agricultural                           how agricultural data structure affects testable empirical
 policy. As new technologies present opportunities to create                        models. Finally, it reviews the challenges and opportunities
 new and integrated data sources, researchers face trade-offs                       offered by technological innovation to meet old and new
 in survey design that may reduce measurement error or                              data demands and address key empirical questions, focusing
 increase coverage. This paper first reviews the economet-                          on the scalable data innovations of greatest potential impact
 ric and survey methodology literatures that focus on the                           for empirical methods and research.




 This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may
 be contacted at gcarletto@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
                            Agricultural Data Collection
           to Minimize Measurement Error and Maximize Coverage


                                     Calogero Carletto
                                       World Bank


                                      Andrew Dillon
                                 Northwestern University


                                      Alberto Zezza
                                       World Bank




Keywords: Agriculture, Measurement Error, Sampling Error, Survey Design, Data Collection
Acknowledgments: We are fortunate to have close collaborators who have greatly
influenced our thinking on agricultural data collection including past and present
colleagues at the World Bank Living Standards Measurement Study team, national
statistical office collaborators, Innovations for Poverty Action, and the Global Poverty
Research Lab. We are grateful for comments on the draft from two anonymous
reviewers, Leah Bevis, Sarah Kopper, Karen Macours, Christopher Udry, and the
Handbook’s editors Christopher Barrett and David Just. We appreciate research support
from Raka Banerjee. This working paper is a pre-typeset version of a chapter prepared
for the Handbook of Agricultural Economics, Volume 5: Agricultural Production and
Research Methods, edited by Chris Barrett and David Just.




                                        2
                                                             Table of Contents

1.     Introduction ......................................................................................................................................... 4
2.     Minimizing Measurement Error ....................................................................................................... 8
     2.1.     Questionnaire design ................................................................................................................. 13
     2.2.     Interviewer effects ...................................................................................................................... 16
     2.3.     Respondent effects...................................................................................................................... 17
     2.4.     Mode of data collection .............................................................................................................. 20
     2.5.     Processing errors........................................................................................................................ 21
3.     Trade-offs in Maximizing Coverage ................................................................................................ 22
     3.1.     Sampling frame .......................................................................................................................... 24
     3.2.     Units of analysis ......................................................................................................................... 26
     3.3.     Survey timing.............................................................................................................................. 30
     3.4.     Mode of data collection .............................................................................................................. 31
     3.5.     Attrition....................................................................................................................................... 33
4.     Empirical Specification, Data Structure, and Measurement Error ............................................. 34
     4.1.     Profit and production functions ................................................................................................ 34
     4.2.     The agricultural household model ............................................................................................ 38
5.     Advances in Data Collection ............................................................................................................ 40
     5.1.     Advances in selected thematic areas.......................................................................................... 41
     5.2.     Advances in data collection modes and data structures ........................................................... 60
6.     Conclusions ........................................................................................................................................ 68
Bibliography .............................................................................................................................................. 72




                                                                              3
   1.      Introduction


In the past two decades, innovations in data systems have led to the production of more real-time,
disaggregated, and interoperable data on agriculture than ever before. Increasing data demands and
emerging policy questions are driving much of this innovation, with fast technological change and
methodological advances providing an opportunity to collect more and better data at lower costs
(Akogun et al., 2020; Carletto et al., 2015; Dillon et al., 2021a; Kosmowski et al., 2019; Liao,
2018; Lobell et al., 2019). Investments in country-level data infrastructure have enabled new
approaches to methodological innovation, such as incorporating randomized control trials into
national panel data collection or devising improved methods to ensure greater data interoperability.
Meanwhile, new types of data – such as remote sensing data and citizen-generated data – and new
technologies – such as portable sensors, DNA fingerprinting, and computer-assisted personal
interviewing (CAPI) – provide unparalleled prospects for collecting and analyzing a wide array of
agricultural constructs in a more granular, timely, and cost-effective manner. These advantages are
further enhanced by integrating new types of data with traditional data sources such as household
surveys, censuses, and administrative data.


While other data sources are becoming increasingly important, household and farm surveys are
likely to remain the centerpiece of policy research for agricultural and development economists.
Not only are household surveys a key data source in their own right, but they serve as interoperable
complements and validation instruments for other data sources, such as for the ground-truthing of
remote sensing data, or for the ex-post adjustment of bias in studies based on citizen-generated
and other non-probability data. Emerging literature on a wide array of agricultural measurement
issues in land, production, and gender analysis has relied upon innovations in survey design, as
fostered in the past decade through data initiatives like the Living Standards Measurement Study-
Integrated Surveys on Agriculture (LSMS-ISA) and the Global Strategy to Improve Agricultural
and Rural Statistics (GSARS).


The influential publication on household survey data collection by Grosh and Glewwe (2000), and
in particular the chapter by Reardon and Glewwe (2000) on agriculture, together with other
chapters on consumption, income, and enterprises, provided an original contribution to the field of

                                                 4
survey measurement issues that remains relevant to this day, as does the influential work by
Sudman and Bradburn (1974) on response effects in the United States. However, significant
innovations in methodological development for household surveys have taken place in recent
years, including on the collection of agricultural data in multi-purpose surveys. Agricultural survey
design continues to evolve through important innovations such as scaling up the collection of plot-
level data in low-income countries, gender-disaggregated agricultural data, 1 agricultural panel
surveys, and the collection of national agricultural household and enterprise data, 2 inter alia.


While the importance of household and farm surveys within national agricultural data systems is
indisputable, it is equally important to recognize their limitations in addressing new data
challenges. For instance, household and farm surveys may be ill-suited to capture the evolving
value chains of rapidly transforming agri-food systems (Barrett et al., forthcoming). Surveys
seldom collect sufficient data on contracting and on the different agents involved in transactions
with the household and, when they do, they tend to be case studies focused on a few commodities
in limited geographies or be qualitative in nature 3 (Barrett et al., 2020; Minten et al., 2016).


Furthermore, surveys often lack sufficient spatial and temporal resolution and are unable to
provide the real-time data needed by policy makers, being limited by cost and sample size
considerations. In higher income countries, remote sensing has been widely used for decades as a
complement to ground-based measures for an array of applications, including sample frame
construction, crop area and land use estimation, crop conditions assessment, climate data, and
production forecasting (Hale et al. 1999). In recent years, the use of Earth Observation data for
agricultural applications, combined and validated with ground-based measurements, has been
spreading rapidly in low- and middle-income countries, yielding promise for more accurate and
timely agricultural data in these contexts (Lobell et al., 2019; Gourlay et al., 2019).




1
  See Doss and Quisumbing (this volume citation) for an extensive review of gender-disaggregated data.
2
  Agricultural sector censuses such as FAO’s World Programme for the Census of Agriculture includes agricultural
households and agricultural enterprises. For a recent review of this program, see WCA (2020).
3
  An exception in national surveys is the collection of network data in a few LSMS-ISA surveys, where information
is collected from respondents on agents involved in the transaction of agricultural inputs and outputs.

                                                        5
Unfortunately, despite impressive progress in both traditional and new data sources, large gaps
still persist in terms of the availability and quality of agricultural data. Furthermore, mounting
global challenges such as rising inequality, climate change, and rapid population growth remain
are likely to disproportionately affect the agriculture sector and rural areas, with more significant
impacts for low- and middle-income countries. Meanwhile, the ongoing COVID-19 pandemic
provided a stark reminder of the need to accelerate the production of more timely and accurate
data to save lives. The pandemic has also exposed growing inequities in data systems across
countries, with innovation moving at a faster pace in higher-income countries (United Nations and
World Bank, 2020). Worse still, agricultural data gaps tend to be the largest where good data are
needed the most, that is, in resource-constrained countries for which agriculture represents the
lifeline of the majority of households and the whole economy. At the same time, the emergence
and diffusion of complex farms in higher income countries (Kling and Mackie, 2019; Macdonald,
2016) creates new layers of difficulty in data collection and measurement. Recognizing that
individual data sources are often unable to singlehandedly address these complex and multi-
faceted challenges, researchers are increasingly focusing on the potential offered by improved data
integration and interoperability between data sources.


While appreciating the importance of improving agricultural data in all countries along the entire
income gradient, this paper intentionally focuses on some of the data challenges and scalable
applications and tools most suitable to low- and middle-income countries. Because of this
geographic focus, we primarily limit our discussion to household and farm surveys, as they are
likely to remain the instrument of choice and backbone of agricultural data systems in many
countries for years to come. The attention to surveys is also warranted by the availability of a fully
developed total survey quality framework around which we develop the narrative of the paper. The
growing attention to survey design issues and a burgeoning literature on rigorous survey
methodological experiments (de Weerdt et al., 2020) also provide added motivation for the focus
of the paper.


In this paper, we will argue and provide evidence that renewed attention to data quality issues –
specifically in terms of measurement error and data coverage – is critical for advancing the
research frontier in agricultural economics and designing better agricultural policy. Both

                                                  6
measurement error and issues of limited data coverage threaten the internal and external validity
of empirical analysis on agriculture, constraining its efficacy and relevance in informing sectoral
policies and investments. A better understanding of measurement error and error-generating
processes is crucial, as errors negatively affect the accuracy and validity of inferences resulting
from data, and thus limit the usefulness of data to policy making. Given the significance of these
issues, agricultural economists and survey practitioners have paid increasing attention to
measurement error in recent years, drawing on insights from existing literature on labor economics,
survey methodology, and statistics. The fact that this is the first paper is fully dedicated to
measurement and data is testament to the prominence that data, in general, and measurement
issues, in particular, have acquired in the profession today. The purpose of this paper is to
demonstrate that improving agricultural data structures – that is, making agricultural data systems
more credible and fit-for-purpose – can address both measurement error and coverage issues to
facilitate better empirical analysis on agriculture. For our purposes, we define data structure as the
full set of survey design choices that comprise the data production process, including sampling,
questionnaire design, and fieldwork implementation.


Today, technology and a well-piloted modernization agenda offer the opportunity to push the data
quality production frontier, both in terms of availability and quality of data. Furthermore,
increasing demands for evidence-based policy making and accountability have generated the
tailwind to achieve critical advances in agricultural data in general, and agricultural survey data in
particular. Addressing existing flaws in survey data would greatly contribute to raising the
credibility and, ultimately, the quality of the resulting research and analysis (Jerven and Johnston,
2015). Achieving the “credibility revolution” in empirical research as advocated by Angrist and
Pischke (2010) calls for better research design choices, which begins with addressing measurement
error and coverage issues. Making agricultural research more policy-relevant, credible, and fit-for-
purpose begins with improving the quality of its underlying data to expand the set of testable
empirical models.


This paper highlights the importance of improving agricultural data structures for empirical
analysis, while accounting for the inherent trade-offs intrinsic to designing data collection for
agricultural research and policy analysis. In the section that follows, we review sources of

                                                  7
measurement error from the perspective of the economics, survey methodology, and statistics
literatures, referring to this rich bibliography for a more detailed discussion of the issues. In section
three, we turn to design choices related to coverage, including sampling design, the unit of analysis,
survey timing, data collection modes, and attrition. The fourth section integrates sources of
measurement error and coverage biases to assess their implications and trade-offs in the empirical
specification of a few examples of agricultural models, documenting where innovation in data
structure has advanced the research frontier. The fifth section offers innovative approaches for
addressing measurement error and coverage biases in agricultural data, based on recent
technological advances and foreseen opportunities. In the sixth and final section, we conclude with
recommendations on priorities for accelerating improvements in the accuracy and coverage of
agricultural data, ultimately to support higher-quality research for better agricultural policy.



    2.      Minimizing Measurement Error
Measurement error and related issues of non-random measurement error have been discussed in
some of the earliest work by Fisher (1926) and Working (1925). Since then, these topics have been
extensively articulated and well-documented across many subdisciplines in economics, such as
health, labor, industrial organization, and applied welfare analysis (Bound et al., 2001; Chesher
and Schluter, 2002; De Haan et al., 2019; Gottschalk and Huynh, 2010; Hu and Schennach, 2008;
Hyslop and Imbens, 2001; Pischke, 1995; Schennach, 2016, 2004; Rom et al., 2020). Most of these
papers consistently highlight that bias induced in parameter estimates depends on the structure of
the measurement error found in the data, as well as the identifying assumptions that empirical
economists make when estimating those parameters. Making the right assumptions for these
structures and tackling the sources of errors, at both the design and analytical stages, can greatly
improve the accuracy and relevance of agricultural data.


While the field of statistics boasts a rich and longstanding literature on measurement error (Biemer,
2010, 2009; Biemer et al., 1991; Biemer and Lyberg, 2003; Carroll et al., 2006; Deming, 1944;
Groves, 1989; Groves and Lyberg, 2010; Kasprzyk, 2005; Kish, 1965; Wansbeek and Meijer,
2000), we have only more recently witnessed a burgeoning literature in agricultural and
development economics journals addressing the sources, magnitude, and implications of

                                                   8
measurement error, and proposing new ways to validate and correct for measurement error biases.
Measurement error can result in both bias and variable error, or variance. With non-random
measurement error biases in parameter estimation come faulty conclusions and misguided policies.
Even with random measurement error, increased statistical noise requires larger sample sizes to
identify parameters of interest, increasing the cost of data collection. Hence, we again emphasize
the importance of understanding the sources of measurement error and attenuating its impact.


In the field of survey methodology, the Total Survey Error (TSE) framework has been the
dominant paradigm. The framework serves as a useful organizing structure for assessing the extent
and composition of different sources of errors that affect estimates, guiding researchers and data
collection practitioners towards appropriate design choices for minimizing measurement error and
maximizing coverage (Groves and Lyberg, 2010). TSE “refers to the accumulation of all errors
that may arise in the design, collection, processing and analysis of survey data” (Biemer, 2010).
The paradigm implies that total errors must be minimized for a given budget and that the major
sources of errors should be identified and prioritized to achieve maximum accuracy for a given
cost (Biemer, 2010). Broadly speaking, TSE can be viewed as encompassing the concept of data
quality which, in statistical terms, is partially captured by the Mean Square Error (MSE), a metric
of the accuracy of the estimated variable.


Minimizing measurement error in agricultural data has been problematic due to a number of
inherent features of agricultural processes, particularly for certain crops and agronomic practices
in smallholder farming. These features include the highly seasonal nature of production and the
irregularity of inputs required in the sequencing of production. Multiple studies have shown that
across a variety of issues, farmers’ self-reported information, which often involves long recall
periods, has proven to be inadequate (Beegle et al., 2013; Deininger et al., 2011; Fermont and
Benson, 2011; Gourlay et al., 2017).


Although long aware of the existence of measurement error, only recently have agricultural
economists shown interest in how these errors affect their inferences and the policy
recommendations deriving from their analysis. Even when measurement errors were considered,
the common practice was to make rather cavalier suppositions about the property and distribution

                                                9
of the errors by assuming a classical measurement error (CME) – that is, assuming that the error
in the variable of interest is independent from its true value as well as from the measurement errors
in all other variables in the model and the stochastic error term. While reliance on the CME
assumption can be justified in some instances, it is seldom the case for many variables, for which
the error-generating process appear to follow more complex and systematic patterns that fail the
classical assumption. The assumption appears to be even more troublesome for non-linear models
(Bound et al., 2001). More recently, the agricultural economics literature has aptly focused on the
potential systematic biases resulting from measurement error and how design choices and new
technologies can help improve measurement (for some recent applications of non-classical
measurement error in agricultural data, see Abay et al., 2019; Carletto et al., 2013; Desiere and
Jolliffe, 2018; Gourlay et al., 2017). We argue that addressing potential bias ex-ante through
appropriate design choices may ultimately be a more effective way to tackle the issue, although
careful ex-post analysis and modeling may also be helpful in mitigating its impact on estimates
(Gollin and Udry, 2021; Maue et al., 2020).


Policy researchers hold the power and responsibility to make wiser design choices at the data
collection stage for given objectives and budget constraints. To this end, the TSE framework
provides a useful blueprint for understanding the underlying error-generating processes and the
relative importance of the different components, as well as how to ameliorate their impact on
estimates. While the TSE framework is useful for this paper, given its focus on sample surveys as
one of the main sources of data for policy research in agriculture, it is important to note that most
features of TSE also apply to other data sources. For instance, Biemer (2017) argues that TSE
provides very useful insights on how to deal with errors in Big Data, drawing clear parallels
between errors in surveys and the often selective, incomplete, and erroneous nature of Big Data-
generating processes. As researchers increasingly rely on alternative data sources such as citizen-
generated data and crowdsourcing to collect agricultural data, similar data quality frameworks
should be developed for those types of data. However, even in the case of TSE, full consensus on
a comprehensive typology of errors is yet to exist. Groves and Lyberg (2010) conclude that this
lack of consensus is the natural consequence of the continuous evolution of methods and data
collection technologies, as well as the different objectives and constraints of different data



                                                 10
producers and analysts. As a result, any list defining the universe of TSE is bound to be incomplete
and/or to emphasize certain components over others (Groves and Lyberg, 2010).


We must note here that focusing solely on minimizing total survey error with expensive
measurement methods ignores the research design cost-variance trade-off and the full set of
research design choices. For instance, a researcher may be willing to accept some degree of
measurement error, if reducing such error would also reduce the statistical power of the research
design. If a researcher is implementing a randomized control trial, measurement error that is not
correlated with treatment status may not bias estimates, whereas in a non-experimental design,
measurement error might bias parameter estimates and thus have consequences for internal validity
and policy recommendations.


To conceptualize these research design trade-offs more clearly, Dillon et al. (2020) build on earlier
writing in the statistical literature (Biemer, 2010, among others) to introduce the idea of the data
quality production function. For any given research project, the researcher’s objective is to
maximize the knowledge or evidence generated from the research project. To do so, the researcher
makes decisions about the identification strategy, statistical power, and external validity of the
project, subject to a budget constraint and the data quality production function. The data quality
production function includes choices on questionnaire design as well as other variables such as
sampling, empirical approach, and field implementation modes, protocols, and constraints. These
latter choices include decisions based on the availability of financial resources, personnel capacity,
and the competing demands and/or mandates of the researcher or agency collecting the data.


Thus, measurement error and bias, which closely relate to the concept of internal validity, must be
weighed against other important features of model inferences, including the power of the estimates,
external validity and coverage, and the intended use of the data (Dillon et al., 2020). From a user’s
perspective, data accuracy (and the costs involved in achieving it) must be weighed against other
idiosyncratic user preferences related to the broader construct of fitness-for-use of the data (Juran
and Gryna, 1980) as part of a broader Total Survey Quality (TSQ) framework (Biemer, 2010).
This more complete construct of survey quality, going beyond accuracy, includes concepts such
as comparability, relevance, timeliness, accessibility, credibility, usability, interpretability,

                                                 11
completeness, and coherence (Biemer, 2010). For instance, the temporal or spatial granularity of
the estimates and other features related to improved coverage may be more important to some
users, who may be willing to sacrifice some degree of accuracy in exchange. Another highly
relevant dimension is the interoperability of the data and how data integration can improve
accuracy and decrease bias while also playing a role in enhancing and/or reducing coverage. For
instance, the use of mixed-mode data collection – such as high-frequency phone surveys that are
fully integrated into a less-frequent face-to-face large-scale survey – has the potential to reduce
measurement errors due to recall bias, but may introduce other problems such as under-coverage
due to the incompleteness of sampling frames or higher levels of attrition. As proposed by Biemer
and Lyberg (2003), one could treat all these additional dimensions of quality as constraints in an
error minimization problem (Biemer, 2010). While highly relevant for sample surveys, the total
survey quality paradigm can also be extended to other sources of data (Amaya et al., 2020).


Keeping in mind the specific design choices that researchers face, we define the possible sources
of measurement errors – corresponding to what Groves (1989) calls errors of observation – into
five groupings: (1) questionnaire design, (2) interviewer effects, (3) respondent effects, (4) mode
of data collection, and (5) data processing. Equally important sources of errors may derive from
incomplete coverage, or lack of representativeness (that is, errors of non-observation), including
sampling errors as well as non-sampling errors, further categorized into coverage errors and non-
response – we address these in the following section.


This taxonomy of sources of errors can be juxtaposed with a typical data structure – with units of
observation in the rows and variables in the columns – to show the relationship, and thus potential
trade-offs, between sources of measurement errors affecting variables (the columns) vis-à-vis non-
coverage errors affecting units of observation (the rows), as well as the trade-offs between
measurement error and coverage. It must be noted, however, that many of these sources and design
choices may affect both measurement error and coverage (e.g., mode effects) or be correlated and
have covariate effects on total error (e.g., interviewer and respondent effects). Furthermore,
sources of measurement errors are likely to simultaneously affect multiple variables, both
dependent and independent, generating complex error structures that have differential implications
on inferences. Hausmann (2001) reviews approaches to dealing with measurement error in either

                                                12
dependent or independent variables and in the case of continuous and discrete variables. Hyslop
and Imbens (2001) provide a clear and succinct classification of the effect of measurement error
on either dependent or independent variables, as well as on both. A common approach to
measurement error in empirical labor economics is to model measurement error as an ‘errors in
variables’ problem whose proposed solution is an instrumental variable. However, increased
concerns about weak instruments have caused such methods to be disfavored in labor economics
and this approach to measurement error in empirical agricultural economics has been rare. Finally,
continuous dependent variables may lead to reduced statistical precision but not necessarily bias –
but the cost of increasing sample size (adding more rows), particularly for numerous sample strata
and domains of inference, is often high.


While econometric approaches are inherently ex-post solutions to measurement error that take the
data-generating process as given, we see opportunities for ex-ante solutions within current
international efforts to build capacity in data quality assurance and methodological innovation in
national statistical offices. These capacity building initiatives provide an opportunity to create
better agricultural data structures that address research hypotheses and policy concerns by
maximizing data quality. To this end, with due consideration to the various trade-offs, researchers
can make design choices in several areas to minimize measurement error in the collection of
agricultural data. Below, we present in detail the five main sources of measurement errors listed
above. Understanding these groupings and their potential impact on bias and variance can help
researchers make the right design choices for their research objectives.


   2.1.    Questionnaire design

Agricultural questionnaire design requires researchers and policy makers to clearly outline the unit
of analysis and agricultural processes that they would like to measure. Rozelle (1991) outlined
various approaches to agricultural survey design, such as production function approaches, income
state approaches, and balance sheet approaches, each of which require different questionnaire
designs. Production function approaches map inputs to outputs to estimate the returns to inputs.
Income statement designs measure farm profits based on revenue and expense information. A
balance sheet approach values farm assets and liabilities in addition to inputs and outputs. An early


                                                 13
resource for agricultural questionnaire design is the Reardon and Glewwe (2000) agricultural
chapter in Ghosh and Glewwe (2000), which outlines broad principles of agricultural module
design in multi-topic household surveys. Dillon et al. (2021a) provide a recent updated review
incorporating recent innovations in survey design choices for agricultural questionnaires,
including the integration of plot-level crop production and input modules as well as livestock
production questionnaires.


A broad questionnaire design literature explores best practices to minimize measurement error.
Errors from questionnaire design may result from unclear wording, poor formatting, priming,
excessive length of questions and instrument, sequencing and skipping of questions, duration of
reference period, and differences in reference periods or the coding of responses (Schwarz, 1997;
Fowler, 1995; Gideon, 2012; Iarossi, 2006; Manski and Molinari, 2008; Payne, 1980; Sudman and
Bradburn, 1973; Sudman et al., 1996; de Weerdt et al., 2020). The impact of questionnaire design
choices on data quality can be substantial, with even minor changes having adverse consequences
on the accuracy and comparability of estimates (Beegle et al., 2020; Das et al., 2012; De Weerdt
et al., 2016). Specification errors, which occur “when the concept implied by the survey question
and the concept that should be measured in the survey differ” (Biemer, 2010) can also contribute
to errors from poorly designed questionnaires. One example of specification error in many
agricultural surveys is lack of clarity when defining plots relative to parcels, which may have large
implications for productivity estimates (see section 3.2 on units of analysis for a discussion of plots
versus parcels). Lack of consistent specification in the definition of household membership or
contextual differences in the social and economic criteria of household membership may also lead
to faulty estimates (Beaman and Dillon, 2012). Other examples of questionnaire design choices
are the use of rosters to collect individual or plot-level data, or the collection of individual
components of income or profits in lieu of eliciting information in a more aggregated format (De
Mel et al., 2008; Vijverberg and Mead, 2000). In both paper and CAPI-based questionnaires, visual
aids are widely used for capturing non-standard units of measurement for more accurate
estimations of both agricultural production and food consumption (Eisenhower et al., 1991;
Mathiowetz, 2000; Oseni at al., 2017). Particularly in electronic questionnaires, area maps using
GPS are increasingly used for estimating land area, for listing dwellings and plots for sampling, as
well as for supervision and quality control purposes.

                                                  14
Questionnaire length and complexity, as well sequencing of the questions and modules, may also
have important implications on measurement error (Kilic and Sohnesen, 2019; Strack et al., 1988;
Schuman and Presser, 1981; Schwarz and Hippler, 1991). Furthermore, the use of open or closed
format may have an impact on responses, where closed questions with clearly identified response
options can help respondents in both remembering information and choosing appropriate responses
(Kasprzyk, 2005; Schwarz and Hippler, 1991). Finally, the language(s) and translation of the
questionnaire, as well as differences in language and cultural background between the survey
designer and the respondent may also contribute to errors (Vaessen et al, 1987).


Agricultural surveys have attempted to reduce measurement error by leveraging the number of
visits within the agricultural season to reduce the length of recall and align the visits to key stages
of production. There are obvious costs to this approach, with a higher number of visits possibly
increasing respondent and interviewer fatigue as well as field costs. Meanwhile, the potential
advantages including reducing the length of the recall period, breaking up the length of long,
complex questionnaires and interviews, or using the first interview to identify respondents for
specific follow-ups in a second interview or as a temporal reference point to help respondents
better contextualize their answer (known as bounding questions). Evidence from the measurement
of consumption clearly points to an excessive number of visits negatively affecting data quality,
most likely linked both to respondent fatigue and interviewer effect (Engle-Stone et al., 2017 for
nutrient consumption in a survey employing up to 7 visits over 14 days in Bangladesh; Schündeln,
2018 for a consumption survey with households visited up to 10 times in Ghana).


Several of the surveys supported by the LSMS-ISA program (Dillon et al., 2021a) have attempted
to adjust the survey visit schedule to the calendar of the agricultural campaign by scheduling a
post-planting and a post-harvest visit. This visit structure aimed to ease the cognitive burden of the
respondents by asking questions on agricultural operations and harvest at most a few weeks or
months after they occur, instead of 12 months or more, while also limiting the number of visits to
contain survey costs and respondent burden. Using data for Tanzania and Malawi, Wollburg et al.
(2020) confirm the presence of non-random measurement error systematically related to the length
of the recall period. They find evidence of such error in all the main variables of interest in any

                                                  15
agricultural survey, including quantities harvested, labor and fertilizer inputs, and even the number
of cultivated plots. The magnitude of the recall effect typically varies between two and five percent
per additional month of recall length, rendering its impact on the reliability of key agricultural
indicators economically significant. Recently, some African national statistical offices such as the
Uganda Bureau of Statistics are experimenting with an additional visit that can be used to collect
supplementary objective measures on farm plots, including crop cuts (Ponzini et al., 2021).
Technology is aiding these innovations by facilitating the transfer of information across survey
visits via increasingly flexible CAPI applications.


   2.2.    Interviewer effects
Interviewer effects occur when personal characteristics of the interviewer, such as education,
ability, motivation, or language barriers, affect the interview process. Proper recruitment, training,
and monitoring of job performance are used to minimize errors associated with interviewer effects
(Fowler, 2004). A meta-analysis of the literature (West and Blom, 2017) establishes that
interviewer behavioral traits and demographic characteristics influence survey responses, and by
extension, data quality. Response rates and response biases are particularly influenced by specific
interviewer characteristics (such as age, ethnicity, experience, and education), behaviors (such as
formal versus conversational interview styles), cognitive and non-cognitive skills (such as
mathematical ability, reading, attention to detail, and empathy), and interviewer experience.
Existing literature on interviewer effects finds that data quality can be a function of who is asking
the questions. Responses vary by the interviewer’s ethnicity (Davis et al, 2010; Davis and Silver,
2003), gender (Benstead, 2010; Flores and Lawson, 2008) and religion (Blaydes and Gillum,
2013), especially for questions sensitive to race, gender, and religion respectively. Studies have
also explored the association of data quality with interviewer skills and behaviors such as probing,
providing feedback for responses, and rapport building (Belli et al, 2004). Some interviewer
characteristics are fixed, while skill-based characteristics may change in response to training.


Responses and measurement error may also vary based on the interviewer’s adherence to a script.
For instance, in the context of the Agricultural Labor Survey in the United States, Ridolfo et al.
(2021) show how interviewers’ lack of adherence to the script resulted in significant measurement
errors. Similarly, using the same survey, Rodhouse et al. (2019) quantify the extent to which

                                                 16
deviating from the script affects the likelihood of measurement errors and conclude that the
presence of measurement error is highly associated with the interviewer’s ability to adhere to the
script. Biagas et al. (2019), using the same data, use a novel multi-method approach to identify
patterns of interviewer behavior and its contribution to total survey error.


Recent research on interviewer effects in a randomized experiment in Uganda by Di Maio and
Fiala (2019) found that interviewer characteristics and their differences from respondent
characteristics affected survey responses and ultimately data quality for sensitive topics. On the
contrary, responses to less sensitive topics were much less, or not at all, susceptible to interviewer
characteristics. This is supported by additional research suggesting that the salience and sensitivity
of the questions influence the nature and magnitude of interviewer effects (Himelein, 2015; Laajaj
and Macours, 2021). Marx et al. (2018) provide evidence on the impacts of team composition and
ethnic diversity on interviewer performance. Data on the time use of field teams suggests that
teams composed solely of interviewers organize tasks more efficiently than teams that include
supervisors, interviewers, and data monitors, which demonstrate lower levels of effort. In a review
of several studies, Groves (1989) suggests that demographic traits result in interviewer effects only
when the specific question is related to the demographic characteristics of the interviewer (i.e., an
interviewer effect based on the race of the interviewer may be found for questions about race).
This may be particularly relevant in contexts with large ethnic and racial diversity.


The effect of priming in surveys and the inconsistent application of interviewing protocols and
wording across interviewers is also likely to generate systematic biases (Lavrakas, 2008).
Similarly, the interview setting may also affect interviewers’ recording of responses and result in
systematic errors. Collecting detailed metadata on the interview process is often used to partially
control for potential biases generated by poor interview settings; unfortunately, this practice is not
consistently applied across surveys.


   2.3.    Respondent effects
Respondents can also contribute to TSE in several additional ways, either intentionally or
unintentionally. Assumptions about the structure of those respondent biases are often uninformed
by empirical evidence, although Hyslop and Imbens (2001) provide a categorization of different

                                                 17
types of potential biases. For instance, respondents may intentionally under-report the amount of
land they own because of taxation concerns or may conversely over-report their land holdings
because of prestige considerations or social desirability. Social desirability concerns are likely to
result in the under-reporting of “socially undesirable” behavior, and the over-reporting of socially
desirable occurrences (Bound et al., 2001). For some agricultural statistics such as child labor,
context may determine whether children’s work in agriculture carries social stigma and hence
potential reporting bias. Similar response behavior may also occur with the reporting of income
variables (Tourangeau et al., 2000). Respondents may also round up the amount of land owned to
integer values, resulting in a phenomenon known as heaping. Research on land area measurement
consistently finds systematic errors in farmers’ self-reporting, with farmers that own smaller land
holdings systematically over-reporting (and farmers with larger land holdings under-reporting), as
well as considerable heaping in reporting (Carletto et al., 2015, 2013).


Errors in responses may be unintentional, resulting from limited knowledge or recall bias due to
memory decay as the length of the recall period increases. Errors may also derive from limited
understanding of the questions, potentially correlated with the cognitive level of the respondent
(Laajaj et al., 2019; Laajaj and Macours, 2021). The use of bounding techniques, providing an
easy-to-remember temporal reference point in respondents’ memory to better contextualize the
answer, can be used to reduce the effect of telescoping in recall (Abate et al., 2020; Neter and
Waksberg, 1964). Recall biases are also affected by the salience of the event being recalled (Beegle
et al., 2012; Gaddis et al., 2020; Kilic et al., 2021; Wollburg et al., 2020). Gaddis et al. (2020) and
Arthi et al. (2017) analyze the impact of recall on the measurement of agricultural labor. Their
findings suggest that a seasonal recall approach to agricultural labor measurement may result in
underestimated labor productivity. In their cross-country study, Beegle et al. (2012) find no
evidence of bias in harvested quantities for both staple and cash crops. Recall bias, however, was
present in hired labor reporting, although the direction of these biases varied by country. As already
mentioned, similar findings emerge from Wollburg et al. (2020) related to the design choices of
number and timing of field visits, as more visits and shorter roll-out periods reduce the length of
the recall period.




                                                  18
Interestingly, at least in domains outside of agriculture, perceptions of salience may be influenced
by the length of the recall period (Winkielman et al., 1998) and may vary by the income level of
the respondent (Das et al., 2012). Understanding respondents’ cognitive strategies is crucial for
choosing the most appropriate length of recall and thus minimizing measurement errors in
responses. Evidence suggest that beyond a certain recall length, respondents switch from
enumeration to estimation, each translating into different errors (de Nicola and Giné, 2014; Scott
and Amenuvegbe, 1991).


The use of proxy respondents and widespread reliance on the most informed respondent – often
the male head of the household – is also likely to result in biased responses (Dillon and Mensah,
2020; Doss et al., 2019; Kilic et al., 2020; Kilic and Moylan, 2016; Krosnick, 1999; Moore, 1988).
Bardasi et al. (2011) find that using proxy responses led to the under-reporting of men’s
participation rates in agricultural activities. Kilic et al. (2020a) show significant impacts of
respondent selection strategy in the collection of labor data. Kilic et al. (2020b) also find that the
common practice of proxy reporting results in different reporting of land assets relative to those
reported by self-respondents. The use of proxy reporting by the “most knowledgeable household
member” results in higher rates of exclusive ownership of agricultural land among men, and lower
rates of joint ownership among women, as compared to the gold standard approach of individual,
self-respondent interviews (Kilic et al., 2020b). In this context, interview setting has also been
shown to greatly affect responses. For instance, the common practice of non-private interviewing
(i.e. where other members of the household and community may be present during the interview),
more often conducted through proxies, results in significant under-reporting of employment
relative to measurement through private, self-respondent interviews, with stronger effects for
women than men (Kilic et al., 2020a). Dillon and Mensah (2020) note that when proxies report
household-level agricultural variables as opposed to individual-level responses, proxy response
bias is composed of both aggregation errors and asymmetric information within the household.
Thus, their findings suggest that proxy response bias is not solely due to asymmetric information
within the household, as is commonly assumed in the literature on proxy response bias for
individual-level variables.




                                                 19
Measurement error can also derive from the use of peers (e.g. neighbors, co-workers, key
informants, etc.) as proxy respondents, potentially resulting from projection or false consensus
biases, among other things (Hogset and Barrett, 2010). Despite the potential biases of proxy
reporting of peer behaviors (Ashenfelter and Krueger, 1994), the practice is widely used (Hogset
and Barrett, 2010). In some cases, however, it may be justified when gathering data from peers
may be sub-optimal yet preferable, such as in the case of collecting highly sensitive information.
While the use of proxy respondent should be minimized to the extent possible, one must also
acknowledge the trade-offs between respondent bias and coverage, as restricting interviewing to
the selected respondents is likely to result in higher attrition and unit missingness. Furthermore,
the use of proxy respondents is often unavoidable due to logistics or cost considerations. In such
cases, the proxy respondent selection process should be conducted based on strict standardized
field protocols.


   2.4.    Mode of data collection
The mode of data collection – whether face-to-face, self-administered or interviewer-led, by
phone, or by web, and whether on paper or in electronic format – can have substantial effects on
measurement error as well as coverage. In terms of measurement error, several studies have shown
that the effect depends on the type of question as well as interviewer ability and respondent
characteristics (Biemer and Lyberg, 2003; Caeyers et al., 2010; De Leeuw, 2005; De Leeuw and
Van der Zouwen, 1988). While errors of coverage may result from incompleteness of frames and
respondent selection, phone or web surveys may allow more frequent data collection for greater
temporal granularity and lower measurement error due to recall bias. Similarly, crowdsourcing and
other forms of citizen-generated data are increasingly used in agriculture and potentially offer
enormous opportunities for collecting data at greater temporal and spatial resolution. However,
these modes of data collection also exhibit serious limitations in terms of representativeness as
well as overall data quality which, if left unaddressed, are bound to produce biased inferences
(Aceves-Bueno et al., 2017; Japec et al., 2015; Wiggins et al., 2011; Ambel et al., 2021; Brubaker
et al., 2021). Using data from several African countries, Brubaker et al. (2021) address the issue
of representativeness of phone surveys for gender-level estimates based on individual non-random
respondents in the household, most commonly the head of the household, and propose ways to
mitigate the bias.

                                                20
Gibson et al. (2017) summarize the literature on data quality comparing phone to face-to-face
interviewing, much of which focuses on health rather than agricultural variables. A significant
increase in methodological work related to data quality on phone surveys occurred in response to
the COVID-19 pandemic. Das et al. (2021) and Dillon et al. (2021b, c) provide summaries of
random digit dialing phone surveys related to improving response rates, optimal timing of phone
survey call attempts, and the impact of pre-survey text messaging.


Greenleaf et al. (2020) estimates a difference of 20 percentage points in reported contraceptive
use, with higher reported use for phone interviewing. In this case, presumably because sensitive
subjects may induce lower reporting, phone interview modes may decrease measurement error by
providing respondents more anonymity. In Lamanna et al. (2018), phone interviewing methods
also induced higher reporting in dietary measures, with increases that varied between 28
percentage points, 14 percentage points, and 18 percentage points for minimum dietary diversity,
minimum meal frequency, and minimum acceptable diet measures, respectively. Mahfoud et al.
(2014) compared estimates of alcohol consumption and exercise, finding a four percentage point
increase in alcohol consumption and a seven percentage point decrease in exercise level reported
with phone survey mode relative to face-to-face interviewing. However, not all studies find
statistically significant impacts on data quality by survey mode; for example, Gallup (2012) finds
no differences between survey modes in an experiment in Honduras. Furthermore, much of the
existing evidence does not test behavioral mechanisms that might explain why responses differ by
survey mode (Tourangeau and Yan 2007).


   2.5.    Processing errors

Finally, processing errors include possible errors generated during data entry, editing, coding,
weighting, and analysis of data. New technologies and data processing power have transformed
the set of opportunities and ways to reduce processing errors. Unfortunately, the relatively faster
growth in the volume of data being generated, combined with the complexity of the new data
landscape, have created additional challenges in terms of processing errors. For a review of
processing errors and how they may impact total error, see Biemer and Lyberg (2003).


                                                21
Many authors emphasize and empirically demonstrate how new data sources, methods, and
technologies have proven instrumental to attenuating the sources of measurement error in data
collection, as discussed in section 5 of this paper (Abay et al., 2019a, 2020; Dillon et al. 2019;
Gourlay et al., 2019; Kosmowski et al., 2019; Lobell et al., 2020). In particular, the recent
proliferation of well-designed validation studies, relying on complementary data sources and
readily available technology, is contributing to a better understanding of the relationship between
the measured and true value of the variable of interest, as well as providing insights on the
magnitude and direction of potential bias. For instance, recent advances in CAPI, the use of sensors
and other direct measurements, and the use of metadata and paradata, are increasingly being used
to offset the threats to data quality of more traditional data collection techniques that rely heavily
on farmers’ self-reporting (Akogun et al., 2020; Pratt et al., 2020; Sinha et al., 2020).



   3.      Trade-offs in Maximizing Coverage

As described in the previous section, measurement error can be addressed ex-ante through design
choices, or ex-post through proper analysis and modeling, particularly when error-generating
processes are non-random, thus potentially resulting in biased inferences. Aside from measurement
error due to the poor design and administration of survey instruments, there are several other non-
sampling errors which negatively affect researchers’ analyses if left unaddressed. Of particular
concern are coverage errors, which occur when individuals or units of interest are excluded from
the sample, resulting in serious repercussions for the validity of the estimates, as they fail to
represent the entire population or area of interest. Most troublingly, errors of coverage are seldom
random and tend to exclude subpopulations of interest, such as female farmers, smallholders living
in remote areas, pastoralist and nomadic populations, and more distant plots, among other marginal
groups. They may also result from the use of particular sampling frames such as population-based
listings (based on the population living in the area covered by the survey) or area frames (based
on geographic area covered in the survey) which, by construction, exclude or undercount certain
subgroups. For example, using population-based listings to collect agricultural data for the
estimation of total national agricultural production or the farm size-productivity relationship
prevents the inclusion of larger commercial farms in the list. The exclusion of commercial farming

                                                 22
from population-based listings may also hamper the analysis of the potential spillovers in terms of
labor and other inputs for neighboring smallholders. For instance, Ali et al. (2019), using data from
Ethiopia, find little or no benefits in terms of job creation, input market access, or technology
transfer due to the presence of large farms. The possible exclusion or undercounting of medium-
and large-scale farms from population-based listings may also constrain analysis on the rapid
process of agricultural transformation and land consolidation occurring in many countries (Jayne
et al. 2019). The advantages of population-based sampling frames relative to area frames have also
been questioned, based on the assertion that respondents tend to under-report on plot ownership
and use, thus leading to the underestimation of total production. On the other hand, the use of area
frames may result in systematic biases in the collection of socioeconomic variables from the plot
owners. The use of multiple frame sampling is often advocated as a way to overcome the
limitations of either approach (Gonzalez Villalobos and Wigton, 2011; FAO, 2015b).


Coverage errors can also derive from omitted variables in model specification due, for instance, to
missing environmental variables affecting production decisions (Sherlund et al., 2002). As seldom
collected in surveys, information on environmental conditions and capturing inter-farm
heterogeneities affecting farmers’ choice is often missing, potentially resulting in biased estimates
of both estimates and coefficients. With the widespread availability of inexpensive geospatial data
that can be linked to household-level data, filling these gaps and potentially reducing the risk of
omitted variable bias has become increasingly easier. However, the georeferencing of survey data
at farm and plot level is yet to become common practice in many low- and middle-income
countries. Furthermore, many of the hurdles related to data privacy for secure and confidential
dissemination and use remain unresolved.


In addition to non-coverage errors due to a priori exclusion from the frame, non-response and other
reasons for attrition are also likely to affect the validity of the estimates. Non-response can be
further divided into unit and item non-response. Unit non-response occurs when a unit of concern
is included in the frame but is either not reached or is unwilling to participate in the survey. This
missingness is seldom random and is often treated ex-post through imputation methods (Kilic et
al., 2020; Rubin, 1996, 1987). Conversely, item non-response occurs when a respondent fails to
provide information to a question during interviewing. This missingness is most often handled at

                                                 23
the data processing stage through complex imputation methods. However, the lack of consistent
guidelines and practices of value imputation, combined with poor documentation on how missing
values have been treated – potentially leading to systematic errors – once again highlights the
potential trade-offs between coverage and measurement error.


In line with the broader total survey quality framework, we review the issue of coverage error
found in the statistical literature, considering several of the dimensions of coverage related to
fitness-for-use. As postulated above, researchers are likely to be faced with trade-offs between
measurement error and other dimensions of data quality. Furthermore, these trade-offs are also
related to the particular data source. For instance, the feasibility of various possible decisions on
the unit of analysis, timing, or level of disaggregation will vary based on whether data is being
collected through an agricultural census, a national survey, or a randomized control trial. Coverage
error is also affected by the mode of data collection, such as face-to-face, phone, or web-based
interviewing, which can also lead to mode effects, another factor contributing to total error. In this
section, we describe some of the design choices related to sampling frames, units of analysis,
survey timing, modes of data collection, and attrition, all of which have implications for data
coverage and the trade-offs between measurement error and coverage.


   3.1.    Sampling frame

The sampling frame used for agricultural surveys, whether based on population listing or
geographic areas, may often be missing some sub-populations or units of concern. Under-coverage
could be accidental or intentional and, if deliberate, may be driven by many motives. Of particular
relevance to smallholder agriculture is the issue of under-coverage of more remote areas, farm
holdings, or individual plots, driven by either cost considerations or convenience (Kilic et al.,
2017). Capturing pastoralist and transient populations is also particularly challenging, and their
exclusion is likely to result in biased estimates (Himelein et al., 2014).


As stated above, coverage will also depend on the type of sampling frame chosen. Multi-purpose
household surveys like the LSMS-ISA use a population-based listing, with the household as the
unit of analysis. More recent agricultural surveys, particularly in more developed economies, have


                                                  24
increasingly relied on area or point frames, which presents advantages when the objective is to
estimate production at the national or sub-national level. In fact, one concern with using
population-based listings is the possibility of missing some plots due to misreporting, thus
resulting in lower total production. This problem is further compounded for pastoralist and semi-
nomadic populations which are entirely missing from or hard to reach using population frames; in
these cases, using area frames in combination with population listings may be more appropriate
(Himelein et al., 2014). However, collecting socioeconomic information on the farm household
starting from a given area or point is particularly challenging, and hardly ever done in
socioeconomic studies, thus limiting the analytical use of the data. Finding ways of reconciling the
choice of frames and maximizing coverage by linking multiple frames has been the subject of
recent research under the 50x2030 Data Smart Agriculture initiative (D’Orazio, 2020).


One key shortcoming of using a population frame for agricultural data collection is the wholesale
exclusion of medium and large commercial farms. This has been the practice in agricultural
household surveys such as those conducted under the LSMS-ISA initiative, which has raised
concerns about the validity of inferences made on a truncated distribution of farm holdings
(Muyanga and Jayne, 2019; Ali and Deininger, 2014). On the other hand, agricultural censuses
and farm surveys that focus on both farming households and commercial farms may be more
suitable for sector-level estimation of agricultural indicators, but fall short of meeting the analytical
objectives of surveys of households as both production and consumption units (Singh et al., 1986).
The use of multi-frame sampling strategies, combining the strengths of the individual sources, is
gaining ground in lower-income countries, albeit at a slower pace due to capacity constraints and
the technical difficulties involved.


By choosing a household listing as a population-based frame, the definition of household and
household membership have important measurement implications, in part because social and
economic definitions of the household diverge (Beaman and Dillon, 2012). This is especially the
case in communities with extended farming families with land inheritance claims, common
production of family lands, or complicated land use rights. The household definition matters from
an agricultural perspective, as household membership defines the individuals that will be included
in modules on agricultural land holdings, labor, assets, and marketing decisions. Unsurprisingly,

                                                   25
reported values of agricultural production are lower when the household definition excludes some
agricultural producers. Residency requirements also complicate the measurement of pastoralist
activities when households are involved in transhumant pastoralism.


   3.2.    Units of analysis

Agricultural data is often collected at different units of analysis, including at national or sub-
national levels, geospatial area units, and at the household, farm, plot, or plot-crop-season-manager
levels. Depending on the frame used, selection may be based on population listing or a map,
subdivided into grids. It may be the case that, for a specific application, both types of frames may
be used, which requires reconciling the estimates at the national or sub-national level. For instance,
Pelletier et al. (2020) use small area estimation (SAE) to reconcile deforestation estimates from
area-based frames with smallholders’ use of modern inputs drawn from a population-based frame.
In fact, it is conceptually appealing to use area frames when the focus is the measurement of land
area and related agronomic features, as a population-based survey may result in under-counting
plots and farming activities. On the other hand, the use of population-based listings may be more
appropriate than area frame sampling in capturing data from hard-to-reach farmers living far away
from the plots. The use of multiple frames, which in the agricultural domain involves the combined
use of both area and list frames, has been advocated as a way to ensure completeness of the frame
and full coverage of the sector (FAO, 2015b; Gonzalez Villalobos and Wigton, 2011). However,
avoiding duplication and overlapping units is often a challenge when constructing multi-frames.


Indirect sampling has also been proposed as a way to overcome the shortcomings intrinsic to
household listings as frames for agricultural statistics, by applying Generalized Weighted
Sampling Methods (GWSM) to obtain estimates for landholdings (the unknown and more relevant
universe) from a household listing (the known population) (Falorsi et al., 2016; Gennari et al.,
2013).


Any livestock sector outcomes should be recorded at the herd level rather than at the household
level, particularly for nomadic populations. Household surveys with population-based sampling



                                                 26
frames will never capture herd-level outcomes, for which area-based-sampling measures may be
a better unit of analysis.


In an example of innovative survey design, LSMS-ISA surveys redesigned agricultural modules
by recognizing that the unit of observation for agricultural production is often not the household
but the plot, which may be managed by differing household members, with significant implications
for sex-disaggregated analysis. In early versions of LSMS surveys and many other multi-topic
household surveys, agricultural activities were not detailed at the plot level, as this requires a
higher respondent burden relative to household level recall. However, a plot-level approach more
accurately measures the relationship between inputs and outputs in agricultural production.
Production heterogeneity results from differences in crops cultivated that require different levels
of inputs and may not be managed by the household head. The level of detail required in
subsequent modules makes this survey design choice non-trivial. For example, plot-level data
collection requires not simply measuring land area at the plot level, but also production, labor,
capital, chemical inputs, and land management techniques.


Choosing to measure production at the plot level requires a wider series of choices in identifying
the unit of analysis, which has implications for both design and implementation. First, agricultural
production is seasonal and multiple crop cycles on a given plot need to be measured. Second, plots
are not always associated with a single crop, as multi-cropping or inter-cropping is a common land
management practice for increasing yield and land quality. In contexts such as smallholder
agriculture in Sub-Saharan Africa, inter-cropping is the norm, not the exception. For instance, two-
thirds of maize plots in Uganda include multiple crops (authors’ calculations based on the 2018
Uganda National Panel Survey). Third, property rights are not necessarily well established in many
rural contexts, and multiple family members may work on the plot or cultivate the plot in different
seasons. The landowner may be different than the person making decisions to cultivate the land,
as sharecropping, land leasing, or land lending may mean that landowners are not making
agricultural decisions. Holden et al. (2016) describe survey design choices in modules used to
describe land tenure. Depending on the empirical applications of data collected, information on
the sources of land acquisition including inheritance and legal status, land transactions, formal and
informal property rights, land conflicts, perceptions of tenure security, and trust in land-related

                                                 27
institutions may all be important additional questions complementary to the land roster. This is
especially true if the survey aims to monitor the Sustainable Development Goals (SDGs) related
to land tenure, particularly SDGs 5.a.1 and 1.4.2. Recognizing the importance of land tenure
security as it relates to the control of and access to other assets, through access to credit and
investment in land, for example, these SDG indicators seek to measure specific aspects of land
tenure at the individual level, rather than at the household level.


In summary, the key innovation in conducting plot-level production analysis is not to simply
measure inputs and outputs at the plot level, but to distinguish the unit of analysis as plot-crop-
season-manager. This unit of analysis facilitates comprehensive measurement of household
production, allowing multiple analytical strategies from seasonal, crop, and gender perspectives,
but also has some limitations, particularly in the context of a panel survey, given the changing
demarcation of plots across seasons. Tracking parcels over time is often a more feasible option.


Across different agricultural systems, the vocabulary associated with an agricultural landholding
may also differ. Farmers use different words to indicate their farms, parcels, and plots, often with
contradictory meanings. It is important that any agricultural survey design reflects a clear
conception of the hierarchy of units consistent with the agricultural system that is being measured.
Carletto et al. (2016) provide an overview of land area measurement survey design issues, noting
differences in units of land measurement as well as variation across LSMS-ISA surveys in land
reporting units. Holdings, parcels, fields, and plots all have internationally accepted definitions,
although their interpretations by both academics, NSOs, and policy makers often lead to
ambiguity. The definition of the agricultural holding is the primary unit of analysis in agricultural
surveys, whereas the household is the primary unit of analysis in household surveys. The Food and
Agriculture Organization of the United Nations (FAO) (2015) defines the agricultural holding as
an “economic unit of agricultural production under single management comprising all livestock
kept and all land used wholly or partly for agricultural production purposes, without regard to title,
legal form or size...” Holdings can be divided into parcels, and the FAO notes that “a distinction
should be made between a parcel, a field and a plot”, where “a field is a piece of land in a parcel
separated from the rest of the parcel by easily recognizable demarcation lines such as paths,
cadastral boundaries, fences, waterways or hedges. A field may consist of one or more plots, where

                                                  28
a plot is a part or whole of a field on which a specific crop or crop mixture is cultivated, or which
is fallow or waiting to be planted” (FAO 2015). However, when designing and implementing an
agricultural survey, practitioners should confirm the tiers and definitions used by the national
statistical office, as these may not always coincide with the FAO definitions. As suggested above,
tracking parcels may be a more practical option for longitudinal studies, given the changing size
of plots across seasons.


Recording agricultural information at the household level inherently aggregates individual
production and imposes a linearity assumption across plots for input utilization and asset use. The
main trade-off in recording agricultural information at the plot level is that farmers must recall
input allocation at the plot level, which requires more cognitive effort and response time. These
recall biases may be compounded by proxy response bias, as plot-level self-reporting is time-
consuming in the field and may be not feasible for all survey responses. Proxy respondents may
have incomplete information on plots managed by other household members. For farmers who
purchase inputs collectively with their family for multiple plots, it may be difficult to accurately
assess how much fertilizer, seed, or other input was applied to a particular plot. Consistent with
time use data, it may also be difficult for a farmer to recall individual household labor allocations
to particular plots over an agricultural season or with respect to particular agricultural tasks. While
more research is needed to understand the measurement implications of the disaggregation of input
and production data to the plot level from the household level, the known analytical advantages of
doing so – such as analysis of male-managed plots vis-à-vis female-managed plots – outweigh the
unknown risk of aggregation in many surveys, including LSMS-ISA surveys.


Due to variation in land tenure status and land use rights, it is also important to account for
seasonality in production on plots and changes in plot management when considering the unit of
analysis. Depending on the agricultural season, a plot may be cultivated by a different member of
the household and use different levels of inputs along with different cropping choices. Researchers
have often cited asymmetries in crop type and input use, and therefore productivity and earnings,
by the gender of the plot manager. O’Sullivan et al. (2014) estimate that, after controlling for plot
size and region, productivity differences across male- and female-managed plots in Africa ranged
from 23 to 66 percent. In order to appropriately account for plot-level production, and to enable

                                                  29
analysis of the timing of production and gender asymmetries, both season and plot manager should
be considered. The plot manager may differ from one season to the next, often depending on
gender-based norms.


Just as there are trade-offs in empirical specifications among units of analysis, differences in units
of analysis also imply different constraints when repeated observations are an objective of the
survey design.


   3.3.    Survey timing

Survey timing is a critical design choice that affects coverage as well as measurement error due to
questionnaire design or respondent effects. Survey timing can refer to the timing of visits within a
single agricultural season, as well as the timing of visits between seasons. LSMS-ISA surveys
feature an innovative survey design that includes repeated visits both within and between seasons
(Carletto et al., 2010). New international surveys such as those produced under the 50x2030
Initiative will also feature multiple within- and between-season observations. Here, we review
survey timing decisions that increase coverage across certain dimensions of time.


As a research and policy issue, seasonality and the presence of multiple cropping cycles within the
agricultural calendar imply that multiple agricultural surveys may be best timed according to
cropping cycles or the agricultural calendar. Survey timing choices can decrease recall bias,
particularly for agricultural choices such as input decisions, labor allocation, or sales, which are
often frequent and difficult to recall. LSMS-ISA surveys collect agricultural information during
the post-planting and post-harvest periods, but timing could be more frequent to correspond to
multiple cropping periods which may overlap, particularly when agricultural systems have fewer
water constraints. When variables such as labor inputs are a research objective, higher frequency
surveys may reduce recall bias as well as increase statistical power depending on the auto-
correlation of the variable over periods of time (McKenzie 2012).


Agricultural panel surveys most frequently track households between seasons to capture changes
in production decisions over time, and their correspondence to changes in welfare. Despite interest


                                                 30
in understanding changes in agricultural activities over time, plots are rarely tracked in panel
surveys, as tracking plots over time is a time-intensive field activity that limits coverage. For
example, LSMS-ISA surveys are conducted as a household panel (or household-parcel panel in
select countries), with repeated cross-sections of a tracked household’s plot, production, and input
information.


Variation in production and other agricultural variables as well as the ability to monitor shocks
and household resilience could also be captured through community sentinel sites complementing
less frequent surveys (Barrett and Headey, 2014). The authors convincingly argue for establishing
a multi-country system of sentinel sites in selected communities as a way to improve the timeliness
and coverage of agricultural data, in the face of ever more frequent shocks affecting the resilience
of rural households.


   3.4.      Mode of data collection

As discussed in the previous section, the choice of survey mode may have significant implications
in terms of measurement error, either directly or through its interaction with other design choices
related to questionnaire design, interviewer selection, and respondent features. Similarly, certain
modes of data collection may also affect survey coverage. Poor representativeness because of
inadequacy of the sample frame, selectivity, and potentially high attrition is a major challenge for
phone surveys (Ballivian et al., 2015; Gibson et al., 2019). For instance, the use of mobile phones
or the web affects not just how responses are elicited, but also whether respondents agree to
participate in the survey, and/or whether respondents are included in the frame in the first place.
In terms of frames, phone surveys predominantly rely on three options: (1) recent representative
surveys with phone numbers of respondents; (2) lists of phone numbers from mobile phone
providers, and (3) random digit dialing (Kastelic et al., 2020; McGee et al., 2020). Each option
involves significantly different implications for both coverage and attrition. Proper tracking and
field protocols, combined with the collection of selected information for ex-post weighing and bias
adjustment, can greatly enhance the representativeness and usability of phone surveys and reduce
mode bias.



                                                31
Irrespective of the type of frame used, phone surveys are more likely to miss more remote and
poorly connected households, as well as poorer households who do not own a phone or live in
areas with poor mobile coverage. This is particularly relevant for agricultural data, where large
shares of respondents live in remote and poorly connected areas and are more likely to be poor and
technologically illiterate. Educational level, age, and technological literacy will also systematically
affect overall coverage. Similarly, selection bias for respondents in citizen-generated data and
crowdsourcing makes collecting agricultural data using those modes particularly concerning in
terms of representativeness and coverage, especially when it comes to their use in official statistics.
These concerns, combined with the huge opportunities that these new modes of data collection
provide, are generating significant attention in recent literature (see Hill et al., 2019 and some of
the papers cited therein, including Buil-Gil et al, 2020; Diego-Rosell et al., 2020; Salganik, 2017).


Coverage biases related to phone survey modes may be due to either non-response or non-
completion. Little literature exists on response rates in phone surveys in low-income countries, but
recent studies due to COVID-19 restrictions on face-to-face interviewing are generating new
evidence. Dillon et al. (2021b) conducted random digit dialing surveys in nine countries,
demonstrating variation in response rates as high as 60 percent and as low as 7 percent. They find
most coverage biases are due to non-response when respondents do not answer their telephone.
Hence, a key survey design feature in telephone surveys is decreasing non-response. Dillon et al.
(2021c) evaluated choices on pre-contact notification via text message and time-of-day/day-of-
week as potential telephone survey protocol decisions that might reduce non-response. They found
that pre-survey text messages did little to improve non-response but did increase survey
completion. In the Philippines, pre-survey text messages actually increased non-response, while
having no effect in the Colombia, Mexico, or Rwanda samples. In those countries, pre-survey text
messaging increased survey completion by between one and four percentage points. In nine
countries, time-of-day and day-of-week effects were estimated, with midday interviews increasing
participation and evening calls reducing participation (Das et al. 2021). These effects were
relatively small, with an effect size of four and eight percentage points over the base pickup and
completion rates, respectively. The day-of-week effects varied substantially between countries,
with few generalizable day-of-week effects between countries. Within countries, effect sizes were
substantial, but often in different direction by country. More country-specific evidence may need

                                                  32
to be generated to reduce non-response, underscoring the importance of understanding local
contexts and the differences in time distributions between work and leisure within a country.


The impact of mode of data collection on coverage extended to other survey design features. For
instance, the use of diaries, while possibly more accurate than recall modes when properly
implemented, may lead to greater under-coverage among illiterate respondents as well as higher
non-response among richer households that face higher opportunity costs for answering lengthy
diaries. Also, differences in record-keeping across groups of respondents, such as between
smallholders and larger-scale farmers, may result in systematic variations based on the chosen
method (Lyberg and Kasprzyk, 2004; Silberstein and Scott, 2004). In the next section, we discuss
in detail how measurement error and coverage bias affect the empirical estimation of common
agricultural models.


   3.5.    Attrition

Significant coverage biases due to attrition affect both the internal and external validity of
empirical work for both randomized control trials and observational panels (Beegle et al., 2011;
Falaris, 2003; Outes-Leon and Dercon, 2008; Rosenzweig, 2003; Thomas et al., 2012, 2001; Zabel,
1998). Millan and Macours (2019) discuss attrition in the context of randomized control trials
where tracking protocols may affect intent-to-treat effects. In a 10-year panel from Nicaragua, they
find that excluding attrited individuals led to an overestimate of their intent-to-treat effect by 35
percent.


Tracking the same respondent over time is challenging in large national surveys. Thomas et al.
(2012) and Witoelar (2011) discuss minimizing attrition and improving tracking in the context of
a large-scale national survey. The integration of mobile phones and household geo-referenced data
increases the traceability of households, but also raises concerns about privacy and data protection.
While multiple papers suggest ex-post methods of dealing with attrition (see for example, DiNardo
et al., 2006; Millan and Macours, 2019; Wooldridge, 2002), little consensus on ex-ante methods
of reducing attrition have been covered in the literature. One notable exception is Olsen (2005),
who discusses design features relevant to attrition reduction in the National Longitudinal Survey


                                                 33
of Youth. Thomas et al. (2012) also discuss planning for attrition and protocols for reducing
attrition. They find that success in tracking movers depended not only on observable characteristics
of respondents, but also the characteristics of interviewers who initially interviewed respondents.
Reducing coverage bias due to attrition is likely to be most successful not simply when surveys
are designed to track respondents who have moved, but also when initial interviews collect
tracking data and interviewers are trained to establish connections with survey respondents.



   4.      Empirical Specification, Data Structure, and Measurement Error

For any empirical analysis, the set of theoretical models that can be tested is defined by the
available data. Each data set has its own data structure, which we defined above as the full set of
survey design choices that comprise the data production process, including sampling,
questionnaire design, and fieldwork implementation choices. National production surveys such as
agricultural censuses imply a specific subset of production models that can be tested. Household
surveys that integrate agricultural data, such as LSMS-ISA surveys, are implicitly informed by
producer models or agricultural household models, but measurement error or coverage bias can
reduce the precision and utility of estimates and restrict the set of testable models. In this section,
we review trade-offs in the empirical specification of agricultural models and data requirements.
We discuss how survey design choices that increase data coverage present a trade-off in potentially
increasing measurement error in prominent empirical models.


While we cannot review the interaction of data structure and empirical specification across all
prominent models in agricultural economics given the focus of this paper, it is illustrative to choose
a few common specifications to demonstrate how innovations in data structure have expanded the
set of testable models. For this purpose, we review examples from the profit and production
function literatures and the agricultural household model literature. The estimation and
improvements in estimating these models over the last few decades as international household
surveys have emerged are directly cited as motivation in Ghosh and Glewwe (2001) among others.


   4.1.    Profit and production functions


                                                  34
A large literature examines models of the producer problem (Chambers, 1988; Chambers and
Quiggin, 2000; Mundlak, 2001) and the specification of the agricultural production function (Pope
and Just, 2001). Pope and Just (2003) provide a summary of production technologies and their
functional forms. In this earlier production literature, measurement error and coverage bias were
central concerns in the field. Pope and Just (2003) specifically discuss coverage bias and its effect
on production function specification, as well as the modeling of measurement error. Aggregated
district or national data sources led to misattribution of the returns to inputs, as the unit of analysis
in the data was not at the producer level where profit-maximizing decisions were made in the
theoretical model.


Measurement error due to unobservable decision variables is also a source of bias in production
function estimation, but distinguishing measurement error from unobserved heterogeneity and
potential misallocation is challenging. Yields can be biased by output or land size measurement as
noted by Abay et al. (2020). Inputs such as fertilizer or labor can be biased by errors in both
quantity and quality over the relevant recall period. In the case of livestock production, inputs such
as medical care and feeding practices may be difficult to attribute within herds. Measurement error
in these input and output variables is likely correlated with unobserved heterogeneity in farmer
ability. As agricultural production is also characterized by stochastic disturbances such as weather
shocks which require similar modeling assumptions to address unobserved farmer heterogeneity,
error terms capture multiple sources of stochastic shocks. In principle, researchers can model such
errors in the producer problem depending on the data structure.


Pope and Just (2003) consider the case where measurement error found in demand and supply
functions is uncorrelated between input and output – i.e., ������������ = ������������ (������������ ∗ , ������������, ������������), as opposed to ������������ =
������������(������������ ∗ + ������������ , ������������ , ������������) where input disturbances affect output directly. The latter case is called errors in
optimization or misallocation, where disturbances are interpreted as errors in decision making.
Pope and Just (2003) distinguish misallocation from additive measurement error, their errors in
variables case, and the errors in uncontrolled conditions case, when disturbances are modeled as
errors that affect production after producer decisions are made. For the producer problem,
������������(������������ , ������������ ) = max{������������������������ [������������������������ (������������, ������������, ������������ ) − ������������������������]}, the supply and demand equations are ������������ = ������������ ∗ (������������ , ������������ ) + ������������ +
                   ������������

������������ and ������������ = ������������ (������������ ∗ + ������������ , ������������ , ������������ ) + ������������ where ������������ is misallocation, ������������ is the errors in measurement of inputs

                                                                         35
that do not affect outputs, ������������ represents stochastic production shocks such as weather, and ������������
represents measurement of outputs. Using agricultural data from the United States, Pope and Just
(2003) estimate the model, finding no evidence for measurement error, but cannot reject
misallocation.


In a similar spirit, Gollin and Udry (2021) model measurement error, unobserved heterogeneity,
and misallocation using panel data from Tanzania and Uganda. Several important identification
challenges are addressed due to data structure, in particular, a unit of analysis at the farmer-crop-
plot-season level. After explaining differences in production across farms due to observable
differences, Gollin and Udry (2021) note that unobserved variation could be due to unobserved
land characteristics, risk, measurement error, or misallocation. With repeated panel data of farmers
over time, a production function, whose error term is disaggregated among these different
unobserved components, can be estimated. Gollin and Udry’s (2021) estimates suggest that
measurement error and heterogeneity explain two-thirds to three-quarters of productivity
differences, while misallocation affects productivity only modestly. In considering the advances
in the identification of measurement error and misallocation in the production function, we note
from a data structure perspective the trade-off between improved empirical specification of
misallocation and measurement error due to survey design. Recall of input allocations at the
farmer-crop-plot-season level is ideal, as it permits researchers to map inputs to outputs, but
measurement error may actually be increased if farmers do not recall inputs at the farmer-crop-
plot-season level. For example, farmers may make bulk fertilizer purchases within their household
that are then divided within the household and across the farmer’s plots. Precisely recalling the
amount of fertilizer applied to a farmer’s maize field relative to their inter-cropped legumes may
be impossible, even if the farmer knows exactly how much fertilizer was purchased in total.


We note that the Gollin and Udry (2021) identification strategy capitalizes on a farmer panel to
disentangle the effects of measurement error from misallocation and farmer unobservables, but we
also note that assumptions about the production technology, as in Pope and Just (2003), are
required. Estimates of misallocation using different production functions would certainly vary,
along with the level of measurement error estimated for each. This provides an important example
of the trade-offs between data structure and empirical modeling. Advances in farmer panels allow

                                                 36
Gollin and Udry (2021) to estimate misallocation and measurement error addressing farmer
unobservables through farmer fixed effects in their production function.


Coverage biases in profit and production functions also largely depend on the unit of analysis. In
national surveys, units of analysis for agricultural data include the household (Reardon and
Glewwe, 2000), the agricultural holding (FAO, 2016), or the plot (Carletto et al., 2016). When
land is recorded at the household level, aggregation bias and asymmetric information among
household farmers may cause landholdings to be misreported (Dillon and Mensah, 2020).
Coverage biases could be significant when commercial farms or large farms are omitted using
household sampling frames (Muyanga and Jayne, 2019). The improved coverage of plot-level
information may also increase measurement error in input reporting, as noted above. High-
frequency agricultural surveys are rare and as such, much knowledge about seasonality is due to
post-harvest recall rather than within a production season. Models of sequential agricultural
decision-making such as Fafchamps (1993) are few.


Foster and Rosenzweig (2010) discuss coverage and measurement error biases in the estimation
of profit functions, particularly in the inference of returns to inputs in technology adoption
problems. Cross-sectional inference of returns to fertilizer are biased, as farm heterogeneity and
low price variation make it difficult to disentangle the marginal product of fertilizer and land
quality in explaining differences in profits. Panel data with increased coverage of farmers across
seasons do not necessarily improve the identification of returns to inputs even if measurement error
is low, because multiple sources of unobserved heterogeneity remain as correlated biases in the
production function residual, such as plot unobservables, between-season misallocation, or
climatic variation which may be unobservable ex-post.


While coverage biases are a significant constraint in improving estimates of agricultural profit
functions, measurement error remains a significant bias, primarily due to labor recall, input quality,
and estimating the use of and return to agricultural assets. Akogun et al. (2020), Carletto et al.
(2012), and Dillon et al. (2020) document the challenges of agricultural labor recall in household
surveys. Not only is plot-level detail of person-days challenging for respondents to recall, but
agricultural wages are difficult to accurately measure in household surveys, since much


                                                 37
agricultural labor is household labor. Physical activity measurement of agricultural labor for
particularly physical tasks, may be one approach to better measuring labor quality or effort, a key
variable in much of the off-farm labor and contracting literatures. Differentiating between adult
and child labor on the farm is another important dimension of labor input quality. National surveys
of child labor are often conducted as standalone surveys, rather than integrated into agricultural
surveys. As much child labor is crop-specific, detailed child labor data is often difficult to collect
in national surveys. Input quality is also not limited to labor, as chemical inputs often face
questions of quality due to inappropriate mixing, as in the case of pesticide use/exposure and
fertilizer which may be adulterated (Michaelson et al., 2021, Norton et al., 2020). Asset ownership
and use also vary considerably by respondent type (Doss and Quisumbing, 2019). We note
advances in measuring labor and assets in section five.



    4.2.      The agricultural household model

A second class of models frequently used in agricultural economics link agricultural production
decisions with household welfare. We sketch an agricultural household model to motivate
measurement error and coverage biases when welfare analysis of production decisions is an
empirical objective. In cases where separability is assumed, the model reduces to a profit
maximization and utility maximization problem given production choices. The agricultural
household model is a useful example of trade-offs in measurement error and coverage because we
implicitly cover a range of producer models within the agricultural household model.


Household decisions are constrained by an agricultural production function, time endowment, and
an intertemporal budget constraint (see Bardhan and Udry, 1999; LaFave et al., 2013; Singh et al.,
1986). The household’s problem is to choose own-produced agricultural goods (������������������������������������ ), purchased
market goods (������������������������������������ ), agricultural inputs (������������������������ ), and leisure (������������������������ ) to maximize the discounted stream of
expected utility, given observed (������������������������ ), and unobserved household characteristics (������������������������ ).


In a non-separable formulation of the agricultural household model, production factors such as
input prices also influence the household’s consumption choices. Coverage biases may exist in the
collection of input price data if household surveys do not measure market prices faced by farmers.

                                                            38
Imputed input price data for fertilizer, seed, or pesticides/herbicides ignore substantial price
variation within input class correlated with product quality and efficacy.


Equation 1 provides the reduced form purchased market goods demand, which can be derived from
the first order condition:
������������������������������������ = ������������������������������������ (������������������������������������������������ , ������������������������������������������������ , ������������������������������������ , ������������������������+1 , ������������������������ (������������������������������������������������ , ������������������������������������������������ , ������������������������������������ , ������������������������������������ ; ������������������������ ), ������������������������������������������������ , ������������������������������������������������ , ������������������������������������ , ������������������������������������ , ������������������������������������ , ������������������������������������ ; ������������������������ , ������������������������ )
(1)
where good m consumption depends on market (������������������������������������������������ ) and own produced agricultural good prices
(������������������������������������������������ ), the price of variable inputs (������������������������������������ ) such as agricultural labor, fertilizer, pesticides, or
herbicides, interest rates (������������������������+1), farm profits (������������������������ ) conditional on climate variability (������������������������ ), exogenous
income (������������������������������������ ), and future prices via the marginal utility of wealth (������������������������������������ ). Consumption also depends
on household characteristics, both observed (size and composition) and unobservable (food
preferences). Input prices affect household consumption when markets are incomplete, and we
cannot assume that income alone affects household consumption demand. Therefore, the
consumption demand equation includes not only variables that affect household income, but also
those that affect production decisions.


While we have discussed above the challenges in measuring agricultural variables in the demand
equation, we now highlight coverage biases and measurement error in the estimation of equation
1. First, coverage biases could have significant effects on consumption demand when consumption
is measured substantially after agricultural variables are realized and/or uses a different reference
period. For example, annual household surveys that record production data from the last
agricultural season may be lagged by months relative to household food consumption data, which
is often recorded for the last seven-day reference period. Second, measurement error in food
consumption aggregates can be substantial, due to the conversion of non-standard units and the
subsequent imputed food prices (Oseni et al., 2017). Deaton and Zaidi (2002) provide a detailed
description of consumption aggregate choices. As documented by Beegle et al. (2012), survey
design choices related to different recall periods and survey modes (such as diaries versus in-
person recall) have substantial effects on measured household consumption and consequently on
imputed food prices and welfare. Third, an important specification issue in the demand equation
is the inclusion of prices of consumption goods, own-produced goods, and inputs. As agricultural

                                                                                                                                                        39
surveys are often collected at a single period of time, capturing the relevant prices to correctly
specify equation 1 could result in substantial measurement error given seasonal price fluctuations
in both inputs and outputs.


In the next section, we discuss advances in measuring agricultural variables that are paramount to
the producer and agricultural household models described above, but also to a wide set of models
in agricultural economics that are beyond the scope of this paper. We note that the two models
chosen in this section are examples, but issues of identification, measurement error, and coverage
are not limited to producer and agricultural household models. Advances in data infrastructure
improve internal and external validity by expanding the possibilities for improved identification
and coverage, providing data sources for testing a wide range of potential empirical models.



   5.      Advances in Data Collection
The combined availability of new data sources, affordable computing power and data storage
options, and digital technologies allowing for innovative modes of data collection (such as mobile
and smart phones, tablets, and sensors of all kinds) have created a new data landscape with novel
opportunities for more accurate, affordable, and timely data collection (Hill et al., 2019). In some
cases, new data collection modes or innovations may help correct for existing biases – for example,
measuring land area using GPS alongside farmers’ self-reported information – while in others,
they may introduce new biases via under-coverage or response biases – for example, phone surveys
or citizen-generated data (Amaya et al., 2019; Hill et al., 2019). Integration of new data collection
modes with household surveys requires assessing trade-offs between cost, measurement error,
coverage bias, and knowledge generated from testing new empirical models. A recent surge in the
number of survey experiments, including on issues related to agriculture and food, has greatly
contributed to making rigorously evaluated progress in survey design in areas of interest to
agricultural economists (De Weerdt et al., 2020).


In sections 2, 3 and 4 of this paper, we covered measurement error, coverage bias, and provide
some examples of the expanded frontier of empirical models. In this section, we discuss advances
in data collection with a focus on their impact on reducing measurement error and/or increasing

                                                 40
coverage. 4 With this context in mind, we organize this section around (1) advances in specific
thematic areas of relevance to agricultural economists, and (2) modes or data structures that
provide new solutions and address challenges to reducing error and increasing coverage. For both
topics, we highlight how these advances speak to the issues highlighted in the previous sections,
including Total Survey Error, bias, measurement error, coverage, inter alia. Many recent advances
in data collection have resulted from addressing constraints to data collection in low- and middle-
income countries, but we also highlight experiences from high-income settings to emphasize how
these issues are in fact globally relevant.


    5.1.     Advances in selected thematic areas

        Land area measurement

Recent evidence from studies in Africa (Abay et al., 2019a; Carletto et al. 2017b, 2016; Desiere
and Jolliffe, 2018) and Asia (Dillon and Rao, 2021) that included both GPS and self-reported
measures of land area, all following the type of survey experiment set-up advocated in De Weerdt
et al. (2020), found remarkably consistent presence and patterns of non-classical measurement
error in farmers’ self-reporting. The superiority of GPS with respect to self-reported measures has
been confirmed by studies that also included the more expensive compass-and-rope method, such
as Carletto et al. (2016) and Dillon et al. (2019). The integration of hand-held GPS devices in
survey work for land area measurement has since become commonplace to overcome this
prototypical example of respondent effect. While GPS measurements are not entirely free of error
(Dillon et al., 2019; Cohen, 2019), the associated measurement error is larger in relative terms for
very small plots but is not correlated with land size (Carletto et al., 2017b, 2016).


Innovation is now proceeding in the direction of integrating GPS measurement in CAPI
applications through the testing of features allowing plot delineation on preloaded satellite imagery
(Masuda et al., 2020) or on printed high resolution imagery (Dillon and Rao, 2021), or through the
use of the GPS receivers integrated into interviewer tablets for in situ land area measurement. In
geographies where accurate knowledge of land area by respondents is commonplace, similar

4
 A more comprehensive, and prescriptive, treatment of agricultural survey design choices is provided in Dillon et
al. (2021a), which builds on the guidance in Glewwe and Reardon (2000).

                                                        41
technological developments are being pursued to enable the efficient delivery and evaluation of
programs tied to land area and land use, such as the Land Parcel Identification System (LPIS) in
the European Union (Pluto-Kossakowska et a., 2008; Devos 2011; Tarko et al., 2015), or in the
area survey implemented by the National Agricultural Statistics Service (NASS) in the United
States, which is also successfully experimenting with the use of a mobile plot delineation
application (Abreu et al., 2017).


In the coming years, these developments can be expected to be brought to scale to address some
of the drawbacks of measuring land area with GPS units, such as the cost of plot visits and the
inability to measure all plots, particularly those that are more distant or particularly large. While
in situ GPS measurement certainly reduces bias, some of these concerns about item non-response
can be mitigated through imputation methods, which have been shown to effectively predict GPS
plot measures for all plots by using farmers’ self-reporting alongside other plot characteristics
(Kilic et al., 2017), or by further technological development if plot delineation on high resolution
imagery can reduce the drudgery of the field visit that typically plagues GPS measurement.


       Agricultural output and yields

Recent empirical work has reviewed the quality of agricultural output data, related to both the
level of data collection as well as biases in farmers’ self-reporting of agricultural output. Abay et
al. (2019a), Desiere and Jolliffe (2018), Gourlay et al. (2019), and Lobell et al. (2020) all point to
the presence of non-classical measurement error in farmer’s self-reporting of crop output, with
farmers substantially over-reporting production on small plots and under-reporting production on
larger plots. Currently, these biases can be corrected through the use of crop cuts on sub-samples,
and looking ahead, through Earth Observation data calibrated with ground-truthing from field
observations. Two levels of integration will be key to moving the agenda forward: integration
between subjective (recall) and objective (crop cuts) data, and between ground and satellite data.
Where available, administrative data can also be combined with survey data (as well as with
satellite imagery and climate data) to produce disaggregated model-based yield estimates (see for
instance Erciulescu et al., 2019 for county-level yield estimates in the United States).




                                                 42
Meanwhile, challenges persist in the measurement of yields in fields using mixed or inter-cropping
planting techniques (Dillon et al., 2020; Wineman et al.; 2018). Estimating land area apportioned
to a specific crop as well as its production is particularly difficult. Most household surveys
acknowledge the complications of production and input estimates on inter-cropped plots by
identifying these plots and apportioning the area planted, to divide reports of plot-level inputs by
production reported by crop. However, proportional input attribution implies crop input demands
including fertilizer, weeding, and harvest time are similar by crop, which may not always be an
accurate assumption.


The Global Strategy to Improve Agricultural and Rural Statistics provides methodological
guidance on implementing the above methods to measure the area under a given crop in inter-
cropped systems (GSARS, 2018). Unfortunately, guidance on best practices supported by evidence
from methodological survey experiments is not currently available. Remote sensing or crop cut
production estimates are possible alternatives, but these measures are also challenging to
implement. For instance, crop cutting, in addition to its high costs due to the need for closer
supervision and multiple visits over the growing period, can only be done in a very restricted time
window which may be difficult to plan correctly in a large survey operation. It also carries
implementation difficulties that are associated with specific error generation mechanisms
(Kosmowski et al., 2021). Furthermore, Wahab (2020) find a substantial discrepancy between
crop-cuts and self-reported output measures, which he ascribes in part to the variability in crop
performance within plots, leading to plot area loss in the course of the season.


Yield prediction models based on remote sensing data clearly face bigger challenges the smaller
the plots and the more complex the cropping patterns, particularly related to the degree of
intercropping or the presence of canopy cover. Lobell et al. (2019) report lower accuracy of
remotely sensed production estimates compared to crop cut production estimates for maize inter-
cropped plots in Uganda. However, they also clearly show the benefit of properly calibrating the
spatial model through accurate ground-truthing based on high-quality crop cutting, even if only on
a small sub-sample of plots. Řezník et al. (2020) compare yield predictions from satellite data with
measured yield data on spring barley, winter wheat, corn, and oilseed rape in the Czech Republic,



                                                43
finding the yield predictions to be credible, with only two out of nine measures reporting
differences between measured and predicted yields larger than 5 percent.


         Agricultural labor


SDG 2.3, which defines productivity in terms of output per unit of labor, has increased attention
to the measurement of labor productivity. At the same time, results from survey methods research
have unearthed the staggering magnitude of recall bias (a respondent effect) in measures of
agricultural labor, with one influential study showing hours worked per person-plot being 3.0 and
3.7 times higher in recall surveys compared to benchmark estimates based on weekly visits (Arthi
et al., 2018).


Agricultural labor data have been typically sourced through labor force surveys or national
censuses (with information generally limited to the primary occupation) and used primarily in
aggregate-level productivity analysis and macro-level comparisons of national agricultural GDP
with labor shares. The availability of higher quality labor data in the last decade has raised
questions about the validity of evidence that shows a six-fold labor productivity gap between
agriculture and non-agricultural sectors of the economy (Gollin et al., 2014). Studies that use more
carefully collected labor data from household surveys have shown that the measured labor
productivity gap is substantially reduced when data allow for measuring production per hours
worked, as opposed to just per person per year (McCullough, 2017), and for individual fixed effects
(Hamory et al., 2021). In the US, where data on agricultural labor are collected via a dedicated
survey, farm labor hours have historically been difficult for respondents to report, as a low
percentage of operators based their responses on formal records (National Research Council, 2008;
Ott, 1999). Difficulties in this case refer also to capturing the complexity of the pay structure,
recording information on different tasks, since many agricultural workers perform multiple tasks
on the farm (Ridolfo and Ott, 2021), and collecting data on contract workers (Ridolfo and Ott,
2020).


Advances in the measurement of labor inputs in recent years have been based on both technology-
enhanced and low-tech innovations, including by leveraging mode of data collection to ease the

                                                44
cognitive response burden. Notable technology-enhanced innovations include the use of mobile
phones for high-frequency interviews (Arthi et al., 2018; Dillon, 2012), and the use of wearable
accelerometers for the measurement of physical effort (Akogun et al., 2020). Arthi et al. (2018)
find that phone surveys can be a more accurate alternative to face-to-face interviews for measuring
labor inputs, and this finding remains consistent when the research question calls for collecting
high-frequency data or repeated measures. In such cases, the cost of additional phone interviews
is a fraction of the cost that would be implied by additional face-to-face visits (Table 1).


Table 1. Per-Household Interviewing Cost Increases




Source: Arthi et al., 2017.


Akogun et al. (2020) measure the physical activity of sugarcane cutters using accelerometers,
which is a direct measure of effort in their piece rate wage setting. They find a high correlation
between administrative data on output per worker recorded by the firm and the worker’s physical
activity, as well as large changes in the intensity of such activity in response to malaria testing and
treatment. Integrating objective physical activity measures into a sub-sample of observations in
national surveys may be used to calibrate biases in reported time as well as to predict effort-based
measures of agricultural labor productivity.


Aside from the mode of data collection, substantial recent advances in methodologies relate to the
key set of survey design choices in agricultural labor measurement. Bardasi et al. (2011)
investigate how survey design elements such as screening questions and proxy response results in
biased estimates of labor force participation, hours worked, and income by gender and sector of
employment. Female labor participation statistics are not affected by the use of proxy respondents

                                                  45
in their survey experiment from Tanzania, but male employment rates are, due to the under-
reporting of agricultural activity by proxy respondents. Using data from Malawi, Kilic et al. (2020)
find that employment is further under-reported when recall periods increase and when women are
the subject of proxy reporting. Recent advances in data collection software and the ubiquitous use
of CAPI can also make it easier to avoid another source of coverage-related bias unearthed by
Ambler at al. (2020). They show that the fact that household members are not listed randomly in
the labor module, coupled with respondent fatigue, leads to age and gender related biases in
employment measures. Software that allows for randomizing the ordering of household members
when collecting data on the labor module can mitigate this source of systematic bias, as can
avoiding the use of proxy respondents. Avoidance of proxy respondents to minimize measurement
error, however, can potentially lead to greater errors of coverage.


The effects of different recall periods for measuring agricultural labor are investigated by Arthi et
al. (2018), who use Tanzania data to compare weekly agricultural labor reporting with end-of-
season reporting. The latter is associated with a fourfold increase in the hours reported by
individuals at the plot level, in comparison to reports obtained via weekly visits, their preferred
benchmark. However, they note that aggregation to household-level reporting causes the
differences in reported hours between the weekly and end-of-season recall periods to disappear. In
interpreting these findings, the authors note how different recall biases are associated with memory
decay (which shorter recall would help address), but also by the mental burden of reporting that
varies by the level of aggregation. In their study, aggregating plot-person hours to the household
level happens to compensate for competing biases arising from over-reporting at the intensive
margin and under-reporting at the extensive margin. However, this is not a result that can be
extrapolated to other settings. Understanding the level of disaggregation at which individuals
provide the most accurate reports on their agricultural labor inputs should be an area of focus for
future research. Research by Gaddis et al. (2020) in Ghana find much less dramatic differences in
the magnitude of the recall bias compared to Arthi et al. (2018), but also discover that an important
source of bias is the omission of plots and farm workers at the listing stage, which can be mitigated
by explicit attention to this specific aspect of survey design.




                                                  46
In the United States, a substantial amount of randomized testing (Reist et al., 2019) and cognitive
interview piloting (Ridolfo and Ott, 2020; Ridolfo and Ott, 2021) is routinely devoted to testing
innovations aimed at easing response burden and addressing complex questions about workers’
remuneration and tasks. The findings suggest that the optimal design of instruments to collect labor
data will likely require a fair amount of adaptation based on the context and the intended use of
the data. For instance, while respondents in the United States appear comfortable separating base
and overtime hours, they had difficulties distinguishing base pay from bonuses – the concept being
hardly applicable to respondents paying piece rate (Ridolfo and Ott, 2021). For low-income
settings, Sagesaka et al. (2020) have systematized recent findings from survey research into
practical data collection guidance for survey practitioners.


       Non-labor inputs

One empirical regularity that has recently come to the fore is that measurement error in land area
is strongly correlated with farmers’ self-reporting of their application levels of agricultural inputs
(Abay et al., 2019b; Bevis and Barrett, 2019; Burke et al., 2019). These patterns in the data
naturally raise questions on the mechanisms that drive the relationship between non-classical
measurement error (NCME) for land area and self-reported input application rates. One such
mechanism could be that farmers have a mental heuristic for input application rates and thus self-
report, for example, seed or fertilizer quantities based on the amount of land they believe they
cultivated, along the lines of the optimal error prediction model of measurement error. Such a
heuristic is easy to imagine in the case of fertilizer or seed, for which extension agents and
agricultural input dealers commonly offer recommendations in the form of application rates per
unit of land cultivated. If this is indeed the mechanism behind the observed correlation between
area NCME and agricultural input levels or application rates, it could imply either of two
possibilities. On the one hand, NCME in land area might propagate into NCME in agricultural
input data – that is, the measurement error in inputs would merely reflect the error in land area,
permitting statistical correction using observed area measurement error. On the other hand, land
area NCME could actually affect agricultural input use by farmers, if farmers’ decisions on input
intensity are based on misperceived land area (Abay et al., 2019b). Eliciting input use information
after the collection of objective land area measures to better understand how the mental heuristic


                                                 47
of optimal application rates may be influencing farmers’ self-reporting is a key methodological
research area for improving data collection on input use.


Aside from application rates, measuring the quality of inputs is an important and often unobserved
characteristic of input investments. The fact that input quality is often not directly observable poses
a problem not only for the analysis of agricultural productivity, but also for farmers in making
decisions on input use. Perceived quality may influence input demand and use more than actual
attributes of quality. Such questions have been difficult to explore until recently, as economists
have begun complementing traditional data collection from farmer respondents with laboratory
analysis. The latter is also not free from error, however. An early study by Bold et al. (2017)
finding widespread problems with nutrient quality in Uganda has since been contradicted by a
series of large-scale sample surveys finding limited evidence of widespread quality issues in
synthetic urea in East Africa. There is also evidence that perceptions of quality are influenced by
other factors that in turn influence productivity, such as rainfall patterns (Hoel et al., 2021;
Michaelson et al., 2020; Ashour et al., 2019a; Sanabria et al., 2018). Collecting better data on both
perceived and actual fertilizer quality is essential to explain farmers’ behavior with respect to their
adoption, and the extent to which possible remedial action for low levels of fertilizer use may come
from certification or the use of other policy levers (Hoel et al., 2021).


For herbicides, Ashour et al. (2019b) find that there are widespread quality issues with the
herbicides available in local markets in Uganda, but that farmers’ perceptions of poor herbicide
quality are overstated, and poorly correlated with actual measures of product quality from
laboratory testing. Prices correlate with measured quality, but very weakly. In a technical report
using the same data set, Ashour et al (2019a) report poor correlation between tests in two different
labs and ascribe the difference to flawed procedures in one of the facilities, a reminder to
researchers that ‘objective’ measures conducted with the aid of technology are, as with any
measurement operation, not immune from error.


In countries that have administrative data systems around the use of agricultural inputs such as
pesticides, these offer the potential to be combined with survey data to both improve the accuracy
of the data compared to respondents’ recall, while also reducing the burden on survey participants.

                                                  48
This is for instance the case in the United States, where at least some states (Arizona, California)
are using data from mandatory pesticide use reporting systems instead of asking farmers (NRC,
2008). However, these methods may be more difficult to implement when the objective is to collect
crop or field level data: in such cases, the US National Agricultural Statistical Service (NASS),
collects data from respondents on one randomly selected field for selected crops of interest (NASS,
2021). A similar type of use of multiple data sources may also be more difficult to implement in
poorer countries, where administrative data systems suffer from low quality and credibility.


These studies are examples of ways in which administrative or market-level data collection can be
combined with household-level survey data to provide evidence on the use and quality of inputs
available to farmers. In terms of our conceptual framework, this implies efforts towards improving
the accuracy of input quality (via objective testing) and quantity (via the use of administrative
records) estimates as well as the coverage (e.g. via market-level sampling for quality testing which
can be linked to farm level behavioral variables), but also to the collection of additional (omitted)
variables related to farmers’ perception of quality, as these may be only tenuously linked to actual
quality attributes.


        Soil quality and soil health

Stevens (2018) writes that soil health “is a straightforward concept in the abstract, but difficult to
define in practice”. Not only do soils have many attributes that require multiple, complex
measures, but these attributes are also interdependent, and the attributes (or their combinations) of
significance can vary depending upon the application for which an assessment of soil health is
needed.


In Europe, the ‘Land Use/Cover Area Frame Statistical Survey Soil’ (LUCAS Soil) is a regular
topsoil survey that is implemented every three years on approximately 20,000 soil samples
collected across the European Union (Orgiazzi et al., 2018). The United States Department of
Agriculture’s Natural Resource Conservation Service (NRCS) maintains a century-old soil survey
of the United States (NRCS, 2021). While both these data sets have relatively good national
coverage and are spatially explicit, their use in conjunction with the main farm surveys in the


                                                 49
European Union and United States for economic and policy analysis remains limited, partly due
to the difficulty of overcoming confidentiality concerns in data dissemination preventing record
linkage across data sets (NRC, 2008).


In low-income settings, where large-scale soil surveys are not usually available, recent research
has cast serious doubts on the reliability of farmers’ self-reporting on soil quality and soil health,
with findings for Ethiopia (Carletto et al., 2017a; Kosmowski et al., 2020a), Kenya, and Tanzania
(Berazneva et al., 2018) consistently finding poor or no correlation between farmers’ assessments
of soil quality and objective measures based on lab analyses or portable spectrometers. Unlike land
area measurement, there are no clear systematic biases emerging in the case of soil quality
attributes; the concern is mainly with the lack of explanatory power of the traditional measures
relying on farmers’ assessments. While some predictive power has been reported for soil type
(Berazneva et al., 2018) and soil color and texture (Gourlay, 2017), the reported correlations are
very weak.


Efforts to pilot the use of portable spectrometers for in situ objective measurement of key soil
health features such as organic carbon, PH, nitrogen, potassium, and clay percentage have been
shown to perform well when compared to Conventional Soil Analysis (Carletto et al., 2017a;
Kosmowki et al., 2020; Vasques et al., 2020). While portable spectrometers are not nearly as
widely available as GPS units, their cost and weight are expected to decline rapidly as technology
advances, making the prospects for their use at scale ever more attractive, particularly when soil
attributes are important for the research question at hand. In lieu of field-ready soil sensors, some
survey efforts have moved towards smartphone-based soil assessments such as LandPKS (Herrick
et al., 2013), but these have largely been on pilot-level or small-sample surveys (see for example
Nord and Snapp, 2020).


The other related avenue through which advances in soil health data can be expected to rapidly
materialize is the integration of remote sensing data with georeferenced survey data. The
correlation between available modeled georeferenced data such as AfSIS (see Hengl et al., 2015
for details) has been shown to be encouraging but far from perfect, particularly when there are
high variations in soil quality within a given geography (Gourlay et al., 2017). As more objectively

                                                 50
measured ground data on soil health is collected and used to train models based on Earth
Observation data, however, the quality of the modeled data will increase (Kosmowski et al.,
2020a).


       Agricultural machinery and farm implements

Agricultural capital in the form of machinery and farm implements can increase the production
capacity of smallholder farmers. Understanding the mechanization of agriculture is critical to
understanding changes in farm size and profitability over time. While it is generally regarded as
easy for farmers to recall agricultural capital within the household, the plot-level attribution and
control of such capital are measurement challenges. Plot-level attribution of machinery use is often
avoided, as it may be assumed by the survey designer that agricultural capital is shared equally in
the household.


A large literature on women’s empowerment in agriculture has focused on accurately measuring
women and men’s ownership of assets relative to their use rights (Alkire et al., 2013). Doss and
Kieran (2014) provide a comprehensive review and guidelines for collecting gender-disaggregated
asset data which apply generally to agricultural capital modules. Kilic and Moylan (2016) provide
experimental evidence on the effects of variation in respondent selection protocols and
questionnaire design compared to commonly used approaches for eliciting information on the
individual ownership of and rights to assets. These studies all emphasize the importance of
respondent selection and the method of collecting ownership and use rights. Lessons learned from
this body of work have been consolidated in the recently published Guidelines for Producing
Statistics on Asset Ownership from a Gender Perspective (United Nations, 2019). Data from the
machinery and farm implements modules can be linked to plot-disaggregated production and other
inputs modules to assess differences in the intra-household allocation of inputs (Udry, 1996).


Recall periods for agricultural machinery and implements usually focus on the availability of assets
over the previous 12 months. Differences in input use by crop-plot-season are important to capture,
but this may not be possible if the frequency of survey administration is annual rather than




                                                51
seasonal. The age of machinery is usually collected with the intention of calculating depreciation,
but much depreciation of machinery depends on their maintenance and frequency of use.


        Crop variety identification

Possibly the most important technological choice farmers face is that of choosing which crop, and
specifically which crop variety, to plant. A good proportion of the budget for agricultural research
globally is directed at breeding crops and livestock with desirable traits. While the uptake and
impact of improved varieties has traditionally been collected by eliciting information from either
farmers or panels of experts, the shortcomings of such methods have become evident in the past
decade; as a result, they are gradually being replaced or combined with more objective methods
(Maredia et al., 2016; Stevenson et al., 2018; Wossen et al., 2019). The method that is currently
being more widely adopted is DNA fingerprinting, which entails the collection of plant material
that is subsequently sent for lab analysis. While logistically cumbersome, its implementation has
been shown to be possible at reasonable scale, and protocols for its adoption are emerging (Poets
et al., 2020).


Asking farmers to identify the crop variety they are planting has often been shown to be utterly
inaccurate, even when augmented with photo aids or phenotypic trait-related questions aimed to
improve the accuracy of the data. This holds true for different crops across different settings,
including sweet potato (Kosmowski et al., 2019), wheat, maize, barley and sorghum in Ethiopia
(Jaleta et al., 2020; Kosmowski et al., 2020b; Yirga et al., 2016), cassava in Ghana and beans in
Zambia (Maredia et al., 2016), maize in Uganda (Kilic et al., 2017) and Tanzania (Wineman et al.,
2020), and cassava in Vietnam (Le et al., 2019), Colombia (Floro et al., 2017) and Nigeria (Wossen
et al., 2019). A few studies report more encouraging self-reported results, with farmers in
Bangladesh being most able to discern modern from traditional varieties for both rice (Kletzschmar
et al., 2018) and lentils (Yigezu et al., 2019). The latter study is also of interest in that the panel of
experts was, on the contrary, found to overestimate adoption by 89 percent compared to DNA
fingerprinting.




                                                   52
Even in the studies where farmers’ self-reporting is close to the objective benchmark, DNA
fingerprinting was found to have advantages for the analysis of determinants of adoption (Yigezu
et al., 2019) as well as for detecting lack of authenticity in modern varieties present in seed markets
and in the field (Kletzschmar et al., 2018). When technology adoption is an important component
of research design, researchers should consider adopting DNA fingerprinting as a data collection
method. The option of conducting such objective, yet more costly, measurement could be more
routinely considered on a sub-sample or for priority crops of interest. When field visits for area
measurement or crop cuts for output measurement are being performed, the research design can
exploit significant economies of scale by performing additional tasks during the same visit to the
plot. This does pose other constraints to data collection processes, as such field work needs to be
performed within a specific time window (i.e., while crops are still in the field). Ethiopia has been
able to incorporate DNA fingerprinting at scale in a national socioeconomic survey for three main
crops: wheat, barley, and sorghum (Kosmowski et al., 2020b). Barriga and Fiala (2020) use DNA
lab analysis to investigate seed quality along the seed supply chain, looking at genetic variation,
physical purity, and performance, focusing for the latter on germination rate, moisture level, and
vigor. This allows them to identify issues with the handling and storage of seeds, rather than
counterfeiting or adulteration.


In addition, Kosmowski and Worku (2018) report promising results for the use of spectrometers
for varietal identification on cultivars of barley, chickpea, and sorghum in Ethiopia, with an overall
correct classification accuracy of respectively 89, 96, and 87 percent in their sample. Sinha et al.
(2020) report similarly encouraging results from a study on banana varieties in Uganda by
extrapolating the ground-based hyperspectral measures to high-resolution satellite imagery,
therefore creating the potential of mapping the distribution of banana varieties at a higher spatial
resolution. This is an exciting area of innovation which is currently at the experimental stage but
is likely to become mainstream over the next few years, provided validation efforts continue and
implementation protocols are devised.


       Measurement of farm level food losses




                                                  53
While research on food losses has increased in recent years, the available data is extremely
heterogeneous with respect to the measurement approaches used, the stages of the value chain
investigated, and the conceptual framework adopted. Bellemare et al. (2017) propose a different
conceptualization of food waste from that used by others in this domain, whose estimates of food
losses would be largely overestimated according to their definition (Table 2).


Table 2. A Comparison of Quantity and Cost Estimates of Food Waste Across Definitions




Source : Bellemare et al., 2017.


In the existing literature, storage is the stage of the value chain where most food losses are
concentrated (FAO, 2019). 5 Xue et al. (2017) attributes differences in food losses to different
storage conditions, and research from Bachewe et al. (2018) and Minten et al. (2015) also point to
the importance of storage losses. Despite the interest and prominence that the debate on food losses
has acquired, data of sufficient quality and robustness on storage losses is lacking, hindering the
design and implementation of interventions to reduce them systematically and at scale.


Comparisons between objective and self-reported measurements of food losses routinely find
systematic differences between the two. While objective measures are more accurate, they are also
more costly, time-consuming (selecting, sorting, and weighing samples of grains), and logistically
challenging. Model-generated methods of estimation are therefore being researched, as they offer
a possibility to deliver measurements in a more cost-effective manner (FLW Protocol Steering
Committee, 2016). Model-based estimates could be used in conjunction with rather than as a
replacement for survey data, for instance, by estimating losses between survey rounds. These


5
 The discussion on food losses below draws on text provided by Marco Tiberti and FAO, based on unpublished
material.

                                                     54
estimates can determine storage outcomes by taking into account the effect of variables related
directly to storage conditions (e.g., the type of storage facility, the application of pest protection
products, or the moisture content at which the grain is stored) as well as contextual variables (e.g.,
weather conditions, crop variety, or farmer skills), and the interaction between the two. The
African Postharvest Losses Information System (APHLIS) is one example of the production of
losses estimates based on the modeling of agronomic and bio-physical relationships of factors
including the presence of rain at harvest time, as well as agricultural storage and marketing
practices.


       Livestock production and management

The bias of agricultural economists for crops over livestock is reflected in the relatively limited
efforts seen to date on developing better data collection methods for livestock (Barrett et al., 2008;
Kristjanson et al., 2014; Little et al., 2008; McCarthy et al., 2004; Pica-Ciamarra et al., 2014).
Most methodological work has been directed at pastoral or agro-pastoral systems, which is to be
expected, given both the specific challenges these systems pose to data collection and the
importance of livestock for people living in regions where pastoralism is prevalent. Recent work
in this area has focused on herd mobility, to address the challenges that it poses for enumerating
nomadic or semi-nomadic populations, as well as to study mobility patterns linked to the state and
management of natural resources (e.g., grazing, water) upon which livestock and their herders
depend.


For example, Himelein et el. (2014) conducted a pilot in the Afar region of Ethiopia to explore the
use of random geographic cluster sampling as an alternative to conventional sampling methods.
The approach is based on the random selection of points around which circles are drawn and all
eligible respondents found inside those circles interviewed. The approach aims to reduce the
under-coverage of mobile populations expected when samples are drawn based on lists of
dwellings within a primary sampling unit, as is typically the case for household surveys.
Otherwise, methods have not evolved significantly from the surveys at enumeration points (i.e.,
water, dipping or vaccination points, stock routes) and aerial surveys recommended by ILCA in
the 1970s and 1980s (FAO, 1992; GSARS, 2016; ILCA, 1990), except for the fact that these aerial


                                                 55
surveys can now also be implemented using higher-resolution imagery captured by drones
(Chamoso et al., 2014) or satellites. However, these methods are still in the experimental stage and
have not to our knowledge been applied at scale.


Advances in spatial data, both from satellites and on the ground, is creating opportunities for the
collection of data on the interaction between livestock, mobility, and natural resources. On the
ground, GPS trackers placed on cattle have been used to characterize the mobility of herds and
their use of rangeland resources (Bailey et al., 2018; Liao et al., 2018, 2017; Swain et al., 2011;
Turner et al., 2000), although few of these applications have appeared in economics journals. From
space, satellite imagery is being used to characterize the state of rangeland resources (Reinermann
et al., 2020) and we expect that the potential for applications in agricultural and natural resource
economics will expand dramatically as a result.


On improved measures of livestock productivity, recent studies led by economists are limited.
Specialized livestock surveys often select a random animal in the herd and ask questions about
that animal. In household surveys, this is not generally done, as the herd may not be present, and
a visit would add to the interview time. Livestock experts also tend to measure productivity using
the reproductive capacity of the herd, and thus their focus is on demographic parameters (Lesnoff
et al., 2014). For milk off-take, a methodological study conducted in Niger comparing different
types of recall to an objective measure provides some confidence in the accuracy of recall measures
(Zezza et al., 2016a). Other technologies, such as 3D and thermal cameras, are being used to assess
livestock weight and health (Song et al., 2018; Stajnko et al., 2008), but mostly by animal scientists
rather than economists or statisticians. Nonetheless, there is a clear potential for economic
applications to emerge, as the value of livestock is primarily determined by parameters linked to
weight and health, which are notoriously difficult to elicit from survey respondents. Guidance for
data collection on livestock in low-income countries has been systematized in recent years in
GSARS (2016) and Zezza et al. (2016b). Model-based estimates of livestock populations have
been developed by researchers at the FAO (Robinson et al., 2014) and are continuously being
updated as new spatial data sets become available and modeling techniques evolve (Nicolas et al.,
2016; Da Re et al., 2020).



                                                  56
        Land tenure

Holden et al. (2016) document that few low- and middle-income countries have nationally
representative data that can be used to understand how land tenure policies or tenure reforms may
affect land market activity, land productivity, technology adoption, or changes in the distribution
of farm size. Measurement challenges in this area are primarily related to the complexity of the
concept of tenure and the different set of rights that define it (FAO, 2002; United Nations, 2019),
as well as to the fact that different individuals may have different perceptions of tenure, particularly
in the case of joint ownership (Ambler et al., 2020; Kilic and Moylan, 2016). In high-income
countries, increasing challenges for data collection arise for more complex forms of ownership,
linked to the rise of corporate land ownership and of complex company arrangements for corporate
farms (National Academies of Sciences, Engineering, and Medicine, 2019; MacDonald, 2016).


With respect to adequately capturing the different dimensions of tenure, the consensus has
converged towards the need for survey data to cover a bundle of ownership rights, including
documented ownership, reported ownership, and the rights to sell and bequeath (United Nations,
2019), and survey instruments have been developed to implement this guidance (FAO et al., 2019).
In the United States, where corporate farms account for an increasingly important share of
agricultural value added, the Agricultural Census form includes a specific set of questions on the
type of farm organization (whether a Limited Liability Company) and their legal tax status (family,
partnership, incorporated, or other). The census information is then integrated with a separate
Tenure, Ownership, and Transition of Agricultural Land (TOTAL) Survey, which focuses
specifically on all land rented out for agricultural purposes, whether by farmers and ranchers
(operator landlords) or by non-operator landlords. Given the complexity of some operations, the
surveys face challenges in the definition of the landlord, the identification of the owners
particularly when incorporated, and in assessing the location and size (combined acreage) of
landowners (Hamer, 2016).


For household farms, whenever individual-level data are of interest, such as when the research
objective is to study gender gaps in productivity, wealth, or vulnerability, land ownership should
be reported by self rather than proxy respondents, owing to well-documented and large


                                                  57
discrepancies between proxy and self-responses. While research on the implication of different
possible approaches is still needed, the primary issue is the method of respondent selection, where
researchers increasingly favor interviewing multiple individuals per household. Approaches may
vary, and will also depend on the objective of the analysis, but they can be reduced to essentially
three options: (1) interview all household members, (2) focus on the members of the principal
couple if one is present, or (3) select a random age-eligible household member and his/her partner
if applicable (Doss et al., 2019). When multiple household members are interviewed, they should
be interviewed separately and whenever possible concurrently or consecutively, so as to avoid the
possibility of contamination in their responses (United Nations, 2019).


       Climate: weather events, perceptions of and adaptation to climate change

Climate data have experienced a revolution in recent decades which continues to the present day.
While climate and weather have always been central to explanations of agricultural productivity,
attention has increased with the emergence of debates on climate change, climate-smart agriculture
(Lipper et al., 2018), and index-based insurance (Benami et al., 2021; Carter et al., 2017; Jensen
and Barrett, 2017; Rosenzweig and Udry, 2014). Dell et al. (2014) and Auffhammer et al. (2013)
provide excellent reviews of the types of available climate data as well as their accompanying
measurement bias and coverage concerns, which economists should consider when relying on
climate data for making inferences. In terms of the production and availability of climate data,
there has been a surge in data from remote sensing and in situ sensors (which are discussed later
in the paper), as well as concerns in Africa and small island states regarding the decline in the
availability of traditional meteorological stations (Dinku, 2019; Dobardzic et al., 2019).


Weather data are commonly classified into four categories: ground station data, gridded data,
satellite data, and reanalysis data. Data from ground stations offer direct observation of key
weather variables, but their coverage is neither universal nor constant over time, with weather
stations being relatively sparse in many low- and middle-income countries. Additionally, their
coverage and trends are often related to the distribution of weather variables, posing estimation
problems similar to those of selective attrition. Gridded data provide complete coverage at
different resolutions by interpolating weather station data and assigning a value for weather


                                                58
variables for each cell on the grid. They present the desirable advantages of balanced panels, but
analysts should be aware that results will differ for different products, particularly for outcomes
that have greater spatial variation such as precipitation. The presence of missing values in the
underlying station data and the spatial correlation introduced by extrapolation algorithms all create
potential biases in the estimated coefficients and standard errors when gridded products are used
as independent variables in econometric analyses (Dell et al., 2014).


Satellite data use readings from satellite-borne sensors but do not directly measure weather events.
Their time series are shorter than those for station and gridded data (starting in the 1990s and
increasing since the 2000s), and their quality may not be uniform, due to changes in satellites and
sensor features. Reanalysis data combine information from other weather data sources and
elaborate them with a climate model to estimate (not simply interpolate) weather variables across
a grid. Analysts should consider whether such modeled data are preferable to interpolated gridded
data, given the objective of the analysis, and should be aware that the correlation across models is
often weak, particularly for rainfall data. Dell et al. (2014) and Auffhammer et al. (2013) provide
a more detailed discussion, while Michler et al. (2020) and Parkes et al. (2019) provide examples
of empirical applications testing the behavior of different gridded products as explanatory
variables in agricultural productivity analyses for India and Sub-Saharan Africa, respectively.


Analysts must also identify the most appropriate set of climatic variables to use when specifying
explanatory models for outcomes heavily dependent on climatic inputs. Advances have come from
the increased cross-fertilization between crop science and statistical models, which has expanded
the range of climate variables used in empirical analysis beyond standard rainfall and temperature
measures. Newly included climate variables include growing degree days (GDD) and extreme heat
degree days (EHDD) as well as measures to better account for humidity and evapotranspiration
such as vapor pressure deficit (VPD), wind speed, and sunshine duration (Roberts et al., 2013;
Zhang et al., 2017). Challenges remain for statistical models in accounting for the effects of carbon
dioxide (CO2) that accompany warming or the concentration of ozone (O3) that may be associated
with the burning of fossil fuels (Lobell and Asseng, 2017).




                                                 59
In parallel, there have also been increased efforts to capture both subjective perceptions of climate
change, as well as practices reflecting the adoption of adaptation practices by farmers. While
several researchers have engaged in collecting data with this objective in mind (e.g., Di Falco,
2011) there have been few attempts (McCarthy, 2011) to systematize data collection instruments
in this domain; as such, this remains an area in need of further development. However, recent
studies comparing self-reported data on weather events to recorded, observed weather data find a
very weak correlation between the two. More importantly, they find that self-reported weather data
are influenced by variables of interest such as the involvement in off-farm activities (Nguyen &
Nguyen, 2020; Waldman et al., 2019). Self-reported data hold more promise for investigating
perceptions and adaptation actions by farmers, whereas indicators referring to realized weather
events should be based on objective data whenever possible. Researchers of smallholder, rain-fed
production systems face particular challenges in achieving the granular resolution required for
conducting plot-level analysis of the determinants of productivity, yield variability, and other key
outcomes.


   5.2.     Advances in data collection modes and data structures

Earth Observation

The ever-increasing number of satellites orbiting the Earth has exponentially increased the
availability of satellite-borne sensors supplying a variety of data at high temporal and spatial
resolution. A classification of the satellite sensors categories, with their main features, as per the
classification of the European Space Agency is provided in Table 3.



Table 3: Classification of satellite sensor categories, based on the European Space Agency
(ESA) nomenclature




                                                 60
Source: GSARS (2017).



Remote sensing data are being used and adapted for countless purposes in farm management,
agricultural programs, agricultural statistics, and empirical agricultural economics. GSARS (2017)
provides a comprehensive overview of the uses of remote sensing in agricultural statistics,
including land cover mapping, the design of sampling frames, crop mapping, crop area and yield
estimation, and early warning systems. With Earth Observation data becoming available more
frequently and at increasing granularity, recent research has focused on facilitating and validating

                                                61
the use of these data for different cropping systems, at scale, and in a timely fashion (Defourny et
al., 2019). Recent studies have also focused on developing and validating methods and standards
for the efficient collecting of in situ ground-truthing data for model calibration (Azzari et al., 2021;
d’Andrimont, 2018; Paliwal and Jain, 2020).


For empirical applications in agricultural economics, remote sensing data offer the promise of far
greater accuracy, objectivity, temporal resolution, and coverage, than could be achieved through
traditional survey methods relying on farmers’ self-reporting. However, remote sensing data sets
are not immune from measurement error. Michler et al. (2020) identify three sources of error when
using remote sensing weather data in conjunction with survey data. Errors can be introduced
through the measurement technology, the algorithm to convert the measurement into a variable for
analytical use (e.g., rainfall), or the resolution of the data. Errors can also occur in linking remote
sensing data to the household, plot, or farm on which the analysis is run, as well as by using
variables that are not ‘fit for purpose’ from an agronomic perspective.


The use of remote sensing for crop area estimation and crop and yield mapping is now widespread,
with a continuous flow of new competing products being developed by public sector agencies,
academics, and the private sector, often in partnership. However, remote sensing presents specific
challenges in smallholder systems, which require high resolution and often incorporate inter-
cropping patterns that are hard to characterize based on satellite data (Burke and Lobell, 2017; Jain
et al., 2016; Jin et al., 2019; Rustowicz et al., 2019). Thus, remote sensing and ground data are
much more productively seen as complements rather than substitutes. The use of survey data,
particularly objective measures such as crop cuts, for ground-truthing and training models based
on satellite data can greatly increase the accuracy of the remote sensing predictions (see Lobell et
al., 2019; d’Andrimont, 2018; Paliwal and Jain, 2020 for yield measurement and Hengl et al., 2020
for global soil mapping). The combined use of multiple sources is the most promising avenue for
agricultural data systems to minimize error and maximize coverage. As for climate variables, users
should be aware of the error structures present in modeled estimates when using them as
independent variables in econometric analyses. One key obstacle to using Earth Observation data
in conjunction with spatially explicit survey data is that of overcoming confidentiality concerns.
For some years now, the United States Department of Agriculture has been aware of the lack of

                                                  62
precise spatial information as a major weakness of their flagship ARMS survey, limiting its value
for a range of applications. Other international survey programs such as the Demographic and
Health Survey (DHS) and the Living Standards Measurement Study (LSMS) adopt protocols to
publicly disseminate ‘masked’ coordinates while preserving anonymity. However, researchers and
the global statistical community are still searching for dissemination standards that can maximize
the value of spatially explicit data for analytical applications while also preserving anonymity
(Croft et al., 2021).


        Crowdsourced and citizen-generated data

An innovative source of data that is likely to be increasingly used for research in the coming years
is citizen-generated data (Lämmerhirt et al., 2018). This includes data generated via
crowdsourcing, that is, by enlisting a large ‘crowd’ of individuals (volunteers or for pay) or
devices (e.g. sensors) to collect and share data. In the cognitive science discipline, one third to one
half of the scientific papers in top tier journals are now based on crowdsourced data sets (Stewart
et al., 2017). However, at the time of this writing, crowdsourced data are still relatively underused
in agricultural economics and are more often employed for operational purposes rather than
academic work. In economic research more broadly, the disciplines most likely to use such data
are those more amenable to the wholesale enlisting of respondents through dedicated platforms,
such as labor market or consumer research. Citizen-generated data are already contributing or
demonstrating the potential to contribute to advancing the global data agenda (Fraisl et al., 2020).
Their supply and use can be expected to expand rapidly in the coming years, but this will require
solutions to overcome issues around quality control and validation (Balázs et al., 2021; Wiggins
et al., 2021.)


In the agriculture and food domain, crowdsourced data are more common in price data collection
efforts, where agents or volunteers can be recruited to survey markets (UN Global Pulse, 2015;
Zeug et al., 2017; Ochieng and Baulch, 2020). They are also used for obtaining climate data, such
as rainfall, which is less correlated in space (Minet et al., 2017) and can be crowdsourced by
connecting micro rain gauges to the internet (Van de Giesen et al., 2014). Another option is soil
data collection, which can be crowdsourced to farmers using smartphone apps to collect soil profile


                                                  63
information (Herrick et al., 2013). One study crowdsourced the visual interpretations of satellite
imagery from popular mapping applications to estimate the global distribution of field size (Lesiv
et al., 2019). In a review article, Ebitu et al. (2021) identify data collection as the main current
thrust for citizen science in agriculture, with key challenges including validation procedures, but
primarily the recruitment, motivation, and retention of volunteers.


Citizen-generated data are attractive due to their potential to return data at high levels of spatial
and temporal resolution with relatively limited costs. However, these data present significant
limitations in their representativeness and the quality of the data generation process that must be
understood and managed for statistical inference. Based on a review of survey data, Wiggins et al.
(2011) propose a quality assurance framework for citizen science data organized along two
categories of sources of errors (which may derive from participants or field protocols) and three
entry points in the data production process. While recognizing the huge potential of citizen science
data for agriculture and beyond, it is clear that before it can become mainstreamed in data
production, more effort must go into ensuring that data collected through “volunteers” with
varying levels of expertise and commitment are of acceptable quality (Bonter and Cooper, 2012).


Mehrabi et al. (2021) warn of an emerging global divide in data-driven farming, linked to the
differential access to mobile data technologies for low-resourced farmers, particularly in Africa,
as a result of a combination of differential ownership of mobile devices, poorer data connection,
and connectivity costs. However, the rapid increase in both mobile phone ownership and phone
coverage in most countries bodes well for a more widespread adoption of phone data collection.
In the cognitive science literature, where crowdsourced data are mainly generated via the Amazon
Mechanical Turk platform, concerns have arisen on the professionalization of the individuals
contributing the data, with many of them sharing information on internet fora in ways that pose
concerns for the independence of the observations (Stewart et al., 2017). Statistics Canada is one
of a few statistical offices that have actively published data generated through crowdsourcing, for
public policy applications ranging from urban planning to gauging the price of marijuana on the
illegal market ahead of its legalization. Tellingly, such data are not accompanied by indications
regarding their accuracy (including bias and coverage) that accompany other published statistics
(Statistics Canada, 2021).

                                                 64
Methodologies for validating and correcting crowdsourced data through post-stratification efforts
(Arbia et al., 2020) or other efforts to assess and improve the bias and variability of the estimates
are now starting to emerge (Buil-Gil et al., 2020). With their further development, crowdsourced
data will surely become an increasingly important source of data for agricultural economics
applications.


Phone surveys

Phone surveys have been around for decades and are in fact part and parcel of the survey data
collection in several high-income countries (NRC, 2008; Slavec and Toninelli, 2015). In low-
income countries, phone surveys were for some time confined predominantly to the collection of
data in conflict- or disaster-affected areas where ground operations are more constrained
(Hoogeveen and Pape, 2020), or in urban areas where phone ownership and coverage is higher.
However, their adoption has quickly become ubiquitous with the onset of the COVID-19 pandemic
in 2020, as statistical offices and practitioners increasingly recognize how phone surveys can
become an integral part of a modernized survey system beyond the contingency of the pandemic
response period (Glazerman et al., 2020; Young Lives 2020; Josephson et al., 2021).


There are specific coverage concerns for phone surveys linked to the extent and patterns of
(mobile) phone penetration, which can be expected to be correlated with variables of interest. Such
concerns are far more severe in low-income countries, where phone penetration has been
increasing but is still far from universal, and specifically in rural areas, where agricultural
economists often focus their research interests (Dillon, 2012; Ballivian et al., 2015; Leo et al.,
2015; Lamanna et al., 2019; Mehredi et al., 2021; GSMA, 2020; Dabalen et al., 2016).


During the COVID-19 pandemic, phone surveys allowed for the possibility to contact respondents
amid widespread travel and social distancing restrictions, without exposing them or the
enumerators to a health risk. Phone surveys can also generate much more frequent data relative to
face-to-face interviews, due to their reduced cost and simplified logistics (e.g., not requiring
travel). This can limit survey error for variables that are more prone to recall error (such as


                                                 65
agricultural labor (Arthi et al., 2018) or continuous crop production estimates (Kilic et al., 2021),
as well as increase the temporal dimension of data collection for outcomes that have low
autocorrelation (McKenzie, 2011), or where short-term overtime changes are of the essence, as is
the case for the study of resilience (Knippenberg et al., 2019).


Concerns remain for the representativeness and coverage of phone surveys, not only for specific
households that may be less likely to have access to a phone connection, but also for individuals
who are less likely to be phone owners or are otherwise less represented in phone survey samples
(Leo et al., 2015; Brubaker et al., 2021). Such issues can be mitigated when the phone survey
sampling frame is based on an adequate set of information on observable household and individual
characteristics, as is the case when the phone survey is tied to a recent representative face-to-face
survey that collected respondent phone numbers (Ambel et al., 2021). While the sample size of
phone surveys using this approach is limited by the sample size of the existing representative
survey, phone surveys that use a sampling strategy based on sampling numbers from a list or via
Random Digit Dialing (RDD) usually lack sociodemographic information associated with each
phone number, making it harder to assess and improve their representativeness (Henderson and
Rosenbaum, 2020; Himelein et al., 2020).


Other limitations of phone surveys are related to the type of information that can be asked over the
phone, both because of content that respondents may not feel comfortable sharing over the phone,
as well as the overall interview length (Abay et al. 2021). Even so, recent experience has
demonstrated the value of collecting information over the phone on issues related to agriculture
and food security (Amankwah and Gourlay, 2020; Hirvonen et al., 2021), charting the way for a
survey research and implementation agenda to leverage the integration of high-frequency data
collection via phones and other mobile technology with traditional face-to-face surveys. Such a
mixed-mode approach can carry the added advantage of freeing up space in face-to-face surveys
from items that can be collected via remote data collection to generate data that are characterized
by both reduced survey error and higher temporal resolution. Mixed-mode models can also be
instrumental for achieving the temporal resolution needed for many indicators, as well as for
providing a low-cost platform to collect more accurate data on high-frequency, repeated
occurrences, such as labor allocation in agriculture and other time use data. This is a likely

                                                 66
direction for investment in the survey research agenda in the coming years, where the involvement
of agricultural economists in influencing the structure and features of the resulting data will be
paramount.


Panel data

Understanding agriculture and the fast transformation processes ongoing in all countries at
different stages of development requires panel data. Partly in response to this renewed awareness,
we have recently witnessed a surge in the availability of panel data related to agriculture and rural
development in low- and middle-income countries. While for decades, the ICRISAT village study
(Walker and Ryan, 1990) was one of the few longitudinal data sets allowing research on
agricultural and rural livelihoods, over the past two decades the availability of such data sets has
increased substantially, even if they remain limited in numbers and geographic coverage.
Examples include the panel data set for Ghana collected by researchers at Yale and the Institute of
Statistical, Social, and Economic Research in Ghana, the panel data sets collected by statistical
offices in eight Sub-Saharan African countries under the World Bank’s LSMS-ISA program, the
National Income Dynamics Study (NIDS) in South Africa, the Kagera Health and Development
Survey in Tanzania, the Family Life Surveys in Indonesia and Mexico, and the panel data collected
by IFPRI in several countries in Asia and Africa, by the Tegemeo Institute in Kenya, and by
Michigan State in Zambia, among others. These surveys have generated an invaluable wealth of
research and contributed to answering key policy questions that cross-sectional data have been
unable to convincingly address.


We have discussed above (see section 3.5) several actions that can be taken to manage attrition in
panel data, whether ex-ante by improving the design and implementation of tracking protocols, or
ex-post. The availability and penetration of mobile phones and the growing adoption of CAPI have
been important innovations that have enabled implementing and improving the tracking outcomes
for longitudinal surveys in low-income countries. Collecting as many contact numbers as feasible
at baseline greatly improves the likelihood of being able to recontact households that move
between survey waves, and has also played a fundamental role in allowing the longitudinal
tracking of households for phone surveys during the COVID-19 pandemic (Glazerman et al., 2020;


                                                 67
Gourlay et al., 2021). Georeferencing households is an additional technology-based solution that
can help relocate the site of the dwelling in areas where these are not otherwise clearly marked or
identifiable (Witoelar, 2011).


Additionally, the technology embedded in CAPI applications is providing new approaches for
survey designers and implementers to understand and manage attrition (Kreuter, 2013) as well as
to improve data quality through better remote supervision. Specifically, the paradata produced
during CAPI interviews enables the understanding of certain features that predict attrition as they
materialize during the course of the interview, including enumerator effects. These paradata can
inform actions to minimize attrition and monitor individuals at higher risk of dropping out of the
sample, thus countering the predominant coverage issue for longitudinal data (Mercer, 2012;
Roßmann and Gummer, 2016).


Finally, following the onset of the COVID-19 pandemic, the availability of well-established long-
term longitudinal studies put countries at an advantage for rapidly shifting to high-frequency phone
surveys to monitor the impact of the pandemic. This served to fill critical data demands, while also
reducing the potential coverage biases of phone surveys by providing better sampling frames and
a wealth of information for the ex-post mitigation of bias.



   6.      Conclusions


Agricultural data continue to suffer from lack of availability, poor quality, and incomplete
coverage. However, in recent years, increasing data demands and emerging policy questions such
as climate change and demographic trends, among others, have driven innovation in the sector,
with rapid technological change and methodological advances providing an opportunity to collect
more and better data at lower cost. In the past two decades, technology has expanded the data
production frontier to generate more accurate, granular, and frequent data within shrinking budget
envelopes. These innovations have been accompanied by greater attention to issues of
measurement error and coverage, focused on ways to attenuate trade-offs and achieve both high



                                                68
accuracy and high representativeness to the greatest extent possible, and by greater rigor in testing
the validity of changes in methods via randomized validation exercises.


This paper is a testament to the increased importance of data and data quality issues within the
agricultural economics profession. Researchers hold the power and responsibility to make wiser
design choices throughout the data production process. However, reaching the full potential of
improvements in data structures for producing policy-relevant empirical analysis may require
changes in researchers’ incentives and priorities to generate knowledge that is accurate, relevant
and credible. For instance, a recent evidence synthesis paper exposes a striking disconnect between
empirical agricultural and social science research and policy questions (Porciello et al., 2020).


Throughout the paper, we have highlighted the importance of improving agricultural data
structures for empirical analysis, while accounting for the inherent trade-offs involved in
designing data collection for agricultural research and policy. Measurement error creates both
internal and external validity issues that limit causal inference and descriptive understanding of
national agricultural systems. Coverage biases also create internal and external validity issues,
particularly when limited coverage biases the testing of underlying mechanisms that drive
agricultural choices.


While surveys remain the linchpin of agricultural policy analysis, other traditional data sources
such as administrative data and agricultural censuses, as well as new data like Earth Observation
and remote sensing data, play equally important roles in improving the coverage of agricultural
data in its many domains. Additionally, alternative data sources such as citizen-generated data
and methods such as machine learning, while not yet mainstreamed in agricultural data
production, offer tremendous opportunities for the future. To achieve their potential, these newer
data sources require fully developed quality assurance frameworks to address multiple sources of
errors and biases, just as traditional ones do.


As data users become more integrated into data system design, data systems can be better designed
for empirical research and policy to minimize measurement error. As emphasized by many authors,
non-classical measurement error and its effects vary by sample and are not necessarily adequately

                                                  69
treated and corrected using ex-post econometric tools. Nonetheless, trade-offs are inevitable, as
increased coverage can lead to measurement error and internal validity concerns, while low
coverage reduces policy relevance and the external validity of parameter estimates.


To promote more systemic learning, validation studies and experimentation must be carried out
more systematically within or in parallel to other data collection efforts, and lessons learned from
the existing vast body of research in the impact evaluation literature must be streamlined and
systematized to offer guidelines on best practices for researchers. Specifically, we propose
bridging the gap between the impact evaluation literature and observational studies by
methodically incorporating survey experiments to validate new methods and types of data
collection. The empirical standard in many validation studies is to use a “gold standard” as
numeraire, although such “gold standard” metrics are also likely to be measured with error. As a
result, many of the available validation studies tend to measure error relative to a standard deemed
“closer to the truth”. While technology presents an opportunity to benchmark agricultural measures
and generate more objective benchmarks for validation purposes (e.g., DNA fingerprinting to
measure improved seed variety), these processes are often considered too costly to be conducted
at scale. However, the rapidly decreasing costs and diffusion of new technologies bode well for
the future. Furthermore, future survey experiments need to expand the set of econometric
techniques to identify unbiased effects of survey design choices beyond pairwise comparisons
(Dillon et al., 2019), as has been the case in labor economics, where even the ‘gold standard’ of
United States administrative data has been challenged (Abowd and Stinson, 2013).


Fostering greater integration and interoperability across data sources would also allow for more
opportunities for minimizing measurement error while maximizing spatial and temporal coverage.
As shown, sample surveys have been used to ground-truth remote sensing imagery for the
estimation of crop productivity and other agricultural metrics from space. These experiments are
examples of how reducing measurement error and improving coverage can simultaneously be
achieved through better data interoperability. This is best done when proper design choices are
made ex-ante, so as to also minimize the measurement errors of the ground data. Achieving greater
reliability of remote sensing data could radically improve the geographic granularity, timeliness,
and frequency of agricultural estimates, while also potentially constraining costs. Attaining such a

                                                70
goal will require the better coordination and acceleration of research efforts, including the
production of multi-purpose ground layers of high-quality measurements.


Maximizing coverage of agricultural data also requires improving other traditional sources such
as routine data systems and agricultural censuses. The weak data quality of both sources, as well
as the low periodicity and predictability of agricultural censuses, particularly in lower-income
countries and regions, remain matters of concern. With regards to administrative data,
underfunding and the persistent neglect of extension services in past decades are responsible for
the current unenviable state of affairs. Digitalization and the adoption of technological solutions
can accelerate progress in this area. Furthermore, linking administrative data to newer data sources
such as crowdsourced data or high-frequency community surveys through sentinel sites could go
a long way towards enhancing the statistical rigor of administrative data. Rethinking administrative
data collection and its interoperability with other data sources, while also ensuring better access,
should be prioritized to contribute to minimizing error and maximizing coverage of agricultural
data. The trend towards greater reliance on administrative data is well advanced in more developed
economies, with low- and middle-income countries lagging behind.


New data sources and modes of data collection such as phone or web surveys as well as
crowdsourcing and other forms of citizen-generated data offer tremendous potential to improve
the availability and frequency of agricultural data. However, to fully exploit these opportunities,
better methods are needed to account for likely biases due to selectivity and under-coverage. It is
also important to raise awareness, particularly among young researchers, of the pitfalls of ignoring
these potential errors and to build their capacity in addressing them, both at the design and
analytical stages.


Finally, relying on direct measurements in contrast to the more common practice of asking farmers
to self-report, often based on long recalls, has become steadily more feasible due to the declining
cost of technology. Nonetheless, cost considerations remain an issue in implementing such
methods on a full sample or, in the case of agricultural censuses, on the entire population of
concern. Using direct measurement on a sub-sample of households, combined with the rigorous
use of imputation methods, may be a more viable way to improve the accuracy of agricultural data.

                                                71
Bibliography

Abate, G., Brauw, A., Gibson, J., Hirvonen, K., & Wolle, A. (2020). Telescoping Causes
        Overstatement in Recalled Food Consumption: Evidence from a Survey Experiment in
        Ethiopia. IFPRI Discussion Paper 1976. Tech. rep., Washington, DC: International Food
        Policy Research Institute (IFPRI).
Abay, K. (2020). Measurement Errors in Agricultural Data and their Implications on Marginal
        Returns to Modern Agricultural Inputs. Agricultural Economics, 51.
        doi:10.1111/agec.12557
Abay, K. A., Abate, G. T., Barrett, C. B., & Bernard, T. (2019). Correlated non-classical
        measurement errors, ‘Second best’ policy inference, and the inverse size-productivity
        relationship in agriculture. Journal of Development Economics, 139, 171-184.
        doi:10.1016/j.jdeveco.2019.03
Abay, K. A., Berhane, G., Hoddinott, J., & Tafere, K. (2021). Assessing Response Fatigue in
        Phone Surveys: Experimental Evidence on Dietary Diversity in Ethiopia. Policy Research
        Working Paper. No. 9636. World Bank, Washington, DC.
Abay, K. A., Bevis, L., & Barrett, C. B. (2020). Measurement Error Mechanisms Matter:
        Agricultural Intensification with Farmer Misperceptions and Misreporting. American
        Journal of Agricultural Economics 103 (2).
Aceves-Bueno, E., Adeleye, A., Feraud, M., Huang, Y., Tao, M., Yang, Y., & Anderson, S.
        (2017). The Accuracy of Citizen Science Data: A Quantitative Review. Bulletin of the
        Ecological Society of America, 98, 278-290. doi:10.1002/bes2.1336
Akogun, O. B., Dillon, A. S., Friedman, J., Prasann, A., & Serneels, P. M. (2020). Productivity
        and Health: Physical Activity as a Measure of Effort. The World Bank Economic Review.
Alcser, K., Clemens, J., Holland, L., Guyer, H., & Hu, M. (2016). Interviewer recruitment,
        selection, and training. Guidelines for Best Practice in Cross-Cultural Surveys, 419–468.
Ali, D. A., & Deininger, K. (2014). Is there a farm-size productivity relationship in African
        agriculture ? evidence from Rwanda. Tech. rep., The World Bank.
Ali, D. , K. Deininger and A. Harris (2019). Does Large Farm Establishment Create Benefits for
        Neighboring Smallholders? Evidence from Ethiopia. Land Economics. 95 (1)
Alkire, S., Meinzen-Dick, R., Peterman, A., Quisumbing, A., Seymour, G., & Vaz, A. (2013).
        The women’s empowerment in agriculture index. World development, 52, 71–91.
Amankwah, A. & Gourlay, S. (2021). Impact of COVID-19 Crisis on Agriculture : Evidence
        from Five Sub-Saharan African Countries (English). LSMS Integrated Surveys on
        Agriculture Washington, D.C.: World Bank Group.
Amaya, A., Bach, R., Keusch, F., & Kreuter, F. (2019). New Data Sources in Social Science
        Research: Things to Know Before Working With Reddit Data. Social Science Computer
        Review, 089443931989330. doi:10.1177/0894439319893305
Amaya, A., Biemer, P., & Kinyon, D. (2020). Total Error in a Big Data World: Adapting the
        TSE Framework to Big Data. Journal of Survey Statistics and Methodology, 8, 89-119.
        doi:10.1093/jssam/smz056
Ambel, A., K. Mcgee and A. Tsegay (2021). Reducing Bias in Phone Survey Samples:
        Effectiveness of Reweighting Techniques Using Face-to-Face Surveys as Frames in Four
        African Countries. Policy Research Working Paper no. 9676. The World Bank.
        Washington, DC.


                                               72
Ambler, K., Herskowitz, S., & Maredia, M. (2020). Are we done yet? Response fatigue and rural
        livelihoods (Vol. 1980). Intl Food Policy Res Inst.
Angrist, J. and J.S. Pischke (2010). The Credibility revolution in Empirical Economics. How
        Better Research Design is Taking the Con out of Econometrics. Journal of Economic
        Perspectives. 24(2)
Arbia, G., Solano-Hermosilla, G., Micale, F., Nardelli, V., & Genovese, G. (2020). Post-
        sampling crowdsourced data to allow reliable statistical inference: the case of food price
        indices in Nigeria. Post-sampling crowdsourced data to allow reliable statistical
        inference: the case of food price indices in Nigeria.
Arthi, V., Beegle, K., De Weerdt, J., & Palacios Lopez, A. (2017). Not your average job:
        Measuring farm labor in Tanzania. Journal of Development Economics, 130.
        doi:10.1016/j.jdeveco.2017.10.005
Ashenfelter, O. and A. Krueger (1994). Estimates of the Economic Return to Schooling from a
        Sample of Twins. American Economic Review. 84(5)
Ashour, M., Gilligan, D., Hoel, J., & Karachiwalla, N. (2018). Do Beliefs About Herbicide
        Quality Correspond with Actual Quality in Local Markets? Evidence from Uganda. The
        Journal of Development Studies, 55, 1-22. doi:10.1080/00220388.2018.1464143
Auffhammer, M., Hsiang, S., & Schlenker, W. (2013). Using weather data and climate model
        output in economic analyses of climate change. Review of Environmental Economics and
        Policy, 6.
Bachewe, F. N., Minten, B., Taffesse, A. S., Pauw, K., Cameron, A., & Endaylalu, T. G. (2020).
        Farmers’ grain storage and losses in Ethiopia: Measures and associates. Journal of
        Agricultural and Food Industrial Organization, 18. doi:10.1515/jafio-2019-0059
Bailey, D., Trotter, M., Knight, C., & Thomas, M. (2018). Use of GPS tracking collars and
        accelerometers for rangeland livestock production research. Translational Animal
        Science, 2. doi:10.1093/tas/txx006
Balázs B., Mooney P., Nováková E., Bastin L., Jokar Arsanjani J. (2021) Data Quality in Citizen
        Science. In: Vohland K. et al. (eds) The Science of Citizen Science. Springer, Cham.
        https://doi.org/10.1007/978-3-030-58278-4_8
Bakker, B. F., Van Rooijen, J., & Van Toor, L. (2014). The system of social statistical datasets
        of Statistics Netherlands: an integral approach to the production of register-based social
        statistics. Statistical journal of the United Nations ECE, 30, 411–424. doi:10.3233/SJI-
        140803
Ballivian, A., Azevedo, J. P., Durbin, W., & Bank, W. (2015). Using Mobile Phones for High-
        Frequency Data Collection. In Mobile Research Methods: Opportunities and Challenges
        of Mobile Research Methodologies, 21–39.
Bardasi, E., Sabarwal, S., & Terrell, K. (2011). How do female entrepreneurs perform? Evidence
        from three developing regions. Small Business Economics, 37, 417-441.
        doi:10.1007/s11187-011-9374-z
Bardhan, P., & Udry, C. (1999). Development microeconomics. OUP Oxford.
Barrett, C. B., Gebru, G., McPeak, J. G., Mude, A. G., Vanderpuye-Orgle, J., & Yirbecho, A. T.
        (2008). Codebook for data collected under the improving pastoral risk management on
        East African Rangelands (PARIMA) Project. Unpublished, Cornell University.
Barrett, C., T. Reardon, J. Swinnen and D. Zilberman (forthcoming). Agro-Food Vakue Chain
        Revolutions in Low- and Middle-Income Countries. Journal of Economic Literature.



                                               73
Baulch, B., Ochieng, D. O., & others. (2020). Most Malawian maize and soybean farmers sell
        below official minimum farmgate prices. Tech. rep., International Food Policy Research
        Institute (IFPRI).
Beaman, L., & Dillon, A. (2012). Do household definitions matter in survey design? Results
        from a randomized survey experiment in Mali. Journal of Development Economics, 98,
        124-135. doi:10.1016/j.jdeveco.2011.06
Beaman, L., Benyishay, A., Magruder, J., & Mobarak, A. (2018). Can Network Theory-Based
        Targeting Increase Technology Adoption? SSRN Electronic Journal.
        doi:10.2139/ssrn.3225815
Beegle, K., De Weerdt, J., & Dercon, S. (2011). Migration and economic mobility in Tanzania:
        Evidence from a tracking survey. Review of Economics and Statistics, 93, 1010–1033.
Beegle, K., Himelein, K., & Ravallion, M. (2012). Frame-of-reference bias in subjective welfare.
        Journal of Economic Behavior & Organization - J ECON BEHAV ORGAN, 81.
        doi:10.1016/j.jebo.2011.07.020
Beegle, K., Olinto, P., Sobrado, C., & Uematsu, H. (2013). The State of the Poor: Where Are
        The Poor, Where Is Extreme Poverty Harder to End, and What Is the Current Profile of
        the World’s Poor? World Bank Economic Premise.
Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on
        reinforcement learning. International Conference on Machine Learning, (pp. 449–458).
Belli, Robert F., et al. "Calendar and question-list survey methods: Association between
        interviewer behaviors and data quality." Journal of Official Statistics 20.2 (2004): 185.
Benami, E., Jin, Z., Carter, M. R., Ghosh, A., Hijmans, R. J., Hobbs, A., . . . Lobell, D. B.
        (2021). Uniting remote sensing, crop modelling and economics for agricultural risk
        management. Nature Reviews Earth & Environment, 1–20.
Berazneva, J., McBride, L., Sheahan, M., & Güereña, D. (2018). Empirical assessment of
        subjective and objective soil fertility metrics in east Africa: Implications for researchers
        and policy makers. World Development, 105, 367-382.
        doi:https://doi.org/10.1016/j.worlddev.2017.12.009
Beullens, K., & Loosveldt, G. (2016). Interviewer Effects in the European Social Survey. In
        Survey Research Methods (Vol. 10, No. 2, Pp. 103-118). European Survey Research
        Association.
Bevis, L., & Barrett, C. (2019). Close to the edge: High productivity at plot peripheries and the
        inverse size-productivity relationship. Journal of Development Economics, 143, 102377.
        doi:10.1016/j.jdeveco.2019.102377
Biagas, D., E. Abayomi, J. Rodhouse and H. Ridolfi (2019). Examining Interviewer Effects on
        the Agricultural Labor Survey: A Mixed-Metheds Approach. 2019 Workshop:
        Interviewers and Their Effects from a Total Survey Error Perspective. 34.
        http://digitalcommons.unl.edu/sociw/34
Biemer, P. (2009). Chapter 12 - Measurement Errors in Sample Surveys. In C. R. Rao (Ed.),
        Handbook of Statistics (Vol. 29, pp. 281-315). Elsevier.
        doi:https://doi.org/10.1016/S0169-7161(08)00012-6
Biemer, P. (2010). Total Survey Error: Design, Implementation, and Evaluation. The Public
        Opinion Quarterly, 74, 817-848. doi:10.2307/40985407
Biemer, P. (2017). Errors and inference. Big data and social science: A practical guide to
        methods and tools, 265–298.



                                                74
Biemer, P. P., Groves, R. M., Lyberg, L. E., Mathiowetz, N. A., & Sudman, S. (1991).
        Measurement errors in surveys. John Wiley & Sons.
Biemer, P., & Lyberg, L. (2003). Introduction to Survey Quality (Vol. 335).
        doi:10.1002/0471458740
Blaydes, Lisa, and Rachel M. Gillum. "Religiosity-of-interviewer effects: Assessing the impact
of veiled enumerators on survey response in Egypt." Politics & Religion 6.3 (2013): 459-482.
Bold, T., Kaizzi, K., Svensson, J., & Yanagizawa-Drott, D. (2017). Lemon Technologies and
        Adoption: Measurement, Theory and Evidence from Agricultural Markets in Uganda*.
        Quarterly Journal of Economics, 132. doi:10.1093/qje/qjx009
Bonter, D.N. and C.B. Cooper (2012). Data Validation in Citizen Science: a Casa Study from
        Project FeederWatch. Frontiers in Ecology and the Environment. 10 (6).
Bound, J., Brown, C., & Mathiowetz, N. (2001). Chapter 59 - Measurement Error in Survey
        Data. In Handbook of Econometrics, Volume 5, J. J. Heckman, & E. Leamer (Eds.).
        Elsevier. doi:https://doi.org/10.1016/S1573-4412(01)05012-7
Brubaker, J.M., T. Kilic and P. Wollburg (2021). Reopresentativeness of Individual-Level Data
        in Covid-19 Phone Surveys: Findings from Sub-Saharan Africa. Policy Research
        Working Paper 9660. World Bank, Washington, DC.
Buil-Gil, D., Solymosi, R., & Moretti, A. (2020). Nonparametric Bootstrap and Small Area
        Estimation to Mitigate Bias in Crowdsourced Data. In T. D. C.A. Hill, & L. E. Lyberg
        (Eds.), Big Data Meets Survey Science (pp. 487-517). John Wiley & Sons, Ltd.
        doi:https://doi.org/10.1002/9781118976357.ch16
Burke, W., Frossard, E., Kabwe, S., & Jayne, T. (2019). Understanding fertilizer adoption and
        effectiveness on maize in Zambia. Food Policy. doi:10.1016/j.foodpol.2019.05.004
Caeyers, B., Chalmers, N., & De Weerdt, J. (2010). A comparison of CAPI and PAPI through a
        randomized field experiment. SSRN Electronic Journal. doi:10.2139/ssrn.1756224
Cannell, C. F., Marquis, K. H., & Laurent, A. (1976). A summary of studies of interviewing
        methodology (Vol. 77). Department of Health, Education, and Welfare, Public Health
        Service, Health ….
Carfagna, E., & Gallego, F. J. (2005). Using Remote Sensing for Agricultural Statistics.
        International Statistical Review, 73, 389-404. doi:https://doi.org/10.1111/j.1751-
        5823.2005.tb00155.x
Carletto, C., Aynekulu, E., Gourlay, S., & Shepherd, K. (2017a). Collecting the dirt on soils:
        advancements in plot-level soil testing and implications for agricultural statistics. The
        World Bank.
Carletto, C., Corral, P., & Guelfi, A. (2017b). Agricultural commercialization and nutrition
        revisited: Empirical evidence from three African countries. Food Policy, 67, 106-118.
        doi:https://doi.org/10.1016/j.foodpol.2016.09.020
Carletto, C., Deininger, K., Savastano, S., & Muwonge, J. (2012). Using diaries to improve crop
        production statistics: Evidence from Uganda. Journal of Development Economics, 98(1),
        42-50.
Carletto, C., Gourlay, S., Murray, S., & Zezza, A. (2017c). Cheaper, faster, and more than good
        enough: Is GPS the new gold standard in land area measurement? Survey Research
        Methods, 11, pp. 235–265.
Carletto, G., Gourlay, S., Murray, S., & Zezza, A. (2016). Land Area Measurement in Household
        Surveys: A Guidebook. Tech. rep., Washington DC: World Bank.



                                               75
Carletto, G., Ruel, M., Winters, P., & Zezza, A. (2015). Farm-Level Pathways to Improved
        Nutritional Status: Introduction to the Special Issue. The Journal of Development Studies,
        51, 945-957. doi:10.1080/00220388.2015.1018908
Carletto, G., Zezza, A., & Banerjee, R. (2013). Towards Better Measurement Of Household
        Food Security: Harmonizing Indicators And The Role Of Household Surveys. Global
        Food Security, 2, 30–40. doi:10.1016/j.gfs.2012.11.006
Carroll, R., Ruppert, D., Stefanski, L., & Crainiceanu, C. (2006). Measurement Error in
        Nonlinear Models: A Modern Perspective. Boca Raton: Chapman & Hall.
Carter, R., Lau, M., Johnson, V., & Kirkinis, K. (2017). Racial Discrimination and Health
        Outcomes Among Racial/Ethnic Minorities: A Meta-Analytic Review. Journal of
        Multicultural Counseling and Development, 45, 232-259. doi:10.1002/jmcd.12076
Chambers, R. G. (1988). Applied production economics: a dual approach.
Chambers, R. G., & Quiggin, J. (2000). Uncertainty, production, choice, and agency: the state-
        contingent approach. Cambridge University Press.
Chamoso, P., Raveane, W., Parra, V., & González, A. (2014). UAVs Applied to the Counting
        and Monitoring of Animals. Advances in Intelligent Systems and Computing, 291.
        doi:10.1007/978-3-319-07596-9_8
Chesher, A., & Schluter, C. (2002). Welfare Measurement and Measurement Error. The Review
        of Economic Studies, 69, 357-378.
Cohen, A. (2019). Estimating Farm Production Parameters with Measurement Error in Land
        Area. Economic Development and Cultural Change, 68, 305-334. doi:10.1086/700557
Dabalen, A., Etang, A., Hoogeveen, J., Mushi, E., Schipper, Y., & Engelhardt, J. von. (2016).
        Mobile Phone Panel Surveys in Developing Countries: A Practical Guide for Microdata
        Collection. The World Bank.
Da Re, D., Gilbert, M., Chaiban, C., Bourguignon, P., Thanapongtharm, W., Robinson, T., &
        Vanwambeke, S. (2020). Downscaling livestock census data using multivariate predictive
        models: Sensitivity to modifiable areal unit problem. PLOS ONE, 15, e0221070.
        doi:10.1371/journal.pone.0221070
Das, J., J. Hammer and C. Sanchez-Paramo (2012). The Impact of Recall Periods on reported
        Morbidity and Health Seeking Behavior. Journal of Development Economics. 98
Das, N., Davies, E., Dillon, A., Glazerman, S., & Rosenbaum, M. (2021). Optimal Timing for
        Random Digit Dialing. Global Poverty Research Lab Working Paper No. 21-107.
Datashift. (2015). What is Citizen-Generated Data And What Is The DataShift Doing To
        Promote It?
Davis, D. W., Silver, B. D. (2003). “Stereotype Threat and Race of Interviewer Effects in a
        Survey on Political Knowledge,” American Journal of Political Science, 47, 33–45.
Davis, R. E., Couper, M. P., Janz, N. K., Caldwell, C. H., Resnicow, K. (2010). “Interviewer
        Effects in Public Health Surveys,” Health Education Research, 25, 14–26.
De Haan, W., Van Berkel, S., Van Der Asdonk, S., Finkenauer, C., Forder, C., Van Ijzendoorn,
        M., . . . Alink, L. (2019). Out-of-home placement decisions: How individual
        characteristics of professionals are reflected in deciding about child protection cases.
        Developmental Child Welfare, 1, 251610321988797. doi:10.1177/2516103219887974
De Leeuw, E. (2004). To Mix or Not to Mix Data Collection Modes in Surveys. J Off Stat, 21.
De Leeuw, E. D., & Van der Zouwen, J. (1988). Data quality in telephone and face to face
        surveys: a comparative meta-analysis. Telephone survey methodology.



                                               76
De Mel, S., McKenzie, D., & Woodruff, C. (2008). Returns to Capital in Microenterprises:
        Evidence from a Field Experiment. The Quarterly Journal of Economics, 123, 1329-
        1372. doi:10.1162/qjec.2008.123.4.1329
De Nicola, F and X. Giné (2014).How accurate are Recall Data? Evidence from Coastal India.
        Journal of Development Economics, 106 (1).
De Weerdt, J. K. Beegle; J. Friedman and J. Gibson (2016). The Challenges of Measuring
        Hunger through Survey. Economic Development and Cultural Change. 64(4).
De Weerdt, J., J. Gibson and K. Beegle (2020). What Can We Learn from Experimenting with
        Survey Methods? Annual Review of Resource Economics. 12.
Deaton, A., & Zaidi, S. (2002). Guidelines for constructing consumption aggregates for welfare
        analysis (Vol. 135). World Bank.
Deininger, K., Byerlee, D., Lindsay, J., Norton, A., Selod, H., & Stickler, M. (2011). Rising
        Global Interest in Farmland: Can it Yield Sustainable and Equitable Benefits? The
        World Bank.
Dell, M., Jones, B. F., & Olken, B. A. (2014). What Do We Learn from the Weather? The New
        Climate–Economy Literature. Journal of Economic Literature, 52, 740–798.
Deming, E. W. (2006). On Errors in Surveys (An Excerpt). The American Statistician, 60, 34-38.
        doi:10.1198/000313006X91755
Deming, W. E. (1944). On errors in surveys. American Sociological Review, 9, 359–369.
Desiere, S., & Jolliffe, D. (2018). Land productivity and plot size: Is measurement error driving
        the inverse relationship? Journal of Development Economics, 130, 84-98.
        doi:https://doi.org/10.1016/j.jdeveco.2017.10.002
Di Falco, S., Veronesi, M., & Yesuf, M. (2011). Does Adaptation to Climate Change Provide
        Food Security? A Micro-Perspective from Ethiopia. American Journal of Agricultural
        Economics, 93, 825-842. doi:10.1093/ajae/aar006
Di Maio, M., & Fiala, N. (2019). Be Wary of Those Who Ask: A Randomized Experiment on
        the Size and Determinants of the Enumerator Effect. The World Bank Economic Review,
        34, 654-669. doi:10.1093/wber/lhy024
Diego-Rosell, P., Nichols, S., Srinivasan, R., & Dilday, B. (2020). Assessing Community
        Wellbeing Using Google Street-View and Satellite Imagery. In T. D. C.A. Hill, & L. E.
        Lyberg (Eds.), Big Data Meets Survey Science (pp. 435-486). John Wiley & Sons, Ltd.
        doi:https://doi.org/10.1002/9781118976357.ch15
Dillon, A., & Mensah, E. (2020). Respondent Biases in Household Surveys. Global Poverty Lab
        Working Paper.
Dillon, A., & Rao, L. (2018). Land Measurement Bias: Comparisons from Global Positioning
        System, Self-Reports, and Satellite Data. SSRN Electronic Journal.
        doi:10.2139/ssrn.3188522
Dillon, A., Carletto, G., Gourlay, S., Wollburg, P., & Zezza, A. (2021a). Advancements in Data
        Collection Methods for Agricultural Surveys: Lessons from the LSMS-ISA and Beyond.
        Tech. rep., FAO.
Dillon, A., Glazerman, S., & Rosenbaum, M. (2021b). Understanding Response Rates in
        Random Digit Dial Surveys. Global Poverty Research Lab Working Paper No. 21-105.
Dillon, A., Glazerman, S., and Rosenbaum, M. (2021c). Messaging to Improve Response Rates:
        Eﬀectiveness of Pre-Survey SMS Messages. Global Poverty Research Lab Working
        Paper No. 21-106.



                                               77
Dillon, A., Gourlay, S., Mcgee, K., & Oseni, G. (2019). Land Measurement Bias and Its
        Empirical Implications: Evidence from a Validation Exercise. Economic Development
        and Cultural Change, 67. doi:10.1086/698309
Dillon, A., Karlan, D., Udry, C., & Zinman, J. (2020). Good identification, meet good data.
        World Development, 127, 104796. doi:10.1016/j.worlddev.2019.104796
Dillon, B. (2012). Field Report Using Mobile Phones to Collect Panel Data in Developing
        Countries. Journal of International Development, 527(February 2011), 518–527.
        https://doi.org/10.1002/jid
DiNardo, J., McCrary, J., & Sanbonmatsu, L. (2006). Constructive proposals for dealing with
        attrition: An empirical example. NBER working paper, 1–46.
Dinku, T. (2019). Chapter 7 - Challenges with availability and quality of climate data in Africa.
        In A. M. Melesse, W. Abtew, & G. Senay (Eds.), Extreme Hydrology and Climate
        Variability (pp. 71-80). Elsevier. doi:https://doi.org/10.1016/B978-0-12-815998-9.00007-
        5
Dobardzic, S., Dengel, C. G., Gomes, A. M., Hansen, J., Bernardi, M., Fujisawa, M., . . . others.
        (2019). 2019 State of Climate Services: Agriculture and Food Security. World
        Meteorological Organization.
D'Orazio, M. (2020). Finding ways of reconciling the choice of frames and maximizing coverage
        by linking multiple frames has been the subject of recent research under the 50x2030
        Data Smart Agriculture initiative. Tech. rep., FAO.
Doss, B., Roddy, M., Nowlan, K., Rothman, K., & Christensen, A. (2018). Maintenance of Gains
        in Relationship and Individual Functioning Following the Online OurRelationship
        Program. Behavior Therapy, 50. doi:10.1016/j.beth.2018.03.011
Doss, C., & Kieran, C. (2014). Standards for Collecting Sex-Disaggregated Data for Gender
        Analysis; a Guide for CGIAR Researchers.
Doss, C., & Quisumbing, A. (2019). Understanding rural household behavior: Beyond Boserup
        and Becker. Agricultural Economics, 51. doi:10.1111/agec.12540
Doss, C., Kieran, C., & Kilic, T. (2020). Measuring Ownership, Control, and Use of Assets.
        Feminist Economics, 26, 144-168.
Doss, C., Kovarik, C., Peterman, A., Quisumbing, A., & van den Bold, M. (2015). Gender
        inequalities in ownership and control of land in Africa: myth and reality. Agricultural
        Economics, 46, 403-434. doi:https://doi.org/10.1111/agec.12171
Eisenhower, D., Mathiowetz, N. A., & Morganstein, D. (1991). Recall Error: Sources and Bias
        Reduction Techniques. In P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Mathiowetz,
        & S. Sudman (Eds.), Measurement Errors in Surveys (pp. 125-144). John Wiley & Sons,
        Ltd. doi:https://doi.org/10.1002/9781118150382.ch8
Engle-Stone, R., Sununtnasuk, C., & Fiedler, J. L. (2017). Investigating the significance of the
        data collection period of household consumption and expenditures surveys for food and
        nutrition policymaking: Analysis of the 2010 Bangladesh household income and
        expenditure survey. Food Policy, 72, 72-80. doi:10.1016/j.foodpol.2017.08.014
Fafchamps, M. (1993). Sequential Labor Decisions Under Uncertainty: An Estimable Household
        Model of West-African Farmers. Econometrica, 61, 1173–1197.
Falaris, E. (2003). The effect of survey attrition in longitudinal surveys: evidence from Peru,
        Cote d'Ivoire and Vietnam. Journal of Development Economics, 70, 133-157.




                                               78
Falorsi, P.D. and D. Bako (2016). Indirect Sampling, a Way to Overcome the Weakness of the
        Lists in Agricultural Surveys. Seventh International Conference of Agricultural
        Statisticians, Rome, Italy. doi: 10.1481/icasVII.2016.f35e
FAO. (1992). Collecting Data on Livestock (Vol. 4). FAO Statistical Development Series.
FAO. (2002). Land Tenure and Rural Development.
FAO. (2015). World Programme for the Census of Agriculture 2020, Volume 1. Programme,
        concepts and definitions. FAO, Rome, Italy.
FAO (2015). Handbook on Master Sampling Frames for Agricultural Statistics - Frame
        Development, Sampole Design and Estimation. FAO, Rome, Italy.
FAO. (2017). Global database of GHG emissions related to feed crops: Methodology. Version 1.
        Livestock Environmental Assessment and Perfomance Partnership. FAO, Rome, Italy.
FAO. (2019). The state of Food and Agriculture 2019. Moving forward on food loss and waste
        reduction. Rome, Italy.
FAO, World Bank, UN Habitat. (2019). Measuring Individuals’ Rights to Land: An Integrated
        Approach to Data Collection for SDG Indicators 1.4.2 and 5.a.1. Washington, DC:
        World Bank. FAO, World Bank, and UN Habitat.
Fermont, A., & Benson, T. (2011). Estimating yield of food crops grown by smallholder farmers:
        A review in the Uganda context. International Food Policy Research Institute discussion
        paper, 01097.
Fisher, R. (1926). The Arrangement of Field Experiments. Journal of the Ministry of Agriculture,
        33, 503-515.
Flores-Macias F., Lawson C. (2008). “Effects of Interviewer Gender on Survey Responses:
        Findings from a Household Survey in Mexico,” International Journal of Public Opinion
        Research, 20, 100–110.
Floro, V., Labarta, R., Becerra Lopez-Lavalle, L., Martínez, J., & Ovalle, T. (2017). Household
        Determinants of the Adoption of Improved Cassava Varieties using DNA Fingerprinting
        to Identify Varieties in Farmer Fields: A Case Study in Colombia. Journal of Agricultural
        Economics, 69. doi:10.1111/1477-9552.12247
FLW Protocol Steering Committee. (2016). Food Loss and Waste Accounting and Reporting
        Standard.
Foster, A. D., & Rosenzweig, M. R. (2010). Microeconomics of Technology Adoption. Annual
        Review of Economics, 2, 395-424. doi:10.1146/annurev.economics.102308.124433
Fowler Jr, F. J. (1995). Improving survey questions: Design and evaluation. Sage.
Fowler Jr, F. J. (2004). Reducing interviewer-related error through interviewer training,
        supervision, and other means. Measurement errors in surveys, 259–278.
Fowler, F. J., & Mangione, T. W. (1985). The value of interviewer training and supervision.
        Center for Survey Research.
Fraisl, D., Campbell, J., See, L., Wehn, U., Wardlaw, J., Gold, M., Moorthy, I., Arias, R., Piera,
        J., Oliver, J.L., Masó, J., Penker, M., & Fritz, S. (2020.) Mapping citizen science
        contributions to the UN sustainable development goals. Sustainability Science. 15, 1735–
        1751. https://doi.org/10.1007/s11625-020-00833-7
Gaddis, I., Oseni, G., Palacios-Lopez, A., & Pieters, J. (2020). Measuring Farm Labor: Survey
        Experimental Evidence from Ghana. The World Bank Economic Review.
        doi:10.1093/wber/lhaa012
Gallup. (2012). Listening to LAC (L2L). Washington, DC: World Bank.



                                               79
Gennari, P, P. D. Falorsi and C.A. Khalil (2013). The Indirect Sampling as a General Approach
        for Defining Unbiased Sampling Strategies for integrated Agricultural Surveys.
        Proceedings of the 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong.
Gibson, D. G., Wosu, A. C., Pariyo, G. W., Ahmed, S., Ali, J., Labrique, A. B., . . . Hyder, A. A.
        (2019). Effect of airtime incentives on response and cooperation rates in non-
        communicable disease interactive voice response surveys: randomised controlled trials in
        Bangladesh and Uganda. BMJ global health, 4, e001604.
Gibson, D., Ochieng, B., Kagucia, E., Were, J., Hayford, K., Moulton, L., . . . Feikin, D. (2017).
        Mobile phone-delivered reminders and incentives to improve childhood immunisation
        coverage and timeliness in Kenya (M-SIMU): a cluster randomised controlled trial. The
        Lancet Global Health, 5, e428-e438. doi:10.1016/S2214-109X(17)30072-4.
Gideon, L. (2012). Handbook of Survey Methodology for the Social Sciences.
Glazerman, S., Rosenbaum, M., Sandino, R., & Shaughnessy, L. (2020). Remote Surveying in a
        Pandemic: Handbook. 1–19. https://www.poverty-
        action.org/sites/default/files/publications/IPA-Phone-Surveying-in-a-Pandemic-
        Handbook-Updated-December-2020.pdf
Gollin, D., & Udry, C. (2021). Heterogeneity, Measurement Error, and Misallocation: Evidence
        from African Agriculture. Journal of Political Economy, 129, 1-80. doi:10.1086/711369
Gollin, D., Lagakos, D., & Waugh, M. (2014). The Agricultural Productivity Gap. American
        Economic Review: Papers & Proceedings 2014, 104, 165-170.
Gonzalez Villalobos, A., and W. H. Wigton (2011). On Applying Area and Multiple Frame
        Sampling Methods in a Wide Range of Baseline Agricultural and Rural Survey
        Programmes. Proceedings of the 58th World Statistical Congress, Dublin.
Gottschalk, P., & Huynh, M. (2010). Are Earnings Inequality and Mobility Overstated? The
        Impact of Nonclassical Measurement Error. The Review of Economics and Statistics, 92,
        302-315.
Gourlay, S., Kilic, T., & Lobell, D. (2017). Could the Debate Be Over? Errors in Farmer-
        Reported Production and Their Implications for the Inverse Scale-Productivity
        Relationship in Uganda. Policy Research Working Paper No. 8192. World Bank,
        Washington, DC.
Gourlay, S., Kilic, T., & Lobell, D. (2019). A new spin on an old debate: Errors in farmer-
        reported production and their implications for inverse scale - productivity relationship in
        Uganda. Journal of Development Economics, 141, 102376.
        doi:10.1016/j.jdeveco.2019.102376
Gourlay, S., Kilic, T., Martuscelli, A., Wollburg, P., & Zezza, A. (2021). High-Frequency Phone
        Surveys on COVID-19: Best Practices, Open Questions. Tech. rep., World Bank.
Gourley, J., Flamig, Z., Vergara, H., Kirstetter, P.-E., Clark, R., Argyle, E., . . . Howard, K.
        (2017). The FLASH project: improving the tools for flash flood monitoring and
        prediction across the United States. Bulletin of the American Meteorological Society, 98.
Greenleaf, A. R., Gadiaga, A., Guiella, G., Turke, S., Battle, N., Ahmed, S., & Moreau, C.
        (2020). Comparability of modern contraceptive use estimates between a face-to-face
        survey and a cellphone survey among women in Burkina Faso. PLOS ONE, 15, 1-15.
        doi:10.1371/journal.pone.0231819
Griliches, Z. (1986). Productivity, R&d, and Basic Research at the Firm Level in the 1970s.
        Working Paper, National Bureau of Economic Research. doi:10.3386/w1547



                                                80
Grosh, M. and P. Glewwe (2000). Designing Household Survey Questionnaires for Developing
        Countries : Lessons from 15 Years of the Living Standards Measurement Study.
        Washington, DC: World Bank
Groves, R. M. (1989). Survey Errors and Survey Costs. Wiley.
Groves, R., & Lyberg, L. (2010). Total Survey Error: Past, Present, and Future. The Public
        Opinion Quarterly, 74, 849-879. doi:10.2307/40985408
GSARS. (2016). Guidelines for the Enumeration of Nomadic and Semi-Nomadic (Transhumant)
        Livestock.
GSARS. (2017). Handbook on Remote Sensing for Agricultural Statistics. Publication prepared
        in the framework of the Global Strategy to improve Agricultural and Rural Statistics.
GSARS. (2018). Handbook on crop statistics: improving methods for measuring crop area,
        production and yield. Publication prepared in the framework of the Global Strategy to
        improve Agricultural and Rural Statistics.
GSMA. (2020). Mobile Economy. GSMA Intelligence.
Hale, Robert (1999). Appropriate Role of Remote Sensing in U.S. Agricultural Statistics. FAO
        regional project - Improvement of Agricultural Statistics in Asia and Pacific Countries
        (GCP/RAS/171/JPN). FAO, Bangkok, Thailand.
Hausmann, J. (2001). Mismeasured Variables in Econometric Analysis: Problems from the Right
        and Problems from the Left. Journal of Economic Perspectives, 15, 57-67.
        doi:10.1257/jep.15.4.57
Henderson, S., Rosenbaum, M., 2020. Remote Surveying in a Pandemic: Research Synthesis.
        Innovation for Poverty Action.
Hengl, T., Heuvelink, G. B., Kempen, B., Leenaars, J. G., Walsh, M. G., Shepherd, K. D., . . .
        others. (2015). Mapping soil properties of Africa at 250 m resolution: Random forests
        significantly improve current predictions. PloS one, 10, e0125814.
Hengl, T., Miller, M., Križan, J., Shepherd, K., Sila, A., Kilibarda, M., . . . Crouch, J. (2020).
        African Soil Properties and Nutrients Mapped at 30–m Spatial Resolution using Two-
        scale Ensemble Machine Learning. doi:10.21203/rs.3.rs-120359/v1
Herrick, J., Sala, O., & Karl, J. (2013). Land degradation and climate change: a sin of omission?
        Frontiers in Ecology and the Environment, 11. doi:10.2307/23470470
Hicks, J. H., Kleemans, M., Li, N. Y., & Miguel, E. (2017). Reevaluating Agricultural
        Productivity Gaps with Longitudinal Microdata. Working Paper, National Bureau of
        Economic Research. doi:10.3386/w23253
Hill, C., Biemer, P., Buskirk, T., Callegaro, M., Cazar, A., Eck, A., . . . Sturgis, P. (2019).
        Exploring New Statistical Frontiers at the Intersection of Survey Science and Big Data:
        Convergence at “BigSurv18”. Survey Research Methods, 13.
        doi:10.18148/srm/2019.v1i1.7467
Himelein, K. (2015). Interviewer Effects in Subjective Survey Questions: Evidence From Timor-
        Leste. International Journal of Public Opinion Research, 28, 511-533.
        doi:10.1093/ijpor/edv031
Himelein, K., Eckman, S., Kastelic, J., McGee, K., Wild, M., Yoshida, N., Hoogeveen, J. (2020).
        High Frequency Mobile Phone Surveys of Households to Assess the Impacts of COVID-
        19. Guidelines on Sampling Design. World Bank, Washington D.C.
Himelein, K., Eckman, S., & Murray, S. (2014). Sampling Nomads: A New Technique for
        Remote, Hard-to-Reach, and Mobile Populations. Journal of Official Statistics, 30.
        doi:10.2478/jos-2014-0013

                                               81
Hirvonen, K., de Brauw, A., & Abate, G. T. (2021). Food Consumption and Food Security
        during the COVID-19 Pandemic in Addis Ababa. American Journal of Agricultural
        Economics, 103(3), 772–789. https://doi.org/10.1111/ajae.12206
Hogset, H. and C. B. Barrett (2010). Social Learning, Social Influence, and Projection Bias: a
        Caution of Inferences Based on Proxy Reporting of Peer Behavior. Economic
        Development and Cultural Change. 58(3).
Holden, S., Ali, D., Deininger, K., & Hilhorst, T. (2016). A Land Tenure Module for LSMS.
Hu, Y., & Schennach, S. M. (2008). Instrumental Variable Treatment of Nonclassical
        Measurement Error Models. Econometrica, 76, 195–216.
Hyslop, R., & Imbens, G. W. (2001). Bias from Classical and Other Forms of Measurement
        Error. Journal of Business & Economic Statistics, 19, 475–481.
Iarossi, G. (2006). The Power of Survey Design : A User's Guide for Managing Surveys,
        Interpreting Results, and Influencing Respondents. Washington, DC: World Bank.
ILCA. (1990). Livestock System Research. Vol. 1, International Livestock Center for Africa,
        Addis Ababa.
Jaleta, M., Tesfaye, K., Kilian, A., Yirga, C., Habte, E., Beyene, H., . . . Erenstein, O. (2020).
        Misidentification by farmers of the crop varieties they grow: Lessons from DNA
        fingerprinting of wheat in Ethiopia. Plos one, 15, e0235484.
Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C., . . . Usher, A. (2015). Big
        Data in Survey Research. Public Opinion Quarterly, 79, 839-880.
        doi:10.1093/poq/nfv039
Jayne, T.S., M. Muyanga, A. Wineman, H. Ghebru, C. Stevens, M. Stickler, A. Chapoto, W.
        Anseeuw, D. van der Westhuizen and D. Nyange (2019). Are medium-Scale Farms
        Driving Agricultural Transformation in Sub-Saharan Africa? Agricultural Economics. 50.
Jensen, N., & Barrett, C. (2017). Agricultural Index Insurance for Development. Applied
        Economic Perspectives and Policy, 39, 199-219.
        doi:https://doi.org/10.1093/aepp/ppw022
Jerven, M. and D. Johnston (2015). Statistical Tragedy in Africa? Evaluating the Database for
        African Economic Development. Journal of development Studies. 51(2).
Josephson, A., Kilic, T. & Michler, J.D. (2021.) Socioeconomic impacts of COVID-19 in low-
        income countries. Nature Human Behavior 5, 557–565. https://doi.org/10.1038/s41562-
        021-01096-7
Juran, J. M., & Gryna, F. M. (1980). Quality planning and analysis. McGraw-Hill: New York.
Kasprzyk, D. (2005). Chapter IX. Measurement error in household surveys: sources and
        measurement. in Household Surveys in Developing and Transition Countries, United
        Nations.
Kastelic, K. H., Eckman, S., Kastelic, J. G., McGee, K. R., Wild, M., Yoshida, N.,
        & Hoogeveen, J. G. (2020). High Frequency Mobile Phone Surveys of Households to
        Assess the Impacts of COVID-19 (Vol. 2): Guidelines on Sampling Design. Tech. rep.,
        Washington, D.C.: World Bank.
Kilic, T., & Moylan, H. (2016). Methodological Experiment on Measuring Asset Ownership
        from a Gender Perspective. Tech. rep., World Bank, Washington, DC. © World Bank.
        doi:10.1596/33653
Kilic, T., Djima, I., and Carletto, G. (2017). Mission Impossible? Exploring the Promise of
        Multiple Imputation for Predicting Missing GPS-Based Land Area Measures in
        Household Surveys. doi:10.1596/1813-9450-8138

                                               82
Kilic, T., Moylan, H. G., and Koolwal, G. B. (2020b). Getting the (Gender-Disaggregated) Lay
        of the Land : Impact of Survey Respondent Selection on Measuring Land Ownership and
        Rights. Policy Research Working Paper Series. Washington, DC: World Bank.
Kilic, T., Moylan, H., Ilukor, J., Mtengula, C., & Pangapanga-Phiri, I. (2021). Root for the
        tubers: Extended-harvest crop production and productivity Measurement in Surveys.
        Food Policy. (online version) https://doi.org/10.1016/j.foodpol.2021.102033
Kilic, T. and T. Sohnesen (2019). Same Questions but Different Answer: Experimental Evidence
        on Questionnaire design's Impact on Poverty Measured by Proxies. Review of Income and
        Wealth 65 (1).
Kilic, T., Van den Broeck, G., Koolwal, G., & Moylan, H. (2020a). Are You Being Asked?
        Impacts of Respondent Selection on Measuring Employment. Tech. rep., The World
        Bank.
Kilic, T., Zezza, A., Carletto, G., & Savastano, S. (2016). Missing(ness) in Action: Selectivity
        Bias in GPS-Based Land Area Measurements. World Development, 92.
        doi:10.1016/j.worlddev.2016.11.018
Kish, L., & Collection, K. W. (1965). Survey Sampling. Wiley.
Knippenberg, E., Jensen, N., & Constas, M. (2019). Quantifying household resilience with high
        frequency data : Temporal dynamics and methodological options. World Development,
        121, 1–15. https://doi.org/10.1016/j.worlddev.2019.04.010
Kosmowski, F., Chamberlin, J., Ayalew, H., Sida, T., Abay, K., & Craufurd, P. How accurate are
        yield estimates from crop cuts? Evidence from smallholder maize farms in Ethiopia,
        Food Policy, 102, 102122. https://doi.org/10.1016/j.foodpol.2021.102122.
Kosmowski, F., & Worku, T. (2018). Evaluation of a miniaturized NIR spectrometer for cultivar
        identification: The case of barley, chickpea and sorghum in Ethiopia. PLOS ONE.
        doi:10.1371/journal.pone.0193620
Kosmowski, F., Abebe, A., & Ozkan, D. (2020). Challenges and lessons for measuring soil
        metrics in household surveys. Geoderma. doi:10.1016/j.geoderma.2020.114500
Kosmowski, F., Aragaw, A., Kilian, A., Ambel, A., Ilukor, J., YIGEZU, B. I., & Stevenson, J.
        (2019). Varietal identification in household surveys: results from three household-based
        methods against the benchmark of dna fingerprinting in southern Ethiopia. Experimental
        Agriculture, 55, 1-15. doi:10.1017/S0014479718000030
Kretzschmar, T., Mbanjo, G., Magalit, G., Dwiyanti, M., Habib, M., Diaz, M., . . . Yamano, T.
        (2018). DNA fingerprinting at farm level maps rice biodiversity across Bangladesh and
        reveals regional varietal preferences. Scientific Reports, 8. doi:10.1038/s41598-018-
        33080-z
Kreuter, F. (2013). Improving Surveys with Paradata: Analytic Uses of Process Information.
        John Wiley & Sons, Inc.
Kristjanson, P., Waters-Bayer, A., Johnson, N., Tipilda, A., Njuki, J., Baltenweck, I., . . .
        Macmillan, S. (2014). Livestock and Women's Livelihoods. In A. Quisumbing, R.
        Meinzen-Dick, T. Raney, A. Croppenstedt, J. Behrman, & A. Peterman (Eds.), Gender in
        Agriculture (p. 20). doi:10.1007/978-94-017-8616-4_9.
Krosnick, J. A. (1999). Survey research. Annual review of psychology, 50, 537–567.
Laajaj, R., k. Macours, D. A. Pinzon Hernandez, O. Arias, S. Gosling, J. Potter, M. Rubio-
        Codina and R. Vakis. Challenges to Capture the Big Five Personality Traits in non-
        WEIRD Populations. Science Advances. Vol. 5. no. 7.



                                              83
Laajaj, R., & Macours, K. (2021). Measuring skills in developing countries. Journal of Human
         resources. 56 (4).
LaFave, D., Peet, E., & Thomas, D. (2013). Are rural markets complete? Prices, profits and
         recursion. Tech. rep.
Lamanna, C., Hachhethu, K., Chesterman, S., Singhal, G., Mwongela, B., Ngendo, M., Passeri,
         S., Farhikhtah, A., Kadiyala, S., Bauer, J. M., & Rosenstock, T. S. (2019). Strengths and
         limitations of computer assisted telephone interviews (CATI) for nutrition data collection
         in rural Kenya. PLoS ONE, 14(1), 1–20. https://doi.org/10.1371/journal.pone.0210050
Lämmerhirt, D., Gray, J., Venturini, T., & Meunier, A. (2018). Advancing sustainability
         together? Citizen-generated Data and the sustainable development goals. SSRN
         Electronic Journal. doi:10.2139/ssrn.3320467
Lavrakas, P. J. (2008). Encyclopedia of survey research methods. Sage Publishing.
         doi:10.4135/9781412963947
Lavrakas, P. J. (2011). Is the Exclusion of Mobile Phones from Telephone Surveys a Problem:
         The U.S. Experience. Presentation prepared for the Australian Mobile Phone Survey
         Workshop. Melbourne, Australia.
Leo, B., Morello, R., Mellon, J., Peixoto, T., & Davenport, S. T. (2015). Do Mobile Phone
         Surveys Work in Poor Countries? Center for Global Development Working Paper No.
         398. https://doi.org/10.2139/ssrn.2623097
Lesnoff, M., Lancelot, R., Moulin, C.-H., Messad, S., Juanès, X., & Sahut, C. (2014).
         Calculation of Demographic Parameters in Tropical Livestock Herds. doi:10.1007/978-
         94-017-9026-0
Liao, C., Clark, P. E., DeGloria, S. D., & Barrett, C. B. (2017). Complexity in the spatial
         utilization of rangelands: Pastoral mobility in the Horn of Africa. Applied Geography, 86,
         208-219. doi:https://doi.org/10.1016/j.apgeog.2017.07.003
Liao, C., Clark, P., Shibia, M., & Degloria, S. (2018). Spatiotemporal dynamics of cattle
         behavior and resource selection patterns on East African rangelands: evidence from GPS-
         tracking. International Journal of Geographical Information Science, 32, 1-18.
         doi:10.1080/13658816.2018.1424856
Lipper, L., McCarthy, N., Zilberman, D., Asfaw, S., & Branca, G. (2017). Climate smart
         agriculture: building resilience to climate change. Springer Nature.
Little, P., Mcpeak, J., Barrett, C., & Kristjanson, P. (2008). Challenging Orthodoxies:
         Understanding Poverty in Pastoral Areas of East Africa. Development and Change, 39,
         587-611. doi:10.1111/j.1467-7660.2008.00497.x
Lobell, D., & Asseng, S. (2017). Comparing estimates of climate change impacts from process-
         based and statistical crop models. Environmental Research Letters, 12, 015001.
         doi:10.1088/1748-9326/aa518a
Lobell, D., Deines, J., & Tommaso, S. (2020). Changes in the drought sensitivity of US maize
         yields. Nature Food, 1, 1-7. doi:10.1038/s43016-020-00165-w
Lobell, D., Tommaso, S., You, C., Djima, I., Burke, M., & Kilic, T. (2019). Sight for Sorghums:
         Comparisons of Satellite- and Ground-Based Sorghum Yield Estimates in Mali. Remote
         Sensing, 12, 100. doi:10.3390/rs12010100
Lyberg, L., & Kasprzyk, D. (2004). Data Collection Methods and Measurement Error: An
         Overview. In P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Mathiowetz, & S. Sudman
         (Eds.), Measurement Errors in Surveys (pp. 235-257). John Wiley & Sons, Ltd.
         doi:https://doi.org/10.1002/9781118150382.ch13

                                                84
MacDonald, J.M. (2016). Structural Transformation in Norh America: What Does it Mean for
       Agricultural Statistics? Seventh International Conference on Agricultural Statistics,
       Rome, Italy.
Mahfoud, Z., Ghandour, L., Ghandour, B., Mokdad, A., & Sibai, A. (2014). Cell Phone and
       Face-to-face Interview Responses in Population-based Surveys: How Do They Compare?
       Field Methods, 27, 39-54. doi:10.1177/1525822X14540084
Manski, C., & Molinari, F. (2008). Skip sequencing: A decision problem in questionnaire design.
       The annals of applied statistics, 2, 264-285. doi:10.1214/07-AOAS134
Maredia, M., Reyes, B., Manu-Aduening, J., Dankyi, A., Hamazakaza, P., Muimui, K., . . .
       Raatz, B. (2016). Testing Alternative Methods of Varietal Identification Using DNA
       Fingerprinting: Results of Pilot Studies in Ghana and Zambia. Testing Alternative
       Methods of Varietal Identification Using DNA Fingerprinting: Results of Pilot Studies in
       Ghana and Zambia. doi:10.13140/RG.2.2.11573.27361
Masuda, Y., Kelly, A., Robinson, B., Holland, M., Bedford, C., Childress, M., . . . Veit, P.
       (2020). How do practitioners characterize land tenure security? Conservation Science and
       Practice, 2. doi:10.1111/csp2.186
Mathiowetz, N. A. (2000). The effect of length of recall on the quality of survey data. Fourth
       Conference on Methodological Issues in Official Statistics, Stockholm, Sweden.
Maue, C. C., M. Burke and K.J. Emerick (2020). Productivity Dispersion and Persistence
       Among the World's Most Numerous Firms. NBER Working paper No. 26924
McCarthy, N. (2011). Understanding agricultural households' adaptation to climate change and
       implications for mitigation: land management and investment options. LSMS Guidebook.
       Washington, DC: World Bank.
McCarthy, N., Dutilly-Diane, C., Drabo, B., Kamara, A., & Vanderlinden, J.-P. (2004).
       Managing resources in erratic environments: An analysis of pastoralist systems in
       Ethiopia, Niger and Burkina Faso. Research Report of the International Food Policy
       Research Institute.
McCullough, E. (2016). Labor productivity and employment gaps in Sub-Saharan Africa. Food
       Policy, 67. doi:10.1016/j.foodpol.2016.09.013
Mercer, A. (2012) Using Paradata to Understand Effort and Attrition in a Panel Survey. Section
       on Survey Research Methods – JSM. 3822-3833.
Meyer, B., Mok, W., & Sullivan, J. (2015). Household Surveys in Crisis. Journal of Economic
       Perspectives, 29, 199-226. doi:10.1257/jep.29.4.199
Michelson, H., Fairbairn, A., Ellison, B., Maertens, A., & Manyong, V. (2021). Misperceived
       quality: Fertilizer in Tanzania. Journal of Development Economics, 148.
       doi:https://doi.org/10.1016/j.jdeveco.2020.102579
Michler, O., Decker, R., & Stummer, C. (2019). To trust or not to trust smart consumer products:
       a literature review of trust-building factors. Management Review Quarterly, 70.
       doi:10.1007/s11301-019-00171-8
Millán, T., Barham, T., Macours, K., Maluccio, J., & Stampini, M. (2019). Long-Term Impacts
       of Conditional Cash Transfers: Review of the Evidence. The World Bank Research
       Observer, 34, 119. doi:10.1093/wbro/lky005
Minet, J., Curnel, Y., Gobin, A., Goffart, J. P., Mélard, F., Tychon, B., . . . Defourny, P. (2017).
       Crowdsourcing for agricultural applications: A review of uses and opportunities for a
       farmsourcing approach. Computers and Electronics in Agriculture, 142, Part A, 126-138.
       doi:10.1016/j.compag.2017.08.026

                                                85
Minten, B., Beyene, S., Legesse, E., & Kuma, T. (2015). Transforming Staple Food Value
        Chains in Africa: The Case of Teff in Ethiopia. The Journal of Development Studies, 52,
        1-19. doi:10.1080/00220388.2015.1087509
Minten, B., S. Tamru, E. Engida and T. Kuma (2016). Transforming Staple Food Value Chains
        in Africa: the Case of Teff in Ethiopia. Journal of Development Studies 52(5).
Moore, J. C. (1988). Self/proxy response status and survey response quality. Journal of Official
        Statistics, 4, 155–172.
Mundlak, Y. (2001). Chapter 1 Production and supply. Handbook of Agricultural Economics, 1,
        3-85. doi:10.1016/S1574-0072(01)10004-6
Muyanga, M., & Jayne, T. S. (2019). Revisiting the Farm Size-Productivity Relationship Based
        on a Relatively Wide Range of Farm Sizes: Evidence from Kenya. American Journal of
        Agricultural Economics, 101, 1140-1163.
National Academies of Sciences, Engineering and Medicine. (2019). Improving Data Collection
        and Measurement of Complex Farms. Kling, C. and C. Mackie, eds. Washington, DC:
        The National Academis Press. https://doi.org/10.17226/25260
Neter, J., & Waksberg, J. (1964). A Study of Response Errors in Expenditures Data from
        Household Interviews. Journal of the American Statistical Association, 59, 18–55.
Nguyen, G., & Nguyen, T. T. (2020). Exposure to weather shocks: A comparison between self-
        reported record and extreme weather data. Economic Analysis and Policy, 65, 117-138.
        doi:10.1016/j.eap.2019.11.009
Nicolas, G., Robinson, T. P., Wint, G. W., Conchedda, G., Cinardi, G., & Gilbert, M. (2016).
        Using random forest to improve the downscaling of global livestock census data. PloS
        one, 11.
Nord, A., & Snapp, S. (2020). Documentation of farmer perceptions and site-specific properties
        to improve soil management on smallholder farms in Tanzania. Land Degradation &
        Development, 31, 2074-2086. doi:https://doi.org/10.1002/ldr.3582
Norton, B.P., Hoel, J.B. & Michelson, H. (2020). The demand for (fake?) fertilizer: Using an
        experimental auction to examine the role of beliefs on agricultural input demand in
        Tanzania. 2020 Annual Meeting, July 26-28, Kansas City, Missouri 304444, Agricultural
        and Applied Economics Association.
O’Sullivan, M., Rao, A., Banerjee, R., Gulati, K., & Vinez, M. (2014). Levelling the Field:
        Improving Opportunities for Women Farmers in Africa. World Bank and One Campaign,
        Washington, DC.
Ochieng, D. O., & Baulch, B. (2020). Report on a study to crowdsource farmgate prices for
        maize and soybeans in Malawi. MaSSP reports, International Food Policy Research
        Institute (IFPRI).
Olsen, R. (2005). The Problem of Respondent Attrition: Survey Methodology is Key. Monthly
        Labor Review, 128.
Oseni, G., Durazo, J., & McGee, K. (2017). The Use of Non-Standard Units for the Collection of
        Food Quantity : A Guidebook for Improving the Measurement of Food Consumption and
        Agricultural Production in Living Standards Surveys. Washington, DC: World Bank.
Outes-Leon, I., & Dercon, S. (2008). Survey attrition and attrition bias in Young Lives.
Parkes, B., Higginbottom, T., Hufkens, K., Ceballos, F., Kramer, B., & Foster, T. (2019).
        Weather dataset choice introduces uncertainty to estimates of crop yield responses to
        climate variability and change. Environmental Research Letters, 14. doi:10.1088/1748-
        9326/ab5ebb

                                               86
Payne, S. L. (1980). The Art of Asking Questions: Studies in Public Opinion, 3 (Vol. 451).
        Princeton University Press.
Pelletier, J., H. Ngoma, N.M. Mason and C.B. Barrett (2020). Does Smallholder Maize
        Intensification Reduce Deforestation? Evidence from Zambia. Global Enviironmental
        Change. 63.
Pica-Ciamarra, U., Baker, D., Morgan, N., Zezza, A., Azzarri, C., Ly, C., . . . Sserugga, J.
        (2014). Investing in the livestock sector: Why good numbers matter, a sourcebook for
        decision makers on how to improve livestock data.
Pischke, J.-S. (1995). Individual Income, Incomplete Information, and Aggregate Consumption.
        Econometrica, 63, 805–840.
Poets, A., Silverstein, K., Pardey, P. G., Hearne, S., & Stevenson, J. (2020). DNA fingerprinting
        for crop varietal identification: Fit-for-purpose protocols, their costs and analytical
        Implications.
Ponzini, G. et al. (2021). Documenting the Uganda experience in the 50 by 2030 Initiative.
        Unpublished manuscript. World Bank and FAO: Washington, DC and Rome.
        unpublished.
Pope, R. D., & Just, R. E. (2003). Distinguishing Errors in Measurement from Errors in
        Optimization. American Journal of Agricultural Economics, 85, 348-358.
        doi:10.1111/1467-8276.00124
Porciello, J., Ivanina, M., Islam, M., Einarson, S., & Hirsh, H. (2020). Accelerating evidence-
        informed decision-making for the Sustainable Development Goals using machine
        learning. Nature Machine Intelligence, 2, 559-565. doi:10.1038/s42256-020-00235-5
Pratt, M., Sallis, J. F., Cain, K. L., Conway, T. L., Palacios-Lopez, A., Zezza, A., . . . Kilic, T.
        (2020). Physical activity and sedentary time in a rural adult population in Malawi
        compared with an age-matched US urban population. BMJ Open Sport & Exercise
        Medicine, 6. doi:10.1136/bmjsem-2020-000812Reardon, T., & Glewwe, P. (2000).
        Agriculture” and “Module for Chapter 19" in M. Grosh and P. Glewwe (eds.)
        "Designing household survey questionnaires for developing countries: lessons from 15
        years of the living standards measurement study". World Bank.
Reinermann, S., Asam, S., & Kuenzer, C. (2020). Remote Sensing of Grassland Production and
        Management—A Review. Remote Sensing, 12. doi:10.3390/rs12121949
Řezník T, Pavelka T, Herman L, Lukas V, Širůček P, Leitgeb Š, & Leitner F. (2020.) Prediction
        of Yield Productivity Zones from Landsat 8 and Sentinel-2A/B and Their Evaluation
        Using Farm Machinery Measurements. Remote Sensing. 12(12):1917.
        https://doi.org/10.3390/rs12121917
Ridolfo, H., D. Biagas, E. J. Abayomi and J. Rodhouse (2021). Behavior Coding of the 2018
        Agricultural Labor Survey. RDD Research Report No. RDD-21-01, NASS, Washington,
        DC.
Roberts, M. J., Schlenker, W., & Eyer, J. (2013). Agronomic Weather Measures in Econometric
        Models of Crop Yield with Implications for Climate Change. American Journal of
        Agricultural Economics, 95, 236-243. doi:https://doi.org/10.1093/ajae/aas047
Robinson, T. P., Wint, G. W., Conchedda, G., Van Boeckel, T. P., Ercoli, V., Palamara, E., . . .
        Gilbert, M. (2014). Mapping the global distribution of livestock. PloS one, 9, e96084.
Rodhouse, J., H. Ridolfo, E. Abayomi and D. Biagas (2019). Did the Respondent Really mean
        That? How the Behaviors of CATI Interviewers and Data Editiors Impact Measurement
        and PRocessing Errors in Establishment Surveys. 2019 Workshop: Interviewers and

                                                 87
        Their Effects from a Total Survey Error Perspective. 35.
        http://digitalcommons.unl.edu/sociw/35
Rosenzweig, M. (2003). Payoffs from Panels in Low-Income Countries: Economic Development
        and Economic Mobility. American Economic Review, 93, 112-117.
        doi:10.1257/000282803321946903
Rosenzweig, M. R., & Udry, C. (2014). Rainfall Forecasts, Weather and Wages over the
        Agricultural Production Cycle. Working Paper, National Bureau of Economic Research.
        doi:10.3386/w19808
Roßmann J, Gummer T. Using Paradata to Predict and Correct for Panel Attrition. (2016.) Social
        Science Computer Review. 34(3):312-332. doi:10.1177/0894439315587258
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.
Rubin, D. B. (1996). Multiple Imputation after 18+ Years. Journal of the American Statistical
        Association, 91, 473-489. doi:10.1080/01621459.1996.10476908
Sagesaka, A., Palacios-Lopez, A., & Amankwah, A. (2020). Measuring Work on Household
        Farms using Household Surveys. A Practical Guidebook on Designing Household
        Surveys for Effective Data Collection of Work on Household Farms. Tech. rep., World
        Bank: Washington, DC.
Salganik, M. J. (2019). Bit by bit: Social research in the digital age. Princeton University Press.
Schennach, S. (2016). Recent Advances in the Measurement Error Literature. Annual Review of
        Economics, 8, 341-377. doi:10.1146/annurev-economics-080315-015058
Schennach, S. M. (2004). Estimation of Nonlinear Models with Measurement Error.
        Econometrica, 72, 33-75.
Schuman, H. and S. Presser (1981). Questions and Answers in Attitude Surveys. New York:
        Academic Press
Schündeln, M. (2018). Multiple Visits and Data Quality in Household Surveys. Oxford Bulletin
        of Economics and Statistics, 80, 380-405. doi:https://doi.org/10.1111/obes.12196
Schwarz, N. (1997). Questionnaire Design: the Rocky Road from Concept to Answer. In Survey
        Measurement and Process Quality, L. Lyberg et al. New York: John Wiley and Sons.
Schwarz, N. and H. Hipplker (1991). Response alternatives: the impact of their choice and
        presentation order. in Measurement Errors in Surveys, P. Biemer and others, eds. New
        York: John Wiley and Sons.
Scott, C. and B. Amenuvegbe (1991). Recall Loss and Recall Duration: and Experimental Study
        in Ghana. Inter-Stat 4(1)
Sherlund, S., C. B. Barrett and A. Adesina (2002). Smallholder Technical Efficiency Controlling
        for Environmental Production Conditions. Journal of Development Economics. 69.
Silberstein, A. R., & Scott, S. (2004). Expenditure Diary Surveys and Their Associated Errors. In
        P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Mathiowetz, & S. Sudman (Eds.),
        Measurement Errors in Surveys (pp. 303-326). John Wiley & Sons, Ltd.
        doi:https://doi.org/10.1002/9781118150382.ch16
Singh, I., Squire, L., & Strauss, J. (1986). Agricultural household models: Extensions,
        applications, and policy. Washington, DC: World Bank.
Sinha, P., Robson, A., Schneider, D., Kilic, T., Mugera, H. K., Ilukor, J., & Tindamanyire, J. M.
        (2020). The potential of in-situ hyperspectral remote sensing for differentiating 12 banana
        genotypes grown in Uganda. ISPRS Journal of Photogrammetry and Remote Sensing,
        167, 85-103. doi:https://doi.org/10.1016/j.isprsjprs.2020.06.023
Slavec, A., & Toninelli, D. (2015). An Overview of Mobile CATI Issues in Europe. In D.

                                                88
       Toninelli, R. Pinter, & P. de Pedraza (Eds.), Mobile Research Methods: Opportunities
       and Challenges of Mobile Research Methodologies (pp. 41–62). Ubiquity Press.
Song, L., Jiang, Q., Shi, Y.-E., Feng, X.-T., Li, Y., Su, F., & Liu, C. (2018). Feasibility
       Investigation of 3D Printing Technology for Geotechnical Physical Models: Study of
       Tunnels. Rock Mechanics and Rock Engineering, 51. doi:10.1007/s00603-018-1504-3
Stajnko, D., Brus, M., & Hočevar, M. (2008). Estimation of bull live weight through
       thermographically measured body dimensions. Computers and Electronics in
       Agriculture, 61, 233-240. doi:10.1016/j.compag.2007.12.002
Stewart, N., Chandler, J., & Paolacci, G. (2017). Crowdsourcing Samples in Cognitive Science.
       Trends in Cognitive Sciences, 21. doi:10.1016/j.tics.2017.06.007
Strack F.,L. Martin and N. Schwarz (1988). Priming abnd Communication: Social Determinants
       of Information Use in Judgement of Life Satisfaction. European Journal of Social
       Psychology. 18 (3).
Sudman, S., & Bradburn, N. M. (1973). Effects of Time and Memory Factors on Response in
       Surveys. Journal of the American Statistical Association, 68, 805-815.
       doi:10.1080/01621459.1973.10481428
Sudman, S., & Bradburn, N. M. (1974). Response Effects in Surveys: Review and Synthesis.
       Chicago: Aldine.
Sudman, S., N. Bradburn and N. Schwarz (1996). Thinking about Answers: the Application of
       Cognitive Processes to Survey Methodology. San Francisco, CA: Jossey-Bass
Swain, D., Friend, M., Bishop-Hurley, G. J., Handcock, R., & Wark, T. (2011). Tracking
       livestock using global positioning systems - are we still lost? Anim. Prod. Sci., 51, 167-
       175. doi:10.1071/an10255
Thomas, D., Frankenberg, E., & Smith, J. P. (2001). Lost but Not Forgotten: Attrition and
       Follow-up in the Indonesia Family Life Survey. The Journal of Human Resources, 36,
       556–592.
Thomas, D., Witoelar, F., Frankenberg, E., Sikoki, B., Strauss, J., Sumantri, C., & Suriastini, W.
       (2012). Cutting the costs of attrition: Results from the Indonesia Family Life Survey.
       Journal of Development Economics, 98, 108-123. doi:10.1016/j.jdeveco.2010.08
Tourangeau, R., L.J. Rips and K. Rasinski (Eds.) (2000). The Psychology of Survey Response.
       Cambridge University Press. https://doi.org/10.1017/CBO9780511819322
Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological bulletin, 133 5,
       859-83.
Turner, L., Udal, M., Larson, B., & Shearer, S. (2000). Monitoring cattle behavior and pasture
       use with GPS and GIS. Canadian Journal of Animal Science - CAN J ANIM SCI, 80,
       405-413. doi:10.4141/A99-093
Udry, C. (1996). Gender, Agricultural Production, and the Theory of the Household. Journal of
       Political Economy, 104, 1010-1046.
UN Global Pulse. (2015). Feasibility Study: Crowdsourcing High- Frequency Food Price Data
       in Rural Indonesia.
United Nations. (2019). Guidelines for Producing Statistics on Asset Ownership from a Gender
        Perspective. United Nations, New York.
United Nations, World Bank. 2020. Monitoring the State of Statistical Operations under the
       COVID-19 Pandemic. Washington, DC: World Bank.




                                                89
Vaessen, M at al. (1987). Translation of Questionnaires into Local languages. In The World
       Fertility Survey: an Assessment. J. Cleland and C. Scott, eds. New York: Oxford
       University Press.
Van De Giesen, N., Hut, R., & Selker, J. (2014). The Trans-African Hydro-Meteorological
       Observatory (TAHMO). Wiley Interdisciplinary Reviews: Water, 1.
       doi:10.1002/wat2.1034
Vasques, G., Rodrigues, H., Coelho, M., Baca, J., Dart, R., Oliveira, R., . . . Ceddia, M. (2020).
       Field Proximal Soil Sensor Fusion for Improving High-Resolution Soil Property Maps.
       Soil Systems, 4. doi:https://doi.org/10.3390/soilsystems4030052
Vijverberg, W. P., & Mead, D. C. (2000). “Household Enterprises”, pp. 105-137 in Margaret
       Grosh and Paul Glewwe(eds.) "Designing Household Survey Questionnaires for
       Developing Countries : Lessons from 15 Years of the Living Standards Measurement
       Study, Volume 3". The World Bank.
Waldman, K., V. N., Attari, S., Sheffield, J., Estes, L., Caylor, K., & Evans, T. (2019). Cognitive
       biases about climate variability in smallholder farming systems in Zambia. Weather,
       Climate, and Society, 11(2), 369-383. doi:https://doi.org/10.1175/WCAS-D-18-0050.1
Walker, T. S., & Ryan, J. G. (1990). Village and household economics in India's semi-arid
       tropics. Johns Hopkins University Press.
Wansbeek, T., & Meijer, E. (2000). Measurement Error and Latent Variables in Econometrics.
       Economics Letters, 69.
West, B., & Blom, A. (2017). Explaining Interviewer Effects: A Research Synthesis. Journal of
       Survey Statistics and Methodology, 5, 175-211. doi:10.1093/jssam/smw024
Wiggins, A., Greg, N., Stevenson, R., & Crowston, K. (2011). Mechanisms for Data Quality and
       Validation in Citizen Science. 2011 IEEE Seventh International Conference on se-
       Science Workshops, Stockholm, Sweden . doi:10.1109/eScienceW.2011.27
Winkielman, P. B. Knauper and N. Schwarz (1988). Looking back at anger: reference periods
       change the interpretation of emotional frequency questions. Journal of Personality and
       Social Psychology. 75(3)
Witoelar, F. (2011). Tracking in Longitudinal Household Surveys. Tech. rep., Washington, DC:
       World Bank.
Wollburg, P., Tiberti, M., & Zezza, A. (2020). Recall length and measurement error in
       agricultural surveys. Food Policy, 102003.
       doi:https://doi.org/10.1016/j.foodpol.2020.102003
Wooldridge, J. M. (2002). Inverse probability weighted M-estimators for sample selection,
       attrition, and stratification. Portuguese Economic Journal, 1, 117–139.
Working, H. (1925). The Statistical Determination of Demand Curves. 39(4).
       doi:10.2307/1883264
Wossen, T., Abdoulaye, T., Alene, A., Nguimkeu, P., Feleke, S., Rabbi, I. Y., . . . Manyong, V.
       (2019). Estimating the productivity impacts of technology adoption in the presence of
       misclassification. American Journal of Agricultural Economics, 101, 1-16.
Xue, L., Liu, G., Parfitt, J., Liu, X., van Herpen, E., Stenmarck, Å., . . . Cheng, S. (2017).
       Missing Food, Missing Data? A Critical Review of Global Food Losses and Food Waste
       Data. Environmental Science & Technology, 51. doi:10.1021/acs.est.7b00401
Yigezu, Y. A., Alwang, J., Rahman, W., Mollah, M. B., El-Shater, T., Aw-Hassan, A., & Sarker,
       A. (2018). Is DNA fingerprinting the gold standard for estimation of adoption and
       impacts of improved lentil varieties? Food Policy, 83. doi:10.1016/j.foodpol.2018.11.004

                                                90
Yirga, C., & Alemu, D. (2016). Adoption of Crop Technologies among Smallholder Farmers in
        Ethiopia: Implications for Research and Development. Ethiopian Journal of Agricultural
        Science, Ethiop. J. Agric. Sci., 1-16.
Young Lives. (2020). Listening to Young Lives at Work Covid-19 Phone Survey: First Call
        shows widening inequality. Young Lives.
        https://www.younglives.org.uk/content/listening-young-lives-work-covid-19-phone-
        survey-first-call-shows-widening-inequality
Zabel, J. E. (1998). An Analysis of Attrition in the Panel Study of Income Dynamics and the
        Survey of Income and Program Participation with an Application to a Model of Labor
        Market Behavior. Journal of Human Resources, 33, 479-506.
Zeug, H., Zeug, G., Bielski, C., Solano-Hermosilla, G., Mâ, R., & others. (2017). Innovative
        Food Price Collection in Developing Countries. Focus on Crowdsourcing in Africa. JRC
        Working Papers. doi:10.2788/12343
Zezza, A., Federighi, G., Kalilou, A. A., & Hiernaux, P. (2016a). Milking the data: Measuring
        milk off-take in extensive livestock systems. Experimental evidence from Niger. Food
        Policy, 59, 174-186. doi:https://doi.org/10.1016/j.foodpol.2016.01.005
Zezza, A., Pica-Ciamarra, U., Mugera, K. H., Mwisomba, T., & Okello, P. (2016b). Measuring
        the Role of Livestock in the Household Economy. A Guidebook for Designing Household
        Survey Questionnaires. 67pp.
Zhang, P., Zhang, J., & Chen, M. (2017). Economic impacts of climate change on agriculture:
        The importance of additional climatic variables other than temperature and precipitation.
        Journal of Environmental Economics and Management, 83, 8-31.




                                               91