Policy Research Working Paper 10738 Using Survey-to-Survey Imputation to Fill Poverty Data Gaps at a Low Cost Evidence from a Randomized Survey Experiment Hai-Anh Dang Talip Kilic Vladimir Hlasny Kseniya Abanokova Calogero Carletto Development Economics Development Data Group March 2024 Policy Research Working Paper 10738 Abstract Survey data on household consumption are often and alternative (quarterly or monthly) Consumer Price unavailable or incomparable over time in many low- and Index deflators. The proposed approach to imputation also middle-income countries. Based on a unique randomized performs better than multiple imputation and a range of survey experiment implemented in Tanzania, this study machine learning techniques. In the case of a target survey offers new and rigorous evidence demonstrating that with modified (shortened or aggregated) food or non-food survey-to-survey imputation can fill consumption data consumption modules, imputation models including food gaps and provide low-cost and reliable poverty estimates. or non-food consumption as predictors do well only if the Basic imputation models featuring utility expenditures, distributions of the predictors are standardized vis-à-vis together with a modest set of predictors on demographics, the base survey. For the best-performing models to reach employment, household assets, and housing, yield accurate acceptable levels of accuracy, the minimum required sample predictions. Imputation accuracy is robust to varying the size should be 1,000 for both the base and target surveys. survey questionnaire length, the choice of base surveys for The discussion expands on the implications of the findings estimating the imputation model, different poverty lines, for the design of future surveys. This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at hdang@worldbank.org and tkilic@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Using Survey-to-Survey Imputation to Fill Poverty Data Gaps at a Low Cost: Evidence from a Randomized Survey Experiment Hai-Anh Dang, Talip Kilic, Vladimir Hlasny, Kseniya Abanokova and Calogero Carletto* Keywords: consumption, poverty, survey-to-survey imputation, household surveys, Tanzania. JEL Codes: C15, I32, O15. * The senior authorship is shared between Dang and Kilic. Dang (hdang@worldbank.org; corresponding author) is a senior economist in the Living Standards and Measurement Study (LSMS) Unit at the World Bank Development Data Group in Washington, DC and is also affiliated with GLO, IZA, Indiana University, and London School of Economics and Political Science; Kilic (tkilic@worldbank.org; corresponding author) is the senior program manager for the LSMS Unit at the World Bank Development Data Group in Washington, DC; Hlasny (vhlasny@gmail.com) is an economic affairs officer at the UN ESCWA in Beirut, Lebanon; Abanokova (kabanokova@worldbank.org) is an economist in the LSMS Unit at the World Bank Development Data Group in Washington, DC; and Carletto (gcarletto@worldbank.org) is the senior manager for the LSMS and the Strategy and Operations Units at the World Bank Development Data Group in Washington, DC. We would like to thank Benoit Dercef, Andrew Dillon, Anne Swindale, and participants at the 2023 European Survey Research Association (ESRA) conference, the IPA/ GRPL conference (Northwestern) and various seminars and workshops at Australian National University, University of Oxford, and the World Bank for helpful discussion and feedback on the earlier drafts. We are grateful for the funding from the United States Agency for International Development (USAID). 1. Introduction Household consumption survey data that underlie monetary poverty estimates in low- and middle-income countries are often unavailable, unreliable or incomparable. To address these challenges, imputation-based methods have become increasingly more common not only to fill poverty data gaps in data-scarce and resource-constrained contexts, but also to identify project/program beneficiaries and evaluate development project/program impacts on poverty at low cost (World Bank, 2021; Smythe and Blumenstock, 2022; Dang and Lanjouw, 2023). 1 Building on the seminal technique that obtains small area estimates of monetary poverty by imputing from a household consumption survey into a census (Elbers et al., 2003), survey-to- survey imputation builds an imputation model using appropriate predictor variables from an existing older consumption survey (base survey), which can be subsequently applied to the same variables in another non-consumption survey (target survey) to provide poverty estimates for the latter survey. The target survey can be either an existing, non-consumption survey, such as a Demographic and Health Survey (DHS) or a labor force survey (Stifel and Christiaensen, 2007; Douidich et al., 2016), or a purposefully commissioned survey that only collects the requisite predictors. Recent applications also include sourcing the data for the requisite predictors from administrative records to impute poverty for hard-to-reach refugee populations (Altindag et al., 2021; Dang and Verme, 2023), or phone call detail records to target the ultra-poor (Aiken et al., 2023). Three key conceptual, but understudied, issues motivate our work. First, the literature on survey-to-survey imputation has long emphasized the requirement of having identical questions 1 Imputation techniques are widely used by international organizations and national statistical agencies to fill in missing data gaps such as education statistics (UOE, 2020) and income data (US Census Bureau, 2017). See also Dang and Lanjouw (2023) for a recent review of poverty imputation studies. 2 for poverty predictors in both base and target surveys. However, even if this requirement is fulfilled, substantial differences may still exist between base versus target surveys regarding length, thematic scope, and complexity of questionnaires. These differences may lead to considerable differences in interview duration and respondent burden, which can affect measurement in diverse ways that are ultimately context- and subject-specific (Kreuter et al., 2011; Eckman et al., 2014). In our case, the understudied topic is whether poverty imputation accuracy can be affected by the fact that the target survey questionnaire, by design, would be lighter and less burdensome than its older, base survey counterpart – even if the requisite questions underlying the poverty predictors are identical across base and target survey questionnaires. The only available evidence regarding this question comes from a randomized experiment that was implemented in Malawi but not replicated elsewhere, and that shows the measurement of poverty predictors can indeed be affected by the length of the target survey in a way that can also impact predicted poverty estimates (Kilic and Sohnesen, 2019). The second and related issue is whether shorter consumption modules included in a target survey (e.g., with reduced or aggregated item lists vis-à-vis the base survey) can provide cheaper- to-collect but reliable predictors that can further boost the accuracy of poverty predictions under marginal additional costs of data collection. In this case, the requisite questions underlying the poverty predictors may be non-identical across base and target survey questionnaires – relaxing the aforementioned traditional requirement for survey-to-survey imputation. In this respect, only two studies exist, and they offer inconclusive evidence. While Christiaensen et al. (2022) suggest that using consumption sub-aggregates for poverty imputation only works under certain stringent conditions, Dang et al. (forthcoming) analyze 14 surveys from various countries and demonstrate that adding household utility expenditures to a basic imputation model with household 3 demographic and employment attributes can produce accurate poverty predictions - consistently within the 95 percent confidence internal, and often within one standard error, of the observed “true” poverty rate. 2 Finally, the last issue motivating our work is that existing studies that “validate” imputed poverty estimates were implemented in artificial settings. Specifically, these studies typically pursue validation by estimating an imputation model on an older, base consumption survey and applying the model to a more recent, target consumption survey, pretending that there were no consumption data in the latter survey. These studies subsequently compare the resulting imputed estimate to the true poverty rate based on the actual consumption data in the target survey. The fact that the newer survey round serves both as the target survey and as the source of true poverty abstracts away from real-life differences in base versus target survey design that motivate our work in the first place. These traditional artificial settings also differ from many practical applications for survey-to-survey imputation where a new survey with a different design is implemented as the target survey (e.g., a survey that does not collect consumption data or that administers lighter consumption modules - as in the case of most proxy-means tests). Against this background, we report on a unique randomized and nationally representative household survey experiment that was implemented in Tanzania in 2022 to systematically investigate the understudied topics that have a bearing on the operational/practical applications of survey-to-survey imputation to fill poverty data gaps. The experiment featured three treatment arms (TA) that sampled households were randomly assigned to and that differed in terms of questionnaire design. Treatment Arm 1 (TA 1) households were administered a questionnaire that collects comprehensive data on household consumption and allows for the computation of 2 We use the term “true” poverty rate to refer to the poverty rate that can be estimated using the actual household consumption data. 4 benchmark poverty estimates, which is identical to the questionnaire for the base survey that permits the estimation of a wide range of competing imputation models. TA 2 households were administered a light questionnaire variant that only included questions that permit the estimation of a data-modest subset of imputation models, which additionally includes the TA 1 food consumption module but with a reduced list of key food items. Finally, TA 3 households were administered an alternative light questionnaire that shares the same core as the TA 2 questionnaire and that includes alternate, aggregated versions of TA 1 food and non-food consumption modules. These data are in turn complemented with the data from the nationally representative Tanzania National Panel Survey (TZNPS), and specifically the TZNPS 2020/21 and 2019/20 rounds that are used as base surveys for the estimation of the imputation models that are in turn applied to each treatment arm to obtain across-year predictions. Through our research, we make novel contributions to the literature by (a) providing experimental evidence regarding the effects of target survey design on poverty imputation, (b) sidestepping usual concerns regarding the “validation” of imputed estimates by offering a real-life setting with benchmark data, and (c) providing new evidence regarding the minimum-required base and target survey sample sizes. To our knowledge, we offer the first study that leverages a randomized and nationally representative survey experiment to rigorously study these inter- connected, but little-explored, research questions that are at the heart of survey-to-survey imputation. In this sense, our work is also broadly related to a growing literature that relies on randomized survey experiments in low- and middle-income contexts to gauge the relative accuracy and cost-effectiveness of competing survey methods vis-à-vis gold-standard measurement approaches (Beegle et al., 2012; Arthi et al., 2018; Gourlay et al., 2019; De Weerdt et al, 2020; Kilic et al., 2021; Abate et al., 2023). 5 The analysis demonstrates that if the predictors in the target survey are elicited through questions that are identical to their counterparts in the base survey, imputation accuracy is not impacted by the remaining differences between the base and target surveys in terms of scope and complexity. Basic imputation models, including a core set of predictors on demographics, employment, household assets and housing, and/ or utility expenditures, yield highly accurate predictions vis-à-vis the true poverty rate. Furthermore, regarding TA 2 or TA 3 with modified (either shortened or aggregated) food and non-food consumption modules, imputation models including food consumption or non-food consumption expenditures as predictors do well only if the distributions of the predictors are standardized vis-à-vis the base survey (which can be either the TZNPS or TA 1). Finally, for the best-performing models to reach acceptable levels of accuracy, the analysis shows that the minimum-required sample size should be 1,000 observations for both the base survey and the target survey. The results are robust to the choice of base surveys used for imputation model estimation; different poverty lines; and alternative (quarterly or monthly) CPI deflators. Our proposed approach to imputation is also shown to perform better than multiple imputation and a range of machine learning techniques. This paper consists of six sections. Section 2 presents the experimental design (Section 2.1) and descriptive statistics (Section 2.2). Section 3 discusses the analytical framework. Section 4 presents the main estimation results (Section 4.1) and robustness checks (Section 4.2), followed by section 5 on various extensions. Section 6 concludes. We provide additional estimation results in Appendix A, further description of the consumption aggregates in Appendix B, and more detailed discussion of the formulas and intuition behind the method in Appendix C. 6 2. Experimental design and descriptive statistics 2.1. Experimental design The data come from the Tanzania Methodological Survey Experiment on Household Consumption Measurement, which was conducted from April to July 2022 by the Tanzania National Bureau of Statistics, with technical support from the World Bank Living Standards Measurement Study (LSMS) program. Informed by the power calculations based on the past rounds of the Tanzania National Panel Survey (TZNPS) and the Household Budget Survey (HBS), the experiment spanned 143 enumeration areas (EAs) across Mainland Tanzania and Zanzibar, including both urban and rural areas. In each sampled EA, 25 households were selected at random from a fresh household listing that was conducted, out of which five sampled households were assigned at random to one of five survey treatment arms. We analyze three survey treatment arms that are most relevant for our study. 3 Treatment Arm 1 (TA 1) administered the standard TZNPS household questionnaire that provides observed consumption and poverty estimates and that permits the estimation of all imputation models presented in Dang et al. (forthcoming), whose Tanzania-specific portions of the research relied on the data from the previous rounds of the TZNPS. Table A.1 in Appendix A shows each of the models and their predictors. The TA 1 sample consists of 711 households. Treatment Arm 2 (TA 2) administered a light questionnaire that includes: (1) “Core modules” that only include the questions necessary for computing the predictors for a data-modest subset of models that are presented in Dang et al. (forthcoming) - specifically 3 The two additional treatment arms that are not discussed/used in this paper were (a) the sample that was subject to a 14-day diary for data collection on food consumption, following the HBS 2017/18 methodology, and otherwise identical non-food consumption expenditure modules vis-à-vis T1; and (b) the sample that was subject to a modified version of T1 questionnaire, specifically with a food consumption module that was set up to be aligned with the T1/TZNPS food consumption module but with the HBS food item list. 7 Models 1, 2, 8 and 9, which require predictors related to household demographics, employment attributes, housing characteristics, assets, utility expenditures, and (2) A shorter version of the TA 1 food consumption module - with an identical set-up/set of questions but with a reduced list of food items – aligned with the earlier Survey of Household Welfare and Labour in Tanzania (SHWALITA) and specifically the “short list” treatment arm in that study. 4 The TA 2 food consumption module is slotted immediately after the TA 2 core modules, covering 26 items out of the 71 items included in TA 1. 5 These selected items account for 69 percent of the monetary value of food consumption in TA 1, indicating that the reduced list of food consumption items under TA 2 misses out on a considerable share of the food expenditure compared to the full TA 1 food consumption module. As discussed later, TA 2 data on food consumption are used to estimate an additional imputation model, namely Model 3 as presented in Dang et al. (forthcoming), which includes household food consumption expenditures as a predictor. The TA 2 sample consists of 701 households. Table A.2 in Appendix A presents expenditures on these food categories for TA 2 and TA 3 in comparison with those from TA 1. Finally, Treatment Arm 3 (TA 3) administered an alternative light questionnaire variant that includes: (1) The same TA2 core modules that allow for the estimation of Models 1, 2, 8, and 9 as presented in Dang et al. (forthcoming), 4 For more information regarding SHWALITA, please see Beegle et al. (2012) and visit https://www.uantwerpen.be/en/staff/joachim-deweerdt/public-data-sets/shwalita/#introduction. 5 TA 2 covers 13 individual food items and 4 item categories corresponding to 13 items on TA1. The 13 individual items include: rice (husked); maize (grain); maize (flour); millet and sorghum (flour); cassava fresh; cassava dry/flour; sweet potatoes; cooking bananas and plantains; sugar; beef including minced sausage; dried/salted/canned fish and seafood; fresh milk; cooking oil. The 4 grouped item categories (covering 13 items in TA 1) include: peas, beans, lentils, and other pulses; Onions, tomatoes, carrots, and green peppers; Spinach, cabbage, and other green vegetables; and Fresh fish and seafood. 8 (2) An aggregated food consumption module that corresponds to the “collapsed list” treatment arm in the SHWALITA study, and (3) A series of short, aggregated non-food consumption expenditure modules that were informed by the variants from the SHWALITA study but were refined in some instances to better align with the COICOP categories (United Nations, 2018), related, for instance, to education, health, and utilities expenditures. The TA 3 collapsed food consumption module is slotted immediately after the core modules, covering all 12 broad food categories (including alcoholic beverages), and only asking the respondent to state the monetary value that the consumed quantity of total food in that category would have cost, had it been purchased. 6 TA3 non-food consumption expenditure modules are then slotted immediately after the TA3 collapsed food consumption module, and together, these sets of modules permit the estimation of Models 3 and 4 as presented in Dang et al. (forthcoming). The TA 3 sample consists of 698 households. These data are in turn complemented with the data from the nationally representative TZNPS 2020/21 and 2019/20 rounds, which are used as base surveys to estimate the imputation models that are in turn applied to each treatment arm. The main results are based on the 2020/21 round, while Appendix A includes consistent findings based on the 2019/20 round, as discussed below. The TZNPS is a multi-topic, nationally representative longitudinal household survey that has been implemented by the NBS since 2008, with financial and technical support from the World Bank Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) project. The questions for the poverty predictors required for the estimation of Models 1, 2, 8 and 9 are 6 TA 3 covers: cereals and cereal products; starches; sugar and sweets; pulses, dry; nuts and seeds; vegetables; fruits; meat, meat products, fish; milk and milk products; oil and fats; spices and other foods; alcoholic and non-alcoholic beverages. 9 identical across the consumption experiment as well as the TZNPS 020/21 and 2019/20 rounds. The sample sizes were 4,644 in 2020/21 (following up with a panel sample that was first interviewed during the 2014/15 round) and 1,179 households in 2019/20 (following up with a subset of an older panel sample that had been interviewed as part of the TZNPS 2008/09, 2010/11 and 2012/13). As discussed above, there are differences in terms of food and non-food consumption modules that were introduced in TA 2 and TA 3 to understand the potential for using lighter version of these modules to obtain accurate poverty predictions. Finally, in TA 1 and the TZNPS 2020/21 and 2019/20 rounds, the total consumption is taken to be the sum of food (consumed at and away from home) and non-food consumption (health, education, utilities, furnishing and household expenses, transport, communication, retreats, and other). We provide more detailed discussion on the food and non-food consumption expenditure aggregates for the TZNPSs and the three TAs in Appendix B. 2.2. Descriptive statistics We spatially and inter-temporally deflate all the consumption aggregates in the three TAs and the TZNPSs. The spatial and temporal price differences in nominal household consumption expenditures within all survey rounds are corrected using Fisher price indices. These price indices are estimated within each survey round by stratum and quarter (or month, in the case of the experiment), and the base period in each estimation comprises the entire period of each round. The across-survey intertemporal deflation is in turn conducted using the annual inflation series for various consumption groups, as obtained from the World Bank Global Database of Inflation (Ha et al. 2023). 7 Specifically, food expenditure is deflated using the consumer price inflation for 7 To access the database, visit: https://www.worldbank.org/en/research/brief/inflation-database. 10 food and non-alcoholic beverages, while utilities expenditure is deflated using the consumer price inflation for energy (capturing housing, water, electricity, gas and other fuels). Remaining non- food consumption expenditure is deflated using the headline average consumer price inflation. The year 2022 is used as the base year. Hence, the consumption expenditures as measured in our experiment in 2022 are taken in their nominal values, while the expenditures in previous rounds are deflated. The expenditure values elicited during the TZNPS 2020/21, conducted between December 2020 and January 2022, are deflated in accordance with the 2021-2022 inflation. Similarly, the expenditure values elicited during the TZNPS 2019/20, conducted between January 2019 and January 2020, are deflated in accordance with the 2020-2022 inflation. In what follows, all expenditures are reported in year- 2022 Tanzanian shillings (TSH), and total annual consumption per adult equivalent is compared to the TZNPS 2020/21 poverty line deflated to prices in 2022. Table 1 provides descriptive statistics for TZNPS 2020/21, 2019/20 rounds and for each of the survey treatment arms, coupled with the results from the tests of mean differences among the TAs. The “good” news is that across treatments, comparisons of the prospective poverty predictors that are required for Models 1, 2, 8 and 9 largely do not reveal statistically significant differences. The only exceptions are participation in wage work, and bicycle ownership, between TA 1 and TA 2; radio ownership, urban–rural residence and utility expenditures (though with marginal differences) between TA 2 and TA 3; and access to piped water, between TA 1 and TA 3. These findings are in stark contrast with those of Kilic and Sohnesen (2019) 8 and do not raise flags regarding the 8 Kilic and Sohnesen (2019) report on a randomized survey experiment that was conducted in Malawi in 2016 and that shows that observationally equivalent, as well as identical, households in fact answer the same questions differently depending on whether they are interviewed with a short questionnaire or its longer counterpart. The authors find large and statistically significant differences in reporting across a range of topics and question types, which can lead to a difference of 3 to 7 percentage points in predicted poverty estimates, depending on the imputation model. The authors, however, demonstrate that the imputation model using only the predictors that are elicited prior to the 11 sensitivity of measurement to the differences in length and complexity between the base survey and target survey questionnaire design, provided that the identical questions are utilized across the surveys. It is thus reasonable that changes in the distributions of the predictor variables over time for these four models can capture the change in the poverty rate between the rounds (i.e., satisfying Assumption 2 in our imputation framework discussed in the next section). On the other hand, Table 1 also shows that the food and non-food consumption aggregates that can be created with the TA 2 and TA 3 data, as explained above, present large and statistically significant differences vis-à-vis the TA 1 counterparts. Relative to TA 1, the TA 2 and TA 3 food consumption modules decrease the reported household food consumption expenditures, respectively, by 22 and 31 percent. The non-food consumption module administered in TA 3 decreases the reported household non-food consumption expenditures by 46 percent compared to TA 1. The results lower our expectations regarding the predictive accuracy of Models 3 and 4 that would be applied to target survey data with reduced food and non-food consumption modules. Hence, as discussed later, we also explore “standardizing” the distributions of the predictors in TA 2 and TA 3 as to match the distributions of the same variables that are obtained in the base survey (Dang et al. 2017). In practical terms, making the standard assumption that the variables to be standardized have a normal distribution, standardization implies (1) subtracting each variable from its mean (i.e., demeaning) in the target survey, (2) multiplying the demeaned variable with the ratio of the square root of the variable variances in the base survey and target survey, and (3) adding the base survey mean (see Appendix C for further discussion). 9 variation in questionnaire design provides identical poverty predictions irrespective of the short versus longer questionnaire treatment. 9 Alternatively, we can employ a rescaling approach since food and non-food expenditures reported in NPS can provide the benchmark against which we can adjust (rescale) the corresponding expenditures under TA 2 and TA 3 such that these expenditures are equal. While rescaling the shares of these expenditures against those for the corresponding groups in NPS improves imputation accuracy for food expenditures in TA 2, it does not improve estimates of food and non-food expenditures in TA 3. 12 Overall, the differences in questionnaire design do not significantly affect the poverty predictors required for Models 1, 2, 8 and 9. However, differences in food and non-food consumption aggregates collected with the short questionnaires can have a significant impact on the predictive accuracy of Models 3 and 4, highlighting the importance of questionnaire design for poverty imputation and survey research in general. 3. Analytical framework The analytical framework features the poverty imputation method developed in Dang et al. (2017), which builds on the survey-to-census poverty mapping method in Elbers et al. (2023). 10 The method has been applied to fill poverty data gaps for general household populations in various low- and middle-income countries from different regions, including India, Jordan, Tunisia, Viet Nam, and Sub-Saharan African countries (Beegle et al., 2016; Dang and Lanjouw, 2023), as well as to estimate poverty within refugee sub-populations (Dang and Verme, 2023). Most recently, the method is employed in Dang et al. (forthcoming) to analyze data from 14 multi-topic household surveys from Ethiopia, Malawi, Nigeria, Tanzania, and Viet Nam. We briefly describe the method below. Building on a household utility budget model (Deaton and Muellbauer, 1980), (log) household consumption per capita ( ) is typically estimated using the following reduced-form linear model for survey j, for j= 1, 2 11 = ′ + (1) 10 Previous studies offer various refinements of certain features of the poverty mapping technique, such as imposing a parametric probit functional form on the error term (Tarozzi, 2007) or offering a different formula to estimate the standard errors (Mathiassen, 2009). Dang et al. (2017) offer a well-defined framework for survey-to-survey imputation with simpler variance formulas and formulas for standardization of variables from surveys with different sampling designs (e.g., imputing from a household consumption survey into a LFS). 11 The subscript for households is omitted for less cluttered notation in the subsequent discussion. 13 In Equation (1), can include household variables that capture not only household socio- demographic characteristics, but also preferences and attitudes that may shape household consumption patterns; can include household assets or incomes; and is the error term (Elbers et al., 2003; Ravallion, 2016). 12 consists of the two components, a cluster random effects ( ) and an idiosyncratic error term ( ), which are assumed independent and normally distributed 2 2 given , such that | ~(0, ) and | ~(0, ). The consumption data exist in the base survey (i.e., j= 1, or survey 1) but are not available in the other survey(s). Our goal is to estimate consumption data from the base survey (j=1) for use in poverty estimates in the target survey (j=2), where the consumption data may not be available or of poor quality. The estimation process relies on two assumptions that must be met. Assumption 1 states that the sampled data in survey 1 and survey 2 are representative of the same population in each respective time period, and Assumption 2 assumes that changes in the distributions of the explanatory variables between the two periods can capture the change in the poverty rate in the next period. Given Assumptions 1 and 2, to obtain the imputed consumption for survey 2 we can replace 1 with 2 in Equation (1): 1 2 = 1 ′2 + 1 + 1 (2) Put differently, Equation (2) applies the model parameter 1 and the distributions of the error terms 1 and 1 from the base survey to the 2 characteristics in the target survey to obtain 1 estimates of household consumption 2 in the target survey (with the superscript indicating that 12 We suppress the index for households in the equations to make the notation less cluttered. 14 the household consumption variable is predicted using the model parameters from the base survey). Since the parameters used in Equation (2) are estimated using a base survey that is different from the target survey, we can use simulation to estimate Equation (2) (for a single draw) as follows: �2 1 �̂′ ̃1, �1, + ̂ (3) � , = 1, 2 + � ′ , In Equation (3), �1, represent the sth random draw (simulation) from their � 1, , and � � 1, estimated distributions using the base survey, for s= 1,…, S. It can be proved that the poverty rate in the target survey and its variance can then be estimated as �2 = 1 ∑ �2 ( 1 , ≤ 1 ) (4) =1 �2 ) = 1 ∑ ( �2, |2 ) + (1 ∑ ( � | ) (5) =1 =1 2, 2 �2, = ( �2, in Equation (5) is similarly defined as follows where �2 1 , ≤ 1 ) and 1 is the poverty line in the base survey (see Appendix C for more discussion). For consistency, it is 1 —rather than 2 in the target survey—that should be used in combination with the predicted consumption to obtain poverty estimates since all the estimates for 1, 1 , and 1 in Equation (2) also come from the base survey. Notably, this is an advantage of survey imputation method because it helps preclude various data challenges such as obtaining the right consumption deflators for the target survey or ensuring the new poverty line is constructed in a comparable manner to that in the base survey. Unlike the traditional econometric model that estimates the impacts of on , our focus is on predicting (imputing) given , so endogeneity concerns of are less important in our context. 15 Individual characteristics include variables such as age, sex and education. Household characteristics include variables such as household size and composition, the living area of the house, the physical quality of the house (e.g., whether its roof or wall has good quality or whether the toilet is improved, such as a flush toilet), quality of drinking water, and household assets. These observed characteristics are commonly used as proxies for measuring household wealth. For comparison and robustness testing, two estimation methods relying on different assumptions about the error terms are considered in Tables 2–6. Method 1 uses the normal linear regression model that assumes a normal distribution of the error terms, and Method 2 uses the empirical distribution of the error terms. Both methods include the random effects at the primary sampling unit. To ensure that consumption data are consistent, the poverty lines that are used to provide the imputation-based estimates are those in the base surveys. But we will also offer robustness checks when we use different poverty lines and alternative modelling techniques such as multiple imputation (MI) and machine learning. Figure 1 presents a visual summary of the two types of poverty imputation that we implement. The first type of imputation is to employ the 2020/21 TZNPS as the base survey to impute into the different Tas to obtain poverty estimates over time (i.e., across-year imputation). While this offers our main estimation results, we will also provide robustness checks where we use the 2019/20 TZNPS as an alternative base survey. The second type of imputation is to use TA 1 as the base survey to impute into the other two Tas to obtain poverty estimates in the same year (i.e., within- year imputation). Both types of imputation have implications for policy advice and survey design. While across-year imputation provides estimates on consistent poverty trends over time, within- year imputation offers cost-saving and reduced logistic challenges where a small-scale consumption survey can be implemented instead of a full-scale consumption survey. 16 4. Estimation results 4.1. Main results Tables 2–6 report the main estimation results for the predicted poverty rates in each of the experimental survey treatment arms, using either the NPS 2020/21 round (in Tables 2–4) for across-year predictions or the TA 1 round (in Tables 5–6) as the base survey round for within-year predictions. Model 1 is the most parsimonious model and consists of household size, household heads’ age and gender, household heads’ highest completed levels of schooling (as binary indicators), the shares of household members in the age ranges 0-14, 15-24, and 60 years old and older (with the reference group being those 25-59 years old), binary indicators indicating whether the head worked in the past 7 days or was self-employed, and a binary variable indicating urban residence. Model 2 adds household asset variables and house (dwelling) characteristics to Model 1. Household assets include variables indicating whether the household has a motor vehicle, bicycle, mobile phone, video/DVD player, television set, computer, refrigerator/freezer, air conditioner/fan, radio, and mosquito net. Additional controls include the number of rooms in the house, construction materials used for the house’s roof, wall and floor, and access to drinking water from a pipe/truck, and a flush/VIP toilet. Models 3-8 additionally control for one observed expenditure category, respectively: total food consumption, total non-food consumption, furnishing and other expenses, health expenditures, education expenditures, and utility expenditures. Finally, Model 9 controls for utility expenditures, as in Model 8, but does not control for household assets and dwelling characteristics. The list of 17 the specific predictors that are used, as well as their estimated coefficients, are provided in Appendix A, Tables A.5 to A.7 for the TZNPSs 2019/20, 2020/21 as well as TA 1. As discussed above, for comparison and robustness testing, we show the results using two estimation methods relying on different assumptions about the error terms in Tables 2–6. Method 1 uses the normal linear regression model, and Method 2 uses the empirical distribution of the error terms. Across-year imputation Table 2 provides the estimates of poverty rates in the experimental TA 1 using the NPS 2020/21 as the base survey. This is the benchmark imputation run, where the most recent full-size household consumption survey is used as the base survey, and the experimental treatment arm using the standard full set of consumption items is used as the target survey, allowing the comparison of estimates to the observed “true” poverty rate (i.e., the poverty rate that is estimated based on the actual consumption survey data). The estimation results show that all Models 1–9 provide adequate poverty estimates that are not statistically significantly different from the true poverty rate of 21.0 percent. Estimates from Models 1–9 range from 20.4 to 23.1 percent, all within the 95 percent confidence interval (CI) of the true poverty rate [16.6, 25.4], and within one standard error [18.8, 23.2]. Estimation methods across all models thus perform similarly, without consistent ranking across models in terms of prediction errors or significance. Across-year imputation with lighter questionnaires Similar results emerge in Tables 3 and 4 in the comparisons of the observed TA 1 poverty rate (21.0 percent with a confidence interval 18.8–23.2) and the predicted poverty estimates from the experimental TA 2 and TA 3 samples (using the NPS 2020/21 as the base survey). Table 3 (for 18 TA 2-specific findings) indicates that Models 1, 2 and 8 yield estimates of 21.6–22.5 – all within one standard error of the true poverty rate. Model 9 yields estimates of 23.3–23.4, within the 95 percent CI of the true poverty rate. Notably, Model 3.1’s estimates (28.4 percent and 28.7 percent) using the food expenditures data collected with the lighter questionnaire under TA 2 are outside of the 95 percent CI of the true rate. Likewise, Table 4 (for TA 3-specific findings) indicates that Models 1 and 2 estimate poverty rates at 21.6–22.6 percent, within one standard error of the true poverty rate. The estimates for Models 8 and 9, specifically 23.4-23.8, are still within the 95 percent CI of the true poverty rate. The estimates for Models 3.1 and 4.1, specifically 38.5–48.9 percent, are significantly higher than and outside of the 95 percent CI for the true rate. However, when we standardize the distribution of the variables associated with Model 3 in TA 2 and Models 3 and 4 in TA 3 by those in NPS 2020/21, more promising results emerge. The means of the standardized variables in TA 2 and TA 3 are not statistically different from those in NPS 2020/21 (Appendix A, Table A.3). When we predict the poverty rate using standardized variables, the estimation results demonstrate statistically significant improvements for Model 3.2 in TA 2 (Table 3), as well as for Models 3.2 and 4.2 in TA 3 (Table 4), with all estimates falling within one standard error of the true poverty rate. The improvement after standardizing variables is consistent with earlier standardization evidence using household surveys and labor force surveys for Jordan (Dang et al., 2017). Figure 2 provides a visualization of the key results from Tables 2– 4. Within-year imputation with lighter questionnaires Tables 5 and 6 utilize the experimental TA 1 as the base round for imputing poverty rate in TA 2 and TA 3, respectively. Both tables yield the analogous results that Models 1–2 and 8–9 produce poverty estimates that are within one standard error of the expected value. By contrast, Models 3.1 19 and 4.1 produce worse estimates, specifically 29.7–30.1 percent in TA 2 and 40.9–45.8 percent in TA 3. The results are again robust to the choice of the use of the normal regression model versus the empirical distribution of the error terms. For the within-year imputation, we follow a similar approach as the across-year imputation mentioned earlier, by standardizing the distribution of the variables in TA 2 and TA 3 by those in TA 1. The variables’ distributions become statistically similar to those in TA 1, as shown in Appendix A, Table A.4. Similar to the previous findings with across-year imputation, the standardization of variables results in a significant improvement of imputation accuracy for the Model 3.2 in TA 2 (Table 5) and Models 3.2 and 4.2 (Table 6), producing estimates that fall even within one standard error of the true rate. In sum, the results in Tables 2–6 show that using total food consumption, total non-food consumption or non-food consumption components as predictors in Models 3-7 does not significantly improve on the accuracy of imputation in Models 1, 2 or 9, and may come at the cost of accuracy. The light questionnaire modules for deriving “imperfect” measures of food and non- food consumption do not end up providing the requisite predictors that permit the reliable estimation of Models 3 and 4. However, the standardization procedures for both TA 2 and TA 3 exhibits a substantial improvement in imputation accuracy. Comparing the results in Tables 5–6 to those in Tables 2–4 suggests that using TA 1 as the base survey offers slightly stronger statistical signifance than using the 2020/21 TZNPS as the base survey. But the improvements do not appear consisent or noticeably large for all the models, perhaps due to the closer time interval between the 2020/21 TZNPS and the 2022 experimental data. 20 4.2. Robustness checks Cost-of-living differences The consumption aggregates across the TZNPS survey rounds are expressed in 2022 prices using the World Bank Global Database of Inflation (Ha et al. 2021), as discussed earlier (Section 2.2). To assess the sensitivity of our results to the approach to standardizing prices – namely the degree of disaggregation of price adjustments by types of commodities, and by frequency of adjustment – alternative approaches were evaluated. Our preferred models relying on deflating prices using annual inflation at the level of commodity groups – food, utilities, and other non-food commodities – are compared to models relying on deflation by the traditionally used overall CPI deflators provided by the International Monetary Fund. 13 Deflation using the annual CPI disaggregated by groups is also compared to that using the quarterly CPI between the midpoints of survey rounds (Q1-2009, Q1-2011, Q2-2013, Q2-2015, Q2-2019, Q2-2021, and Q2-2022) and the monthly CPI (3/2009, 3/2011, 6/2013, 6/2015, 7/2019, 6/2021, 6/2022). 14 Figures 3–5 reveal that, across all models and pairs of base-target survey rounds (with a single exception for Model 4 in imputing into TA 1), annual commodity-disaggregated deflation performs not worse than and frequently better than non-disaggregated deflation. The advantage of commodity-disaggregated deflation is particularly notable in Models 8-9 when imputing into TA 1, Model 8 when imputing into TA 2, and Model 2 when imputing into TA 3. Regarding the frequency of deflation, there appears to be no advantage to going below the annual level. Deflation using quarterly or monthly CPI performs no better and sometimes worse than deflation using annual (commodity-disaggregated) CPI. While commodity-disaggregated annual deflation performs notably better than IMF CPI in certain models and treatment arms, there 13 To access the database, visit: https://data.imf.org/?sk=4ffb52b2-3653-409a-b471-d47b46d904b5. 14 Quarterly and monthly inflation rates for food and energy are not available prior to February 2010. For years 2009– 2010, annual inflation rates for those sub-aggregates are used. 21 is no significant difference in the performance of Models 8 and 9 when comparing deflation using quarterly or monthly CPI to deflation using the aggregated IMF CPI. Alternative base survey The main results in Section 4.1 relied on base survey data that preceded the target survey round by less than two years. Had we relied on older base survey data, would that have compromised the performance of the promising imputation models? Figure 2 highlights a limited test of varying the timespan between the base and target survey rounds, by nearly doubling it from 1–2 years (in the case of using the TZNPS 2020/21 as the base survey) to 2–3 years (using the TZNPS 2019/20 as the base surveys). The full poverty estimates are shown in Appendix A, Tables A.8 to A.10. The results indicate that both base surveys perform similarly in terms of producing estimates close to the true poverty rates (except Model 4) with earlier round performs even better for Models 8-9 in TA 2 and TA 3. 15 That is despite the fact that the sample size of the more distant round, NPS 2019/20, is only a quarter of the sample in NPS 2020/21. Varying poverty lines Finally, poverty as measured under the latest consumption data collected under TA 1 can vary by as much as more than two percentage points depending on whether we use the poverty line in the 2019/2020 TZNPS or in the 2020/2021 TZNPS. To examine sensitivity of our results to the poverty line, we re-estimate the main results shown in Tables 2 to 6 but using the poverty line in the 2019/2020 TZNPS instead of the 2020/2021 TZNPS. 16 The results, shown in Appendix A, 15 Figure 2 shows estimates that are obtained using the normal linear regression models. The estimates obtained using the empirical distribution of the error terms are similar (Appendix A, Figure A.1). 16 The poverty lines for 2020/21 and 2019/20 are based on data from the NPS and may vary due to changes in the household consumption questionnaire. While the food components of both years' poverty lines remain consistent, the 22 Tables A.11 to A.15, remain qualitatively similar with lower significance for Model 9 in between- year imputations. Further meta-analysis that controls for various modeling characteristics such as the imputation methods, the poverty lines, the treatment arms, and the source of deflator data confirms that employing the poverty line that is consistent with the base survey offers better results for between-year imputation (Appendix A, Figure A.2). 17 Alternative modeling options We consider two alternative modelling options, which are multiple imputation (MI) and machine learning (ML). It is useful to note that the poverty imputation literature is broadly related to a larger literature on missing data (or multiple imputation in statistics; see, e.g., Little and Rubin, 2019; Carpenter et al., 2023). Certain differences, however, exist between the two literatures. One difference is that MI studies tend to employ Bayesian techniques for their estimation, which can require more computational time for drawing from posterior distributions. Another difference is that economists tend to use economic theory rather than statistical theory for model selection. 18 We employ two common MI techniques, linear model and predictive mean matching (PMM). On the other hand, ML methods have been useful additions to the economists’ toolbox (Mullainathan and Spiess, 2017; Athey and Imbens, 2019). However, unlike our proposed imputation approach that builds imputation models based on economic theory (and perhaps closer non-food components differ. The 2019/20 poverty line includes information from earlier NPS rounds, while the 2020/21 poverty line includes data from later rounds, notably incorporating imputed rents, the value of durable goods, and clothing. To ensure consistency, we employ the appropriate consumption aggregate definition aligned with the respective poverty line. 17 This result is consistent with earlier observations for various contexts in Dang et al. (2019) and Dang and Lanjouw (2023). The key intuition is that the poverty line that is typically produced using prices (and to some extent, consumption patterns) specific to one time period would be more consistent with the consumption (poverty) levels in that period than those in another period. The full marginal effects from the logit regressions underlying Figure A.2 are shown in Appendix A, Table A.17. 18 Recent studies that apply MI techniques to economic issues include Jenkins et al. (2011), Douidich et al. (2016), Dang et al. (2017), Yoshida et al. (2022), and Dang et al. (forthcoming). 23 to MI), ML approaches follow a data mining philosophy and build the imputation model using variables that maximize predictive power from the training sample. This imputation model is then applied to the estimation sample. In this study, we employ several ML techniques, namely LASSO, Elastic Net, and Random Forest, and use the base survey as the training sample and the target survey as the estimation sample. We show the estimation results for imputation from NPS 2020/21 to the three treatment arms in Appendix A, Tables A.18 to A.20. 19 The MI estimates perform well for 19 out of 23 imputation models for the three treatment arms, where two to four estimates fall within one standard error of the true rates (with the PMM technique obtaining somewhat greater accuracy than the linear model). The ML estimates perform worst, yielding accurate estimates for only 2 to 3 imputation models when imputing to TAs 1 and 2 and yielding inaccurate estimates when imputing into TA 3. In contrast, our proposed imputation method performs well for 20 out of 23 imputation models for the three treatment arms, providing 17 estimates within one standard error of the true rates. 5. Further extentions Next, we examine how the imputation model performance is impacted by the size of the base and target survey samples. Given the comparability of the data on the poverty predictors across different treatment arms, we pool the TA 1, TA 2, and TA 3 samples to form an expanded target survey sample and investigate the implications of varying the target survey sample size for the predictive accuracy of competing imputation models. 20 19 The final selected variables and goodness-of-fit statistics for the machine learning estimates are shown in Appendix A, Tables A.21 to A.23. 20 We show additional estimation results when imputing into varying sample sizes of TA1 in Appendix A, Figure A.3. 24 Figure 6 provides a graphical overview of our results that are obtained when we vary both the target survey sample and the base survey sample. The former ranges from 10 to 100 percent of the sample, while the latter ranges from 10, 20, 30 to 100 percent of the sample (which helps make the graph less crowded). Since the combined TA sample contains 2,110 observations, we consider a range of 211 to 2,110 observations for the target survey sample size. Given the sample size of 4,644 households for the TZNPS 2020/21, we explore the predictive accuracy of imputation models using a base survey sample ranging from 464 to 4,644 observations. Figure 6 illustrates that when using only 10 percent of the base survey (464 observations) to construct the models, most of the estimates for different target survey samples (indicated by the dotted blue line) fall outside the 95 percent CI of the true poverty rate (represented by the light gray zone). These estimates also display more variation, especially for smaller samples of the target survey. Increasing the sample size of the base survey from 10 percent to 20 percent (930 observations) (the dotted light green line) results in a dramatic improvement in the performance of the models, with almost all the estimates falling inside the 95 percent CI of the true poverty rate. Further increasing the base survey to 30 percent (1,393 observations) (the dotted orange line) slightly improves the imputation accuracy. Yet, estimates using larger sample sizes of the base survey (not shown) result in marginal improvement, as seen with the 100 percent sample of the base survey (the dotted black line). 21 In general, the predictive performance of the models stabilizes at the minimum sample size of around 1,000 observations for the target survey (close to 50 percent of the sample) and a similar 21 For example, a further increase in the size of the base survey from 20 percent to 60 percent (2,786 observations) does not appear to significantly improve the accuracy of the poverty estimates in all models, although there is a marginal improvement in Model 9 by bringing estimates closer to one standard error of the true poverty rate (the dark gray zone). Moreover, there is only a marginal difference between the behavior of Models 1 and 2 and that of Models 8 and 9, as Models 1 and 2 display less fluctuation starting from 844 observations (40 percent of the target survey), while Models 8 and 9 become more stable starting from 1,055 observations (50 percent of the target survey). 25 minimum sample size of around 1,000 observations for the base survey (slightly over 20 percent of the sample), irrespective of the model being used. Given these samples, both the estimates and their 95 percent CI almost all fall inside the 95 percent CI around the true rate. The finding that the minimum sample size for the target survey is around 1,000 observations is consistent with previous evidence (Dang and Verme, 2023) and corroborates the findings in Section 4. 6. Conclusion We offer fresh randomization evidence on the effects of survey design on poverty imputation, on which very few studies currently exist. We implemented a randomized experiment in Tanzania with three treatment arms: (TA 1) a standard, full consumption questionnaire; (TA 2) a light questionnaire variant that includes a reduced list of food items; and (TA 3) an alternative light questionnaire variant that includes more aggregated food and non-food consumption modules, which we combine with rich household consumption survey data to provide imputation-based poverty estimates. We find that we can obtain reasonably accurate results for quite a few imputation models when imputing either from the TZNPSs into the full consumption questionnaire (TA 1) or other lighter questionnaires (TAs 2 and 3), or from TA 1 into the other lighter questionnaires (TAs 2 and 3). Interestingly, imputing into the light questionnaire modules (TAs 2 and 3) using the less-than-ideal consumption, with the variables in TA 2 and TA 3 being standardized by those in the base survey, helps significantly improve accuracy. Our estimation results are robust to the choice of base surveys used for imputation model estimation; different poverty lines; and alternative (quarterly or monthly) CPI deflators. Further, our approach to imputation performs better than competing methods, including multiple imputation and a range of machine learning techniques. There is some meta-analysis evidence pointing to imputation accuracy gains when we employ the poverty line 26 that is most consistent with the base survey, a more recent base survey, or more disaggregated CPI deflators than the annual data. We also find that the predictive performance of the imputation models stabilizes at the minimum sample size of around 1,000 observations for both the base survey and the target survey. If similar evidence is obtained in other contexts, this can suggest a promising cost-saving approach with two options for survey design. The first option is to impute from an existing base survey (if such base survey exists) into a new target, non-consumption survey. The second option is to field a survey consisting of a smaller sample (preferably with at least around 1,000 households) with a full consumption questionnaire and a separate sample (of an equal or larger size) with partial or no consumption data collection. We can subsequently build an imputation model using the smaller sample with full consumption data to impute into the larger sample with partial or no consumption data. The tradeoff between the two options is that the first option requires zero cost with collecting new consumption data. Consequently, the first option is less expensive but may result in somewhat reduced imputation accuracy as discussed earlier, while the opposite holds for the second option. Yet, both options are still relatively less expensive than the traditional approach of implementing a full household consumption survey to obtain updated poverty estimates. 27 References Abate, G. T., De Brauw, A., Hirvonen, K., & Wolle, A. (2023). Measuring consumption over the phone: Evidence from a survey experiment in urban Ethiopia. Journal of Development Economics, 161, 103026. Aiken, E. L., Bedoya, G., Blumenstock, J. E., & Coville, A. (2023). Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan. Journal of Development Economics, 161, 103016. Altındağ, O., O'Connell, S. D., Şaşmaz, A., Balcıoğlu, Z., Cadoni, P., Jerneck, M., & Foong, A. K. (2021). Targeting humanitarian aid using administrative data: Model design and validation. Journal of Development Economics, 148, 102564. Arthi, V., Beegle, K., De Weerdt, J., & Palacios-López, A. (2018). Not your average job: Measuring farm labor in Tanzania. Journal of Development Economics, 130, 160-172. Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11, 685-725. Beegle, K., De Weerdt, J., Friedman, J., & Gibson, J. (2012). Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of development Economics, 98(1), 3-18. Beegle, K., Christiaensen, L., Dabalen, A., & Gaddis, I. (2016). Poverty in a rising Africa. World Bank Publications. Carpenter, J. R., Bartlett, J. W., Morris, T. P., Wood, A. M., Quartagno, M., & Kenward, M. G. (2023). Multiple imputation and its application. John Wiley & Sons. Christiaensen, L., Ligon, E., & Sohnesen, T. P. (2022). Consumption subaggregates should not be used to measure poverty. The World Bank Economic Review, 36(2), 413-432. De Weerdt, J., Gibson, J., & Beegle, K. (2020). What can we learn from experimenting with survey methods?. Annual Review of Resource Economics, 12, 431-447. Dang, H. A. H., & Lanjouw, P. F. (2023). Regression-based imputation for poverty measurement in data-scarce settings. In Silber, J. (Ed.), Research handbook on Measuring Poverty and Deprivation (pp. 141-150). Edward Elgar Publishing. Dang, H. A. H., & Verme, P. (2023). Estimating poverty for refugees in data-scarce contexts: an application of cross-survey imputation. Journal of Population Economics, 36(2), 653-679. Dang, H. A. H., Lanjouw, P. F., & Serajuddin, U. (2017). Updating poverty estimates in the absence of regular and comparable consumption data: methods and illustration with reference to a middle-income country. Oxford Economic Papers, 69(4), 939-962. Dang, H. A., Kilic, T., Abanokova, K., & Carletto, C. (forthcoming). Poverty imputation in contexts without consumption data: a revisit with further refinements. Review of Income and Wealth. 28 Deaton, A., & Muellbauer, J. (1980). Economics and consumer behavior. Cambridge University Press. Douidich, M., Ezzrari, A., Van der Weide, R., & Verme, P. (2016). Estimating quarterly poverty rates using labor force surveys: a primer. The World Bank Economic Review, 30(3), 475-500. Eckman, S., Kreuter, F., Kirchner, A., Jäckle, A., Tourangeau, R., & Presser, S. (2014). Assessing the mechanisms of misreporting to filter questions in surveys. Public Opinion Quarterly, 78(3), 721-733. Elbers, C., Lanjouw, J. O., & Lanjouw, P. (2003). Micro-level estimation of poverty and inequality. Econometrica, 71(1), 355-364. Gourlay, S., Kilic, T., & Lobell, D. B. (2019). A new spin on an old debate: Errors in farmer- reported production and their implications for inverse scale-Productivity relationship in Uganda. Journal of Development Economics, 141, 102376. Ha, Jongrim, M. Ayhan Kose, and Franziska Ohnsorge. "One-stop source: A global database of inflation." Journal of International Money and Finance (2023): 102896. Jenkins, S. P., Burkhauser, R. V., Feng, S., & Larrimore, J. (2011). Measuring inequality using censored data: a multiple-imputation approach to estimation and inference. Journal of the Royal Statistical Society Series A: Statistics in Society, 174(1), 63-81. Kilic, T., & Sohnesen, T. P. (2019). Same question but different answer: experimental evidence on questionnaire design's impact on poverty measured by proxies. Review of Income and Wealth, 65(1), 144-165. Kilic, T., Moylan, H., Ilukor, J., Mtengula, C., & Pangapanga-Phiri, I. (2021). Root for the tubers: Extended-harvest crop production and productivity measurement in surveys. Food Policy, 102, 102033. Kreuter, F., McCulloch, S., Presser, S., & Tourangeau, R. (2011). The effects of asking filter questions in interleafed versus grouped format. Sociological Methods & Research, 40(1), 88- 104. Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data. 3rd Edition. John Wiley & Sons. Mathiassen, A. (2009). A model based approach for predicting annual poverty rates without expenditure data. The Journal of Economic Inequality, 7, 117-135. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. Ravallion, Martin. (2016). The economics of poverty: History, measurement, and policy, New York: Oxford University Press. 29 Smythe, I. S., & Blumenstock, J. E. (2022). Geographic microtargeting of social assistance with high-resolution poverty maps. Proceedings of the National Academy of Sciences, 119(32), e2120025119. Stifel, D., & Christiaensen, L. (2007). Tracking poverty over time in the absence of comparable consumption data. The World Bank Economic Review, 21(2), 317-341. Tanzania’s National Bureau of Statistics. (2011). Basic Information Document—National Panel Survey 2010-11. Tarozzi, A. (2007). Calculating comparable statistics from incomparable surveys, with an application to poverty in India. Journal of business & economic statistics, 25(3), 314-336. United Nations. (2018). Classification of individual consumption according to purpose (COICOP). Statistical Papers. Department of Economic and Social Affairs. Statistics Division, 265. UNESCO-UIS/OECD/EUROSTAT (UOE). (2020). Data collection on formal education— Manual on concepts, definitions and classifications. Montreal/ Paris/ Luxembourg. United States Census Bureau. (2017). Current Population Survey, Imputation of Unreported Data Items. Accessed on the Internet on May 24, 2021 at https://www.census.gov/programs- surveys/cps/technical-documentation/methodology/imputation-of-unreported-data-items.html World Bank. (2021). World development report 2021: Data for better lives. World Bank. Yoshida, N., Takamatsu, S., Yoshimura, K., Aron, D. V., Chen, X., Malgioglio, S., ... & Zhang, K. (2022). The Concept and Empirical Evidence of SWIFT Methodology. World Bank. 30 Table 1. Descriptive statistics Difference Variables NPS 2019/20 NPS 2020/21 TA 1 TA 2 TA 3 TA 2-TA 1 TA 3-TA 1 TA 3-TA 2 1,293,438.69 1,550,865.76 1,367,296.94 Total household expenditures (78,620.14) (64,838.25) (61,797.40) Total food and non-alcoholic 827,883.46 863,962.65 838,428.94 652,206.43 578,765.01 -186222.5*** -259663.9*** -73,441.4*** expenditures (40,583.02) (14,057.96) (29,529.69) (20,823.03) (21,470.64) (26373.9) (25870.8) (20,021.8) 443,477.52 686,575.42 527,176.40 284,569.30 -242607.1*** Total non-food expenditures (38,953.77) (56,821.82) (37,769.48) (21,614.27) (31467.2) 47,791.26 105,111.39 101,997.02 50,953.59 -51043.4*** Health expenditures (4,645.77) (15,402.42) (11,856.50) (5,144.21) (11955.5) 65,397.27 81,120.01 52,454.27 39,537.99 -12916.3* Education expenditures (9,591.65) (6,315.51) (6,105.98) (6,552.07) (7600.1) Utilities: Water, Kerosene, El., 55,163.38 56,605.99 47,764.41 50,880.13 44,317.04 3,115.7 -3,447.4 -6,563.1* Matches, Bulbs, Charcoal (5,700.76) (2,158.06) (5,067.41) (6,014.63) (4,426.19) (4,274.0) (3,021.5) (3,774.2) Furnishing and household 21,781.89 33,105.06 18,974.28 15,264.53 -3709.8*** expenses (2,693.92) (3,556.79) (1,134.98) (1,118.15) (1348.0) 7.17 6.38 6.25 6.77 6.71 0.5 0.5 -0.1 Household size (0.52) (0.10) (0.17) (0.49) (0.30) (0.5) (0.3) (0.5) 47.59 45.42 45.47 45.09 45.60 -0.4 0.1 0.5 Age of HH Head (0.86) (0.26) (0.58) (0.62) (0.73) (0.8) (0.8) (0.9) 0.19 0.23 0.23 0.21 0.23 -0.0 -0.0 0.0 HH Head is Female (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Head does not have formal 0.16 0.21 0.19 0.19 0.20 -0.0 0.0 0.0 education (0.02) (0.01) (0.02) (0.02) (0.03) (0.0) (0.0) (0.0) 0.68 0.63 0.64 0.65 0.63 0.0 -0.0 -0.0 Head has primary education (0.03) (0.01) (0.02) (0.02) (0.03) (0.0) (0.0) (0.0) Head has secondary ordinary 0.11 0.12 0.13 0.12 0.13 -0.0 0.0 0.0 education (0.01) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Head has secondary advanced 0.03 0.04 0.04 0.03 0.04 -0.0 -0.0 0.0 education and higher (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.43 0.44 0.44 0.43 0.43 -0.0 -0.0 0.0 Share of HH members in 0-14 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.18 0.19 0.19 0.20 0.20 0.0 0.0 -0.0 Share of HH members in 15-24 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.33 0.33 0.33 0.32 0.33 -0.0 -0.0 0.0 Share of HH members in 25-59 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Share of HH members in 60 and 0.06 0.04 0.04 0.04 0.04 0.0 -0.0 -0.0 older (0.00) (0.00) (0.00) (0.00) (0.00) (0.0) (0.0) (0.0) HH Head did any wage work 0.26 0.27 0.21 0.27 0.26 0.1** 0.0 -0.0 during the last 7 days (0.03) (0.01) (0.02) (0.03) (0.02) (0.0) (0.0) (0.0) HH Head was self-employed 0.20 0.22 0.21 0.23 0.20 0.0 -0.0 -0.0 (non-farm) during the last 7 days (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Household owns a motor vehicle 0.17 0.13 0.16 0.17 0.16 0.0 -0.0 -0.0 31 (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.33 0.37 0.31 0.37 0.34 0.1** 0.0 -0.0 Household owns a bicycle (0.03) (0.01) (0.02) (0.03) (0.03) (0.0) (0.0) (0.0) 0.89 0.88 0.87 0.90 0.89 0.0 0.0 -0.0 Household owns a mobile phone (0.01) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.18 0.14 0.12 0.13 0.12 0.0 -0.0 -0.0 Household owns a video/DVD (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.25 0.28 0.28 0.28 0.27 0.0 -0.0 -0.0 Household owns a television (0.02) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) 0.02 0.03 0.03 0.03 0.04 -0.0 0.0 0.0 Household owns a computer (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Household owns a 0.08 0.09 0.08 0.10 0.09 0.0 0.0 -0.0 refrigerator/freezer (0.01) (0.01) (0.02) (0.02) (0.01) (0.0) (0.0) (0.0) 0.07 0.07 0.07 0.07 0.07 -0.0 0.0 0.0 Household owns an AC/fan (0.01) (0.01) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.49 0.44 0.42 0.46 0.39 0.0 -0.0 -0.1** Household owns a radio (0.03) (0.01) (0.02) (0.03) (0.03) (0.0) (0.0) (0.0) 0.85 0.85 0.83 0.86 0.86 0.0 0.0 -0.0 Household owns a mosquito net (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.44 0.45 0.46 0.48 0.48 0.0* 0.0 -0.0 Log of per capita residential area (0.01) (0.01) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Roof is made of concrete/metal 0.85 0.84 0.83 0.85 0.85 0.0 0.0 -0.0 sheets/tiles (0.02) (0.01) (0.02) (0.03) (0.02) (0.0) (0.0) (0.0) Wall is made of burnt 0.62 0.55 0.60 0.56 0.56 -0.0 -0.0 -0.0 bricks/concrete (0.03) (0.02) (0.03) (0.04) (0.03) (0.0) (0.0) (0.0) Floor is made of 0.51 0.52 0.48 0.47 0.50 -0.0 0.0 0.0 concrete/cement/tiles/timber (0.03) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) Source of drinking water: piped 0.41 0.44 0.40 0.38 0.34 -0.0 -0.1** -0.0 water/truck (0.04) (0.02) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) 0.33 0.35 0.36 0.35 0.34 -0.0 -0.0 -0.0 Toilet: flush/VIP (0.03) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) 0.25 0.28 0.25 0.24 0.27 -0.0 0.0* 0.0** Residence: Urban (0.02) (0.01) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) 0.75 0.72 0.75 0.76 0.73 0.0 -0.0* -0.0** Residence: Rural (0.02) (0.01) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) Number of observations 1179 4644 711 701 698 1412 1409 1399 Note: The standard errors are in parentheses, and the differences are estimated considering the complex survey design. * p < 0.1, ** p < 0.05, *** p < 0.01. The population weights are applied. The consumption data are expressed in monthly basis and are spatially and temporarily deflated within each round. The consumption data in NPS 2019/20 and in NPS 2020/21 are deflated to 2022 prices using the annual WB deflator. 32 Table 2. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 1 using 2020/21 Poverty Line, Tanzania (percentage) TA 1 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 20.7a 22.1a 21.0a 22.5a 20.9a 20.5a 22.0a 23.1a 23.0a 1) Normal linear regression model (2.1) (2.2) (2.2) (2.2) (2.2) (2.2) (2.2) (2.3) (2.2) 2) Empirical distribution of the error 20.4a 22.0a 21.1a 22.5a 20.7a 20.4a 21.9a 23.1a 22.9a terms (2.1) (2.2) (2.2) (2.2) (2.2) (2.2) (2.2) (2.3) (2.2) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.40 0.54 0.85 0.79 0.56 0.55 0.54 0.55 0.44 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 711 711 711 711 711 711 711 711 711 True poverty rate in TA 1 21.0 (2.2) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The imputed poverty rates for TA 1 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2020/21 deflated to 2022 prices using the annual WB deflator. The estimates in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 33 Table 3. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 2 using 2020/21 Poverty Line, Tanzania (percentage) TA 2 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 8 Model 9 21.9a 21.9a 28.4 20.6a 22.5a 23.4 1) Normal linear regression model (2.7) (2.8) (2.7) (2.3) (2.9) (2.9) 21.7a 21.9a 28.7 20.2a 22.4a 23.3 2) Empirical distribution of the error terms (2.7) (2.8) (2.8) (2.3) (2.9) (2.9) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.40 0.54 0.85 0.86 0.55 0.44 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 21.0 (2.2) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Model 3.1 employs non-standardized variables. Model 3.2 employs variables in TA 2 standardized by those in NPS 2020/21. Food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. Imputed poverty rates for TA 2 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2020/21 deflated to 2022 prices using the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 34 Table 4. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 3 using 2020/21 Poverty Line, Tanzania (percentage) TA 3 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 4.1 Model 4.2 Model 8 Model 9 21.7a 22.6a 38.5 20.5a 48.6 22.0a 23.4 23.7 1) Normal linear regression model (2.5) (2.6) (2.6) (2.1) (2.9) (2.5) (2.6) (2.6) 2) Empirical distribution of the error 21.6a 22.6a 39.7 19.8a 48.9 22.0a 23.5 23.8 terms (2.5) (2.6) (2.6) (2.1) (2.9) (2.5) (2.6) (2.6) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.40 0.54 0.85 0.86 0.79 0.71 0.55 0.44 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 698 698 698 698 698 698 698 698 True poverty rate in TA 1 21.0 (2.2) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Models 3.1 and 4.1 employ non-standardized variables. Models 3.2 and 4.2 employ variables in TA 3 standardized by those in NPS 2020/21. Food and non-food expenditures, household size, and age in both surveys are transformed to normality using the Box-Cox method before standardization. Imputed poverty rates for TA 3 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2020/21 deflated to 2022 prices using the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 35 Table 5. Predicted Poverty Rates Based on Imputation, from TA 1 to TA 2 using 2020/21 Poverty Line, Tanzania (percentage) TA 2 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 8 Model 9 21.0a 20.6a 29.7 20.0a 20.0a 20.4a 1) Normal linear regression model (2.6) (2.7) (2.7) (2.1) (2.7) (2.8) 21.1a 20.4a 30.1 20.0a 19.8a 20.6a 2) Empirical distribution of the error terms (2.6) (2.7) (2.7) (2.1) (2.7) (2.8) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.34 0.51 0.89 0.89 0.55 0.41 N1 (base survey) 711 711 711 711 711 711 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 21.0 (2.2) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Model 3.1 employs non-standardized variables. Model 3.2 employs variables in TA 2 standardized by those in TA 1. Food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. Imputed poverty rates for TA 2 use the estimated parameters based on the TA 1. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2020/21 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.7. 36 Table 6. Predicted Poverty Rates Based on Imputation, from TA 1 to TA 3 using 2020/21 Poverty Line, Tanzania (percentage) TA 3 Method Model Model Model 1 Model 2 Model 3.1 Model 4.1 Model 8 Model 9 3.2 4.2 20.6a 20.9a 40.9 19.5a 45.7 20.8a 20.8a 20.6a 1) Normal linear regression model (2.4) (2.4) (2.5) (2.1) (2.9) (2.4) (2.4) (2.5) 2) Empirical distribution of the error 20.6a 20.7a 41.6 19.0a 45.8 20.5a 20.4a 20.7a terms (2.4) (2.4) (2.5) (2.1) (2.9) (2.4) (2.4) (2.4) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.34 0.51 0.89 0.89 0.76 0.73 0.55 0.41 N1 (base survey) 711 711 711 711 711 711 711 711 N2 (target survey) 698 698 698 698 698 695 698 698 True poverty rate in TA 1 21.0 (2.2) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Models 3.1 and 4.1 employ non-standardized variables. Models 3.2 and 4.2 employ variables in TA 3 standardized by those in TA 1. Food and non-food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. Imputed poverty rates for TA 3 use the estimated parameters based on the TA 1. 1000 simulations are implemented. True poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2020/21 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.7. 37 Figure 1: Flow Chart for Imputation Schemes 38 Figure 2. Predicted Poverty Rates Based on Imputation, from NPS 2019/20 and NPS 2020/21 to TAs, Normal Linear Regression Model, Tanzania TA 1 TA 2 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 10 10 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m8 m9 Imputation model Imputation model TA 3 30 Estimated poverty rate (%) NPS 2019/20 25 NPS 2020/21 10 15 20 m1 m2 m3 m4 m8 m9 Imputation model Note: 1000 simulations are implemented. Model 3 in TA 2 and Models 3 and 4 in TA 3 use standardized variables. The dashed lines represent the true poverty rates for TA 1 using the 2019/20 and 2020/21 poverty lines. The dotted lines represent confidence intervals of the true poverty rates. 39 Figure 3. Predicted Poverty Rates in TA 1 Based on Imputation Using Different Price Adjustments, from NPS 2020/21 to TA 1, Tanzania Panel 1: IMF Panel 2: WB annually 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Panel 3: WB quaterly Panel 4: WB monthly 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: 1000 simulations are implemented. The dashed lines represent the true poverty rates in TA 1 using the 2020/21 poverty lines. The light grey areas represent confidence intervals of the true poverty rates in TA 1 using the 2020/21 poverty lines. The dark grey areas represent standard errors of the true poverty rates in TA 1 using the 2020/21 poverty lines. 40 Figure 4. Predicted Poverty Rates in TA 2 Based on Imputation Using Different Price Adjustments, from NPS 2020/21 to TA 2, Tanzania Panel 1: IMF Panel 2: WB annually 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m8 m9 m1 m2 m3 m8 m9 Imputation model Imputation model Panel 3: WB quaterly Panel 4: WB monthly 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m8 m9 m1 m2 m3 m8 m9 Imputation model Imputation model Note: 1000 simulations are implemented. Model 3 uses standardized variables. The dashed lines represent the true poverty rates in TA 1 using the 2020/21 poverty lines. The light grey areas represent confidence intervals of the true poverty rates in TA 1 using the 2020/21 poverty lines. The dark grey areas represent standard errors of the true poverty rates in TA 1 using the 2020/21 poverty lines. 41 Figure 5. Predicted Poverty Rates in TA 3 Based on Imputation Using Different Price Adjustments, from NPS 2020/21 to TA 3, Tanzania Panel 1: IMF Panel 2: WB annually 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m4 m8 m9 m1 m2 m3 m4 m8 m9 Imputation model Imputation model Panel 3: WB quaterly Panel 4: WB monthly 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 m1 m2 m3 m4 m8 m9 m1 m2 m3 m4 m8 m9 Imputation model Imputation model Note: 1000 simulations are implemented. Models 3 and 4 use standardized variables. The dashed lines represent the true poverty rates in TA 1 using the 2020/21 poverty lines. The light grey areas represent confidence intervals of the true poverty rates in TA 1 using the 2020/21 poverty lines. The dark grey areas represent standard errors of the true poverty rates in TA 1 using the 2020/21 poverty lines. 42 Figure 6. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to Combined Target Survey Sample (TA 1+TA 2+TA 3) for Different Sample Sizes, using 2020/21 Poverty Line, Normal Linear Regression Model, Tanzania Model 1 Model 2 35 35 30 30 25 25 20 20 15 15 10 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 (211) (422) (633) (844) (1055) (1266) (1477) (1688) (1899) (2110) (211) (422) (633) (844) (1055) (1266) (1477) (1688) (1899) (2110) Model 8 Model 9 35 35 30 30 25 25 20 20 15 15 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 (211) (422) (633) (844) (1055) (1266) (1477) (1688) (1899) (2110) (211) (422) (633) (844) (1055) (1266) (1477) (1688) (1899) (2110) % of target survey % of target survey 10% (464) 20% (930) 30% (1393) 100% (4644) Note: The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. Imputed poverty rates for TA (TA1+TA2+TA3) use the estimated parameters based on the NPS 2020/21. 1000 simulations are implemented. All estimates are obtained with the population weights. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line from NPS 2020/21 deflated to 2022 prices. The sample size of TA (TA1+TA2+TA3) is selected as a percentage of the target survey sample varying from (randomly selected) 10% to 100% of the TA sample (with the sample size in parentheses). The sample size of NPS 2020/21 is selected as a percentage of the base survey sample varying from (randomly selected) 10%, 20%, 30% to 100% of NPS 2020/21 sample (with the sample size in parentheses). The total sample size of TA (TA1+TA2+TA3) is 2,110 households, and the sample size of NPS 2020/21 is 4,644 households. 43 Appendix A: Additional Tables and Figures Table A.1. Overview of the imputation models Variables Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Food expenditures Non-food expenditures Furnishings and household expenses Health expenditures Education expenditures Utilities: water, kerosene, lighting Household assets & house characteristics Household owns moto vehicle Household owns bicycle Household owns mobile phone Household owns DVD Household owns TV Household owns computer Household owns refrigerator Household owns AC Household owns radio Household owns mosquito nets Household dwelling wall materials Household dwelling roof materials Household dwelling floor materials Household dwelling water source Household dwelling toilet facility Demographics & employment Head`s age Head is female Head`s education Household size Shares of household members in age groups Head`s employment Urban/rural 44 Table A.2. Food expenditures in TA 2 and in TA 3 against TA 1 by food consumption groups Difference Food groups/items TA 1 TA 2 TA 3 TA 2 - TA 1 TA 3 - TA 1 219,978.61 182,831.31 120,300.51 -37147.30*** -99678.10*** Cereals and Cereal products (6,388.87) (5,396.40) (5,345.95) (8484.85) (8156.26) 168,892.52 182,831.31 13938.79* Rice (husked), maize (grain, flour), millet (5,689.94) (5,396.40) (8373.57) 72,038.14 67,029.74 79,346.75 -5008.41 7308.61 Starches (4,075.93) (4,306.58) (4,687.48) (5536.67) (6376.82) 62,093.06 67,029.74 4936.68 Cassava, sweet potato, cooking bananas (3,854.22) (4,306.58) (5675.08) 24,527.27 18,963.47 17,981.84 -5563.80*** -6545.43*** Sugar, jam, honey, chocolate, and confectionery (1,330.40) (1,151.30) (1,242.37) (1811.30) (1794.15) 21,188.81 18,963.47 -2225.34 sugar (1,045.65) (1,151.30) (1537.61) 34,089.53 30,919.96 24,466.10 -3169.57 -9623.42*** Pulses, Dry (2,106.26) (1,936.26) (1,417.30) (2937.77) (2665.53) 23,431.80 8,512.58 -14919.23*** Nuts and Seeds (2,379.13) (1,424.31) (3068.21) 50,606.82 50,317.69 30,870.77 -289.13 -19736.05*** Vegetables (1,755.06) (2,286.78) (1,282.99) (2632.84) (2330.62) 31,160.59 30,932.88 -227.71 onion tomato carrot pepper (1,412.97) (1,576.36) (1830.00) 19,203.13 19,384.81 181.68 spinach cabbage greens (755.86) (1,034.52) (1305.25) 30,576.32 14,517.59 -16058.73*** Fruits (2,472.49) (1,357.38) (2696.03) 114,913.26 69,167.11 52,788.03 -45746.15*** -62125.23*** Meat, meat products, fish (7,695.80) (4,517.71) (4,037.24) (8735.25) (9007.89) 28,856.58 30,520.84 1664.26 beef including minced sausage (3,048.25) (2,680.97) (3537.19) 39,689.34 38,646.27 -1043.07 fresh & dried fish (2,844.28) (3,113.56) (4838.64) 20,939.79 20,955.29 18,977.35 15.50 -1962.45 Milk and milk products (2,625.66) (3,338.87) (1,975.50) (4076.32) (3100.95) 17,268.50 20,955.29 3686.79 fresh milk (2,359.13) (3,338.87) (3947.52) 39,806.20 39,309.48 41,298.84 -496.71 1492.64 Oil and fats (2,293.77) (2,429.29) (2,731.88) (3453.70) (3488.40) 38,408.41 39,309.48 901.08 cooking oil (2,244.02) (2,429.29) (3428.76) 5,476.61 11,866.76 6390.15*** Spices and other foods (501.86) (1,039.96) (1169.95) 45 16,266.92 6,356.04 -9910.88*** Non-alcoholic beverages (2,817.81) (753.18) (2841.49) 3,048.25 2,971.90 -76.35 Alcoholic beverages (683.66) (758.26) (1013.01) Number of observations 711 701 698 Note: The standard errors are in parentheses, and the differences are estimated considering the complex survey design. * p < 0.1, ** p < 0.05, *** p < 0.01. The population weights are applied. The consumption data are expressed in per adult equivalent monthly basis and are spatially and temporarily deflated within each treatment arm. 46 Table A.3. Comparison of Variables after Standardization between NPS 2020/21 and TA 2/ TA 3 NPS TA 2 TA 3 2020/21 TA 2 diff TA 3 diff 36.13 36.16 0.03 35.88 -0.24 Food expenditures (0.10) (0.18) (0.22) (0.17) (0.21) 9.99 10.00 0.01 Non-food expenditures (0.02) (0.05) (0.05) 2.24 2.28 0.04 2.28 0.04 Household size (0.02) (0.08) (0.09) (0.06) (0.07) 5.02 5.02 0.00 5.05 0.03 Age of HH Head (0.01) (0.02) (0.03) (0.03) (0.03) 0.23 0.22 -0.00 0.21 -0.02 HH Head is Female (0.01) (0.02) (0.02) (0.02) (0.02) 0.21 0.21 0.01 0.21 0.01 Head does not have formal education (0.01) (0.03) (0.03) (0.03) (0.03) 0.63 0.62 -0.00 0.62 -0.01 Head has primary education (0.01) (0.02) (0.03) (0.03) (0.03) 0.12 0.12 0.00 0.12 0.00 Head has secondary ordinary education (0.01) (0.02) (0.02) (0.02) (0.02) Head has secondary advanced education 0.04 0.03 -0.00 0.04 0.00 and higher (0.00) (0.01) (0.01) (0.01) (0.01) 0.44 0.44 0.00 0.43 -0.00 Share of HH members in 0-14 (0.00) (0.01) (0.01) (0.01) (0.01) 0.19 0.19 -0.00 0.21 0.01 Share of HH members in 15-24 (0.00) (0.01) (0.01) (0.01) (0.01) 0.33 0.34 0.01 0.31 -0.01 Share of HH members in 25-59 (0.00) (0.01) (0.01) (0.01) (0.01) 0.04 0.04 -0.01 0.05 0.00 Share of HH members in 60 and older (0.00) (0.00) (0.00) (0.00) (0.00) HH Head did any wage work during the 0.27 0.27 0.01 0.26 -0.01 last 7 days (0.01) (0.03) (0.03) (0.02) (0.02) HH Head was self-employed (non-farm) 0.22 0.23 0.00 0.23 0.00 during the last 7 days (0.01) (0.02) (0.02) (0.02) (0.02) 0.13 0.13 0.00 0.14 0.01 Household owns a motor vehicles (0.01) (0.02) (0.02) (0.02) (0.02) 0.37 0.37 0.01 0.38 0.01 Household owns a bicycle (0.01) (0.03) (0.03) (0.03) (0.03) 0.88 0.89 0.01 0.89 0.00 Household owns a mobile phone (0.01) (0.02) (0.02) (0.02) (0.02) 0.14 0.15 0.00 0.14 -0.00 Household owns a video/dvd (0.01) (0.02) (0.02) (0.02) (0.02) 0.28 0.28 0.01 0.29 0.01 Household owns a television (0.01) (0.03) (0.03) (0.03) (0.03) 0.03 0.02 -0.01 0.03 0.00 Household owns a computer (0.00) (0.01) (0.01) (0.01) (0.01) 0.09 0.10 0.01 0.09 -0.00 Household owns a refrigerator/freezer (0.01) (0.02) (0.02) (0.01) (0.02) 0.07 0.07 0.00 0.06 -0.01 Household owns an air c/fans (0.01) (0.01) (0.02) (0.01) (0.02) 0.44 0.44 0.00 0.45 0.01 Household owns a radio (0.01) (0.03) (0.03) (0.03) (0.03) 0.85 0.86 0.01 0.86 0.01 Household owns a mosquito net (0.01) (0.02) (0.02) (0.02) (0.02) 0.45 0.46 0.01 0.46 0.01 Log of perca residential area (0.01) (0.01) (0.01) (0.01) (0.01) Roof is made of concrete/metal 0.84 0.84 -0.00 0.83 -0.01 sheets/tiles (0.01) (0.03) (0.03) (0.03) (0.03) 0.55 0.55 -0.00 0.56 0.01 Wall is made of burnt bricks/concrete (0.02) (0.04) (0.04) (0.03) (0.04) Floor is made of 0.52 0.53 0.01 0.52 0.01 concrete/cement/tiles/timber (0.01) (0.03) (0.04) (0.03) (0.04) Source of drinking water: piped 0.44 0.45 0.02 0.42 -0.02 water/truck (0.02) (0.04) (0.04) (0.04) (0.04) 0.35 0.37 0.02 0.35 -0.00 Toilet: flush/VIP (0.01) (0.03) (0.04) (0.03) (0.03) 0.72 0.73 0.00 0.70 -0.02 Residence: Rural (0.01) (0.04) (0.04) (0.04) (0.04) Note: The standard errors are in parentheses. Differences are estimated with t-tests that take into account complex survey design. * p<0.10, ** p<0.05 *** p<0.01. 47 Table A.4. Comparison of Variables after Standardization between TA 1 and TA 2/ TA 3 TA 2 TA 3 TA 1 TA 2 diff TA 3 diff 13.96 13.96 -0.01 13.91 -0.05 Food expenditures (0.04) (0.03) (0.05) (0.03) (0.05) 10.47 10.45 -0.02 Non-food expenditures (0.04) (0.04) (0.05) 2.58 2.66 0.07 2.66 0.07 Household size (0.05) (0.10) (0.10) (0.07) (0.09) 6.14 6.14 -0.00 6.19 0.04 Age of HH Head (0.03) (0.04) (0.05) (0.04) (0.05) 0.23 0.23 0.00 0.22 -0.01 HH Head is Female (0.02) (0.02) (0.03) (0.02) (0.03) 0.19 0.19 0.00 0.20 0.00 Head does not have formal educaion (0.02) (0.02) (0.03) (0.03) (0.04) 0.64 0.64 0.01 0.63 -0.00 Head has primary education (0.02) (0.02) (0.03) (0.03) (0.04) 0.13 0.13 0.00 0.13 0.00 Head has secondary ordinary education (0.02) (0.02) (0.02) (0.02) (0.02) 0.04 0.04 -0.01 0.04 0.00 Head has secondary advanced education and higher (0.01) (0.01) (0.01) (0.01) (0.01) 0.44 0.44 0.00 0.44 0.00 Share of HH members in 0-14 (0.01) (0.01) (0.01) (0.01) (0.01) 0.19 0.19 -0.01 0.20 0.01 Share of HH members in 15-24 (0.01) (0.01) (0.01) (0.01) (0.01) 0.33 0.34 0.01 0.31 -0.02* Share of HH members in 25-59 (0.01) (0.01) (0.01) (0.01) (0.01) 0.04 0.04 -0.00 0.05 0.01* Share of HH members in 60 and older (0.00) (0.00) (0.00) (0.00) (0.00) 0.21 0.22 0.01 0.20 -0.01 HH Head did any wage work during the last 7 days (0.02) (0.02) (0.03) (0.02) (0.03) HH Head was self-employed (non-farm) during the 0.21 0.21 0.00 0.21 0.00 last 7 days (0.02) (0.02) (0.03) (0.02) (0.03) 0.16 0.16 -0.00 0.17 0.00 Household owns a motor vehicles (0.02) (0.02) (0.03) (0.02) (0.03) 0.31 0.33 0.03 0.34 0.03 Household owns a bicycle (0.02) (0.03) (0.04) (0.03) (0.04) 0.87 0.88 0.01 0.88 0.01 Household owns a mobile phone (0.02) (0.02) (0.02) (0.02) (0.02) 0.12 0.13 0.00 0.12 -0.00 Household owns a video/dvd (0.02) (0.02) (0.02) (0.02) (0.02) 0.28 0.28 0.00 0.28 0.01 Household owns a television (0.03) (0.03) (0.03) (0.03) (0.04) 0.03 0.02 -0.01 0.03 0.00 Household owns a computer (0.01) (0.01) (0.01) (0.01) (0.01) 0.08 0.09 0.01 0.08 -0.00 Household owns a refrigerator/freezer (0.02) (0.02) (0.02) (0.01) (0.02) 0.07 0.08 0.01 0.07 -0.00 Household owns an air c/fans (0.01) (0.01) (0.02) (0.01) (0.02) 0.42 0.45 0.03 0.45 0.03 Household owns a radio (0.02) (0.03) (0.04) (0.03) (0.04) 0.83 0.85 0.01 0.84 0.01 Household owns a mosquito net (0.02) (0.02) (0.02) (0.02) (0.02) 0.46 0.47 0.01 0.46 0.00 Log of perca residential area (0.01) (0.01) (0.01) (0.01) (0.01) 0.83 0.83 -0.00 0.82 -0.00 Roof is made of concrete/metal sheets/tiles (0.02) (0.03) (0.03) (0.03) (0.03) 0.60 0.56 -0.04 0.56 -0.04 Wall is made of burnt bricks/concrete (0.03) (0.04) (0.04) (0.03) (0.04) 0.48 0.49 0.02 0.49 0.01 Floor is made of concrete/cement/tiles/timber (0.03) (0.03) (0.04) (0.03) (0.04) 0.40 0.41 0.01 0.38 -0.03 Source of drinking water: piped water/truck (0.04) (0.04) (0.04) (0.04) (0.05) 0.36 0.38 0.02 0.35 -0.01 Toilet: flush/VIP (0.03) (0.03) (0.04) (0.03) (0.04) 0.75 0.76 0.01 0.74 -0.02 Area: Rural (0.04) (0.04) (0.04) (0.04) (0.05) Note: The standard errors are in parentheses. The differences are estimated with t-tests that take into account complex survey design. * p<0.10, ** p<0.05 *** p<0.01. 48 Table A.5. Household consumption model, Tanzania, 2019/20 Variables Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 -0.032*** -0.030*** -0.006** -0.017*** -0.031*** -0.032*** -0.030*** -0.028*** -0.026*** Household size (0.01) (0.01) (0.00) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) -0.003* -0.005*** -0.000 -0.005*** -0.004*** -0.005*** -0.006*** -0.005*** -0.004** Age of HH Head (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) 0.088** 0.075** 0.015 0.052* 0.065* 0.070* 0.073** 0.074** 0.080** HH Head is Female (0.04) (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) (0.04) 0.063 -0.013 0.021 -0.028 -0.015 -0.020 -0.013 -0.018 0.034 Head has primary education (0.05) (0.04) (0.02) (0.04) (0.04) (0.04) (0.04) (0.04) (0.05) 0.375*** 0.111** 0.035 0.052 0.108** 0.087 0.115** 0.106* 0.319*** Head has secondary ordinary education (0.06) (0.05) (0.03) (0.05) (0.05) (0.05) (0.06) (0.05) (0.06) Head has secondary advanced education and 0.658*** 0.175** 0.140*** 0.046 0.169** 0.138* 0.181** 0.161* 0.569*** higher (0.09) (0.09) (0.04) (0.07) (0.08) (0.08) (0.09) (0.08) (0.08) -0.730*** -0.505*** -0.145*** -0.336*** -0.497*** -0.546*** -0.543*** -0.508*** -0.724*** Share of HH members in 0-14 (0.09) (0.09) (0.04) (0.07) (0.09) (0.09) (0.10) (0.09) (0.09) -0.295*** -0.223*** -0.035 -0.118* -0.231*** -0.213*** -0.236*** -0.206*** -0.236*** Share of HH members in 15-24 (0.08) (0.07) (0.03) (0.06) (0.07) (0.07) (0.07) (0.07) (0.08) 0.094 0.107 -0.017 0.190** 0.100 0.095 0.122 0.102 0.103 Share of HH members in 60 and older (0.11) (0.10) (0.05) (0.09) (0.10) (0.10) (0.10) (0.10) (0.11) HH Head did any wage work during the last 7 0.150*** 0.102*** 0.028 0.059* 0.089** 0.109*** 0.101*** 0.102*** 0.140*** days (0.04) (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) (0.04) HH Head was self-employed (non-farm) 0.174*** 0.056 0.026 0.010 0.054 0.065* 0.054 0.047 0.130*** during the last 7 days (0.04) (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) (0.04) -0.299*** -0.104*** -0.069*** -0.030 -0.105*** -0.103*** -0.102*** -0.094*** -0.220*** Urban area (0.04) (0.04) (0.02) (0.03) (0.03) (0.03) (0.04) (0.04) (0.04) 0.831*** Total food and non-alcoholic expenditures (0.01) 0.282*** Total non-food expenditures (0.01) 0.022*** Furnishing and household expenses (0.00) 0.020*** Health expenditures (0.00) 0.004 Education expenditures (0.00) Utilities: Water, Kerosene, El., Matches, 0.016*** 0.043*** Bulbs, Chrcoal (0.01) (0.01) 0.223*** 0.141*** 0.102*** 0.206*** 0.217*** 0.221*** 0.222*** Household owns a motor vehicles (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.019 -0.037** 0.055** 0.014 0.028 0.019 0.021 Household owns a bicycle (0.03) (0.02) (0.03) (0.03) (0.03) (0.03) (0.03) 0.147*** 0.068*** 0.039 0.125*** 0.142*** 0.144*** 0.148*** Household owns a mobile phone (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.120** 0.015 0.095** 0.104** 0.118** 0.120** 0.113** Household owns a video/dvd (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.070 0.044* 0.020 0.066 0.071 0.066 0.063 Household owns a television (0.05) (0.03) (0.04) (0.05) (0.05) (0.05) (0.05) 0.185** 0.148*** 0.065 0.175** 0.222*** 0.180** 0.188** Household owns a computer (0.08) (0.04) (0.07) (0.08) (0.08) (0.08) (0.08) 0.116** 0.076*** 0.055 0.133*** 0.116** 0.111** 0.106** Household owns a refrigerator/freezer (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.189*** 0.065*** 0.103** 0.177*** 0.172*** 0.189*** 0.183*** Household owns an air c/fans (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.031 0.005 0.006 0.025 0.033 0.031 0.032 Household owns a radio (0.03) (0.01) (0.03) (0.03) (0.03) (0.03) (0.03) 0.090** 0.022 0.051 0.087** 0.080** 0.092** 0.091** Household owns a mosquito net (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) 0.404*** 0.078** 0.333*** 0.403*** 0.392*** 0.403*** 0.400*** Log of perca residential area (0.07) (0.03) (0.06) (0.06) (0.06) (0.07) (0.07) 0.091* 0.007 0.057 0.094* 0.091* 0.090* 0.077 Roof is made of concrete/metal sheets/tiles (0.05) (0.02) (0.04) (0.05) (0.05) (0.05) (0.05) 0.013 0.028 0.001 -0.001 0.022 0.012 0.014 Wall is made of burnt bricks/concrete (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) Floor is made of 0.048 0.051** -0.039 0.056 0.046 0.045 0.033 concrete/cement/tiles/timber (0.04) (0.02) (0.04) (0.04) (0.04) (0.04) (0.04) 0.125*** 0.024 0.101*** 0.148*** 0.132*** 0.125*** 0.102*** Source of drinking water: piped water/truck (0.03) (0.02) (0.03) (0.03) (0.03) (0.03) (0.04) 0.134*** 0.033* 0.095*** 0.133*** 0.129*** 0.133*** 0.124*** Toilet: flush/VIP (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) _cons 14.501*** 13.727*** 2.508*** 10.404*** 13.566*** 13.666*** 13.751*** 13.610*** 14.075*** (0.10) (0.11) (0.19) (0.18) (0.11) (0.11) (0.11) (0.12) (0.11) sigma_e 0.51 0.46 0.22 0.39 0.46 0.45 0.46 0.46 0.50 sigma_u 0.19 0.09 0.00 0.00 0.07 0.07 0.09 0.09 0.15 rho 0.13 0.04 0.00 0.00 0.03 0.02 0.04 0.04 0.08 r2_o 0.39 0.54 0.90 0.68 0.56 0.57 0.54 0.55 0.43 N 1179 1179 1179 1179 1179 1179 1179 1179 1179 Note: The standard errors are in parentheses. * p<0.10, ** p<0.05 *** p<0.01. 49 Table A.6. Household consumption model, Tanzania, 2020/21 Variables Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 -0.043*** -0.043*** -0.009*** -0.025*** -0.042*** -0.043*** -0.044*** -0.039*** -0.038*** Household size (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) -0.002* -0.003*** -0.000 -0.003*** -0.003*** -0.004*** -0.004*** -0.004*** -0.002** Age of HH Head (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) -0.016 0.041** 0.028*** -0.003 0.040** 0.029 0.037* 0.035* -0.023 HH Head is Female (0.02) (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) 0.038 -0.021 0.008 -0.051*** -0.019 -0.028 -0.025 -0.035 0.003 Head has primary education (0.02) (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) 0.273*** 0.078*** 0.028* 0.008 0.089*** 0.080*** 0.076*** 0.057* 0.203*** Head has secondary ordinary education (0.03) (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) (0.03) (0.03) Head has secondary advanced education and 0.679*** 0.241*** 0.108*** 0.081** 0.213*** 0.230*** 0.241*** 0.213*** 0.583*** higher (0.05) (0.05) (0.03) (0.04) (0.05) (0.05) (0.05) (0.05) (0.05) -0.622*** -0.390*** -0.077*** -0.297*** -0.389*** -0.419*** -0.448*** -0.394*** -0.613*** Share of HH members in 0-14 (0.05) (0.05) (0.03) (0.04) (0.05) (0.05) (0.05) (0.05) (0.05) -0.259*** -0.194*** 0.006 -0.205*** -0.204*** -0.212*** -0.214*** -0.197*** -0.266*** Share of HH members in 15-24 (0.05) (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) (0.04) -0.015 0.030 -0.046 0.170*** 0.030 0.027 0.063 0.020 -0.009 Share of HH members in 60 and older (0.06) (0.06) (0.03) (0.05) (0.06) (0.06) (0.06) (0.06) (0.06) HH Head did any wage work during the last 7 0.076*** 0.064*** 0.010 0.029* 0.056*** 0.067*** 0.064*** 0.050** 0.047** days (0.02) (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) HH Head was self-employed (non-farm) 0.194*** 0.124*** 0.031*** 0.064*** 0.117*** 0.128*** 0.122*** 0.109*** 0.158*** during the last 7 days (0.02) (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) -0.294*** -0.061** -0.041*** 0.027 -0.095*** -0.060** -0.057** -0.027 -0.172*** Urban area (0.03) (0.03) (0.01) (0.02) (0.02) (0.02) (0.03) (0.03) (0.03) 0.783*** Total food and non-alcoholic expenditures (0.01) 0.360*** Total non-food expenditures (0.01) 0.029*** Furnishing and household expenses (0.00) 0.020*** Health expenditures (0.00) 0.006*** Education expenditures (0.00) Utilities: Water, Kerosene, El., Matches, 0.031*** 0.048*** Bulbs, Chrcoal (0.00) (0.00) 0.268*** 0.179*** 0.083*** 0.250*** 0.260*** 0.263*** 0.268*** Household owns a motor vehicles (0.03) (0.01) (0.02) (0.03) (0.03) (0.03) (0.03) 0.039** 0.009 0.020 0.029 0.029 0.037* 0.043** Household owns a bicycle (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) 0.144*** 0.088*** -0.029 0.126*** 0.126*** 0.143*** 0.125*** Household owns a mobile phone (0.03) (0.01) (0.02) (0.03) (0.03) (0.03) (0.03) 0.019 -0.008 0.013 0.006 0.013 0.017 0.015 Household owns a video/dvd (0.03) (0.01) (0.02) (0.03) (0.03) (0.03) (0.03) 0.157*** 0.071*** 0.066*** 0.144*** 0.150*** 0.154*** 0.136*** Household owns a television (0.03) (0.01) (0.02) (0.02) (0.02) (0.03) (0.02) 0.336*** 0.228*** 0.160*** 0.297*** 0.328*** 0.331*** 0.334*** Household owns a computer (0.05) (0.03) (0.04) (0.05) (0.05) (0.05) (0.05) 0.130*** 0.085*** 0.057** 0.122*** 0.122*** 0.127*** 0.123*** Household owns a refrigerator/freezer (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) (0.03) 0.044 0.059*** -0.008 0.054* 0.036 0.047 0.037 Household owns an air c/fans (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) (0.03) 0.078*** 0.024** 0.045*** 0.061*** 0.075*** 0.079*** 0.073*** Household owns a radio (0.02) (0.01) (0.01) (0.02) (0.02) (0.02) (0.02) 0.052** 0.011 0.014 0.029 0.033 0.051** 0.044** Household owns a mosquito net (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) 0.346*** 0.079*** 0.214*** 0.343*** 0.368*** 0.344*** 0.345*** Log of perca residential area (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) (0.04) 0.047* 0.011 -0.008 0.043 0.043 0.042 0.046* Roof is made of concrete/metal sheets/tiles (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) (0.03) 0.027 0.029** -0.006 0.033 0.030 0.026 0.020 Wall is made of burnt bricks/concrete (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) Floor is made of 0.085*** 0.047*** 0.012 0.103*** 0.083*** 0.083*** 0.061*** concrete/cement/tiles/timber (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) 0.061*** 0.025** 0.010 0.080*** 0.070*** 0.059*** 0.024 Source of drinking water: piped water/truck (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) 0.103*** 0.042*** 0.040** 0.097*** 0.095*** 0.103*** 0.087*** Toilet: flush/VIP (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) (0.02) _cons 14.545*** 13.769*** 3.132*** 9.702*** 13.582*** 13.701*** 13.811*** 13.564*** 14.078*** (0.05) (0.07) (0.11) (0.09) (0.07) (0.07) (0.07) (0.07) (0.06) sigma_e 0.56 0.52 0.29 0.41 0.50 0.51 0.51 0.51 0.55 sigma_u 0.20 0.15 0.08 0.09 0.15 0.14 0.15 0.15 0.19 rho 0.11 0.08 0.07 0.05 0.08 0.07 0.08 0.08 0.11 r2_o 0.34 0.46 0.84 0.66 0.48 0.48 0.46 0.47 0.38 N 4644 4644 4644 4644 4644 4644 4644 4644 4644 Note: The standard errors are in parentheses. * p<0.10, ** p<0.05 *** p<0.01. 50 Table A.7. Household consumption model, Tanzania, TA 1 Variables Model 1 Model 2 Model 3 Model 4 Model 8 Model 9 -0.039*** -0.037*** -0.005 -0.020** -0.029*** -0.031*** Household size (0.01) (0.01) (0.00) (0.01) (0.01) (0.01) 0.001 -0.002 0.000 -0.002 -0.002 -0.000 Age of HH Head (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) -0.057 0.033 0.007 0.001 0.027 -0.053 HH Head is Female (0.05) (0.05) (0.02) (0.04) (0.05) (0.05) 0.002 -0.016 0.015 -0.052 0.002 0.018 Head has primary education (0.06) (0.05) (0.02) (0.04) (0.05) (0.05) 0.289*** 0.091 0.040 0.007 0.083 0.245*** Head has secondary ordinary education (0.08) (0.08) (0.03) (0.06) (0.07) (0.08) Head has secondary advanced education 0.659*** 0.081 0.102* -0.043 0.043 0.543*** and higher (0.12) (0.13) (0.06) (0.10) (0.13) (0.12) -0.430*** -0.212* 0.043 -0.250*** -0.162 -0.368*** Share of HH members in 0-14 (0.13) (0.12) (0.06) (0.09) (0.12) (0.12) -0.190 -0.112 0.105** -0.247*** -0.102 -0.191 Share of HH members in 15-24 (0.12) (0.12) (0.05) (0.09) (0.11) (0.12) -0.085 0.115 -0.023 0.120 0.168 0.054 Share of HH members in 60 and older (0.12) (0.11) (0.05) (0.09) (0.11) (0.12) HH Head did any wage work during the 0.055 0.059 0.027 0.016 0.049 0.042 last 7 days (0.06) (0.05) (0.02) (0.04) (0.05) (0.05) HH Head was self-employed (non-farm) 0.114** -0.008 -0.005 -0.019 -0.017 0.090* during the last 7 days (0.05) (0.05) (0.02) (0.04) (0.05) (0.05) -0.299*** 0.049 -0.082*** 0.142*** 0.105* -0.122** Urban area (0.06) (0.06) (0.03) (0.05) (0.06) (0.06) 0.864*** Total food and non-alcoholic expenditures (0.02) 0.419*** Total non-food expenditures (0.02) Utilities: Water, Kerosene, El., Matches, 0.040*** 0.052*** Bulbs, Chrcoal (0.01) (0.01) 0.234*** 0.156*** 0.025 0.223*** Household owns a motor vehicles (0.06) (0.03) (0.05) (0.06) 0.066 0.009 0.047 0.076* Household owns a bicycle (0.05) (0.02) (0.03) (0.04) 0.130** 0.065** -0.060 0.125** Household owns a mobile phone (0.06) (0.03) (0.04) (0.06) -0.032 -0.038 0.011 -0.037 Household owns a video/dvd (0.07) (0.03) (0.06) (0.07) 0.113* 0.013 0.040 0.067 Household owns a television (0.06) (0.03) (0.05) (0.06) 0.282** -0.032 0.198* 0.280** Household owns a computer (0.14) (0.06) (0.11) (0.14) 0.161* 0.106*** 0.032 0.149* Household owns a refrigerator/freezer (0.09) (0.04) (0.07) (0.09) 0.299*** -0.038 0.238*** 0.307*** Household owns an air c/fans (0.09) (0.04) (0.07) (0.09) 0.128*** 0.033* 0.060* 0.133*** Household owns a radio (0.04) (0.02) (0.03) (0.04) 0.068 0.025 0.016 0.057 Household owns a mosquito net (0.05) (0.02) (0.04) (0.05) 0.209** 0.016 0.143* 0.254*** Log of perca residential area (0.10) (0.04) (0.07) (0.09) -0.048 0.004 -0.093** -0.076 Roof is made of concrete/metal sheets/tiles (0.06) (0.03) (0.04) (0.06) 0.036 0.047** -0.005 0.047 Wall is made of burnt bricks/concrete (0.05) (0.02) (0.04) (0.05) Floor is made of 0.180*** 0.053** 0.075* 0.140*** concrete/cement/tiles/timber (0.05) (0.02) (0.04) (0.05) Source of drinking water: piped 0.067 0.016 0.014 0.019 water/truck (0.05) (0.02) (0.04) (0.05) 0.148*** 0.025 0.083** 0.120** Toilet: flush/VIP (0.05) (0.02) (0.04) (0.05) _cons 14.416*** 13.608*** 2.019*** 8.868*** 13.238*** 13.835*** (0.13) (0.16) (0.23) (0.23) (0.16) (0.14) sigma_e 0.52 0.48 0.22 0.36 0.46 0.50 sigma_u 0.19 0.12 0.04 0.10 0.12 0.16 rho 0.12 0.06 0.03 0.07 0.06 0.10 r2_o 0.27 0.43 0.89 0.69 0.47 0.35 N 711 711 711 711 711 711 Note: The standard errors are in parentheses. * p<0.10, ** p<0.05 *** p<0.01. 51 Table A.8. Predicted Poverty Rates Based on Imputation, from NPS 2019/20 to TA 1 using 2019/20 Poverty Line, Tanzania (percentage) TA 1 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 15.9 17.0a 17.9a 13.8 16.2a 14.7 17.0a 17.7 a 18.5a 1) Normal linear regression model (1.9) (2.0) (2.0) (1.9) (2.0) (1.9) (2.0) (2.0) (2.0) 2) Empirical distribution of the error 15.8a 16.7a 18.1a 13.1 16.0a 14.5 16.7a 17.5a 18.6a terms (1.9) (2.0) (2.0) (1.9) (1.9) (1.9) (2.0) (2.0) (2.0) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.39 0.54 0.90 0.67 0.55 0.56 0.54 0.54 0.43 N1 (base survey) 1,179 1,179 1,179 1,179 1,179 1,179 1,179 1,179 1,179 N2 (target survey) 711 711 711 711 711 711 711 711 711 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The imputed poverty rates for TA 1 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.5. 52 Table A.9. Predicted Poverty Rates Based on Imputation, from NPS 2019/20 to TA 2 using 2019/20 Poverty Line, Tanzania (percentage) TA 2 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 8 Model 9 16.4a 16.5a 25.9 18.3a 16.9a 18.3a 1) Normal linear regression model (2.4) (2.5) (2.5) (2.1) (2.5) (2.6) 16.6a 16.2a 26.3 18.3a 16.7a 18.5a 2) Empirical distribution of the error terms (2.4) (2.5) (2.5) (2.0) (2.5) (2.6) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.39 0.54 0.90 0.90 0.54 0.43 N1 (base survey) 1,179 1,179 1,179 1,179 1,179 1,179 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Model 3.1 employs non-standardized variables. Model 3.2 employs variables in TA 2 standardized by those in NPS 2019/20. Food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. The imputed poverty rates for TA 2 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.5. 53 Table A.10. Predicted Poverty Rates Based on Imputation, from NPS 2019/20 to TA 3 using 2019/20 Poverty Line, Tanzania (percentage) TA 3 Method Model Model Model 1 Model 2 Model 3 Model 4 Model 8 Model 9 3.2 4.2 16.6a 17.2a 37.3 18.1a 23.6 18.3a 17.8a 18.9a 1) Normal linear regression model (2.2) (2.3) (2.4) (2.0) (2.5) (2.2) (2.3) (2.4) 16.6a 17.0a 37.7 17.7a 23.1 18.0a 17.6a 19.0a 2) Empirical distribution of the error terms (2.2) (2.3) (2.4) (2.0) (2.5) (2.2) (2.3) (2.4) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.39 0.54 0.89 0.89 0.67 0.67 0.54 0.43 N1 (base survey) 1,179 1,179 1,179 1,179 1,179 1,179 1,179 1,179 N2 (target survey) 698 698 698 698 698 695 698 698 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Models 3.1 and 4.1 employ non-standardized variables. Models 3.2 and 4.2 employ variables in TA 3 standardized by those in NPS 2019/20. Food and non-food expenditures, household size, and age in both surveys are transformed to normality using the Box-Cox method before standardization. The imputed poverty rates for TA 3 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.5. 54 Table A.11. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 1 using 2019/20 Poverty Line, Tanzania (percentage) TA 1 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 18.2a 19.1a 17.8a 15.5 17.9a 17.6a 19.1a 20.1 20.3 1) Normal linear regression model (2.0) (2.1) (2.0) (2.0) (2.0) (2.0) (2.1) (2.1) (2.1) 2) Empirical distribution of the error 17.7a 18.8a 17.5a 14.9 17.5a 17.3a 18.7a 20.0 19.9 terms (2.0) (2.1) (2.0) (2.0) (2.0) (2.0) (2.1) (2.1) (2.1) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.34 0.46 0.84 0.65 0.47 0.47 0.45 0.47 0.37 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 711 711 711 711 711 711 711 711 711 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The imputed poverty rates for TA 1 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 55 Table A.12. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 2 using 2019/20 Poverty Line, Tanzania (percentage) TA 2 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 8 Model 9 19.3a 19.2a 28.8 17.7a 19.7 20.6 1) Normal linear regression model (2.5) (2.6) (3.1) (2.0) (2.7) (2.7) 18.6a 18.8a 28.7 17.0a 19.4a 20.2 2) Empirical distribution of the error terms (2.5) (2.6) (3.1) (1.9) (2.7) (2.7) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.34 0.46 0.65 0.85 0.46 0.37 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Model 3.1 employs non-standardized variables. Model 3.2 employs variables in TA 2 standardized by those in NPS 2020/21. Food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. Imputed poverty rates for TA 2 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 56 Table A.13. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to TA 3 using 2019/20 Poverty Line, Tanzania (percentage) TA 3 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 4.1 Model 4.2 Model 8 Model 9 19.1a 19.7a 35.7 17.3a 28.5 19.0a 20.5 20.9 1) Normal linear regression model (2.3) (2.4) (2.5) (2.0) (2.7) (2.3) (2.4) (2.4) 2) Empirical distribution of the error 18.5a 19.3a 36.7 16.4a 28.4 18.7 20.2 20.6 terms (2.3) (2.4) (2.4) (1.9) (2.7) (2.3) (2.4) (2.4) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.34 0.46 0.84 0.84 0.66 0.68 0.47 0.37 N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 698 698 698 698 698 695 698 698 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Models 3.1 and 4.1 employ non-standardized variables. Models 3.2 and 4.2 employ variables in TA 3 standardized by those in NPS 2020/21. Food and non-food expenditures, household size, and age in both surveys, are transformed to normality using the Box-Cox method before standardization. The imputed poverty rates for TA 3 use the estimated parameters based on the 2019/20 data. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.6. 57 Table A.14. Predicted Poverty Rates Based on Imputation, from TA 1 to TA 2 using 2019/20 Poverty Line, Tanzania (percentage) TA 2 Method Model 1 Model 2 Model 3.1 Model 3.2 Model 8 Model 9 16.5a 16.0a 24.4 15.6 15.3 15.9a 1) Normal linear regression model (2.4) (2.3) (2.3) (1.8) (2.3) (2.5) 16.5a 15.5 24.5 15.5 15.0 16.0a 2) Empirical distribution of the error terms (2.4) (2.3) (2.3) (1.8) (2.3) (2.5) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.27 0.43 0.89 0.89 0.47 0.35 N1 (base survey) 711 711 711 711 711 711 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Model 3.1 employs non-standardized variables. Model 3.2 employs variables in TA 2 standardized by those in TA 1. Food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. The imputed poverty rates for TA 2 use the estimated parameters based on the TA 1. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.7. 58 Table A.15. Predicted Poverty Rates Based on Imputation, from TA 1 to TA 3 using 2019/20 Poverty Line, Tanzania (percentage) TA 3 Method Model Model Model 1 Model 2 Model 3.1 Model 4.1 Model 8 Model 9 3.2 4.2 15.9a 16.1a 35.7 15.0 31.7 15.9a 16.0a 16.1a 1) Normal linear regression model (2.1) (2.1) (2.4) (1.8) (2.8) (2.1) (2.1) (2.1) 15.9a 15.8 36.3 14.6 31.6 15.6 15.7 16.1a 2) Empirical distribution of the error terms (2.1) (2.1) (2.3) (1.8) (2.8) (2.1) (2.0) (2.1) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.27 0.43 0.89 0.89 0.69 0.69 0.47 0.35 N1 (base survey) 711 711 711 711 711 711 711 711 N2 (target survey) 698 698 698 698 698 695 698 698 True poverty rate in TA 1 17.7 (1.9) Note: The standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with the population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Models 3.1 and 4.1 employ non-standardized variables. Models 3.2 and 4.2 employ variables in TA 3 standardized by those in TA 1. Food and non-food expenditures, household size and age in both surveys are transformed to normality using the Box-Cox method before standardization. The imputed poverty rates for TA 3 use the estimated parameters based on the TA 1. 1000 simulations are implemented. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line in NPS 2019/20 deflated to 2022 prices with the annual WB deflator. The estimates shown in bold or with an “a” respectively fall within the 95% CI or one standard error of the true poverty rate. The underlying regression results are provided in Appendix A, Table A.7. 59 Table A.16. Comparison of Variables after Standardization between NPS 2019/20 and TA 2/ TA 3 NPS TA 2 TA 3 2019/20 TA 2 diff TA 3 diff 11.04 11.08 0.03 11.05 0.00 Food expenditures (0.03) (0.02) (0.04) (0.02) (0.04) 14.07 14.21 0.14 Non-food expenditures (0.10) (0.09) (0.14) 2.09 2.05 -0.04 2.05 -0.04 Household size (0.07) (0.08) (0.11) (0.06) (0.09) 2.80 2.79 -0.01 2.80 0.00 Age of HH Head (0.01) (0.01) (0.01) (0.01) (0.01) 0.19 0.22 0.03 0.21 0.02 HH Head is Female (0.02) (0.02) (0.03) (0.02) (0.03) 0.16 0.19 0.03 0.20 0.03 Head does not have formal education (0.02) (0.02) (0.03) (0.03) (0.03) 0.68 0.64 -0.04 0.63 -0.05 Head has primary education (0.03) (0.02) (0.03) (0.03) (0.04) 0.11 0.11 0.01 0.11 0.01 Head has secondary ordinary education (0.01) (0.02) (0.02) (0.02) (0.02) Head has secondary advanced education 0.03 0.03 -0.00 0.04 0.01 and higher (0.01) (0.01) (0.01) (0.01) (0.01) 0.43 0.42 -0.01 0.42 -0.01 Share of HH members in 0-14 (0.01) (0.01) (0.01) (0.01) (0.02) 0.18 0.17 -0.01 0.19 0.01 Share of HH members in 15-24 (0.01) (0.01) (0.01) (0.01) (0.01) 0.33 0.35 0.02 0.32 -0.01 Share of HH members in 25-59 (0.01) (0.01) (0.01) (0.01) (0.01) 0.06 0.06 -0.00 0.07 0.01 Share of HH members in 60 and older (0.00) (0.00) (0.01) (0.00) (0.01) HH Head did any wage work during the 0.26 0.29 0.03 0.27 0.01 last 7 days (0.03) (0.03) (0.04) (0.02) (0.03) HH Head was self-employed (non-farm) 0.20 0.21 0.00 0.21 0.00 during the last 7 days (0.02) (0.02) (0.03) (0.02) (0.03) 0.17 0.15 -0.02 0.16 -0.01 Household owns a motor vehicles (0.02) (0.02) (0.03) (0.02) (0.03) 0.33 0.33 0.01 0.34 0.01 Household owns a bicycle (0.03) (0.03) (0.04) (0.03) (0.04) 0.89 0.87 -0.02 0.87 -0.02 Household owns a mobile phone (0.01) (0.02) (0.02) (0.02) (0.02) 0.18 0.22 0.03 0.21 0.02 Household owns a video/dvd (0.02) (0.02) (0.03) (0.02) (0.03) 0.25 0.28 0.03 0.28 0.04 Household owns a television (0.02) (0.03) (0.04) (0.03) (0.04) 0.02 0.02 -0.00 0.03 0.01 Household owns a computer (0.01) (0.01) (0.01) (0.01) (0.01) 0.08 0.09 0.01 0.08 -0.00 Household owns a refrigerator/freezer (0.01) (0.02) (0.02) (0.01) (0.02) 0.07 0.07 0.01 0.06 -0.00 Household owns an air c/fans (0.01) (0.01) (0.02) (0.01) (0.02) 0.49 0.50 0.01 0.50 0.01 Household owns a radio (0.03) (0.03) (0.04) (0.03) (0.04) 0.85 0.84 -0.00 0.84 -0.01 Household owns a mosquito net (0.02) (0.02) (0.03) (0.02) (0.03) 0.44 0.47 0.03 0.46 0.02 Log of perca residential area (0.01) (0.01) (0.02) (0.01) (0.02) Roof is made of concrete/metal 0.85 0.85 -0.01 0.84 -0.01 sheets/tiles (0.02) (0.03) (0.04) (0.03) (0.03) 0.62 0.59 -0.03 0.60 -0.02 Wall is made of burnt bricks/concrete (0.03) (0.04) (0.05) (0.03) (0.05) Floor is made of 0.51 0.51 -0.00 0.51 -0.01 concrete/cement/tiles/timber (0.03) (0.03) (0.05) (0.03) (0.05) Source of drinking water: piped 0.41 0.43 0.02 0.39 -0.01 water/truck (0.04) (0.04) (0.05) (0.04) (0.05) 0.33 0.36 0.03 0.34 0.01 Toilet: flush/VIP (0.03) (0.03) (0.04) (0.03) (0.04) 0.75 0.74 -0.01 0.71 -0.04 Residence: Rural (0.02) (0.04) (0.05) (0.04) (0.05) Note: The standard errors are in parentheses. The differences are estimated with t-tests that take into account complex survey design. * p<0.10, ** p<0.05 *** p<0.01. 60 Table A.17. Meta-analysis of Imputation Methods, Between-Year Imputations, Marginal Effects from Logit Regressions b/se Imputation method – normal distribution of error terms -0.030 (0.02) Poverty line is consistent 0.724** (0.28) Base year is NPS 2020/21 0.694** (0.27) Type of questionnaire – Treatment Arm 2 0.028 (0.05) Type of questionnaire – Treatment Arm 3 -0.020 (0.05) Inflation source – quarterly WB CPI -0.063 (0.05) Inflation source – monthly WB CPI -0.069 (0.05) Inflation source – annual IMF CPI -0.031 (0.06) Number of observations 480 Note: * p < 0.1, ** p < 0.05, *** p < 0.01. The estimation results are obtained from logit regressions. The dependent variable is a binary variable that indicates whether the predicted poverty rate is within the 95% confidence interval of the true poverty rate. The robust standard errors are clustered at the model level. The reference groups are empirical distribution of the error terms for the “imputation methods,” inconsistent poverty line for “the type of the poverty line,” NPS 2019/20 for the “base year,” TA 1 for the “type of the questionnaire”, annual WB disaggregated CPI for the “inflation source.” The consistent poverty line is defined as poverty line measured at the base year (i.e., the 2020/21 poverty line used for the imputations from 2020/21 to TAs and the 2019/20 poverty line used for the imputations from 2019/20 to TAs). 61 Table A.18. Predicted poverty rates (and SEs) based on LASSO, Elastic net and Random Forest from NPS 2020/21 to TA 1 using 2020/21 Poverty Line, Tanzania (percentage) 2010/11 Method Model Model Model Model Model Model Model Model Model M1 M2 M3 M4 M5 M6 M7 M8 M9 17.6 17.7 16.1 17.4 16.7 16.5 17.8 18.3 19.1a 1) Multiple imputation (PMM) (2.0) (1.9) (1.6) (1.8) (2.0) (2.0) (1.8) (1.9) (2.1) 17.3 17.6 15.9 17.5 17.0 16.6 17.9 18.6 18.7 2) Multiple imputation (linear) (2.0) (1.8) (1.7) (1.8) (1.8) (1.8) (1.9) (1.8) (1.9) 4.5 10.5 19.1 a 19.6 a 11.0 10.1 10.0 13.6 11.7 3) Lasso (1.4) (1.8) (2.1) (2.3) (1.9) (1.8) (1.8) (2.2) (2.1) 3.8 10.6 19.1 a 19.6 a 11.0 10.1 10.1 13.5 11.7 4) Elastic Net (1.3) (1.8) (2.1) (2.3) (1.9) (1.8) (1.8) (2.2) (2.1) 10.4 7.8 20.0 a 17.9 8.5 8.8 8.9 9.8 10.0 5) Random Forest (1.5) (1.5) (2.1) (2.2) (1.6) (1.7) (1.7) (1.9) (1.6) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household Y expenses Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house Y Y Y Y Y Y Y characteristics Demographics & employment Y Y Y Y Y Y Y Y Y N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 711 711 711 711 711 711 711 711 711 True poverty rate 21.0 (2.2) Note: Estimates shown in boldface or with a “a” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Lasso,Elastic net linear models and Random Forest models are trained in the first round when the data is split into two subsets: 50% used for training and 50% used for testing (validation). The list of selected variables with penalized unstandardized coefficients is provided in Appendix A, Table A.21 for Lasso, and Table A.22 for Elastic net. The number of sub-trees is set at 1000 in Random Forest. Both out-of-bag error and validation error are used to determine the best possible model. The list of selected variables with the variable importance scores is provided in Appendix A, Table A.23. We use five nearest neighbors with the predictive mean matching method. 50 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 62 Table A.19. Predicted poverty rates (and SEs) based on LASSO, Elastic net and Random Forest from NPS 2020/21 to TA 2 using 2020/21 Poverty Line, Tanzania (percentage) TA 2 Method Model Model Model M1 Model M2 Model M8 Model M9 M3.1 M3.2 18.0 18.1 25.3 17.7 17.9 19.0a 1) Multiple imputation (PMM) (2.0) (1.9) (1.9) (1.7) (1.9) (2.0) 17.8 17.6 24.5 17.6 18.0 18.2 2) Multiple imputation (linear) (2.0) (1.9) (1.9) (1.7) (2.0) (2.0) 8.4 10.8 26.2 18.1 14.0 13.7 3) Lasso (3.2) (3.0) (2.7) (2.0) (3.6) (3.7) 8.7 10.8 26.2 18.1 14.3 13.7 4) Elastic Net (3.2) (3.0) (2.7) (2.0) (3.6) (3.7) 11.4 8.5 24.5 17.7 9.3 10.8 5) Random Forest (1.9) (1.5) (2.1) (1.7) (1.6) (1.7) Control variables Food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Demographics & employment Y Y Y Y Y Y N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 701 701 701 701 701 701 True poverty rate in TA 1 21.0 (2.2) Note: Estimates shown in boldface or with a “a” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Lasso, Elastic net linear models and Random Forest models are trained in the first round when the data is split into two subsets: 50% used for training and 50% used for testing (validation). For multiple imputation, we use five nearest neighbors with the predictive mean matching method. 50 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 63 Table A.20. Predicted poverty rates (and SEs) based on LASSO, Elastic net and Random Forest from NPS 2020/21 to TA 3 using 2020/21 Poverty Line, Tanzania (percentage) TA 3 Method Model Model Model M1 Model M2 Model M3.1 Model M4.1 Model M8 Model M9 M3.2 M4.2 18.3 18.7 33.3 17.0 41.1 17.7 19.3a 19.6a 1) Multiple imputation (PMM) (2.0) (2.0) (2.0) (1.6) (2.4) (1.9) (2.2) (2.0) 18.1 18.6 31.8 17.0 46.6 18.5 19.4a 19.8a 2) Multiple imputation (linear) (2.1) (1.8) (2.0) (1.6) (2.3) (1.9) (1.9) (2.0) 8.9 14.3 39.4 16.4 48.4 17.1 15.9 14.6 3) Lasso (2.4) (2.5) (2.5) (2.0) (3.0) (2.6) (2.6) (2.5) 8.9 14.5 39.4 16.4 48.3 11.2 16.2 14.6 4) Elastic Net (2.4) (2.6) (2.5) (2.0) (3.0) (1.9) (2.6) (2.5) 13.4 9.7 37.0 16.4 42.8 14.4 10.1 13.3 5) Random Forest (2.0) (1.9) (2.3) (1.9) (3.0) (2.1) (1.8) (2.0) Control variables Food expenditures Y Y Non-food expenditures Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y N1 (base survey) 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 N2 (target survey) 698 698 698 698 698 698 698 698 True poverty rate in TA 1 21.0 (2.2) Note: One-standard-error rule was used to select lambda. Estimates shown in boldface or with a “a” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Lasso, Elastic net linear models and Random Forest models are trained in the first round when the data is split into two subsets: 50% used for training and 50% used for testing (validation). For multiple imputation, we use five nearest neighbors with the predictive mean matching method. 50 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 64 Table A.21. The list of selected variables in LASSO with penalized unstandardized coefficients for imputation from NPS 2020/21 to TA 1 using 2020/21 Poverty Line Model Model Model Model Model Model Model Model Model M1 M2 M3 M4 M5 M6 M7 M8 M9 Household size -0.049 -0.045 -0.014 -0.016 -0.044 -0.045 -0.046 -0.041 -0.043 Age of HH Head -0.001 -0.003 -0.001 -0.001 -0.003 -0.003 -0.003 -0.004 -0.002 Head has secondary ordinary education 0.297 0.086 0.033 0.096 0.089 0.085 0.064 0.238 Head has secondary advanced education and higher 0.751 0.260 0.121 0.041 0.232 0.249 0.260 0.233 0.662 Share of hh_membr_014 -0.670 -0.381 -0.059 -0.227 -0.376 -0.406 -0.404 -0.381 -0.621 Share of hh_membr_1524 -0.313 -0.230 -0.155 -0.235 -0.245 -0.239 -0.228 -0.285 HH Head did any wage work during the last 7 days 0.065 0.042 0.013 0.033 0.045 0.042 0.028 0.023 HH Head was self-employed (non-farm) during the last 7 days 0.201 0.116 0.027 0.036 0.107 0.119 0.116 0.102 0.148 Urban -0.338 -0.075 -0.067 0.041 -0.101 -0.072 -0.073 -0.044 -0.201 HH Head is Female 0.045 0.024 0.039 0.033 0.043 0.039 -0.003 Head has primary education -0.016 -0.024 -0.017 -0.023 -0.018 -0.031 Household owns a motor vehicle 0.280 0.192 0.004 0.258 0.271 0.278 0.280 Household owns a bicycle 0.020 0.018 0.012 0.011 0.020 0.025 Household owns a mobile phone 0.141 0.080 -0.035 0.126 0.124 0.141 0.124 Household owns a video/dvd 0.006 -0.001 0.011 0.002 0.006 0.001 Household owns a television 0.196 0.095 0.175 0.188 0.195 0.176 Household owns a computer 0.362 0.240 0.074 0.318 0.349 0.360 0.360 Household owns a refrigerator/freezer 0.138 0.094 0.125 0.132 0.137 0.129 Household owns a radio 0.080 0.022 0.025 0.062 0.076 0.080 0.075 Household owns a mosquito net 0.027 0.006 0.027 0.019 Log of per capita residential area 0.426 0.195 0.434 0.448 0.427 0.425 Concrete/Metal sheets/Tiles 0.063 0.036 -0.047 0.056 0.060 0.061 0.061 Burnt bricks/Concrete 0.027 0.041 -0.039 0.033 0.031 0.027 0.023 Floor is improved 0.101 0.055 -0.022 0.123 0.099 0.100 0.076 Piped water/Truck 0.068 0.044 -0.025 0.089 0.078 0.067 0.032 Flush/VIP 0.124 0.058 0.118 0.116 0.124 0.106 Household owns an air c/fans 0.059 -0.018 0.054 Log of real per adult equivalent food consumption, annual, 2010/11 prices 0.720 lnpcnf6 0.624 Log of real adult equivalent furnishings & household expenses cons, annual, 2010/11 prices 0.025 Log of real per adult equivalent health expenditure, annual, 2010/11 prices 0.018 Log of real per adult equivalent education expenditure, annual, 2010/11 prices 0.002 Log of real per adult equivalent electricity & kerosene & water cons, annual, 2010/11 prices 0.029 0.050 _cons 14.759 13.864 4.048 6.298 13.691 13.799 13.878 13.676 14.213 Goodness of fit statistics R_squared in training sub-sample 0.38 0.53 0.85 0.80 0.55 0.55 0.53 0.54 0.43 R_squared in testing sub-sample 0.40 0.54 0.86 0.77 0.56 0.55 0.54 0.54 0.44 R_squared 0.40 0.54 0.85 0.79 0.56 0.55 0.54 0.55 0.44 MSE in training sub-sample 0.33 0.25 0.08 0.11 0.24 0.24 0.25 0.24 0.31 MSE in testing sub-sample 0.33 0.25 0.08 0.12 0.24 0.24 0.25 0.25 0.31 MSE 0.33 0.25 0.08 0.11 0.24 0.24 0.25 0.24 0.30 N 4644 4644 4644 4644 4644 4644 4644 4644 4644 Note: Lasso linear model is used. The model is trained using Tanzania 2020/21 round. The training data is split into two subsets: 50% used for training (training sub-sample) and 50% used for validation (testing sub-sample). 65 Table A.22. The list of selected variables in Elastic net with penalized unstandardized coefficients for imputation from NPS 2020/21 to TA 1 using 2020/21 Poverty Line Model Model Model Model Model Model Model Model Model M1 M2 M3 M4 M5 M6 M7 M8 M9 Household size -0.049 -0.045 -0.014 -0.016 -0.043 -0.045 -0.046 -0.041 -0.014 Age of HH Head -0.001 -0.003 -0.001 -0.001 -0.003 -0.003 -0.003 -0.004 -0.002 Head has primary education 0.064 -0.017 -0.026 -0.016 -0.023 -0.019 -0.030 -0.004 Head has secondary ordinary education 0.352 0.086 0.032 0.096 0.089 0.085 0.064 0.046 Head has secondary advanced education and higher 0.808 0.261 0.120 0.042 0.231 0.249 0.261 0.233 0.115 Share of hh_membr_014 -0.673 -0.383 -0.058 -0.235 -0.369 -0.404 -0.406 -0.377 -0.178 Share of hh_membr_1524 -0.315 -0.232 -0.162 -0.228 -0.244 -0.240 -0.224 -0.071 HH Head did any wage work during the last 7 days 0.064 0.042 0.016 0.034 0.044 0.042 0.030 0.002 HH Head was self-employed (non-farm) during the last 7 days 0.199 0.117 0.027 0.039 0.108 0.118 0.116 0.104 0.027 urban==Urban -0.332 -0.075 -0.068 0.043 -0.102 -0.072 -0.073 -0.045 -0.071 HH Head is Female 0.046 0.022 0.040 0.033 0.044 0.042 0.019 Household owns a motor vehicle 0.281 0.191 0.005 0.258 0.271 0.278 0.281 0.178 Household owns a bicycle 0.021 0.019 0.013 0.011 0.021 0.027 -0.009 Household owns a mobile phone 0.141 0.079 -0.038 0.129 0.124 0.141 0.127 0.065 Household owns a video/dvd 0.007 0.012 0.002 0.006 0.002 -0.020 Household owns a television 0.196 0.095 0.175 0.188 0.194 0.177 0.084 Household owns a computer 0.362 0.239 0.074 0.318 0.349 0.360 0.361 0.213 Household owns a refrigerator/freezer 0.138 0.094 0.010 0.126 0.132 0.137 0.130 0.087 Household owns a radio 0.080 0.022 0.026 0.062 0.076 0.080 0.076 0.017 Household owns a mosquito net 0.028 0.006 0.027 0.020 -0.021 Log of per capita residential area 0.426 0.195 0.427 0.448 0.427 0.418 0.213 Concrete/Metal sheets/Tiles 0.063 0.035 -0.048 0.056 0.059 0.061 0.062 0.025 Burnt bricks/Concrete 0.027 0.041 -0.041 0.034 0.031 0.027 0.024 0.046 Floor is improved 0.101 0.056 -0.023 0.123 0.099 0.100 0.077 0.059 Piped water/Truck 0.068 0.043 -0.027 0.089 0.078 0.068 0.033 0.058 Flush/VIP 0.124 0.058 0.118 0.116 0.123 0.106 0.051 Household owns an air c/fans 0.059 -0.026 0.055 0.068 Log of real per adult equivalent food consumption, annual, 2010/11 prices 0.718 0.705 Log of real per adult equivalent non-food consumption, annual, 2010/11 prices 0.625 Share of hh_membr_60 0.072 0.077 0.052 Log (real per adult equivalent furnishings & household expenses cons, annual, 2010/11 prices 0.025 0.010 Log of real per adult equivalent health expenditure, annual, 2010/11 prices 0.018 0.012 Log of real per adult equivalent education expenditure, annual, 2010/11 prices 0.002 0.007 Log of real per adult equivalent electricity & kerosene & water cons, annual, 2010/11 prices 0.028 _cons 14.697 13.864 4.077 6.301 13.700 13.799 13.878 13.687 4.206 Goodness of fit statistics R_squared in training sub-sample 0.38 0.53 0.85 0.80 0.55 0.55 0.53 0.54 0.43 R_squared in testing sub-sample 0.40 0.54 0.86 0.77 0.56 0.55 0.54 0.55 0.44 R_squared 0.40 0.54 0.85 0.79 0.56 0.55 0.54 0.55 0.44 MSE in training sub-sample 0.33 0.25 0.08 0.11 0.24 0.24 0.25 0.24 0.31 MSE in testing sub-sample 0.33 0.25 0.08 0.12 0.24 0.24 0.25 0.25 0.31 MSE 0.33 0.25 0.08 0.11 0.24 0.24 0.25 0.24 0.30 N 4644 4644 4644 4644 4644 4644 4644 4644 4644 Note: Elastic net linear model is used. The model is trained using Tanzania 2020/21 round. The training data is split into two subsets: 50% used for training (training sub-sample) and 50% used for validation (testing sub-sample). 66 Table A.23. The list of selected variables in Random Forest with the variable importance scores for imputation from NPS 2020/21 to TA 1 using 2020/21 Poverty Line Model Model Model Model Model Model Model Model Model M1 M2 M3 M4 M5 M6 M7 M8 M9 Household size 0.082 0.226 0.096 0.100 0.228 0.119 0.217 0.274 0.225 Age of HH Head 0.030 0.098 0.034 0.046 0.097 0.035 0.089 0.126 0.096 HH Head is Female 0.026 0.068 0.027 0.042 0.064 0.030 0.060 0.083 0.072 Head has primary education 0.036 0.072 0.028 0.040 0.071 0.029 0.064 0.091 0.097 Head has secondary ordinary education 0.166 0.117 0.029 0.040 0.113 0.045 0.104 0.144 0.242 Head has secondary advanced education and higher 0.612 0.297 0.084 0.067 0.262 0.090 0.232 0.361 1.000 Share of hh_membr_014 0.053 0.133 0.049 0.056 0.135 0.052 0.126 0.170 0.152 Share of hh_membr_1524 0.031 0.090 0.031 0.047 0.090 0.035 0.082 0.114 0.093 Share of hh_membr_60older 0.034 0.095 0.031 0.054 0.095 0.038 0.085 0.121 0.099 HH Head did any wage work during the last 7 days 0.027 0.064 0.024 0.043 0.062 0.028 0.057 0.081 0.072 HH Head was self-employed (non-farm) during the last 7 days 0.048 0.091 0.034 0.052 0.085 0.041 0.080 0.107 0.110 urban==Urban 1.000 0.367 0.187 0.052 0.439 0.178 0.366 0.409 0.619 Household owns a motor vehicle 0.223 0.093 0.046 0.204 0.097 0.182 0.306 Household owns a bicycle 0.064 0.027 0.043 0.064 0.028 0.056 0.082 Household owns a mobile phone 0.097 0.032 0.049 0.094 0.047 0.087 0.120 Household owns a video/dvd 0.142 0.041 0.041 0.149 0.047 0.120 0.169 Household owns a television 1.000 0.937 0.506 1.000 1.000 1.000 1.000 Household owns a computer 0.559 0.205 0.460 0.493 0.288 0.480 0.727 Household owns a refrigerator/freezer 0.505 0.210 0.051 0.481 0.259 0.513 0.592 Household owns an air c/fans 0.154 0.066 0.032 0.174 0.045 0.131 0.189 Household owns a radio 0.080 0.029 0.041 0.076 0.036 0.070 0.100 Household owns a mosquito net 0.068 0.025 0.048 0.065 0.031 0.061 0.086 Log of per capita residential area 0.188 0.092 0.070 0.210 0.098 0.178 0.253 Concrete/Metal sheets/Tiles 0.096 0.035 0.056 0.098 0.038 0.086 0.124 Burnt bricks/Concrete 0.158 0.059 0.048 0.173 0.054 0.143 0.187 Floor is improved 0.571 0.383 0.064 0.620 0.364 0.573 0.607 Piped water/Truck 0.138 0.039 0.046 0.152 0.048 0.123 0.163 Flush/VIP 0.709 1.000 0.573 0.863 0.553 0.649 0.723 Log of real per adult equivalent food consumption, annual, 2010/11 prices 0.663 Log of real per adult equivalent non-food consumption, annual, 2010/11 prices 1.000 Log (real per adult equivalent furnishings & household expenses cons, annual, 2010/11 prices 0.248 Log of real per adult equivalent health expenditure, annual, 2010/11 prices 0.087 Log of real per adult equivalent education expenditure, annual, 2010/11 prices 0.164 Log of real per adult equivalent electricity & kerosene & water cons, annual, 2010/11 prices 0.455 0.333 Predictive accuracy RMSE for testing sub-sample 0.60 0.50 0.28 0.34 0.48 0.47 0.49 0.48 0.53 Corresponding minimum number of variables 4 5 13 23 5 10 5 5 4 RMSE 0.59 0.48 0.24 0.34 0.46 0.45 0.47 0.44 0.49 Note: The values are scaled proportional to the largest value in the set. The model is trained using Tanzania 2020/21 round. The training data is split into two subsets: 50% used for training (training sub- sample) and 50% used for validation (testing sub-sample). 67 Figure A.1. Predicted Poverty Rates Based on Imputation, from NPS 2019/20 and NPS 2020/21 to TA, Empirical Distribution of the Error, Tanzania TA 1 TA 2 30 30 Estimated poverty rate (%) Estimated poverty rate (%) 25 25 20 20 15 15 10 10 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m8 m9 Imputation model Imputation model TA 3 30 NPS 2019/20 Estimated poverty rate (%) 25 NPS 2020/21 10 15 20 m1 m2 m3 m4 m8 m9 Imputation model Note: 1000 simulations are implemented. Model 3 in TA 2 and Models 3 and 4 in TA 3 use standardized variables. The dashed lines represent the true poverty rates for TA 1 using the 2019/20 and 2020/21 poverty lines. The dotted lines represent confidence intervals of the true poverty rates. 68 Figure A.2. Further Meta-analysis with Between-Year Imputations, Marginal Effects from Logit Regressions Imputation method is normal Poverty line is consistent Base year is NPS 2020/21 Treatment Arm 2 Treatment Arm 3 Quaterly WB CPI Monthly WB CPI Annual IMF CPI -.5 0 .5 1 1.5 Note: The figure displays the results from logit regressions. The dependent variable is a binary variable that indicates whether the predicted poverty rate is within the 95% confidence interval of the true poverty rate. The robust standard errors are clustered at the model level. The reference groups are empirical distribution of the error terms for the “imputation methods,” inconsistent poverty line for “the type of the poverty line,” NPS 2019/20 for the “base year,” TA 1 for the “type of the questionnaire”, annual WB disaggregated CPI for the “inflation source.” The consistent poverty line is defined as poverty line measured at the base year (i.e., the 2020/21 poverty line used for the imputations from 2020/21 to TAs and the 2019/20 poverty line used for the imputations from 2019/20 to TAs). The error bars are 95% confidence intervals. 69 Figure A.3. Predicted Poverty Rates Based on Imputation, from NPS 2020/21 to Target Survey Sample TA 1 for Different Sample Sizes, using 2020/21 Poverty Line, Normal Linear Regression Model, Tanzania Model 1 Model 2 40 35 30 30 25 20 20 15 10 10 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 (71) (142) (213) (284) (356) (427) (498) (569) (640) (711) (71) (142) (213) (284) (356) (427) (498) (569) (640) (711) Model 8 Model 9 40 35 30 30 25 20 20 15 10 10 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 (71) (142) (213) (284) (356) (427) (498) (569) (640) (711) (71) (142) (213) (284) (356) (427) (498) (569) (640) (711) % of target survey % of target survey 10% (464) 20% (930) 30% (1393) 100% (4644) Note: The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. The imputed poverty rates for TA 1 use the estimated parameters based on NPS 2020/21. 1000 simulations are implemented. All estimates are obtained with the population weights. The true poverty rate for TA 1 is the estimate directly obtained using the consumption data in TA1 and the poverty line from NPS 2020/21 deflated to 2022 prices. The sample size of TA 1 is selected as a percentage of the target survey varying from (randomly selected) 10% to 100% of the TA sample (with the sample size in parentheses). The sample size of NPS 2020/21 is selected as a percentage of the base survey sample, varying from (randomly selected) 10%, 20%, 30% to 100% of the NPS 2020/21 sample (with the sample size in parentheses). The total sample size of TA 1 is 711 households, and the sample size of NPS 2020/21 is 4,644 households. 70 Appendix B: Further description of the food and non-food consumption aggregates in the TZNPSs and the experiment The main consumption aggregate in NPS consists of the food and non-food consumption expenditure aggregates. Food component Food consumption is made up of expenditures on thirteen food categories: cereals and cereal products; starches; sugars and sweets; dry pulses; nuts and seeds; vegetables; fruits; meat products and fish; milk and milk products; oils and fats; spices and other foods; non-alcoholic beverages; and alcoholic beverages. It includes all sources of consumption, such as purchased food, meals eaten outside, food produced at home, and gifted food. The values of food that has been eaten and non- purchased food that was consumed are incorporated into the welfare measurement. Non-food component NPS collects data covering a wide array of non-food items, including health, education, utilities, and energy sources (including water, kerosene, electricity, matches, bulbs, and charcoal), furnishings and household items, clothing, transport, communication, recreation, and other non-food consumption. Each non-food category has a specific reference period based on its frequency of purchase or consumption: expenses for mobile phones and personal care are gathered over the last month, and expenses on furnishings over the past twelve months. The information about some non-food goods and services was excluded from the consumption aggregate in earlier rounds of NPS because those items were not collected or due to incomparability with other rounds. Based on this information, we can identify two methods for defining the consumption aggregate. The new classification method includes three additional items in non-food consumption - the use value of durables, imputed rent on user-occupied property, and clothing expenses - while the old method excludes them. Hence, the new method yields higher consumption figures for non-food and total consumption, by the inclusion of durables and imputed rent. We employ the new method to construct the consumption aggregates for analysis. The descriptive statistics for the consumption aggregate calculated using the new method and the old method are shown respectively in Table 1 in the main text and Table B.1 below. In all NPS rounds as well as the experimental round TA 1, food consumption can be disaggregated into 83 individual commodities, while in TA 2, only 17 items are reported. Quantities and unit costs are recorded from which total expenditures are computed. Finally, in TA 3, only monetary expenditures on 13 basic food categories are surveyed, without surveying quantities or prices. 71 Table B.1. Descriptive statistics, old method Difference Variables NPS 2019/20 NPS 2020/21 TA 1 TA 2 TA 3 TA 2-TA 1 TA 3-TA 1 TA 3-TA 2 1,222,517.32 1,373,065.65 1,217,650.85 Total household expenditures (74,322.95) (62,880.44) (51,738.37) Total food and non-alcoholic 827,883.46 863,962.65 838,428.94 652,206.43 578,765.01 -186222.5*** -259663.9*** -73,441.4*** expenditures (40,583.02) (14,057.96) (29,529.69) (20,823.03) (21,470.64) (26373.9) (25870.8) (20,021.8) 372,556.15 508,775.31 377,530.31 245,480.26 -132050.0*** Total non-food expenditures 34,474.17 55,263.20 28,080.77 19,928.04 (24185.1) 47,791.26 105,111.39 101,997.02 50,953.59 -51043.4*** Health expenditures 4,645.77 15,402.42 11,856.50 5,144.21 (11955.5) 65,397.27 81,120.01 52,454.27 39,537.99 -12916.3* Education expenditures 9,591.65 6,315.51 6,105.98 6,552.07 (7600.1) Utilities: Water, Kerosene, El., 55,163.38 56,605.99 47,764.41 50,880.13 44,317.04 3,115.7 -3,447.4 -6,563.1* Matches, Bulbs, Charcoal 5,700.76 2,158.06 5,067.41 6,014.63 4,426.19 (4,274.0) (3,021.5) (3,774.2) Furnishing and household 21,781.89 33,105.06 18,974.28 15,264.53 -3709.8*** expenses (2,693.92) (3,556.79) (1,134.98) 1,118.15 (1348.0) 7.17 6.38 6.25 6.77 6.71 0.5 0.5 -0.1 Household size (0.52) (0.10) (0.17) (0.49) (0.30) (0.5) (0.3) (0.5) 47.59 45.42 45.47 45.09 45.60 -0.4 0.1 0.5 Age of HH Head (0.86) (0.26) (0.58) (0.62) (0.73) (0.8) (0.8) (0.9) 0.19 0.23 0.23 0.21 0.23 -0.0 -0.0 0.0 HH Head is Female (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Head does not have formal 0.16 0.21 0.19 0.19 0.20 -0.0 0.0 0.0 education (0.02) (0.01) (0.02) (0.02) (0.03) (0.0) (0.0) (0.0) 0.68 0.63 0.64 0.65 0.63 0.0 -0.0 -0.0 Head has primary education (0.03) (0.01) (0.02) (0.02) (0.03) (0.0) (0.0) (0.0) Head has secondary ordinary 0.11 0.12 0.13 0.12 0.13 -0.0 0.0 0.0 education (0.01) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Head has secondary advanced 0.03 0.04 0.04 0.03 0.04 -0.0 -0.0 0.0 education and higher (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.43 0.44 0.44 0.43 0.43 -0.0 -0.0 0.0 Share of HH members in 0-14 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.18 0.19 0.19 0.20 0.20 0.0 0.0 -0.0 Share of HH members in 15-24 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.33 0.33 0.33 0.32 0.33 -0.0 -0.0 0.0 Share of HH members in 25-59 (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Share of HH members in 60 and 0.06 0.04 0.04 0.04 0.04 0.0 -0.0 -0.0 older (0.00) (0.00) (0.00) (0.00) (0.00) (0.0) (0.0) (0.0) HH Head did any wage work 0.26 0.27 0.21 0.27 0.26 0.1** 0.0 -0.0 during the last 7 days (0.03) (0.01) (0.02) (0.03) (0.02) (0.0) (0.0) (0.0) HH Head was self-employed 0.20 0.22 0.21 0.23 0.20 0.0 -0.0 -0.0 (non-farm) during the last 7 days (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) Household owns a motor vehicle 0.17 0.13 0.16 0.17 0.16 0.0 -0.0 -0.0 72 (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.33 0.37 0.31 0.37 0.34 0.1** 0.0 -0.0 Household owns a bicycle (0.03) (0.01) (0.02) (0.03) (0.03) (0.0) (0.0) (0.0) 0.89 0.88 0.87 0.90 0.89 0.0 0.0 -0.0 Household owns a mobile phone (0.01) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.18 0.14 0.12 0.13 0.12 0.0 -0.0 -0.0 Household owns a video/DVD (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.25 0.28 0.28 0.28 0.27 0.0 -0.0 -0.0 Household owns a television (0.02) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) 0.02 0.03 0.03 0.03 0.04 -0.0 0.0 0.0 Household owns a computer (0.01) (0.00) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Household owns a 0.08 0.09 0.08 0.10 0.09 0.0 0.0 -0.0 refrigerator/freezer (0.01) (0.01) (0.02) (0.02) (0.01) (0.0) (0.0) (0.0) 0.07 0.07 0.07 0.07 0.07 -0.0 0.0 0.0 Household owns an AC/fan (0.01) (0.01) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) 0.49 0.44 0.42 0.46 0.39 0.0 -0.0 -0.1** Household owns a radio (0.03) (0.01) (0.02) (0.03) (0.03) (0.0) (0.0) (0.0) 0.85 0.85 0.83 0.86 0.86 0.0 0.0 -0.0 Household owns a mosquito net (0.02) (0.01) (0.02) (0.02) (0.02) (0.0) (0.0) (0.0) 0.44 0.45 0.46 0.48 0.48 0.0* 0.0 -0.0 Log of per capita residential area (0.01) (0.01) (0.01) (0.01) (0.01) (0.0) (0.0) (0.0) Roof is made of concrete/metal 0.85 0.84 0.83 0.85 0.85 0.0 0.0 -0.0 sheets/tiles (0.02) (0.01) (0.02) (0.03) (0.02) (0.0) (0.0) (0.0) Wall is made of burnt 0.62 0.55 0.60 0.56 0.56 -0.0 -0.0 -0.0 bricks/concrete (0.03) (0.02) (0.03) (0.04) (0.03) (0.0) (0.0) (0.0) Floor is made of 0.51 0.52 0.48 0.47 0.50 -0.0 0.0 0.0 concrete/cement/tiles/timber (0.03) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) Source of drinking water: piped 0.41 0.44 0.40 0.38 0.34 -0.0 -0.1** -0.0 water/truck (0.04) (0.02) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) 0.33 0.35 0.36 0.35 0.34 -0.0 -0.0 -0.0 Toilet: flush/VIP (0.03) (0.01) (0.03) (0.03) (0.03) (0.0) (0.0) (0.0) 0.25 0.28 0.25 0.24 0.27 -0.0 0.0* 0.0** Residence: Urban (0.02) (0.01) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) 0.75 0.72 0.75 0.76 0.73 0.0 -0.0* -0.0** Residence: Rural (0.02) (0.01) (0.04) (0.04) (0.04) (0.0) (0.0) (0.0) Number of observations 1179 4644 711 701 698 1412 1409 1399 Note: The standard errors are in parentheses, and the differences are estimated considering the complex survey design. * p < 0.1, ** p < 0.05, *** p < 0.01. The population weights are applied. The consumption data in NPS 2019/20 and in NPS 2020/21 are deflated to 2022 prices using the annual WB CPI. 73 Appendix C: Additional Formulas and Proofs We briefly review below some relevant technical details and the intuition of our proposed imputation method from Dang et al. (2017), Dang et al. (2019), and Dang and Lanjouw (2023). Further details and formal proofs are provided in these studies. Given our consumption model for the base survey (survey 1) and the target survey (survey 2), we can write out Equation (1) in full as follows 1 = 1 ′1 + 1 + 1 (C.1) 2 = 2 ′2 + 2 + 2 (C.2) By Assumption 1, since both x1 and x2 are representative of the population at the two survey periods (or at the same time), we can replace x1 with x2 in Equation (C.1) to obtain the imputed household consumption in survey 2 as 1 2 = 1 ′2 + 1 + 1 (C.3) Further given Assumption 2 that changes in the distributions of the explanatory variables between the two periods can capture the change in the poverty rate in the next period, we can obtain poverty estimates in the absence of actual consumption data in the target survey 1 (2 ≤ 1 ) = (2 ≤ 2 ) (C.4) where (.) is the given poverty function and is the poverty line in survey j, j = 1 and 2. Since the poverty function (.) is defined as the averaged poverty rate for the population, it is an expectation function. Using the iterated expectation rule, combining Equations (C.3) and (C.4) and plugging in the estimated parameters for 1 and the estimated distributions of the error terms for 1 and 1 , we have ̂1 ′2 + (( �1 + ̂1 ≤ 1 )) = (2 ≤ 2 ) (C.5) Since we do not exactly know the predicted error terms �1 and ̂1 in survey 1 that are associated with the characteristics x2 for each household in survey 2, we can simulate these error terms from their estimated distributions and approximate the first term on the left hand side in equality (C.5) as 1 ∑=1 ( � 2 1 , ≤ 1 ) → ̂1 ′2 + (( �1 + ̂1 ≤ 1 )) (C.6) where �21 ̂ �1, + ̂ , = 1 ′2 + � ̃1, , and �1, and ̂ � ̃1, represent the sth random draw from their estimated distributions for 1 and 1 , for s= 1,…, S (see, e.g., Gourieroux and Monfort, 1997). The number of simulations S should thus be large enough for the term on the left-hand side to converge to the term on the right-hand side of Equation (C.6), which results in Equation (4). We recommend using 1,000 simulations. The terms on the left-hand side of Equations (C.4) and (C.5) also make it clear that we should use the poverty line 1 in the base survey (rather than the poverty line 2 in the target survey) together 1 with the imputed consumption for the base survey �2 to obtain poverty estimates. Put differently, since all the estimates for 1, 1 , and 1 in Equation (C.5) come from the base survey, the poverty 74 line should also come from the same base survey. This is an advantage of survey imputation methods because it helps preclude various data challenges such as obtaining the right consumption deflators for the target survey, or ensuring that various other variables in the target survey (such as the new poverty line, the new consumption basket, or the new consumption aggregate itself) have to be constructed in a comparable manner to those in the base survey. The variance formula in Equation (5) is based on the total variance formula provided in Equation (5.20) in Little and Rubin (2019), where �2 ) = 1 ∑ ( �2, |2 ) + �1 ∑ ( � | � + 1 (1 ∑ � | ) (C.7) =1 =1 2, 2 =1 2, 2 When S tends to infinity (or is practically large enough), the third term on the right hand side in Equation (C.8) will vanish, thus the stated result follows. Consequently, as discussed above with Equation (C.7), we suggest that 1,000 simulations or more should be used to obtain the poverty estimates and their variances. Dang et al. (2019) further show that the proposed imputed method provides less bias and a better variance than wealth indexes and most proxy means testing methods, largely because it better takes into account the variances of the unobserved effects υc1 and ε1 . Making an additional, but standard, assumption that the variables to be standardized between the two surveys have a normal distribution, such that ~�θ , 2 �, for survey j. We can standardize the variables in survey 2 according those to survey 1 as follows 2→1 = (2 − θ2 ) 1 + θ1 (C.8) 2 where 2→1 is the standardized variable in survey 2. The proof is given in Dang et al. (2017), who further suggest that we can relax the assumption of normality and we can just assume more generally that the distributions of x1 and x2 belong to the same location-scale family (see, e.g., Casella and Berger (2002, pp. 104)). In addition, Dang et al. (2017) also offer a formula to standardize the variables over multiple periods given further assumptions. To make the notation simpler, we do not show the equations above with sampling weights. However, sampling weights can be straightforwardly incorporated in the formulas. For example, consider a commonly used stratified two-stage sample design where the population is divided into R strata, nr clusters are sampled out of a population total of Nr clusters from stratum r in the first stage, and mrc households are sampled out of a population total of Mrc households from cluster c in stratum r in the second stage. Suppose that all individuals within a household share the same poverty status (i.e., poverty is measured at the household level), and household h has mrch members. The formula for estimating the poverty rate at the sth random draw given in Equation (4) can be modified accordingly as 1 1 P(y �2 ,s ≤ z1 ) = ∑=1 ∑=1 ∑ℎ=1 ℎ P(y � 2rchi,s ≤ z1 ) (C.9) with the sampling weight ℎ = � � � � ℎ , and the subscript i indexing individual i in household h. We provide a Stata user-written routine that automates the estimation procedures (Dang and Nguyen, 2014). Dang and Lanjouw (2023) discusses several scenarios where survey-to-survey imputation are most useful in the absence of consumption data. These include i) filling in missing data gaps (most 75 commonly in poorer countries), ii) providing an alternative to conducting new surveys that are prohibitively expensive or for which technical and administrative capacity is unavailable, iii) overcoming issues of non-comparability in existing surveys or to side-step the non-availability of reliable price deflators, and iv) back-casting consumption from a more recent to an older survey for better comparison with older surveys. Additional references Casella, G., & Berger, R. L. (2002). Statistical inference. 2nd Edition. California: Duxbury Press. Dang, H. A., & Nguyen, M. (2014). POVIMP: Stata module to provide poverty estimates in the absence of actual consumption data. Statistical Software Components S457934. Boston College, Department of Economics. Dang, H. A., Jolliffe, D., & Carletto, C. (2019). Data gaps, data incomparability, and data imputation: A review of poverty measurement methods for data‐scarce environments. Journal of Economic Surveys, 33(3), 757-797. Gourieroux, C., & Monfort, A. (1996). Simulation-based econometric methods. Oxford University Press. 76