WPS8403 Policy Research Working Paper 8403 To Impute or Not to Impute? A Review of Alternative Poverty Estimation Methods in the Context of Unavailable Consumption Data Hai-Anh H. Dang Development Economics Development Data Group April 2018 Policy Research Working Paper 8403 Abstract There is an increasingly stronger demand for more frequent national level, to estimates at a more disaggregated level, and accurate poverty estimates, despite the oftentimes as well as estimates of poverty dynamics. The paper pro- unavailable household consumption data. This paper offers vides a concise and accessible synthesis, which serves as an a review of alternative imputation methods that have been introduction to the literature. The focus is on intuition and employed to provide poverty estimates in such contexts. practical insights that highlight the nuanced differences These range from estimates on a nonmonetary basis, esti- between the existing methods rather than technical aspects. mates for specific project targeting or tracking trends at the This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The author may be contacted at hdang@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team To Impute or Not to Impute? A Review of Alternative Poverty Estimation Methods in the Context of Unavailable Consumption Data Hai-Anh H. Dang* Key words: poverty, imputation, consumption, wealth index, synthetic panels, household survey JEL: C15, I32, O15 * Dang (hdang@worldbank.org) is an economist in the Survey Unit, Development Data Group, World Bank, a non- resident research scholar with School of Public and Environmental Affairs, Indiana University, and a non-resident senior research fellow with Vietnam’s Academy of Social Sciences. We would like to thank Gero Carletto, Dean Jolliffe, and Peter Lanjouw for helpful discussions on related work. We are grateful to the UK Department of International Development for funding assistance through its Strategic Research Program (SRP) and Knowledge for Change (KCP) program. Introduction Fighting poverty is a challenging and complex undertaking faced by policy makers in developing and richer countries alike. This undertaking can include different policy options ranging from crafting short-term safety-net programs that prevent vulnerable households from sliding into poverty during a time of crisis, to designing long-term plans that invest in education and skills formation to promote economic growth. But regardless of the specific poverty reduction strategies, a prerequisite for the success of the whole process is a clear understanding of poverty trends and dynamics (either at any single snapshot in time or over different time periods). Indeed, inaccurate measurement naturally results in an inefficient—and even a wasteful—use of resources if, say, short-term employment programs are employed to address a chronic poverty situation caused by inadequate infrastructure. Perhaps the greatest challenge with poverty measurement is the fact that household consumption (or income) data—the underlying data source that provides poverty estimates—do not often meet the necessary requirements.1 For example, such data may simply be unavailable, or may not be comparable from one survey round to the next. As another example, household consumption data are seldom collected on the same households (or individuals) over time, thus making it difficult—if not possible—to track the dynamics of these households’ movements into or out of poverty in different periods. We offer in this paper a review of alternative poverty estimation methods in contexts where consumption data are unavailable. The economic literature on poverty imputation has grown rapidly in the past 20 years, and various methods have been developed. In fact, methods that have 1 We use the terms “income” and “consumption” interchangeably in this review. The latter is often considered to offer better measures of household welfare, especially in developing countries (see, e.g., Deaton (1997)). We focus on monetary poverty in this paper; see Alkire et al. (2015) for a comprehensive discussion of the alternative approach of multi-dimensional poverty. 2 been proposed and discussed using different frameworks and terminology may turn out to be more similar than one might think.2 A typical development practitioner who does not keep regular track of the latest advances in the field may find it time-consuming, and perhaps even quite challenging, to stay abreast of this literature. At the same time, to our knowledge, currently very few studies provide a systematic introduction to this literature. We thus aim to fill in this gap by providing a succinct but user- friendly synthesis of existing poverty imputation methods. In particular, we aim to achieve the following objectives in this review i) Offer a classification of the various poverty imputation situations ii) Lay out clearly the contexts where imputation is most relevant iii) Provide an accessible description of imputation techniques, with explicit flags for common (or potential) mistakes iv) Point out “gray” areas with current imputation methods that need more research. Our goal is to offer a systematic discussion of imputation methods in a consistent format, which starts first with each method’s motivation, a brief description of the method, some recent application examples, and the remaining challenges. While this format may appear rigid, it offers a (somewhat) self-contained treatment of different methods. It also helps facilitate comparison and cross-reference between the various methods and highlight the nuanced differences among them. To help readers—particularly for those who are new to the topic—obtain a quick overview of the literature and the “feel” behind each method, we will focus on offering the intuition rather than the 2 This is not to mention the (beneficial) interactions between the economic literature and the well-established, and more general statistical literature on missing data imputation. We return to more discussion in later sections. Given our focus on intuition and practical insights in this paper, we only provide a superficial description of the imputation methods, and we leave more technical details to footnotes where it is useful to do so. A more technical review with software instructions is offered in Dang, Jolliffe, and Carletto (2017). 3 technical details. Readers who are more familiar with these methods may also find some useful practical insights and suggestions for further research. We will discuss several existing (well-cited) studies as examples, with a focus on developing countries. This paper consists of eight sections. We provide a simple, but new classification of poverty imputation methods in the next section, which also offers a roadmap to the other sections in the paper. This roadmap can function both as a graphical illustration that links the different methods, and as a “user’s guide” that can help readers better identify the issues of their interest. It is subsequently followed by more detailed discussion for poverty estimates on a non-monetary basis (i.e., using wealth indexes) (Section II), poverty estimates for project targeting (Section III), and monitoring poverty trends at the national level (Section IV). We then discuss poverty estimates that are disaggregated below the national level (Section V) and estimates of poverty dynamics (Section VI) before offering some further reflections (Section VII) and the conclusion (Section VIII). I. A “Roadmap” of Poverty Imputation Methods I.1. Roadmap The need to provide imputed poverty estimates varies from context to context, and nuanced differences exist between seemingly similar imputation methods. It can thus be useful to be explicit about the outcomes of interest, as well as the desired types of analysis and their associated challenges, before starting an imputation project. We offer in Figure 1 a simple list of five common questions that can be asked about key aspects of the poverty imputation process, and the suggested strategy to address each question to be discussed in more detail (in the section listed in parentheses). Put differently, this figure proposes a checklist in the decision process to identify the relevant imputation method. 4 Figure 1 suggests that the first question a researcher can ask is whether poverty estimates are to be produced on a monetary (or money-metric) basis. If the answer to this question is no, the relevant imputation method is to construct an asset (wealth) index, and this method is discussed in more detail in Section II. If the answer is yes, the next useful question is whether the desired poverty estimate is nationally representative. If the answer to this question is no, proxy means testing is likely the relevant method (Section III). A yes to this question leads to the third question of whether the poverty estimate will be used for dynamics analysis, which involves an examination of household movements into or out of poverty based on synthetic panel data (Section VI). A no to this question leads to the fourth question of whether the poverty estimate will be disaggregated below the national level. If yes, techniques commonly known as “poverty mapping” (or small-area estimation techniques) are most relevant. Another name for these techniques is survey-to-census imputation, where the imputation model is first built from a survey and subsequently applied to a census, since the latter offers reliable disaggregate data (Section V). If no, survey-to-survey imputation methods are likely most appropriate. The final question is whether the imputation will involve surveys of the same design. Within-survey imputation (Section IV.1) and across-survey imputation (Section IV.2) should be respectively employed if imputation is done on surveys of the same design or of a different design. A related but different classification is offered in Dang et al. (2017a), where a poverty imputation situation is classified according to the degree of missing consumption data. We present a slightly modified version of this classification in Table 1, which offers three categories in a roughly decreasing order of the severity of missing consumption data: completely missing (Category A), partially missing (Category B), and available cross-sectional data but missing panel data (Category C). In other words, while Figure 1 focuses on the functional or practical aspects of 5 poverty imputation methods (and the associated roadmap pointing to the relevant discussion in this paper), Table 1 offers another classification that is more technical and data-oriented. Table 1 also lists the typical poverty imputation (or corresponding data) situation, examples of surveys, and some recent studies corresponding to each missing data category. In particular, Category A can be further broken down into two data situations: one where the available survey produces no consumption data, and the other where the available survey is designed for project targeting purposes. Corresponding to these data situations are the Demographic Health Surveys (DHS) and most small-scale surveys. Category B can also be further disaggregated into three different but related data situations: consumption data are incomparable across survey rounds, consumption data are unavailable in the current survey but available in some other related surveys, and consumption data are unavailable at more disaggregated administrative levels than those offered in the current survey. Finally, Category C addresses the widespread situation that most surveys in developing countries do not provide (nationally representative) household panel data. Table 1 also offers some recent studies that we will discuss in more detail in later sections. I.2. Typical Estimation Equations While we focus in this paper on the intuition behind existing studies, it can be useful to briefly discuss the commonly used empirical framework for clarity. Let xj be a vector of characteristics that represent all the factors that determine a household’s consumption, where j indicates the survey type. More generally, j can indicate either another round of the same household expenditure survey, or a different survey (census), for j= 1, 2.3 Subject to data availability, xj can include household variables such as the household head’s age, sex, education, ethnicity, religion, language 3 More generally, j can indicate any type of relevant surveys that collect household data sufficiently relevant for imputation purposes such as labor force surveys or demographic and health surveys.. 6 (i.e., which can represent household tastes), occupation, and household assets or incomes. Occupation-related characteristics can generally include whether the household head works, the share of household members that work, the type of work that household members participate in, as well as context-specific variables such as the share of female household members that participate in the labor force, or some variables at the region level. Other community or regional variables can also be added since these can help control for different labor market conditions. The following linear model is typically employed in empirical studies to project household consumption on household and other characteristics (x) = ′ + (1) for household i in survey j, for i= 1,…, N. Equation (1) is often extended to a more general model ′ = + + (2) where the error term is now broken down into two components, one ( ) is a cluster random effects and the other ( ) the idiosyncratic error term. Note that we suppress the subscript that indexes households to make the notation less cluttered.4 For convenience, we also refer to the survey that we are interested in imputing poverty estimates for as the target survey, and the survey that we can estimate Equation (1) on as the base survey. The former survey is usually more recent (or offers more disaggregated information, as in the case of a census) and has no consumption data, while the latter is usually older and has consumption data. We discuss the various poverty imputation methods starting from the next section. II. Non-Monetary Poverty Estimates 4 Conditional on household characteristics, the cluster random effects and the error terms are usually assumed 2 2 uncorrelated with each other and to follow a normal distribution such that | ~(0, ) and | ~(0, ). While the normal distribution assumption results in the standard linear random effects model that is more convenient for mathematical manipulations and computation, it is not necessary for this type of model. As can be seen later, we can remove this assumption and use the empirical distribution of the error terms instead, albeit at the cost of somewhat more computing time. 7 Motivation Consumption data are not always collected in a household survey for various reasons. The main reason is that since a typical consumption module (e.g., like that of a Living Standards Measurement Survey) usually consists of up to hundreds of items of food and non-food consumption, it takes time and a certain level of technical expertise to design such a module well. Furthermore, these items need to be updated from time to time to reflect changes in household consumption patterns (e.g., buying a smart phone is becoming an increasingly common purchase these days, but might not be so just 10 years ago). Other factors such as seasonal variations in household consumption (e.g., consumption during holidays can be larger than that during regular times) could only be appropriately accounted for by fielding the consumption module in different months throughout the year. Consequently, collecting household consumption data requires considerable financial resources, time, and logistical capacity, which result in most household consumption surveys being implemented every few years rather than on an annual basis.5 Given this data situation, for the years when consumption data are not collected, but other non- consumption surveys such as labor force surveys (LFS) or small-scale surveys are available, these surveys may be “repurposed” to generate some substitute variable for consumption data. Indeed, one well-known example is the DHS that (usually) do not collect consumption data but offer a wealth index instead. Method Description A well-established method to produce poverty estimates for surveys that have no consumption data but have data on household assets and the physical characteristics of the house is to construct a wealth index from these assets. This method typically consists of two steps. The first step is to identify the list of assets to be used in the construction of the index, and the second step is to apply 5 We return to more discussion on this point in Section VII. 8 some aggregation method to convert these assets into a (single-valued) index. Filmer and Pritchett (2001), who popularized the use of such indexes in the economic literature, employ the following variables from the India National Sample Survey (NSS): household ownership of consumer durables (including clock/watch, bicycle, radio, television, sewing machine, refrigerator, and car), characteristics of the household’s dwelling (including indicators about toilet facilities, the source of drinking water, the rooms in the dwelling, the building materials used, and the main source of lighting and cooking), and household land ownership. They then utilize the principal component analysis (PCA) technique to create a wealth index, which offers different combinations of these assets that aim to capture as much variation in the data as possible.6 Variations exist on both of these steps. Since household questionnaires vary in different contexts, the list of assets to be employed depends on data availability. For example, Filmer and Scott (2012) examine all available asset variables in the DHS from four different countries and find these variables to vary across countries. In particular, their data sets contain between 12 (Uganda) and 29 (Nicaragua) indicators of asset ownership, and between 4 (Ghana) and 12 (Albania) and even as many as 37 (Zambia) indicators of housing characteristics. Besides the PCA technique, the aggregation method can range from simply counting (or adding up) all the indicators of asset ownership to using other techniques such as factor analysis (Sahn and Stifel, 2000).7 Examples and Remaining Challenges 6 See, for example, Jolliffe (2002) for a comprehensive treatment of PCA methods. 7 Moser and Felton (2009) propose a variant of the PCA technique, where analysis is done on the components of each of several types of capital including physical capital, productive capital, human capital, and social capital. Another approach is to produce households’ ranks in the population with the number of consumption items they own, if we make the additional assumption that households place an order of importance on their consumption items when having to reduce their consumption expenditure (Deutsch, Silber, and Wang, 2017). Notably, a common mistake when constructing wealth indexes is to convert ordinal asset variables to dummy variables and then aggregate; see Kolenikov and Angeles (2009) for a careful analysis of this issue. An alternative approach is to collect data on a reduced set of consumption items that may offer strong correlation with the total consumption aggregate (Morris et al., 2000). 9 Sahn and Stifel (2000) offer a well-cited study that constructs wealth indexes for two or more periods from the DHS for 11 Sub-Saharan African countries between 1986 and 1998. They find that poverty generally declined, largely due to improvements in rural areas. A recent study by Harttgen, Klasen, and Vollmer (2013) analyzes 160 DHS surveys from 33 African and 34 non- African countries and constructs wealth indexes in different ways. This study argues, however, that employing wealth indexes as a proxy for trends in household consumption is subject to various types of biases. These include changing preferences for certain assets (e.g., the increasing ownership of smart phones) and changing relative prices among different assets leading to more demand for one asset at the expense of others (e.g., the dramatically decreasing price of smart phones). Other additional challenges exist with using wealth indexes to measure poverty. First, as suggested by the title of this section, wealth indexes offer a non-monetary or relative measure of poverty. Put differently, a wealth index can only identify a household as poor by its relative position on the population’s wealth distribution. This renders it difficult, if not impossible, to compare the poverty rate among different countries without assuming that all the countries under comparison have comparable distributions of wealth. Practically speaking, the list of assets must be either identical or comparable for all these countries, which is certainly harder to satisfy than comparing a consumption distribution denominated on a monetary basis such as the international Purchasing Power Parity (PPP) dollars. For example, an air-conditioner may be a valuable asset in countries in a tropical climate, but it may not even be found in those in a frigid climate, and vice versa with assets such as a heater. Another inconvenience is that the poverty line (threshold) is relative and would need to be fixed for the whole distribution.8 Furthermore, wealth indexes tend 8 A similar concern applies to comparing the wealth index within a country (or countries) over time. In this case, assets for all countries from two periods must be pooled together to construct the wealth index. 10 to provide biased estimates of poverty rates (as measured by household consumption) and may only be able to help track poverty trends over time under special conditions (Dang et al., 2017a). We end this section on a cautious note that even if all the challenges discussed above are overcome, to our knowledge, several other technical challenges remain with the usage of wealth assets to measure poverty. For example, it remains unclear how many assets are sufficient in constructing a wealth index. Assets may be more comparable within a country, but it is unclear to what extent assets are comparable across countries (for cross-country analysis)? It also remains unclear how to take into account the issue of quality versus quantity of assets (e.g., the ownership of one brand-new luxurious car can be different from that of three old and cheap cars).9 III. Poverty Estimates for Project Targeting Motivation Identifying poor households that are beneficiaries of social transfer projects is a common task which is also known as proxy means testing, but completing this task is not simple when there are no data on household consumption. Put differently, this case shares a similar constraint with that of the previous section: consumption data are not collected because of various costs or logistical constraints. One major difference with proxy means testing, however, is targeting a subset of the poor population that are to benefit from (government or NGO) subsidies rather than measuring poverty for the whole population. Another important feature is that proxy means testing is (mostly) implemented when there are two sources of household survey data: one is a nationally representative survey that collects both household consumption data and the variables ( , that are used in Equation (1)), and the other is the special and (much) lighter survey conducted to collect only the variables ( ) for the purpose of proxy means testing. 9 Ngo (2018) offers a method to construct a utility-based living standards index based on the values of assets that may address this issue to some extent (assuming that asset quality is correctly measured by asset values). But one practical challenge with this approach is that it requires the collection of asset price data. 11 Method Description Proxy means testing involves two steps. The first step is to estimate the household consumption model using the nationally representative household (or larger) survey, as expressed in Equation ̂ obtained from the (1). Once this is done, the second step is to combine the predicted coefficients first step with the variables in the smaller survey to generate the predicted consumption (i.e., ̂ ′ ) for household i in the smaller survey. In other words, proxy means estimating the term ̂ that come from the larger consumption survey and the testing employs the predicted coefficients variables in the smaller survey to generate the predicted household consumption, which is subsequently utilized to generate poverty estimates.10 Examples and Remaining Challenges A major advantage of proxy means testing over wealth indexes is that the former offers an estimate of household consumption, while the latter an estimate of household wealth. As such, proxy means testing perhaps provides a better estimate of poverty rates (that are based on household consumption data). Still, proxy means testing tends to offer biased poverty estimates. Indeed, Brown, Ravallion, and van de Walle (2016) offer a recent assessment of the performance of various proxy-means testing methods using data for nine African countries. They find that standard proxy-means testing is useful with targeting but excludes many poor people. Furthermore, even with some methodological adjustments, there is room for improvement with proxy-means tests, particularly with identifying the poorest. The intuition behind this result is rather straightforward: the household poverty status is based on the household consumption that consists of two terms on the right hand side of Equation (1), 10 For recent reviews of proxy mean tests, see Grosh et al. (2008), Coady et al. (2014), and Brown, Ravallion and van de Walle (2016). 12 but proxy means testing likely offers biased estimates since it offers the estimate for only one of ̂ ′ ). 11 Perhaps the largest advantages of proxy means testing is that for a smaller these terms (i.e., population group, it is rather simple to implement (relatively speaking, compared with other imputation methods). Consequently, it can offer a quick and inexpensive estimate of poverty in the absence of household consumption data. This, however, may come at the expense of estimate precision. IV. Poverty Estimates for Tracking Trends at the National Level Motivation While proxy means testing targets a subset of the poor population that are to benefit from some social transfer programs, tracking trends in poverty rates at the national level requires poverty estimates for the whole population. The motivation for poverty imputation in this case is related to discussions regarding national poverty trends. For example, the Sustainable Development Goals (SDGs) require frequent monitoring of (national and global) poverty trends that would perhaps require much more frequent collection of household consumption data than most existing household consumption surveys allow (see our earlier discussion with Figure 1).12 In particular, most developing countries, say India, do not collect household consumption data annually. As such, poverty estimates for these countries would have to be obtained using alternative methods such as poverty imputation methods. In this case, similar to proxy means testing, we also need two sources of household survey data: one survey that collects both household consumption data and the variables ( , that are used 11 Proxy means tests offer unbiased estimates of poverty only in the special case that the poverty line is set exactly at the mean consumption level (see Dang et al. (2017a) for a more detailed technical discussion). See also Diamond et al. (2016) for a careful comparison of the poverty score card—a variant of proxy means testing—and other (regression- based) poverty imputation methods. 12 The first goal of the SDGs is to eliminate extreme poverty and reduce national poverty levels at least by half by 2030. For details see https://sustainabledevelopment.un.org/sdg1. 13 in Equation (1)), and the other is another non-consumption survey that offers only the variables ( ). One major difference is that both surveys should provide nationally representative data. We can divide this case into two subcategories depending on whether the non-consumption survey is of the same design or a different design from the consumption survey. The former subcategory—hereafter referred to as within-survey imputation—includes situations where the household consumption data collected in the two surveys are not comparable over time due to changes to the consumption items (while the remain similar). This situation occurs more often than one might think; for example, national statistical agencies regularly update the list of consumption items from time to time to reflect changes in household consumption patterns regarding high-technology goods such as smart phones or smart televisions. The latter subcategory—hereafter referred to as across-survey imputation—includes situations where the consumption survey was implemented a few years back, and a newer round is yet to be fielded. At the same time, there exists another more recent non-consumption survey such as a labor force survey (LFS) that can be combined with the consumption survey to provide more recent poverty estimates. Both of these sub-categories of within-survey imputation and across-survey imputation are also commonly known as survey-to-survey imputation (which is different from survey-to- census imputation or the poverty mapping technique discussed in the next section). We discuss next the techniques and challenges for both these subcategories. Method Description Similar to proxy means testing, survey-to-survey imputation methods consist of two main steps. The first step is to estimate the household consumption model using the nationally representative household (or larger) survey, as expressed in Equation (1). Once this is done, the ̂ and the distribution of the error term second step is to combine the predicted coefficients 14 obtained from the first step with the variables in the smaller survey to generate the predicted consumption. However, different from proxy means testing, the predicted household consumption generated using survey-to-survey imputation methods is composed of both the two terms on the ̂ ′ and ̂ .13 right-hand side of Equation (1), that is Notably, the estimation framework utilized by most existing economic studies is largely based on the seminal survey-to-census imputation method offered by Elbers, Lanjouw, and Lanjouw (2003) (which we return to more discussion in the next section).14 Most recently, building on the Elbers et al. (2003) method, Dang, Lanjouw, and Serajuddin (2017b) attempt to bring some further improvements to the survey-to-survey poverty imputation method, which include simpler variance formulas, more guidance on the selection of control variables for model building, and formulas for standardization of variables from surveys with different sampling designs. They validate estimation results against both household consumption data and LFS data from Jordan before combining these two sources of data to provide more recent poverty estimates. Examples and Remaining Challenges Poverty reduction in India has been subject to intense debates in the past, which was caused by comparability issues with different rounds of the National Sample Survey (NSS)—the country’s 13 Furthermore, ̂ is usually disaggregated into a cluster random effects term ( ̂ ) and an idiosyncratic error term ̂) ( as in Equation (2). This feature, as well as the addition of ̂ to the error term, results in more accurate estimates of household consumption and poverty estimates than proxy means testing (see Dang et al. (2017a) for more detailed discussion). Also note that for consistency, the poverty line in the base survey—rather than that in the target survey— should be used together with the predicted consumption to obtain poverty estimates. 14 Variants on this method exist. For example, Tarozzi (2007) proposes a two-step inverse probability weighting probit estimator, with the relevant weights derived in the first step from the change in the distribution of household characteristics across the two surveys. Mathiassen (2009) also employs a probit estimator, but proposes an exact expression for the standard errors and imposes a stricter parametric functional form on the error term. On the empirical front, Christiaensen et al. (2012) and Mathiassen (2013) apply the Elbers et al. (2003) framework to provide poverty estimates based on within-survey imputation for several countries including China, Kenya, the Russian Federation, Uganda, and Vietnam. Using the same technique, other studies combine the household consumption survey with other surveys such as the DHS (Stifel and Christiaensen, 2007) or the LFS (Mathiassen (2009) and Douidich et al. (2016) to provide across-survey imputation. 15 mainstream consumption survey data—over time (Deaton and Kozel 2005). Similar concerns, albeit to a lesser extent, were raised about the dramatic poverty reduction between 2004 and 2012 as well. One main reason is that the questionnaire design of the consumption module in the 2011/12 (68th) round of the NSS is not comparable to that in the 2009/10 (66th) round (and 2004/05 or 61st round), which may result in inconsistently constructed and incomparable consumption data. Dang and Lanjouw (in press) apply imputation methods to provide checks on the poverty trend. They first build an imputation model using the 2004/05 round as the base survey to obtain poverty estimates for 2009/10, which are satisfactorily not different from the actual poverty rates. They subsequently employ the same model using the 2009/10 as the base survey to obtain poverty estimates for 2011/12. These estimates are close to the actual rates in this year, thus providing supportive evidence for the swift fall in poverty observed in the data. A key assumption for survey-to-survey imputation is that the coefficients estimated from the previous consumption survey can be combined with the variables in the more recent survey to obtain poverty estimates.15 While concerns exist that this assumption is likely to be valid only under normal circumstances, rather than during periods of fast (economic growth and) poverty reduction, it has been shown to hold during a period of dramatic economic growth in China and Vietnam where poverty incidence was cut by around half (Christiaensen et al., 2012). Furthermore, a weaker version of this assumption has been proposed and validated for data from various countries such as India, Jordan, and Vietnam (Dang et al., 2017a; Dang et al., 2017b; Dang and Lanjouw, in press). Yet, we would like to note that the validity of this assumption can be context- specific, and it can be useful to check it using at least two previous rounds of household consumption surveys wherever such data are available. Common sense also suggests that the 15 This is also commonly known as the constant parameter assumption. 16 longer is the time interval between the base survey and the target survey, the more likely that this assumption can be violated. Another concern with survey-to-survey imputation, or more accurately speaking, across- survey imputation methods, is that the variables used in the imputation in both the base survey and the target survey should have the same distribution. This seemingly rather innocuous condition appears often taken for granted in many studies, but if it is not satisfied, it may result in severely biased estimates (Dang et al., 2017b)16. The intuition is rather straightforward: variables such as household size or labor force participation may be defined differently in a household consumption survey and a labor force survey, and the data can be collected accordingly in different ways.17 Consequently, this condition should be carefully checked, and appropriate adjustments (e.g., standardizing the variables) should be done before imputation is implemented on surveys of a different design. V. Disaggregated Poverty Estimates at the Sub-National Level Motivation In most household consumption surveys, consumption data are rarely available at a disaggregated level (such as the state or province level) due to the typically limited sample size of household surveys. However, there exists a strong demand to produce poverty estimates at more disaggregated levels for various purposes such as social transfer targeting or budget allocation. For example, statistical agencies such as the U.S. Census Bureau routinely implement this task to 16 Dang et al. (2017b) also offer a simple method to standardize the variables from the two different surveys. This study also provides more discussion on another related issue of selecting variables in estimating the consumption model as in Equation (1). 17 The inconsistency between different surveys is well documented in studies using data from richer countries. For example, Abraham et al. (2013) examine the differences between employment data between the U.S. Current Population Surveys and employer-reported administrative data. See also Angrist and Krueger (1999) for a related review of comparability and other data issues with a focus on labor force surveys. 17 identify poor communities.18 This task of identifying poor households is commonly known as “poverty mapping” in most studies on developing countries, since poverty estimates are usually graphed on a map at a lower-level administrative level (such as that of a district or a local community).19 In other words, this case typically involves survey-to-census imputation, since only censuses can offer more disaggregated data than those available in a household survey. Put differently, survey-to-census imputation can often be regarded as some type of geographical imputation, which differs from the (mostly) temporal imputation offered with survey-to-survey imputation. Method Description Similar to proxy means testing and survey-to-survey imputation methods, survey-to-census imputation methods also consist of two main steps. The first step is to estimate the household consumption model in Equation (2) using the more aggregated-level household survey. The second ̂ and the distribution of the two error terms step is to combine the predicted coefficients ̂ and ̂ obtained from the first step with the variables in the census to generate predicted consumption data at a more disaggregated level. Elbers et al. (2003) is perhaps the first study that introduces a formal framework for survey- to-census imputation in economics. Building on this framework, Tarozzi and Deaton (2007) propose another condition where the conditional distribution of household consumption given is the same for both the survey and the census. This assumption ensures that the estimated parameters from the smaller areas (as representative in the survey) can be imposed on the data for the larger areas (as representative in the census). Recent developments have been proposed, mostly 18 See, e.g., https://www.census.gov/srd/csrm/SmallArea.html. 19 Another name for this topic in the statistical literature is “small-area estimation” (see, e.g., Rao and Molina (2015) for a recent textbook treatment). 18 in the statistics literature, to offer extensions or alternative estimation techniques to the Elbers et al. (2003) method.20 Examples and Remaining Challenges Elbers et al. (2007) use “poverty maps” from three countries for an ex ante evaluation of the distributional incidence of geographic targeting of public resources. They simulate the impact on poverty of transferring an exogenously given budget to geographically defined sub-groups of the population according to their relative poverty status. They find large gains from targeting smaller administrative units, such as districts or villages. They also suggest that poverty map-based geographic targeting can be combined with within-community targeting mechanisms for better estimation results. Lanjouw, Marra, and Nguyen (2017) employ small area estimation techniques to estimate the poverty indexes of Vietnam's provinces and districts in 2009. They find poverty rates to become more spatially concentrated over time, which is consistent with agglomeration- related growth processes. They offer simulation results suggesting that in both 1999 and 2009 geographic targeting for poverty alleviation improves upon a uniform lump-sum transfer, particularly for the more spatially disaggregated target populations. We note that survey-to-census imputation shares a similar issue as with across-survey imputation methods: the variables used in the imputation in both the (base) survey and the (target) census should have the same distribution. Perhaps we cannot overemphasize the importance of this condition, given both the theoretical results and empirical evidence (offered by Tarozzi and Deaton (2007) and Dang et al. (2017b)). However, to our knowledge, few studies offer explicit 20 See, e.g., Bilton et al. (2017) for a proposal to use a classification trees technique for poverty mapping, and Das and Chambers (2017) for alternative standard error formulae with the Elbers et al. (2003) method. See also Pratesi (2016) for a recent collection of studies discussing various technical aspects of poverty mapping. Another study by Steele et al. (2017) applies machine learning techniques and big data (i.e., cell phone and satellite data) to evaluate poverty mapping. 19 checks of this condition before implementing the imputation. Even fewer studies, if any, attempt to standardize the variables in both the survey and the census. Another interesting and useful area that needs more research is how to produce and interpret the evolution of poverty maps over time. Multiple challenges exist with generating such dynamic poverty maps. One is that we would need survey and census data at two points in time, and both these data sources should be comparable both at each point in time (i.e., the issue discussed immediately above) and over time (i.e., the issue of the constant parameter assumption as discussed with survey-to-survey imputation methods). While some alternative techniques have been proposed,21 it seems that this topic still needs more development. VI. Dynamic Poverty Estimates Motivation Different poverty situations are best addressed with different policy responses. In particular, transitory and chronic poverty typically require different policy instruments and no single policy may successfully address both. For example, while transitory poverty can be alleviated with safety net programs, chronic poverty would need to be tackled with structural and longer-term interventions such as investment in human capital and building infrastructure.22 However, poverty estimates based on cross-sectional data provide only static snapshots of poverty rates, rather than the dynamics of poverty transitions over time. Absent a clear understanding of poverty dynamics, a seemingly unchanged poverty rate of, say, 15 percent in two periods could conceal dynamic processes ranging from zero mobility (i.e., where all households see no change in their welfare) to perfect mobility (i.e., where all poor households in the first 21 For example, Nguyen (2011) offers an innovative study that uses panel data from household surveys to estimate the relation between expenditure in the second period and household characteristics in the first period. The estimated parameters are then applied to a census in the second period to predict expenditure and poverty measures in a future third period. This approach may address, partially but not completely, the issues raised above. 22 See, e.g., Barret (2005) and Ravallion (2016) for more discussion on different policy approaches to reduce poverty. 20 period escaped poverty and were all replaced by households that had previously been non-poor in the first period) and any scenario between these two extremes. Dynamics analysis is crucial for the design of effective and efficient poverty reduction policies, but such analysis requires panel survey data that are usually unavailable (particularly for developing countries). Method Description In the absence of actual panel data, Dang et al. (2014) and Dang and Lanjouw (2013) recently propose methods to construct synthetic panels from repeated cross sections, which have provided encouraging results in various settings.23 Their techniques share certain similarities with survey- to-survey imputation methods and include the following steps. First, estimate the household consumption model in Equation (1) using the available cross sections to obtain the predicted ̂ . Second, estimate the correlation coefficient of the error terms , using cohort- coefficients ̂ and aggregated household consumption between the two surveys. Third, combine ̂ to obtain estimates of poverty mobility using bivariate probability formulae. A key difference, however, with survey-to-survey imputation methods is that the xj in Equation (1) should consist of time-invariant characteristics alone. These include such variables as ethnicity, religion, language, place of birth, and parental education, which provide the connectors between different rounds of cross sections with the appropriate age adjustments. For example, the cohorts 23 For example, recent applications and further validations of these synthetic panels methods include Ferreira et al. (2013), Cruces et al. (2015), and Vakis et al. (2015) for Latin American countries, Bourguignon, Moreno, and Dang (2018) and Foster and Rothbaum (2015) for Mexico, Balcazar et al. (2018) for Colombia, Martinez et al. (2013) for the Philippines, Garbero (2014) for Vietnam, Cancho et al. (2015) for countries in Europe and Central Asia, Dang and Ianchovichina (forthcoming) for countries in the Middle East and North Africa region, Dang and Dabalen (in press) and Dang, Lanjouw and Swinkels (2017) for Sub-Saharan African countries, and Dang and Lanjouw (2017 and forthcoming) for India, Vietnam, and the United States. Researchers at international organizations including the UNDP and the Asian Development Bank have also applied these methods for analysis of welfare mobility (UNDP, 2016; Jha et al., 2018); see also OECD (2015) for an application by the OECD to study labor transitions in richer countries. See also Gibson (2001) for a related study on how panel data on a subset of individuals can be used to infer chronic poverty for a larger sample. 21 age 25 to 55 in the first survey round are likely the same cohorts age 30 to 60 in the second survey round five years later.24 The effects of the time-varying variables are thus subsumed in the correlation coefficient of the error terms. This feature stands in contrast with the survey-to- survey imputation methods discussed in earlier sections that aim to capture as many relevant (time- invariant and time-varying) variables on the right-hand side in their estimation models.25 Examples and Remaining Challenges Despite a growing collection of nationally representative panel surveys for African countries, data coverage is unfortunately available for only seven countries, and the time periods spanned by these panel surveys are mostly limited to short periods of three years or less. To overcome this data shortage, Dang and Dabalen (in press) construct synthetic panel data for more than 20 countries accounting for two-thirds of the population in Sub-Saharan Africa; these synthetic panels span an average of six years for each country. Their analysis suggests that all these countries as a whole have had pro-poor growth, with one-third of the poor population escaping poverty, which is larger than the proportion of the population that fell into poverty in the same period. Chronic poverty, however, remains high and a considerable proportion of the population is vulnerable to falling into poverty. Despite their increasing popularity, synthetic panels are not the perfect substitute for actual panel data. In particular, not much is known whether, and how useful synthetic panels can be utilized in applications involving studying a causal relationship or regression analysis. The analysis offered to date in terms of profiling the poverty trajectories for population groups with these 24 The age range can be adjusted similarly if there is a different time interval between the two survey rounds. Other time-varying household characteristics can also be used if retrospective questions about the round-1 values of such characteristics are asked in the second-round survey. 25 This difference is further accentuated with some missing data imputation methods in the statistical literature where sample design variables such as sampling weights, strata and cluster identifiers are also included in the estimation model (see, e.g., Rubin (1987)). 22 synthetic panels is mostly descriptive, with little explicit attention to underlying causal mechanisms. More research is thus needed on the application of these synthetic panels in such contexts. Put differently, it is useful to know to which extent synthetic panels can substitute for actual panel survey data. VII. Further Reflections on Related Issues We are more often than not faced with contexts where we either have no consumption data, or the available consumption data have quality. Indeed, Serajuddin et al. (2015) find that over the period 2002- 2011, almost one-fifth (i.e., 28) of the 155 countries have only one poverty data point in the WDI database, and as many as 29 countries do not have any poverty data point in the same period. Another recent survey by Beegle et al. (2016) indicates that slightly more than half (i.e., 27) of the 48 countries in Sub-Saharan Africa had two or more comparable household surveys for the period between 1990 and 2012. Clearly, poverty imputation is useful, and may likely be the only choice in these cases. But what about other contexts where we have a choice over poverty imputation and implementing a full-scale household consumption survey? Dang et al. (2017a) suggest that even in these contexts, there are still a couple of advantages to poverty imputation methods, particularly in the following scenarios i) In the immediate term (when micro survey data are not fully available for all countries) ii) Survey costs and/or survey implementation pose a challenge iii) Back-casting consumption from a more recent survey for better comparison with older surveys. It can be useful to offer some additional commentary on these cases. Cases (i) and (ii) are closely related and are perhaps the main driving factor behind poverty imputation, since very few, if any, developing countries can afford the financial and logistical challenges of fielding a household 23 consumption survey every year. Consequently, most developing countries are likely to implement the household consumption survey every few years. In such contexts, poverty imputation methods can offer a (far) less costly option to provide estimates for the intervening years between the surveys.26 Case (iii), although less common, certainly represents the scenario where poverty imputation is the only route to provide poverty estimates for surveys fielded in the past. Another advantage with poverty imputation methods that has received little attention is the fact that such methods, particularly survey-to-survey imputation, can help us bypass the oftentimes thorny issues of obtaining the appropriate (intertemporal and intraregional) price deflators. This issue worsens for cross-country comparison, since in that case we have to employ conversion factors to convert the different currencies to the same base. Notably, a promising direction for further development with poverty imputation methods is that they need not be restricted to the topic of poverty alone but can be utilized in other fields as well. For example, Fujii (2010) and Sohnesen et al. (2017) employ the Elbers et al. (2003) method to provide a map of child malnutrition respectively in Cambodia and Ethiopia; Gibson (2018) uses the same method to study the effect of deforestation on subsequent inequality in the rural Solomon Islands. As another example, a recent study by Dang and Ianchovichina (forthcoming) constructs synthetic panels from the cross sections in the Gallup Poll for 16 countries in the Middle East to offer analysis of dynamics of subjective well-being during the Arab Spring period. Figure 3, taken from this study, plots the percentage of the poor or vulnerable in the first year who move up one or two welfare categories in the second year for major population groups classified by gender, 26 Recent estimates suggest that the average cost of implementing a household consumption survey (in 2014 or later) ranges from approximately $US 800,000 to US$ 5 million depending on the context and sample sizes (Kilic et al., 2017). At the same time, implementing poverty imputation may require only a fraction of this cost since its major expense is to cover analytical time. Indeed, selective pairing of international experts with national statistical agencies’ staff can form small teams that provide both cost-effective analysis and local capacity building. 24 education levels, work status, migration status, and residence areas. This figure suggests that upward mobility is weaker than downward mobility both for Arab Spring and other Arab countries, and migrants are more likely to be less upwardly mobile (and more downwardly mobile) in Arab Spring countries, but the opposite holds for non-Arab Spring countries. As such, in a similar spirit, other potentially useful applications of poverty imputation methods may include emerging policy- relevant topics such as vulnerability, multidimensional poverty, or gender equality. Yet, we end this section on a cautious note that poverty imputation methods, like most other statistical models, rely heavily on the modeling techniques and their accompanying assumptions. If the model assumptions are satisfied by the data, poverty imputation can yield encouraging and low-cost results. However, the opposite holds where the model assumptions are invalid. It would thus be useful to offer careful checks on the relevant modeling assumptions as well as the variable selection process before providing imputation-based poverty estimation. VIII. Conclusion The growing demand for more frequent and accurate poverty estimates is not satisfied by the current availability of household consumption data, at least in the short run. Imputation methods offer a promising solution against this background and have been widely in use.27 These methods have also received increasingly more attention. For example, a recent and high-profile report on monitoring global poverty (Atkinson, 2017) explicitly calls for further exploration of imputation techniques for poverty measurement purposes in data-scarce contexts. Yet, there is currently a severe dearth of research that can bridge the gaps between the typical development practitioners and the latest advances in the field. We aim to help fill in this gap with this review. 27 See, e.g., Jolliffe et al. (2015) for a recent review. 25 References Abraham, Katharine G., John Haltiwanger, Kristin Sandusky, and James R. Spletzer. (2013) “Exploring Differences in Employment between Household and Establishment Data”. Journal of Labor Economics, 31, S129-S172. Alkire, Sabina, James Foster, Suman Seth, Jose Manuel Roche, and Maria Emma Santos. (2015). Multidimensional Poverty Measurement and Analysis. USA: Oxford University Press. Angrist, J. D. and Krueger, A. B. (1999) “Empirical Strategies in Labor Economics.” In Ashenfelter, Orley and David E. Card. (Eds.). Handbook of Labor Economics, Vol. 3c. Amsterdam: North-Holland. Atkinson, Anthony B. (2017). Monitoring Global Poverty: Report of the Commission on Global Poverty. Washington, DC: The World Bank. Balcazar, Carlos Felipe, Hai-Anh Dang, Eduardo Malasquez, Sergio Olivieri and Julieth Pico. (2018). “Welfare Dynamics in Colombia: Results from Synthetic Panels”. Working paper. Barrett, Christopher B. (2005). "Rural Poverty Dynamics: Development Policy Implications." Agricultural Economics, 32: 45-60. Beegle, Kathleen, Luc Christiaensen, Andrew Dabalen, and Isis Gaddis. (2016). Poverty in a Rising Africa. Washington, DC: The World Bank. Bilton, Penny, Geoff Jones, Siva Ganesh, and Steve Haslett. (2017). "Classification Trees for Poverty Mapping." Computational Statistics & Data Analysis, 115: 53-66. Bourguignon, Francois, Hector Moreno, and Hai-Anh Dang. (2018). “On the Construction of Synthetic Panels”. Working paper. Paris School of Economics. Brown, Caitlin, Martin Ravallion, and Dominique van de Walle. (2016). “A Poor Means Test? Econometric Targeting in Africa”. World Bank Policy Research Working Paper No. 7915. Washington DC: The World Bank. Cancho, Author César, María E. Dávalos, Giorgia Demarchi, Moritz Meyer, and Carolina Sánchez Páramo. (2015). “Economic Mobility in Europe and Central Asia: Exploring Patterns and Uncovering Puzzles”. World Bank Policy Research Paper No. 7173. Christiaensen, Luc, Peter Lanjouw, Jill Luoto, and David Stifel. (2012). "Small Area Estimation- based Prediction Models to Track Poverty: Validation and Applications.” Journal of Economic Inequality, 10(2):267-297. Coady, David, Margaret Grosh, and John Hoddinott. (2014). “Targeting Outcomes Redux”. World Bank Research Observer, 19:61–85. 26 Cruces, Guillermo, Peter Lanjouw, Leonardo Lucchetti, Elizaveta Perova, Renos Vakis, and Mariana Viollaz. (2015). “Estimating Poverty Transitions Repeated Cross-Sections: A Three- country Validation Exercise”. Journal of Economic Inequality, 13:161–179. Cuesta Jose and Gabriel Lara Ibarra. (2018). “Comparing Cross-Survey Micro Imputation and Macro Projection Techniques: Poverty in Post Revolution Tunisia”. Journal of Income Distribution. Dang, Hai-Anh and Andrew L. Dabalen. (in press). “Is Poverty in Africa Mostly Chronic or Transient? Evidence from Synthetic Panel Data”. Journal of Development Studies. Dang, Hai-Anh and Elena Ianchovichina. (forthcoming). “Welfare Dynamics with Synthetic Panels: The Case of the Arab World in Transition”. Review of Income and Wealth. Dang, Hai-Anh and Peter Lanjouw. (2013). “Measuring Poverty Dynamics with Synthetic Panels Based on Cross-Sections”. World Bank Policy Research Working Paper No. 6504, World Bank, Washington, DC. ---. (2016). “Toward a New Definition of Shared Prosperity: A Dynamic Perspective from Three Countries”. In Kaushik Basu and Joseph Stiglitz. (Eds.). Inequality and Growth: Patterns and Policy. Palgrave MacMillan Press. ---. (2017). “Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Empirical Application”. Review of Income and Wealth, 63(4): 633-660. ---. (in press). “Poverty and Vulnerability Dynamics for India during 2004-2012: Insights from Longitudinal Analysis Using Synthetic Panel Data”. Economic Development and Cultural Change. Dang, Hai-Anh, Dean Jolliffe, and Calogero Carletto. (2017a). "Data Gaps, Data Incomparability, and Data Imputation: A Review of Poverty Measurement Methods for Data-Scarce Environments". World Bank Policy Research Paper # 8282. World Bank: Washington, DC. Dang, Hai-Anh, Peter Lanjouw, Umar Serajuddin. (2017b). “Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data: Methods and Illustration with Reference to a Middle-Income Country.” Oxford Economic Papers, 69(4): 939-962. Dang, Hai-Anh, Peter Lanjouw, Jill Luoto, and David McKenzie. (2014). “Using Repeated Cross- Sections to Explore Movements in and out of Poverty”. Journal of Development Economics, 107: 112-128. Das, Sumonkanti, and Ray Chambers, R., (2017). “Robust mean‐squared error estimation for poverty estimates based on the method of Elbers, Lanjouw and Lanjouw”. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4): 1137-1161. Deaton, Angus. (1997). The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. MD: The Johns Hopkins University Press. 27 Deaton, Angus and Valerie Kozel. (2005). The Great Indian Poverty Debate. New Delhi: Macmillan. Deutsch, Joseph, Jacques Silber, and Guanghua Wan. (2017). “Curbing One’s Consumption and the Impoverishment Process: The Case of Western Asia”. Research on Economic Inequality, 25: 1-24. Diamond, Alexis, Michael Gill, Miguel Rebolledo Dellepiane, Emmanuel Skoufias, Katja Vinha, and Yiqing Xu. (2016). “Estimating Poverty Rates in Target Populations: An Assessment of the Simple Poverty Scorecard and Alternative Approaches”. Policy Research Working Paper No. 7793. World Bank, Washington, DC. Douidich, Mohamed, Abdeljaouad Ezzrari, Roy van der Weide, and Paolo Verme. (2016). “Estimating Quarterly Poverty Rates Using Labor Force Surveys: A Primer.” World Bank Economic Review, 30(3): 475-500. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. (2003). “Micro-Level Estimation of Poverty and Inequality.” Econometrica, 71(1): 355-364. Elbers, Chris, Tomoki Fujii, Peter Lanjouw, Berk Özler, and Wesley Yin. (2007). "Poverty Alleviation through Geographic Targeting: How Much Does Disaggregation Help?" Journal of Development Economics, 83(1): 198-213. Ferreira, Francisco H. G., Julian Messina, Jamele Rigolini, Luis-Felipe López-Calva, Luis Felipe López-Calva, and Renos Vakis. (2012). Economic Mobility and the Rise of the Latin American Middle Class. Washington DC: World Bank. Filmer, Deon and Lant Pritchett. (2001). “Estimating Wealth Effects without Expenditure Data— or Tears: An Application to Educational Enrollments in States of India”. Demography, 38(1): 115–132. Filmer, Deon and Kinnon Scott. (2012). “Assessing Asset Indices.” Demography, 49 (1): 359–92. Fujii, Tomoki. (2010). “Micro-Level Estimation of Child Undernutrition Indicators in Cambodia”. World Bank Economic Review, 24(3): 520–553. Garbero, Alessandra. (2014). “Estimating Poverty Dynamics Using Synthetic Panels for IFAD- supported Projects: A Case Study from Vietnam”. Journal of Development Effectiveness, 6(4): 490-510. Gibson, John. (2001). “Measuring Chronic Poverty without a Panel”, Journal of Development Economics 65(2): 243-66. 28 ---. (2018). “Forest Loss and Economic Inequality in the Solomon Islands: Using Small-Area Estimation to Link Environmental Change to Welfare Outcomes”. Ecological Economics, 148: 66–76. Grosh, M., C. Del Ninno, E. Tesliuc, and A. Ouerghi. (2008). For Protection and Promotion: The Design and Implementation of Effective Safety Nets. Washington, DC: World Bank. Harttgen, Kenneth, Stephan Klasen, and Sebastian Vollmer. (2013). “An African Growth Miracle? Or: What Do Asset Indices Tell Us about Trends in Economic Performance?” Review of Income and Wealth, 59(S1): S37–S61. Jha, S., A. Martinez, P. Quising, Z. Ardaniel, and L. Wang. (2018). “Natural Disasters, Public Spending, and Creative Destruction: A Case Study of the Philippines”. ADBI Working Paper 817. Tokyo: Asian Development Bank Institute. Jolliffe, Dean, Peter Lanjouw, Shaohua Chen, Aart Kraay, Christian Meyer, Mario Negre, Espen Prydz, Renos Vakis, and Kyla Wethli. (2015). A Measured Approach to Ending Poverty and Boosting Shared Prosperity: Concepts, Data, and the Twin Goals. Washington DC: The World Bank. Kilic, Talip, Umar Serajuddin, Hiroki Uematsu, and Nobuo Yoshida. (2017). "Costing Household Surveys for Monitoring Progress toward Ending Extreme Poverty and Boosting Shared Prosperity." World Bank Policy Research Paper no. 7951, World Bank, Washington, DC. Kolenikov, S. and Angeles, G. (2009). “Socioeconomic status measurement with discrete proxy variables: Is principal component analysis a reliable answer?” Review of Income and Wealth, 55(1): 128-165. Lanjouw, Peter, Marleen Marra, and Cuong Nguyen. (2017). "Vietnam’s Evolving Poverty Index Map: Patterns and Implications for Policy." Social Indicators Research, 133(1): 93-118. Martinez, Arturo Jr., Mark Western, Michele Haynes, Wojtek Tomaszewski. (2013). “Measuring Income Mobility Using Pseudo-Panel Data”. Philippine Statistician, 62(2): 71-99. Mathiassen, Astrid. (2009). “A Model Based Approach for Predicting Annual Poverty Rates without Expenditure Data”. Journal of Economic Inequality, 7:117–135. ---. (2013). “Testing Prediction Performance of Poverty Models: Empirical Evidence from Uganda”. Review of Income and Wealth 59, no. 1:91–112. Morris, Saul S., Calogero Carletto, John Hoddinott, and Luc JM Christiaensen. (2000). "Validity of Rapid Estimates of Household Wealth and Income for Health Surveys in Rural Africa." Journal of Epidemiology & Community Health, 54(5): 381-387. 29 Moser, Caroline and Andrew Felton. (2009). “The Construction of an Asset Index: Measuring Asset Accumulation in Ecuador”. In Addison, T., Hulme, D. and Kanbur, R. (Eds.) Poverty Dynamics: Interdisciplinary Perspectives. Oxford: Oxford University Press. Ngo, Diana K. L. (2018). “A Theory-based Living Standards Index for Measuring Poverty in Developing Countries”. Journal of Development Economics, 130: 190-202. Nguyen, Cuong V. (2011). “Poverty Projection Using a Small Area Estimation Method: Evidence from Vietnam”. Journal of Comparative Economics, 39:368–382. OECD. (2015). OECD Employment Outlook 2015. OECD Publishing, Paris. http://dx.doi.org/10.1787/empl_outlook-2015-en Pratesi, Monica. (Eds.) (2016). Analysis of Poverty Data by Small Area Estimation. John Wiley & Sons. Rao, J. N. K. and Isabel Molina. (2015). Small Area Estimation, 2nd edition, New York: Wiley. Ravallion, Martin. (2016). The Economics of Poverty: History, Measurement, and Policy. New York: Oxford University Press. Rubin, Donald B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley. Sahn, David E. and David C. Stifel. (2000). “Poverty Comparison over Time and across Countries in Africa”. World Development, 28(12): 2123-2155. Serajuddin, Umar, Hiroki Uematsu, Christina Wieser, Nobuo Yoshida, and Andrew Dabalen. (2015). "Data deprivation: another deprivation to end." World Bank Policy Research Paper no. 7252, World Bank, Washington, DC. Sohnesen, Thomas Pave, Alemayehu Azeze Ambel, Peter Fisker, Colin Andrews, and Qaiser Khan. (2017). "Small Area Estimation of Child Undernutrition in Ethiopian Woredas." PloS one 12(4): e0175445. Steele, Jessica E., Pål Roe Sundsøy, Carla Pezzulo, Victor A. Alegana, Tomas J. Bird, Joshua Blumenstock, Johannes Bjelland, Kenth Engø-Monsen, Yves-Alexandre de Montjoye, Asif M. Iqbal, Khandakar N. Hadiuzzaman, Xin Lu, Erik Wetter, Andrew J. Tatem, and Linus Bengtsson. (2017). “Mapping Poverty Using Mobile Phone and Satellite Data”. Journal of the Royal Society Interface. DOI: 10.1098/rsif.2016.0690. Stifel, D. and Christiaensen, L. (2007) “Tracking Poverty over Time in the Absence of Comparable Consumption Data”. World Bank Economic Review, 21, 317-341. Tarozzi, Alessandro. (2007). “Calculating Comparable Statistics from Incomparable Surveys, With an Application to Poverty in India”. Journal of Business and Economic Statistics 25, no. 3:314-336. 30 Tarozzi, Alessandro and Angus Deaton. (2009). “Using Census and Survey Data to Estimate Poverty and Inequality for Small Areas”. Review of Economics and Statistics, 91(4): 773-792. United Nations Development Programme (UNDP). (2016). Multidimensional Progress: Well- being beyond Income. New York: United Nations Development Programme. Vakis, Renos, James Rigolini, and Leonardo Lucchetti. (2015). Left Behind: Chronic Poverty in Latin America and the Caribbean. Washington, DC: World Bank. 31 Table 1: Categories of Missing Household Consumption Data and Recent Sample Studies Extent of Missing Type Typical Situation Example Recent Sample Studies Consumption Data Sahn and Stifel (2000); Filmer and Demographic and Health Surveys and i) Non-consumption surveys Pritchett (2001); Filmer and Scott most small-scale surveys Completely missing (2012) A (e.g., wealth index) Grosh et al. (2008); Coady et al. ii) Proxy means test/ project targeting Most small-scale surveys (2014); Brown, Ravallion, and van de Walle (2016) i) Consumption data not comparable Some rounds of India's National Sample Tarozzi (2007); Christiaensen et al. across survey rounds Surveys (2012); Mathiassen (2013) The annual LFS does not have Partially missing ii) Consumption data unavailable in Mathiassen (2009); Douidich et al. consumption data, but the household B (e.g., imputed current survey but available in another (2016); Dang, Lanjouw, and consumption survey is implemented consumption) related survey Serajuddin (2017) every few years Population census data are representative iii) Consumption data unavailable at Elbers, Lanjouw, and Lanjouw at lower administrative level than a more disaggregated administrative levels (2003); Elbers et al. (2007); household consumption survey, but does than those in current survey Tarozzi and Deaton (2007) not collect consumption data. Available cross sections, but missing Dang et al. (2014); Dang and Most surveys in developing countries do C panel data Lanjouw (2013); Bourguignon, not offer panel data (e.g., synthetic Moreno, and Dang (2018) panels) Note: LFS stands for Labor Force Surveys. This table is a modified and expanded version of Table 1 in Dang, Jolliffe, and Carletto (2017). 32 Figure 1: Decision Process to Select Appropriate Poverty Imputation Methods Yes Yes Yes Synthetic panels Money-metric Nationally Poverty (Section VI) (absolute) representative? dynamics? poverty? No No No Asset index Disaggregation Yes (Section II) Proxy means test Poverty mapping below national (Section III) (Section V) level? Yes No Yes Imputation Within-survey using same imputation survey design? (Section IV.1) No Across-survey imputation (Section IV.2) Note: Rhombus represents the desired poverty estimates, and circle represent the suggested poverty imputation method. 33 Figure 2: Number of Household Surveys vs. Countries’ Income Level, 1981- 2014 30 25 20 y= -7.6 + 2.9x (2.9) (0.5) 15 10 5 4 5 6 7 8 log of mean consumption Note: Estimated coefficients are shown from an OLS regression of the number of surveys on log of mean consumption; standard errors are in parentheses. Source: Dang et al. (2017a). 34 Figure 3: Subjective Wellbeing Dynamics in Arab Spring Countries and Other Arab Countries Based on Synthetic Panels, 2007-2012 Panel A: Upward mobility Panel B: Downward mobility 80 80 70 70 60 60 Percentage (%) Percentage (%) 50 50 40 40 30 30 20 20 10 10 e e y y y e d d g t e l n y e e y y y e d d g t e l n y al al tar dar tiar ye ye ye kin ran tiv rura ow cit al al tar dar tiar ye ye ye kin ran tiv rura ow cit m m n r o o o r a t m e m en r o o o r a t fe m e c o n t e pl pl pl wo mig n a l l r ge f m c on te pl pl pl wo mig n a l l r ge el e se em -em -)em ot sm la el e se em -em -)em ot sm la id elf r n id elf r n pa s (de pa s (de u n u n Population groups Population groups Arab Spring Others Arab Spring Others Note: dashed lines represent the regional averages for upward mobility & downward mobility respectively. Source: Dang and Ianchovichina (forthcoming). 35