Policy Research Working Paper                   9530




           Measuring Poverty Rapidly Using
             Within-Survey Imputations
                                     Utz Pape




Poverty and Equity Global Practice
January 2021
Policy Research Working Paper 9530


  Abstract
  Poverty is an indicator of paramount importance for gauging                       a household consumption questionnaire to less than 60
  the socioeconomic well-being of a population. Especially                          minutes by imputing deliberately absent consumption
  during or after a shock, poverty estimates are invaluable                         values for items that are not explicitly asked. The proposed
  for assessing the severity of the impact and for identifying                      methodology makes it possible to derive poverty estimates
  which parts of the population were most affected. The mea-                        without compromising the credibility of the resulting esti-
  surement of consumption-based monetary poverty, however,                          mate, and it performs considerably better than alternative
  has traditionally been very time consuming. A household                           approaches based on reduced consumption aggregates and
  consumption questionnaire usually includes more than 200                          cross-survey imputations. This new methodology is par-
  items, including food and nonfood items, often requiring                          ticularly useful in fragile states given the significant risks
  more than two hours to administer. This paper proposes                            associated with lengthy interviews, as well as to rapidly
  a new methodology that combines an innovative ques-                               assess the impact of a shock or of a project. It can also be
  tionnaire design with standard imputation techniques.                             useful to reduce enumerator and respondent fatigue, or to
  It substantially shortens the time required to administer                         mitigate the problem of high nonresponse rates.




 This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The author may be contacted
 at upape@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
    Measuring Poverty Rapidly Using Within-Survey Imputations


                                                    Utz Pape 1




Keywords: Poverty and inequality measurement, survey methods

JEL: C83, D63, I32




1
 Corresponding author: Utz Pape (upape@worldbank.org). The findings, interpretations and conclusions
expressed in this paper are entirely those of the author, and do not necessarily represent the views of the World
Bank, its Executive Directors, or the governments of the countries they represent. The author would like to thank
Johan Mistiaen for discussions and support in pursuing the idea and Chris Elbers for help with the statistical
properties of the new methodology, as well as Kathleen Beegle, Tomoki Fujii, Kristen Himelein, Dean Jolliffe, Peter
Lanjouw, Emmanuel Skoufias, Shinya Takamatsu, Roy Van der Weide and Nobuo Yoshida for discussions. The
author is also grateful to the Kenyan National Bureau of Statistics for implementing the methodology in a pilot
survey.
1. Introduction
Poverty is an indicator of paramount importance for gauging the socioeconomic well-being of a
population. Especially during or after a shock, poverty estimates are invaluable for understanding the
situation, as well as for assessing the severity of the impact and for identifying which parts of the
population were most affected. Especially in the developing world, consumption-based monetary poverty
measures are used, defining the poor as those households with consumption levels that fall below a set
poverty line (Angus Deaton and Zaidi 2002). The poverty line is usually set at a consumption level adequate
for sustaining the minimum level of welfare required for healthy living (Ravallion 1998). Consumption-
based poverty measures are widely used in development contexts and play a critical role in policy
decisions (e.g. Beegle et al. 2016).

The measurement of consumption, however, has traditionally been very time consuming. A typical
household consumption questionnaire contains a series of questions about the price and quantity
consumed for each item, and whether it has been purchased, self-produced, or bartered. Usually
encompassing more than 200 food and nonfood items, the time required to administer such a
questionnaire can often substantially exceed two hours. In addition to high administration costs due to
long interview times, measurement errors may become significant towards the end of the questionnaire
as enumerators and respondents become fatigued. Respondents might also cancel the interview before
it is completed, thus contributing to a higher non-response rate.

Enumerator and respondent fatigue are well documented in the literature (Tourangeau, Rips, and Rasinski
2000; Krosnick 1991) and become more pronounced for longer questionnaires (Rolstad, Adler, and Rydén
2011; Diehr et al. 2005; Snyder et al. 2007). Enumerator fatigue increases measurement errors often over
the course of a day as well as over the time the survey progresses (Baird, Hamory, and Miguel 2008).
Especially in consumption surveys, a long list of items can lead to enumerators cutting corners and
fabricating data (Fiedler and Mwangi 2016; Finn and Ranchhod 2015) as well as prematurely ending
interviews (A. Deaton and Grosh 2000). Respondents also become fatigued and, for example, learn to say
no to consumption of items to evade more detailed follow-up questions (Kreuter et al. 2011; Eckman et
al. 2014).

To overcome the challenges inherent to measuring consumption poverty, we propose a new methodology
that combines an innovative questionnaire design with standard imputation techniques. 2 This new
methodology allows us to substantially shorten the consumption questionnaire and reduce the interview
time (less than 60 minutes for a standard questionnaire) by imputing deliberately absent consumption
values for those items that are not explicitly asked about. Poverty estimates can be derived in this way
without compromising the credibility of the resulting estimate. This new methodology is particularly
useful in fragile states given the significant risks associated with lengthy interviews. It can also be useful
to reduce enumerator and respondent fatigue, or to mitigate the problem of high non-response rates.

The most straightforward way to reduce the expected interview time is to skip rarely-consumed items.
Another simple strategy is to ask the respondent about an aggregate amount of spending on an entire

2
    A precursor methodology based on the same principle was previously published in Pape and Mistiaen 2015.

                                                         2
category of consumption (e.g., total expenditure on flour) instead of individual items (e.g., expenditure
on corn flour, wheat flour, etc.). However, altering the set of items in the questionnaire can result in a
nontrivial change in the reported consumption amount (Olson-Lanjouw and Lanjouw 2001). Both
approaches are likely to lead to an underestimation of consumption and overestimation of poverty, as
was demonstrated in a study in Tanzania that directly compared various methods of measuring
consumption (Beegle et al. 2012).

An alternative approach is to apply methods of cross-survey imputation. In situations where full
household expenditure surveys are too costly or impractical to administer, data gaps can be filled using
other surveys that have common covariates that are correlated with household expenditure. For example,
data from a full consumption survey can be combined with data from shorter and more frequent labor
force surveys to generate poverty estimates (Douidich et al. 2013). While such methods may work well
even when there is a rapid economic change (Christiaensen et al. 2011), the assumption of a stable
structural parameter typically cannot be tested and may not be valid, especially in the context of large
and systemic shocks, after implementation of projects, or if a substantial amount of time has passed since
the baseline survey was implemented. It is also possible to design a survey such that one sample has a full
consumption module and another sample has only the covariates of consumption. Consumption can thus
be imputed and poverty estimates can be derived at a reduced cost, even though the magnitude of
potential cost reduction may be modest (Fujii and van der Weide 2016). In such a setup, however, the
sample for the full consumption module must be chosen randomly to avoid biased estimates of the model
parameters. Thus, this approach is only of limited usability in the case of fragile countries as it might not
be feasible to administer the full consumption module in particular insecure areas, creating a downward
bias in poverty estimates for those areas.

The proposed methodology uses statistical imputations to obtain estimates for deliberately absent
consumption values. Statistical imputation techniques are widely used to replace missing values in surveys
(Little and Rubin 2019; Ambler, Omar, and Royston 2007; Van Buuren 2007). Straight-forward methods
simply replace the missing values with aggregate statistics like a mean. However, this makes the strong
and often violated assumption that data are missing at random (Carpenter, Kenward, and White 2007).
Model-based approaches can take into account covariates and often use a regression framework to
estimate missing values but distort the variance if based on point-estimates. Multiple imputations help to
mitigate this by drawing multiple estimates from the posterior distribution using a Bayesian approach
(Rubin 2004).

This paper is organized as follows. We first present the proposed methodology with its statistical
properties as well as the data in Section 2. In Section 3, we apply the methodology to different scenarios
showing the trade-offs between the performance and parameters of the approach, and then compare it
to a reduced consumption approach as well as a more sophisticated reduced consumption approach
adjusting the poverty line. The section ends with a real-world example based on a pilot survey in Kenya,
assessing the performance of the new methodology and comparing it to a cross-survey imputation
approach. The paper finishes with Section 4 concluding the findings and discussing some of the limitations
of the new approach.


                                                     3
2. Methodology
Overview
The rapid approach being proposed here applies a split-questionnaire design to the consumption module
of a household survey, thereby generating systematically absent data that can be conveniently imputed. 3
Instead of having all households report on all consumption items, important items are assigned to a core
module and the remaining items are split into two or more optional modules. 4 Each household then
answers the questions in the core module and in only one of the optional modules. This approach reduces
average interview time considerably, down to 45 to 60 minutes per household for a standard household
consumption survey. The cost of this efficiency gain is that data are deliberately absent for those optional
modules that were not administered to certain households. We can however offset this cost by estimating
the deliberately absent data for each household based on the data collected from other households for
that module. While this approach utilizes a structural model for the imputation of the deliberately absent
data, the model is estimated within the survey rather than between two surveys, thereby circumventing
the problem of biased structural parameters due to having different sample populations or considering
the same population at different points in time.

The rapid approach starts by defining the number of core items and the number of optional modules for
the non-core consumption items. The smaller the number of core items and the greater the number of
optional modules used, the faster the questionnaire can be administered, as fewer items need to be asked
for each household. However, having fewer core items and more optional modules also increases the
uncertainty in the estimation as less consumption information is available. Thus, the choice of these two
parameters can be informed by simulations on a previous or similar survey to gauge the performance of
the estimation vis-à-vis the time savings in administering the questionnaire. Another consideration is that
it is beneficial to balance the number of households for each optional module, ideally at the cluster level
of the survey.

The next step is to select core consumption items. Although consumption in any given country will exhibit
some variability, data on a few dozen key items will usually be sufficient to capture the majority of
consumption. Important consumption items can be identified using average consumption share per
household or across households, as estimated by previous consumption surveys in the same context or
recorded consumption shares in neighboring and/or similar countries. While a good choice of core items
will improve the performance of the estimation, the methodology still works if no core items are used,
e.g. in a context without any prior information. The identified key items are then assigned to the core
module that will be administered to all households.

Finally, non-core items are randomly partitioned into optional modules. It is important to note that the
conceptual distinction between core and optional items should not be reflected in the layout of the
questionnaire. Instead, all items per household need to be grouped into categories of consumption items

3
  While the split-questionnaire design is more popular in other disciplines such as psychology (e.g. Graham, Hofer,
and MacKinnon 1996), the approach has not yet been applied to large-scale household-based surveys, nor with the
goal of reducing the time required to estimate consumption or poverty.
4
  As is shown below, the core module is not strictly necessary, further reducing the interview time.

                                                        4
(e.g. meat, fruits, vegetables, cereals) and different recall periods. It is therefore recommended to use
CAPI (Computer-Assisted Personal Interviewing) technology, which makes it possible to hide the modular
structure of the consumption questions within the layout of the questionnaire.

Once the core and optional modules have been defined and the design has been finalized, the survey can
be implemented. The assignment of optional modules to households is performed randomly and is
stratified by enumeration area, thus ensuring an appropriate representation of all optional modules in
each enumeration area. Once the data have been collected and cleaned, household consumption is
estimated by imputation. The average consumption of each optional module can be estimated based on
the sub-sample of households assigned to that optional module.

Theoretical Properties
Consumption for a household i is the sum of the consumption for each item in each module

                                                                       ������������������������ = � ������������������������������������ = � � ������������������������������������������������
                                                                                     ������������                               ������������    ������������

where yikj denotes the consumption of item j in module k. 5 Applying the rapid approach, we only observe
a subset of modules yik, specifically for each household k=0 and one other module where k>0. We can
formalize this by using a binary (0,1) variable bk, which is independent of yik, where P(bk=1) = πk. In practice,
the assignment of optional modules can be done more systematically to ensure a balanced design at the
cluster-level, which does not invalidate the assumed independence of bk from consumption. The expected
consumption of a household is:

                                                                                                                    ������������������������
                                                                  ������������������������������������ = ������������ � ������������������������������������                         = � ������������������������������������������������
                                                                                                                    ������������������������
                                                                                             ������������                                      ������������

We obtain a consistent and unbiased estimator for expected consumption if we can find consistent and
unbiased estimators for expected module consumption. This also holds for regressions assuming bk and
household characteristics xi are independent:

                                                                                                                ������������������������
                                                  ������������ (������������������������ |������������������������ ) = ������������ (� ������������������������������������                      |������������ ) = � ������������ (������������������������ |������������������������ )
                                                                                                                ������������������������ ������������
                                                                                      ������������                                                      ������������

Furthermore, the second moment can be estimated as follows:

                                                                                                                                                       ������������������������                        ������������������������ ������������������������
             ������������������������������������ 2 = ������������ (� ������������������������������������ )2 = ������������ � ������������������������2
                                                                       ������������ + 2������������ � ������������������������������������ ������������������������������������ = �                                             ������������������������������������2
                                                                                                                                                                            ������������ + 2 �                   ������������������������ ������������
                                                                                                                                                       ������������������������                        ������������������������ ������������������������ ������������������������ ������������������������
                                   ������������                         ������������                                ������������≠������������                                 ������������                                 ������������≠������������

Similarly, higher moments can be constructed. Thus, the complete distributional information of y can
theoretically be recovered from sufficiently large samples if the design of the split questionnaire allows
for the estimation of correlations between modules.


5
    Note that we assume consumption to be per-capita throughout the paper.

                                                                                                                5
Consumption Estimator
                                                                                                   ∗
Distinguishing between administered core module ������������ = 0, the administered optional module ������������������������  and the
                                                            ∗
non-administered remaining optional modules 0 < ������������ ≠ ������������������������ , we obtain as estimator for consumption

                                                                                                                                  ������������������������� .
                                                                         ������������� = ������������������������0 + ������������������������������������������������∗ + ∑������������≠������������������������∗ ������������
                                                                        ������������

As shown above, the estimator is unbiased for ������������������������������������ as

                                              ������������������������������������ = � ������������������������������������������������ = ������������������������������������0 + ������������������������������������������������������������∗ + � ������������������������              �������������
                                                                                                                                 ������������������������� = ������������������������
                                                                ������������                                                                      ∗
                                                                                                                                ������������≠������������������������


The variance of consumption can be decomposed as

                ������������������������������������(������������������������ ) = ������������������������������������(������������������������0 ) + � ������������������������������������(������������������������������������ ) + 2 � ������������������������������������(������������������������0 , ������������������������������������ ) + � ������������������������������������(������������������������������������ , ������������������������������������ )
                                                                        ������������                                          ������������                               ������������≠������������


                                   ≥ ������������������������������������(������������������������0 ) + � ������������������������������������(������������                                          ������������������������� ) = ������������������������������������(������������
                                                                                ������������������������� ) + 2 � ������������������������������������ (������������������������0 , ������������                         ������������� )
                                                                        ������������                                          ������������

with the inequality given by the assumption of positive correlation between optional modules. 6 The
variance is thus underestimated, as we cannot measure correlation between modules and so assume
them to be independent ������������������������������������(������������������������������������ , ������������������������������������ ) = 0 for all optional modules k and l. The more optional modules
are used, the higher the under-estimation of the variance. Contrarily, the larger the fraction of the
variance captured in the core module, the lower the underestimation of the variance. This suggests using
a low number of optional modules with a large number of items in the core module. This represents the
fundamental trade-off between the accuracy of the estimator and time savings, which are higher with
more optional modules and fewer items in the core module.

We apply the Foster–Greer–Thorbecke measures of poverty (Foster, Greer, and Thorbecke 1984) to the
consumption aggregate defined as

                                                                                                          ������������ − ������������������������ ������������
                                                                       ������������������������������������������������,������������ = ������������ −1 � �               �
                                                                                                               ������������
                                                                                                         ������������:������������������������ <������������

where ������������ denotes the number of households, z is the poverty line, and ������������������������ is consumption for a given
household. By selecting the coefficient ������������ we can produce different poverty measures: ������������ = 0 for the
poverty headcount, ������������ = 1 for poverty depth and ������������ = 2 for poverty severity. Given that we are
                                                                           �������������,������������ will be underestimated for a
underestimating the variance of ������������������������ , this implies that the estimator ������������������������������������
poverty line z smaller than the mode of ������������ and overestimated for larger poverty lines.



6
  Even though the consumption aggregate consists of complements and substitutes, a random allocation of items
into optional modules will tend to make the correlation between modules positive except in the unlikely case of
two modules sharing a large number of complements with one module capturing, for example, all the items
typically consumed by the poor. Thus, the optional modules can be assumed to be positively correlated.

                                                                                                         6
Estimation
The optional module consumption can be estimated in the log-space conditional on strictly positive
consumption:

                                                      log ������������������������������������ = ������������������������������������ + ������������������������������������ | ������������������������������������ > 0

where Xi denotes a vector of household characteristics and ������������������������������������ the error term. This is implemented as a
two-step estimation procedure with the first step utilizing a logit regression to estimate whether ������������������������������������ = 0
and the second step using an OLS regression. We use the framework of multiple imputations to obtain
several point estimates by drawing from the error distribution to ensure accurate tails of the consumption
distribution.

The household characteristics Xi are selected based on a step-forward algorithm minimizing the AIC by
regressing household characteristics on the observed core and assigned non-core consumption including
a fixed effect for the assigned module:

                              log�������������������������0 + ������������������������������������������������∗ � = ������������������������������������ + ������������������������������������ + ������������������������������������ | ������������������������0 + ������������������������������������������������∗ > 0

where ������������������������������������ represents k dummy variables with the kth variable equal to 1 if household i is assigned to
module k and equal to 0 otherwise.

Performance Assessment
We assess the performance of the estimation based on the bias and the coefficient of variation (CV). The
                                                                                          �������������,������������ estimated using the
bias is defined as the expected value of the absolute percentage difference of ������������������������������������
rapid approach and ������������������������������������������������,������������ estimated based on full consumption data. Using the additional index 1 ≤ ������������ ≤
20 for the simulation, we obtain

                                                                             �������������,������������,������������ − ������������������������������������������������,������������,������������ �
                                                          ������������������������ ������������������������ �������������������������������������

Each simulation uses random allocations of items to optional modules and random assignments of
households to optional modules. 7 We average the bias over all possible poverty lines z so that 1 percent,
2 percent, et cetera, and 99 percent of the population are defined as poor based on the full consumption
distribution. The integration over all possible poverty lines makes the resulting performance measures
independent of the poverty line, while the absolute difference in the definition of the bias avoids canceling
out errors.

Accordingly, the coefficient of variation is defined as the average ratio of the standard deviation and the
mean of the FGT measure over all possible poverty lines:

                                                                                                                               2
                                                                                   �������������,������������,������������ −������������������������������������������������,������������,������������ �
                                                                        ������������������������� �������������������������������������
                                                             ������������������������                    ������������������������ ������������������������������������������������,������������,������������
                                                                                                                                   .



7
 The constraints in these allocations are to ensure that items are uniformly distributed among optional modules
and that each optional module is assigned equally often to households within each cluster.

                                                                                                   7
Data
We applied this method to recent household consumption data from Kenya. Kenya’s source for official
poverty estimates is the Kenyan Integrated Household Budget Survey (KIHBS). The two last rounds were
implemented in 2005/6 and 2015/16. The 2005/6 round used a representative sample of households in
Kenya stratified by 7 provinces and 69 districts split into urban and rural. The sample size of the cleaned
data set includes 12,695 households in 1,338 clusters. The 2015/16 round used a representative sample
of households in Kenya stratified by 47 counties split into urban and rural. The sample size of the cleaned
data set includes 21,585 households in 2,387 clusters. For both surveys, data collection was carried out
over a period of 12 months.

The 2015/16 round was accompanied by a CAPI pilot implementing the rapid approach. The pilot used the
same sampling frame as the 2015/16 round and interviewed up to an additional 6 households in the same
clusters, resulting in a sample size of 12,662 households (as not all clusters and all intended households
were interviewed due to non-response).8 The questionnaire was derived from the KIHBS 2015/16
questionnaire but was considerably simplified across all modules. Specifically, the consumption module
was administered according to the rapid approach. We are using the data set that was constructed with
5 food and 5 non-food items in the core (selected based on the highest consumption share in KIHBS
2005/6), with the remaining 128 food and 76 non-food items partitioned into 3 optional modules. 9 Thus,
the expected time saving was about 30 percent.

In the remainder of the paper, we use KIHBS 2015/16 with the full consumption module as a benchmark.
The previous round of 2005/6 is used to define the core items, and for two of the alternative approaches
it is used to adjust the poverty line and to build the consumption model for the cross-survey imputation.
The 2015/16 pilot is used as a real-world example for the implementation of the rapid approach. We
harmonize the data sets to ensure comparability across the different surveys. The harmonized data sets
include 133 food and 81 non-food items. The consumption aggregate is exclusively based on these 214
items. In addition, we use harmonized location and household characteristics for the various models,
including a binary and a categorical location variable as well as 6 additional binary, 9 additional categorical
and 9 continuous variables for household characteristics (Table 5 in the Appendix).

Consumption shares differ markedly between the 2005/6 and 2015/16 surveys, which is unsurprising
given the 10-year gap (Kenya National Bureau of Statistics 2018). 10 For example, the top 10 food items in
2005/6 capture 58 percent of the food consumption share, but only 39 percent of the consumption share
in 2015/16. Non-food consumption changed to a lesser degree. The top 10 items in 2005/6 represent 64
percent of non-food consumption shrinking to 59 percent in 2015/16. These differences will impact the




8
  Balance mean tests are shown in Table 6 indicating similar households in both surveys with similar although not
always statistically indistinguishable characteristics.
9
  The assignment of optional modules to households was balanced with 4,222 households assigned to module 1,
4,192 to optional module 2, and 4,248 to optional module 3.
10
   See Table 4 for the shares of the items with the top 20 consumption shares for KIHBS 2005/6 and KIHBS
2015/16.

                                                        8
performance of those approaches that strongly rely on data from previous surveys, e.g. adjusting poverty
lines and cross-survey imputations.


3. Results
We assess the performance of the rapid approach vis-à-vis alternative approaches (Table 1). The long
questionnaire of the full-consumption approach can increase unit and item non-response but is
nevertheless used as benchmark due to its de facto standard for consumption surveys. The rapid approach
compromises on the long list of items by introducing a subset of core items and distributing remaining
items in optional modules, reducing the questionnaire length with beneficial impacts on unit and item
non-response but at the cost of additional estimation error. The reduced approach further decreases
questionnaire length by simply dropping items altogether, creating substantial bias in the resulting
estimates. The bias can be minimized by adjusting the poverty line based on data from a previous
consumption survey, called the adjusted reduced approach below. Finally, the largest time savings are
generated by completely abandoning consumption data and using a structural model to impute
consumption from a baseline survey using common co-variates. The time gap with the baseline survey as
well as shocks and other structural changes, for example from the implementation of a project to reduce
poverty, invalidate the structural model leading to substantial bias in the estimates.

                      Table 1: Comparison of consumption methodologies and sources of error.

                Unit non-response       Item non-response       Implementation issues      Error inherent to method

  Full          Elevated levels of      Long list of items      Relatively limited issue   Theoretically unbiased if
  consumption   non-response,           increase item non-      as questionnaire is        implemented correctly with
                particularly among      response          and   straightforward though     full response
                urban and wealthy       measurement error       length may be issue
                households (Korinek,    (Fiedler and Mwangi
                Mistiaen,        and    2016;     Finn    and
                Ravallion 2006; Osier   Ranchhod 2015) as
                2016)                   well as unfinished
                                        interviews (A. Deaton
                                        and Grosh 2000)

  Rapid         Non-response is an      Less of an issue than   Could be substantial       Trade-off between length of
  approach      issue but not as        full consumption due    issue    with    paper     questionnaire and accuracy of
                much as in full         to            shorter   questionnaires      but    poverty     and     inequality
                consumption due to      questionnaire           almost      completely     estimates       due         to
                the          shorter                            mitigated with CAPI        underestimation of variance
                questionnaire                                                              attenuated by less core items
                                                                                           and more optional modules
                                                                                           (Table 2; Figure 1; Figure 2
                                                                                           and Figure 3).

  Reduced       Non-response is an      Less of an issue than   Limited     issue    as    Substantial bias in total
  consumption   issue but not as        full consumption due    questionnaire         is   consumption     attenuating
                much as in full                                 straightforward
                consumption due to


                                                            9
                  the          shorter                 to            shorter                                              inequality (Figure 1 and Figure
                  questionnaire                        questionnaire                                                      2) (Beegle et al. 2012)

   Cross-Survey   Non-response is an                   Very small issue as no           Very small issue as               Reliance    on    old    data
   (X-Survey)     issue but not as                     consumption section              questions are simple              introducing bias in structural
   imputations    much as in full                                                                                         model leading to biased
                  consumption due to                                                                                      poverty estimates (Figure 3)
                  the          shorter
                  questionnaire




Using the full consumption data from KIHBS 2015/16 as benchmark, in this section, first, we investigate
the empirical trade-off between the number of core items and the number of optional modules for the
rapid approach with respect to the performance of poverty indicators. Second, we compare the rapid
approach with a traditional reduced consumption aggregate, without any adjustment of the poverty line.
Third, we again use a reduced consumption aggregate, but adjust the poverty line based on previously
observed consumption from 2005/6. Fourth, we compare the application of the rapid approach in the
pilot in Kenya with a cross-survey imputation approach also using the 2005/6 data as baseline.

Trade-off with Number of Core Items and Optional Modules
As discussed in the methodology section, the rapid approach creates a trade-off between the number of
core items and the number of optional modules, as a larger number of core items (smaller number of
optional modules) will improve the performance of the estimation. A larger number of core items will
capture a larger fraction of the variance in the core module, minimizing the estimation error for the
variance. Similarly, a smaller number of optional modules reduces the estimation error of the covariance
between modules. However, the time savings by the rapid approach are larger for a smaller number of
items in the core module and a larger number of optional modules as fewer items are asked about for
each household.

Table 2: Trade-off for FGT measures by number of optional modules and number of core items measured by bias and cv.

                                                            Bias                                            CV
                              Opt.                      Core Items                                      Core Items
                              Mod.      0       1        3       5      10      20       0       1       3      5       10      20
                                   2   0.021   0.019    0.015 0.012     0.009   0.006   0.081   0.075   0.051 0.043     0.029   0.017
                                   4   0.033   0.032    0.024 0.019     0.014   0.007   0.125   0.122   0.085    0.07   0.051   0.026
                       fgt0




                                   6   0.039   0.036    0.027 0.021     0.015    0.01   0.139    0.14   0.099 0.081      0.06   0.037
                                   8   0.042   0.042    0.031 0.026     0.021   0.009   0.151   0.155   0.105 0.096      0.08   0.035
                                   2   0.015   0.014     0.01 0.008     0.005   0.003   0.128    0.12   0.081 0.071     0.051   0.028
                                   4   0.024   0.023    0.016 0.013     0.009   0.005    0.19   0.189   0.129 0.114     0.087   0.045
                       fgt1




                                   6   0.027   0.026    0.019 0.015     0.011   0.007   0.207   0.214    0.15 0.131     0.103   0.062
                                   8   0.029   0.029     0.02 0.018     0.015   0.006   0.224   0.232   0.156 0.151     0.132   0.063
                                   2   0.011    0.01    0.007 0.006     0.004   0.002   0.166   0.157   0.108 0.098     0.075    0.04
                                  4    0.017   0.017    0.012    0.01   0.007   0.004   0.241   0.241   0.166 0.154     0.123   0.064
                       fgt2




                                  6     0.02   0.019    0.014 0.011     0.008   0.005   0.259   0.272   0.192 0.174     0.144   0.085
                                  8    0.021   0.022    0.015 0.013     0.011   0.005   0.279    0.29   0.198 0.198     0.179    0.09
                                  2    0.005   0.001    0.001       0   0.003   0.002   0.014   0.004   0.004 0.004     0.009   0.005
                                  4    0.004   0.001    0.003 0.001         0   0.003   0.013   0.009    0.01 0.004     0.004    0.01
                       gini




                                  6    0.009   0.005    0.005 0.008     0.006   0.008   0.025   0.014   0.014 0.021     0.015   0.021
                                  8    0.001   0.008    0.011 0.003     0.006   0.007   0.011   0.022    0.03    0.01   0.015   0.018


Based on simulations using KIHBS 2015/16, we estimate the bias and coefficient of variation to estimate
FGT0, FGT1 and FGT2 as well as the Gini using as reference the full consumption aggregate from the

                                                                                10
survey. 11 As expected, all performance measures deteriorate for a smaller number of core items, as well
as for a larger number of optional modules (Table 1). Using the minimum number of optional modules (2),
we obtain an average bias of 0.021 for FGT0 using no core items, but a considerably smaller bias of only
0.006 (a more than 70 percent reduction) if using 20 core items. An increase of the number of optional
modules from 2 to 8 almost doubles the bias for FGT0 from 0.021 to 0.042 using 0 core items. We observe
similar trade-offs for FGT1 and FGT2. The trade-offs for the Gini coefficient are less clear as independent
of the number of core items and modules the bias is consistently extremely low, almost always below
0.01.

Rapid vs. Reduced Estimation
Traditionally, time savings in administering consumption modules are achieved by reducing the number
of consumption items included in the questionnaire. Here we compare the rapid approach with a reduced
approach based on the effective time savings achieved. Assuming that only consumed items require
substantial interview time, we estimate the average number of items that were administered and
consumed by households relative to the total number of items in the full consumption module. The
smaller the number of items consumed and administered in the questionnaire, the larger the time savings.
This measure takes into account that effective time savings are smaller if fewer but often-consumed items
are administered to a household, compared to a larger number of items, which are rarely consumed. The
results show simulations with varying numbers of core items and optional modules for the rapid approach,
and varying numbers of items included in the reduced module.




11
  If not noted otherwise, each core and optional module configuration is run 20 times, each using 50 multiple
imputations. Note that the same survey data are used to define the core module items (based on highest
consumption shares).

                                                       11
Figure 1: Absolute bias and coefficient of variation (cv) of rapid vs. reduced poverty estimation, by effective questions asked




Based on KIHBS 2015/16 using the full consumption aggregate as the reference, we compare the average
and maximum bias across all poverty lines for FGT0, FGT1 and FGT2 as well as the Gini (Figure 1). The
results clearly show the superiority of the rapid approach for any time saving larger than 10 percent.
Generally, the bias and coefficient of variation are increasing for larger time savings, except for the Gini
which is generally low with a bias of usually less than 0.01 for the rapid approach while the reduced
approach over-estimates the Gini by up to 0.04. The maximum bias for FGT0 remains below 7 percent in
95 percent of the simulations. The average bias generally remains below 5.3 percent. The average
coefficient of variation slightly increases for larger time savings, but generally remains below 20 percent.
Note that the coefficient of variation is only meaningful for the rapid approach, as the reduced approach
is deterministic across simulations.

Time savings of 50 percent can be achieved with the rapid approach by accepting an average bias of 1.8
percent and a maximum bias of 3 percent for FGT0 for 0 items in the core and 2 optional modules. For
similar time savings, the reduced approach would need to consist of the 20 percent of items with largest
consumption, which will be consumed by most households, but resulting in an average and maximum bias
of 15 percent and 21.9 percent, respectively. A 75 percent time saving is possible with 0 core items and 4
optional modules. It comes at the cost of an average and maximum bias of 3.3 percent and 4.6 percent
respectively.

                                                             12
Rapid vs. Adjusted Reduced Estimation
The reduced approach from the previous section can be improved by adjusting the poverty line based on
a previous survey (Olson-Lanjouw and Lanjouw 2001). To simulate this case, we use the KIHBS 2005/6
survey to re-estimate the poverty line for the reduced approach and use the survey for the definition of
core items for the rapid approach (optional module items are randomly assigned). The re-estimated
poverty line for the reduced approach and the core module assignment is then applied to the KIHBS
2015/16 survey (Figure 2). As before, we only show the coefficient of variation for the rapid approach, as
the adjusted poverty line approach is deterministic for each simulation.

   Figure 2: Absolute bias and coefficient of variation (cv) of rapid vs. Olson-Lanjouw and Lanjouw poverty estimation, by
                                                   effective questions asked




We observe a very similar performance for the rapid approach as the performance does not change for
any number of optional modules with zero core items. However, the best performance is now achieved
with some core items, as it helps to isolate more of the variance from the estimation. For example, a time
saving of 75 percent can now be achieved with a core module of 5 items and 10 optional modules,
resulting in an average and maximum bias of only 0.5 percent and 1.2 percent respectively for FGT0. This
is a considerable reduction in bias as compared to using 0 core items and 4 optional modules (reported in
the previous subsection), although in both cases the time savings are the same. Thus, it is useful to include



                                                             13
a few core items with highest consumption shares, even if they are selected from a rather outdated survey
as in Kenya with a 10-year gap.

The approach of using an adjusted poverty line performs significantly better than the simple reduced
approach presented in the previous section, but at the cost of the distributional shape captured in the
Gini coefficient. For the FGT measures, it is, thus, generally advisable to adjust the poverty line if a reduced
approach must be used, even if the adjustment is based on outdated consumption shares from an old
survey. For FGT0, the average bias of the adjusted reduced approach is usually around or above the
maximum bias of the rapid approach. Furthermore, the maximum bias is considerably larger than the
average bias for the adjusted, reduced approach compared to the rapid approach. For FGT1, the average
bias of the adjusted, reduced approach becomes more comparable with the rapid approach, but the
maximum bias is still significantly larger than the maximum bias of the rapid approach. For FGT2, the
performance of the adjusted, reduced approach becomes more difficult to interpret. The Gini coefficient
is not well conserved approaching a bias of 0.05 for highest time savings.

As the results show, the adjusted, reduced approach has larger variation across poverty lines (as shown
by the larger difference between average and maximum bias) as well as across the different number of
items (implying different time savings). The definition of the estimator explains this. The estimator
depends strongly on the items selected for the reduced approach, as the accuracy relies on the adjustment
factor of the poverty line, which is the share captured by those items. The error can be decomposed into
two components. The first component is the change of the distributional shape between the survey used
for the adjustment and the application of the adjusted poverty line. The second component depends on
the difference in the share captured by those items between the two surveys. Both errors become zero if
the approach is implemented for the same population at the same time. In the usual case though, neither
the population nor the time point is the same. In these cases, especially the second component of the
error leads to large variation of the performance. Even though the consumption shares of the items can
change, the changes might cancel out leading to a good performance of the approach. However, adding
one more item to the consumption module can void the cancellation, leading to a worse performance.
Without knowing the share of the items from total consumption (which is not measured), it is impossible
to predict how many items should be selected for a good performance of the approach.

Application to Kenya: Rapid vs. Cross-Survey Estimation
In 2015/16, a CAPI pilot was implemented alongside KIHBS 2015/16 using the rapid approach. While the
configuration is conservative with only 30 percent time savings, the results show impressive performance
for all FGT measures (Figure 3) in comparison with the full consumption as estimated for KIHBS 2015/16.
Across all potential poverty lines, the rapid approach has a bias of below 1.3 percentage points for FGT0,
0.7 percentage point for FGT1 and 0.6 percentage point for FGT2. The Gini has a bias of only 0.012.

We compare the performance with a cross-survey imputation of consumption using a structural model
built based on the KIHBS 2005/6 data set and applied to KIHBS 2015/16, ignoring the collected
consumption data in 2015/16. The performance of the structural model is then assessed against the KIHBS




                                                      14
2015/16 full consumption aggregate (Figure 3). 12 The cross-survey imputation cannot provide convincing
results. FGT0 is under-estimated by up to 8.1 percentage points, FGT1 by up to 5.0 percentage points, and
FGT2 by up to 4.4 percentage points. The Gini is off by 0.036. This is not surprising given the 10-year gap
between the parameters of the structural model from 2005/6 and inference of poverty for 2015/16. In
such long timeframes, not only do consumption patterns change but also structural drivers or correlates
with poverty. To the further detriment of cross-survey imputations, it is in practice not possible to
estimate the error, making it difficult to recommend its usage.

              Figure 3: Bias of rapid vs. cross-survey (X-Survey) poverty estimation, by poverty percentile. 13




4. Conclusions
The rapid approach proposed in this paper can be used to achieve significant time savings, while only
introducing a small bias into poverty estimates. The choice of the number of items in the core module and
the number of optional modules allows for a precise calibration of time savings. The results show that it

12
   The cross-survey imputation is based on the best model minimizing the AIC using a step-forward algorithm on
the variables from KIHBS 2005/6. The results of the model selection are provided in Table 6 in the Appendix. The
imputation is performed in log-space with 50 multiple imputations.
13
   Note that this figure shows the bias while previous figures showed the absolute bias to avoid canceling negative
and positive bias across percentiles.

                                                            15
is helpful to utilize a previous survey from the same or a similar population to assign a few key items to
the core module. In the best case, the selected items are still highly consumed and will improve the
estimates of poverty. In the worst case, the selected items are no longer important in which case they will
hardly affect the time savings compared to a design with zero core items and also equal its performance.
In countries with large variation in diet across regions, subnational core modules can potentially improve
estimates even further.

This paper also demonstrates the difficulty of achieving convincing results using alternative methods.
Simply removing items from the consumption aggregate to create time savings, but without adjusting the
poverty line, can lead to considerable bias in the poverty estimates. Better results can be achieved by
adjusting the poverty line based on a previous survey in order to accommodate the reduced number of
items, but without conserving the shape of the consumption distribution measured by the Gini.
Furthermore, the potential for large changes in consumption patterns, which cannot be determined under
this approach, creates considerable uncertainty in the resulting estimates. Similarly, a survey based only
on covariates and their structural relationship with poverty estimates from a previous survey introduces
a large bias in the estimates, at least in the studied case with a 10-year gap between surveys. The proposed
rapid approach outperforms all these approaches, only at the cost of increased complexity.

The rapid approach introduces additional complexity into both the questionnaire design and the
estimation of poverty. The capacity of enumerators is often low in developing countries. While the rapid
approach increases the complexity of the questionnaire, CAPI technology easily solves this problem.
Survey software can automatically compile a single consumption module based on the core and optional
modules for each household, without making the partition explicit to the enumerator or demanding the
execution of complex skip patterns. Furthermore, advanced CAPI technology can be used to generate the
questionnaire automatically based on the assignment of the household to an optional module. While
enumerators should be made aware that different households will be asked for different items,
administering a rapid questionnaire does not require any additional training of enumerators beyond the
standard skills for consumption questionnaires.

Conversely, the analysis of data using the rapid approach requires high analytical capacity, something that
is usually lacking in developing countries. While the general concept of the assignment of optional
consumption modules to households can usually be explained to local partners, poverty analysis based on
a bootstrapped sample of the consumption distribution can potentially be too demanding for local
capacity. However, even standard poverty analysis often surpasses the limits of local capacity, especially
in conflict or post-conflict settings. Therefore, capacity building tends to focus on data collection skills
with the long-term perspective of creating data analysis capacity. In addition, the rapid approach might
be the only possibility to create poverty estimates in certain areas. For example in the case of Somalia,
the rapid approach limited overall questionnaire administering time to less than 60 minutes for more than
90 percent of households as required by security considerations for enumerators (Pape and Mistiaen
2018; Pape and Wollburg 2019).

The rapid approach administers different consumption modules to different households. In theory, this
can create a response bias if households report differently on a consumed item depending on the type

                                                    16
and number of items previously asked. Unfortunately, we cannot estimate such a response bias in the
available data. However, implementation of the rapid approach with an enhanced design with different
optional modules varying in their comprehensiveness of items can in general shed light on this bias.
Comparison between responses for the same item in a comprehensive and a non-comprehensive list can
also indicate a lower bound for response bias. Assuming that the context of a comprehensive list is a
better estimate, the response bias could be corrected for. However, it is expected that this type of
response bias is very small in comparison to general measurement and estimation errors.

The main source of bias for the rapid approach is created by the assumption of zero co-variance between
optional modules. Further research can help to estimate co-variance between modules within the survey
and adjust the consumption estimates accordingly. Using a random assignment of items to optional
modules, the co-variance between groups of items within an optional module can potentially be used to
estimate the co-variance between optional modules. Or administering optional modules that share items
might also be helpful to estimate the co-variance between modules.

The rapid approach reduces administering time considerably. While this creates opportunities to include
additional questionnaire modules on different topics (e.g. remittances or health), it also has the potential
to reduce the non-response rate (A. Deaton and Grosh 2000). The KIHBS 2015/16 survey suffered from a
high non-response rate specifically in wealthier areas. For example, the capital city Nairobi had a response
rate of 77 percent compared to 92 percent in the rest of the country. The highest non-response rates were
specifically observed in clusters in wealthier areas within Nairobi. A high correlation of non-response with
welfare status can lead to biased poverty estimates. The considerably shorter pilot survey – which not
only used the rapid approach for consumption but generally shortened the questionnaire across modules
and was carried out using tablets rather than paper – did not show the same pattern of lower response
rates in wealthier areas. The pilot response rate of 99 percent in Nairobi was considerably higher than the
standard KIHBS and was the same as in the rest of the country. 14 Thus, the rapid approach can help to
contribute to shorter questionnaires mitigating concerns around low response rates, especially if
correlated with welfare status. In addition, the rapid approach is likely to reduce enumerator and
respondent fatigue based on the documented impact of fatigue on (consumption) estimates in the
literature (Rolstad, Adler, and Rydén 2011; Diehr et al. 2005; Snyder et al. 2007; Fiedler and Mwangi 2016;
Finn and Ranchhod 2015). 15

The rapid approach might also be particularly useful in the context of evaluating shock or project impacts
on poverty. In these cases, reliance on structural models estimated between surveys is dangerous. Shocks
are likely to distort structural relationships between household characteristics and poverty. For example,
a light shock is often mitigated by the household reducing its consumption, rather than selling assets or
moving into another dwelling. A structural model estimated before the shock will not be able to capture
the reduced consumption, thereby underestimating the impact of the shock. Similarly, project impacts
cannot be adequately estimated using a structural model from before the project. For example, the


14
   The change in the response rate is unlikely to be explained by the transition from PAPI to CAPI (Banks and Laurie
2000; Schräpler, Schupp, and Wagner 2010).
15
   Estimating the reduction of fatigue on measurement error would require a specifically designed survey.

                                                        17
distribution of metal sheets as rooftop materials is unlikely to change consumption patterns, but a
structural model might use the absence of metal roofs to help predict poverty. While administering a full
consumption module is often not feasible, especially in the case of shocks or in fragile settings, the rapid
approach can readily be applied without relying on the assumptions of a structural model that would likely
be violated.




                                                    18
References
Ambler, Gareth, Rumana Z. Omar, and Patrick Royston. 2007. “A Comparison of Imputation Techniques
         for Handling Missing Predictor Values in a Risk Model with a Binary Outcome.” Statistical Methods
         in Medical Research 16 (3): 277–98. https://doi.org/10.1177/0962280206074466.
Baird, S., J. Hamory, and E. Miguel. 2008. “Tracking, Attrition and Data Quality in the Kenyan Life Panel
         Survey Round 1 (KLPS-1).” UC Berkeley: Center for International and Development Economics
         Research.
Banks, Randy, and Heather Laurie. 2000. “From Papi to Capi: The Case of the British Household Panel
         Survey.”        Social      Science       Computer        Review         18     (4):      397–406.
         https://doi.org/10.1177/089443930001800403.
Beegle, Kathleen, Luc Christiaensen, Andrew Dabalen, and Isis Gaddis. 2016. Poverty in a Rising Africa.
         Washington, DC: World Bank Publications.
Beegle, Kathleen, Joachim De Weerdt, Jed Friedman, and John Gibson. 2012. “Methods of Household
         Consumption Measurement through Surveys: Experimental Results from Tanzania.” Journal of
         Development Economics 98 (1): 3–18.
Carpenter, James R., Michael G. Kenward, and Ian R. White. 2007. “Sensitivity Analysis after Multiple
         Imputation under Missing at Random: A Weighting Approach.” Statistical Methods in Medical
         Research 16 (3): 259–75. https://doi.org/10.1177/0962280206075303.
Christiaensen, L., P. Lanjouw, J. Luoto, and D. Stifel. 2011. “Small Area Estimation-Based Prediction
         Methods to Track Poverty: Validation and Applications.” Journal of Economic Inequality 10 (2):
         267–97.
Deaton, A., and M. Grosh. 2000. “Consumption.” In Designing Household Survey Questionnaires for
         Developing Countries: Lessons from 15 Years of the Living Standards Measurement Survey, edited
         by M. Grosh and P. Glewwe, 1:91–134. Washington, DC: World Bank.
Deaton, Angus, and Salman Zaidi. 2002. “Guidelines for Constructing Consumption Aggregates for Welfare
         Analysis.” Living Standards Measurement Study, 135.
Diehr, Paula, Lu Chen, Donald Patrick, Ziding Feng, and Yutaka Yasui. 2005. “Reliability, Effect Size, and
         Responsiveness of Health Status Measures in the Design of Randomized and Cluster-Randomized
         Trials.” Contemporary Clinical Trials 26 (1): 45–58. https://doi.org/10.1016/j.cct.2004.11.014.
Douidich, M., A. Ezzrari, R. van der Weide, and P. Verme. 2013. “Estimating Quarterly Poverty Rates Using
         Labor Force Surveys.” World Bank Policy Research Working Paper, no. 6466.
Eckman, Stephanie, Frauke Kreuter, Antje Kirchner, Annette Jäckle, Roger Tourangeau, and Stanley
         Presser. 2014. “Assessing the Mechanisms of Misreporting to Filter Questions in Surveys.” Public
         Opinion Quarterly 78 (3): 721–33. https://doi.org/10.1093/poq/nfu030.
Fiedler, J.L., and D.M. Mwangi. 2016. “Improving Household Consumption and Expenditure Surveys’ Food
         Consumption Metrics: Developing a Strategic Approach to the Unfinished Agenda.” 1570. IFPRI
         Discussion Paper. Intl Food Policy Res Inst.
Finn, A., and V. Ranchhod. 2015. “Genuine Fakes: The Prevalence and Implications of Data Fabrication in
         a Large South African Survey.” World Bank Economic Review, no. 129203.
Foster, James, Joel Greer, and Erik Thorbecke. 1984. “A Class of Decomposable Poverty Measures.”
         Econometrica: Journal of the Econometric Society, 761–66.
Fujii, Tomoki, and Roy van der Weide. 2016. “Is Predicted Data a Viable Alternative to Real Data?” World
         Bank        Policy      Research        Working         Paper,       no.     7841        (October).
         https://papers.ssrn.com/abstract=2848469.
Graham, J. W., S. M. Hofer, and D. P. MacKinnon. 1996. “Maximizing the Usefulness of Data Obtained with
         Planned Missing Value Patterns: An Application of Maximum Likelihood Procedures.” Multivariate
         Behavioral Research 31 (2): 197–218. https://doi.org/10.1207/s15327906mbr3102_3.


                                                    19
Kenya National Bureau of Statistics. 2018. “Basic Report on Well-Being in Kenya.”
Korinek, Anton, Johan A. Mistiaen, and Martin Ravallion. 2006. “Survey Nonresponse and the Distribution
         of Income.” The Journal of Economic Inequality 4 (1): 33–55. https://doi.org/10.1007/s10888-005-
         1089-4.
Kreuter, Frauke, Susan McCullock, Stanley Presser, and Roger Tourangeau. 2011. “The Effects of Asking
         Filter Questions in Interleafed versus Grouped Format.” Sociological Methods and Research, no.
         1: 88–104.
Krosnick, Jon A. 1991. “Response Strategies for Coping with the Cognitive Demands of Attitude Measures
         in       Surveys.”        Applied      Cognitive         Psychology          5   (3):      213–36.
         https://doi.org/10.1002/acp.2350050305.
Little, Roderick JA, and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley
         & Sons.
Olson-Lanjouw, Jean, and Peter Lanjouw. 2001. “How to Compare Apples and Oranges: Poverty
         Measurement Based on Different Definitions of Consumption.” Review of Income and Wealth 47
         (1): 25–42.
Osier, Guillaume. 2016. “Unit Non-Response in Household Wealth Surveys.” 15. Statistics Paper Series.
         European Central Bank.
Pape, U, and J Mistiaen. 2015. “Measuring Household Consumption and Poverty in 60 Minutes: The
         Mogadishu. Washington DC: World Bank.” Proceedings of ABCA Conference 2015. Washington
         DC: World Bank.
Pape, Utz Johann, and Johan A. Mistiaen. 2018. “Household Expenditure and Poverty Measures in 60
         Minutes : A New Approach with Results from Mogadishu.” World Bank Policy Research Working
         Paper,       Policy      Research       Working        Paper         Series,   ,      no.    8430.
         https://ideas.repec.org/p/wbk/wbrwps/8430.html.
Pape, Utz Johann, and Philip Randolph Wollburg. 2019. “Estimation of Poverty in Somalia Using Innovative
         Methodologies.” World Bank Policy Research Working Paper no 8735 (February): 1–62.
Ravallion, Martin. 1998. “Poverty Lines in Theory and Practice.” LSM133. The World Bank.
         http://documents.worldbank.org/curated/en/916871468766156239/Poverty-lines-in-theory-
         and-practice.
Rolstad, Sindre, John Adler, and Anna Rydén. 2011. “Response Burden and Questionnaire Length: Is
         Shorter Better? A Review and Meta-Analysis.” Value in Health 14 (8): 1101–8.
         https://doi.org/10.1016/j.jval.2011.06.003.
Rubin, Donald B. 2004. Multiple Imputation for Nonresponse in Surveys. Vol. 81. John Wiley & Sons.
Schräpler, Jörg-Peter, Jürgen Schupp, and Gert G. Wagner. 2010. “Changing from PAPI to CAPI: Introducing
         CAPI in a Longitudinal Study.” Journal of Official Statistics 26 (2): 239–69.
Snyder, Claire F., Maria E. Watson, Joseph D. Jackson, David Cella, and Michele Y. Halyard. 2007. “Patient-
         Reported Outcome Instrument Selection: Designing a Measurement Strategy.” Value in Health 10
         (November): S76–85. https://doi.org/10.1111/j.1524-4733.2007.00270.x.
Tourangeau, Roger, Lance J. Rips, and Kenneth Rasinski. 2000. The Psychology of Survey Response.
         Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511819322.
Van Buuren, Stef. 2007. “Multiple Imputation of Discrete and Continuous Data by Fully Conditional
         Specification.” Statistical Methods in Medical Research 16 (3): 219–42.




                                                    20
Appendix A: Performance of Estimation Techniques
Consumption of non-assigned optional modules can be estimated by different techniques. In addition to
the two-step approach presented in the main text, simple summary statistics and simple regression
models can be used.

Summary Statistics (average and median)

This class of techniques applies a summary statistic on the collected module-specific consumption and
applies the result to the non-administered modules. For each module k, a summary statistic
                                 ∗
�������������������������������������������������� | ������������: ������������������������ = �������������� can be computed based on households j to which the module k was administered so
that consumption for household i can be estimated as
                                                                                            ∗
                                              ������������
                                               ������������������������� = �������������������������������������������������� | ������������: ������������������������ = ��������������.

Using this approach, each household is assigned the same consumption per non-administered module.
The summary statistics F can be, for example, a simple average or the median. The median has the
advantage of being more robust against outliers but cannot capture small module-specific consumption
if more than half of the households have zero consumption for the module.

Regression (OLS and Tobit regression)

Module-wise estimation applies a regression model for each module and exploits the differences in
observed household characteristics
                                                                                                           ∗
                                             ������������������������������������ = ������������������������ ������������������������ + ������������������������������������ | ������������: ������������������������ = ������������

so that the deliberately absent consumption can be estimated as

                                                                  ������������           ̂������������ ������������������������
                                                                   ������������������������� = ������������

       ̂ representing the estimated OLS coefficient. Given the impossibility of negative consumption, a
with ������������
Tobit regression with a lower bound of 0 is used instead of a standard OLS regression approach. For the
OLS regression, negative imputed values are set to zero.

Multiple Imputation
Single imputation of the consumption aggregate under-estimates the variance of household consumption.
Depending on the location of the poverty line relative to the consumption distribution, this can either
consistently under- or over-estimate poverty. Thus, the regression can also be embedded in a multiple
imputation framework taking into account the variation absorbed in the residual term estimated via
bootstrapping so that the resulting estimate becomes

                                                           ������������           ̂������������ ������������������������ + ������������̂������������
                                                            ������������������������� = ������������

where ������������̂������������ are repeated draws from the modeled residual distribution.



                                                                             21
Performance Comparison
The comparison of the different estimation techniques reveals that the two-step estimation works well
with highest consistency across different numbers of core items and different numbers of optional
modules outperforming also the simple regression approach (Table 2 and Table 3).

Table 3: Performance by number of core items and estimation technique, using 2 optional modules.

              core items         0           1          3           5          10         20
                           bias cv     bias cv    bias cv     bias cv     bias cv    bias cv
              avg            0.16 0.55 0.16 0.55 0.14 0.51 0.12 0.45         0.1  0.4 0.06 0.29
              med             0.1 0.38    0.1 0.37 0.08 0.32 0.06 0.28 0.05 0.21 0.02 0.11
              mi_2cel        0.02 0.08 0.02 0.08 0.01 0.05 0.01 0.04 0.01 0.03 0.01 0.02
     FGT0




              mi_reg         0.05   0.4 0.05   0.4 0.05 0.39 0.04 0.33 0.04 0.31 0.03 0.24
              reg            0.03 0.09 0.03 0.09 0.02 0.06 0.01 0.04 0.01 0.03 0.01 0.03
              tobit          0.02   0.1 0.02 0.08 0.02 0.07 0.02 0.06 0.01 0.03 0.01 0.02
              avg             0.1 0.72    0.1 0.72 0.09 0.68 0.08 0.63 0.07 0.57 0.05 0.45
              med            0.05   0.5 0.05 0.48 0.04 0.44 0.03       0.4 0.03 0.32 0.01 0.18
              mi_2cel        0.02 0.13 0.01 0.12 0.01 0.08 0.01 0.07 0.01 0.05 <0.01 0.03
     FGT1




              mi_reg         0.06 2.57 0.06 2.53 0.06 2.38 0.05 1.79 0.04 1.58 0.03 1.09
              reg            0.02 0.12 0.02    0.1 0.01 0.06 0.01 0.04 0.01 0.05 <0.01 0.04
              tobit          0.01 0.15 0.01    0.1 0.01 0.09 0.01 0.08 <0.01 0.03 <0.01 0.01
              avg            0.08 0.81 0.08 0.81 0.07 0.78 0.06 0.74 0.05 0.68 0.04 0.56
              med            0.04 0.61 0.04    0.6 0.03 0.55 0.03 0.52 0.02 0.43 0.01 0.26
              mi_2cel        0.01 0.17 0.01 0.16 0.01 0.11 0.01        0.1 <0.01 0.08 <0.01 0.04
     FGT2




              mi_reg         0.12 13.98 0.12 13.52 0.11 12.16 0.08 7.92 0.07      6.6 0.05 3.94
              reg            0.01 0.14 0.01    0.1 0.01 0.07 0.01 0.05 <0.01 0.07 <0.01 0.06
              tobit          0.01 0.24 0.01 0.14 0.01 0.14 <0.01 0.12 <0.01 0.04 <0.01 0.01
              avg            0.19   0.5 0.19   0.5 0.17 0.43 0.14 0.36 0.12       0.3 0.08   0.2
              med            0.15 0.39 0.15 0.39 0.13 0.32       0.1 0.27 0.08 0.21 0.05 0.12
              mi_2cel        0.01 0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.01 <0.01 0.01
     Gini




              mi_reg         0.02 0.04 0.02 0.04 0.01 0.02 <0.01 0.01 <0.01 <0.01 0.01 0.02
              reg            0.03 0.08 0.03 0.08 0.02 0.06 0.02 0.04 0.01 0.03 0.01 0.02
              tobit          0.01 0.03 0.01 0.03 0.01 0.02 <0.01 0.01 <0.01 0.01 <0.01 <0.01




                                                             22
Table 4: Performance by number of optional modules and estimation technique, using 0 core items.

               opt. modules         2          4          6            8
                              bias cv     bias cv    bias cv     bias cv
               avg              0.16 0.55 0.23 0.67 0.25     0.7 0.26 0.71
               med               0.1 0.38 0.15 0.49 0.18 0.54       0.2 0.58
               mi_2cel          0.02 0.08 0.03 0.12 0.04 0.14 0.04 0.15
      FGT0




               mi_reg           0.05   0.4 0.07 0.48 0.07 0.48 0.07       0.5
               reg              0.03 0.09 0.04 0.11 0.05 0.16 0.05 0.17
               tobit            0.02   0.1 0.04 0.13 0.06 0.16 0.09 0.26
               avg               0.1 0.72 0.13    0.8 0.14 0.82 0.14 0.82
               med              0.05   0.5 0.06 0.52 0.06 0.52 0.07 0.53
               mi_2cel          0.02 0.13 0.02 0.19 0.03 0.21 0.03 0.22
      FGT1




               mi_reg           0.06 2.57 0.08 3.53 0.09     3.6 0.09 3.78
               reg              0.02 0.12 0.02 0.14 0.03 0.19 0.03 0.21
               tobit            0.01 0.15 0.02    0.2 0.03 0.18 0.05 0.32
               avg              0.08 0.81 0.09 0.88 0.09 0.89       0.1 0.89
               med              0.04 0.61 0.04 0.62 0.04 0.59 0.04 0.57
               mi_2cel          0.01 0.17 0.02 0.24 0.02 0.26 0.02 0.28
      FGT2




               mi_reg           0.12 13.98 0.18 22.71 0.19 23.59    0.2 25.36
               reg              0.01 0.14 0.01 0.16 0.02 0.21 0.02 0.23
               tobit            0.01 0.24 0.01 0.32 0.02     0.2 0.04 0.35
               avg              0.19   0.5 0.28 0.73 0.31 0.81 0.33 0.85
               med              0.15 0.39 0.24 0.61 0.27 0.69 0.29 0.74
               mi_2cel          0.01 0.01 <0.01 0.01 0.01 0.02 <0.01 0.01
      Gini




               mi_reg           0.02 0.04 0.02 0.06 0.03 0.07 0.03 0.07
               reg              0.03 0.08 0.04    0.1 0.05 0.13 0.06 0.15
               tobit            0.01 0.03 0.01 0.03 0.03 0.07 0.04 0.11




                                                             23
Appendix B: Additional Tables
                               Table 5: Consumption shares of the top 20 items for KIHBS 2005/6 and 2015/16.

                                                 Food                                                                                  Non-Food
Rank                  KIHBS 2005/5                                 KIHBS 2015/16                                   KIHBS 2005/5                            KIHBS 2015/16
   1 Milk - fresh unpacketed              8.7%   Maize Flour - loose                         8.3%   city bus / matatu fares            17.8%   household soap / bar soap   12.3%
   2 Sugar + Sugar cane                   7.6%   Milk - fresh unpacketed                     7.8%   household soap / bar soap          14.4%   city bus / matatu fares     11.2%
   3 Maize Grain - Loose                  7.0%   Sugar + Sugar cane                          6.0%   water                               6.7%   boda boda fare               8.0%
   4 Maize Flour - loose                  6.6%   Beef - with bones                           4.5%   batteries (dry cells)               5.9%   water                        7.4%
   5 Beans                                5.3%   Hotel and restaurants (food + beverages)    4.0%   petroleum jelly                     4.4%   hair dressing (women)        6.7%
   6 Beef - with bones                    4.8%   Kale + Traditional Vegetables               3.9%   detergents                          3.6%   country bus fare             4.6%
   7 Maize Flour - sifted                 3.6%   Beans                                       3.7%   hair dressing (women)               3.3%   detergents                   4.5%
   8 Cooking Fat                          3.4%   Bread (White + Brown)                       3.3%   hair cut (men)                      3.2%   hair cut (men)               4.3%
   9 Rice Grade 2                         3.1%   Maize Flour - sifted                        3.0%   match box                           2.9%   petroleum jelly              4.0%
  10 Cakes                                2.7%   Rice-Grade1-Pishori / Basmati               2.7%   country bus fare                    2.1%   sanitary pads                2.4%
  11 Potatoes                             2.7%   Cooking oil                                 2.7%   fever / pain killers eg panadol     2.0%   toilet paper                 2.3%
  12 Kale + Traditional Vegetables        2.5%   Tomatoes                                    2.4%   shoe polish / cream                 2.0%   Cell phone                   2.1%
  13 Cooking banana                       2.4%   Potatoes                                    2.2%   toothpaste                          1.8%   toothpaste                   2.1%
  14 Tomatoes                             2.3%   Maize Grain - Loose                         2.2%   sanitary pads                       1.8%   batteries (dry cells)        1.9%
  15 Chicken                              1.8%   Wheat Flour (White + Brown)                 2.1%   toilet soap                         1.8%   toilet soap                  1.7%
  16 Tea Leaves                           1.8%   milk - fresh packeted unflavoured           2.0%   medicine anti-malaria               1.8%   match box                    1.7%
  17 Bread (White + Brown)                1.6%   Rice Grade 2                                2.0%   books                               1.4%   medicine anti-malaria        1.4%
  18 Mutton / Goatmeat                    1.6%   Chicken                                     1.8%   Cell phone                          1.4%   body lotion                  1.4%
  19 Onion / Leeks                        1.3%   Mutton / Goatmeat                           1.7%   mattress                            1.2%   cups and saucers             1.3%
  20 milk - fresh packeted unflavoured    1.3%   Banana                                      1.6%   cold tablets / cough syrup          1.2%   cooking sufuria              1.1%
     Total top 20 items                  72.1%   Total top 20 items                         68.1%   Total top 20 items                 80.5%   Total top 20 items          82.6%


                                                          Table 6: Harmonized household variables

                                         Category                           Variable                                     Type
                                         Location                           strata                                       categorical
                                                                            urban                                        binary
                                         Household Characteristics          owns house                                   binary
                                                                            wall type                                    categorical
                                                                            roof type                                    categorical
                                                                            floor type                                   categorical
                                                                            improved drinking water source               binary
                                                                            improved sanitation facility                 binary
                                                                            access to electricity                        binary
                                                                            asset index from PCA                         continuous
                                                                            quartiles of asset index from PCA            categorical
                                                                            number of rooms in household                 continuous
                                                                            quartiles of number of rooms                 categorical
                                                                            number of persons in household               continuous
                                                                            number of children in household              continuous
                                                                            proportion of children in household          continuous
                                                                            number of adults in household                continuous
                                                                            proportion of adults in household            continuous
                                                                            number of seniors in household               continuous
                                                                            proportion of seniors in household           continuous
                                                                            dependency ratio by intervals                categorical
                                                                            at least one member is literate 15+          binary
                                                                            male household head                          binary
                                                                            household head age group                     categorical
                                                                            household head education level               categorical
                                                                            household head employment type               categorical




                                                                                      24
Table 7: Balance tests for KIHBS 2015/16 and CAPI pilot.

                                         KIHBS      CAPI                                                                         KIHBS      CAPI
                      Characteristics                                                            Characteristics
                                        2015/16     Pilot Difference                                                            2015/16     Pilot Difference
number of rooms in household              2.501     2.767  0.266***         Household floor type: other                           0.229     0.247   0.019**
                                         (0.026)   (0.035) (<0.001)                                                              (0.007)   (0.009)  (0.031)
Owns house                                0.595     0.606    0.011          Household dependency ratio: >0.2 & <0.5               0.228     0.263  0.035***
                                         (0.009)   (0.010)  (0.173)                                                              (0.005)   (0.006) (<0.001)
Improved drinking water source            0.765     0.629 -0.136***         Household dependency ratio: >=0.5 & <0.67             0.348     0.357    0.009
                                         (0.007)   (0.009) (<0.001)                                                              (0.005)   (0.007)  (0.217)
Improved sanitation facility              0.657     0.662    0.004          Household dependency ratio: >=0.67 & <=1              0.109     0.111    0.002
                                         (0.008)   (0.008)  (0.594)                                                              (0.003)   (0.004)  (0.567)
HH has access to electricity              0.435     0.414 -0.022***         Male household head                                   0.677     0.680    0.003
                                         (0.009)   (0.010)  (0.002)                                                              (0.005)   (0.006)  (0.739)
Number of children in household           1.632     1.716  0.084***         Household head age group: 30 - 44                     0.392     0.402    0.010
                                         (0.020)   (0.025) (<0.001)                                                              (0.005)   (0.007)  (0.276)
Proportion of children in household       0.312     0.323  0.012***         Household head age group: 45 - 59                     0.228     0.229    0.001
                                         (0.003)   (0.004)  (0.003)                                                              (0.004)   (0.005)  (0.820)
Number of adults in household             2.195     2.253  0.059***         Household head age group: 60+                         0.173     0.173    0.001
                                         (0.015)   (0.019)  (0.009)                                                              (0.004)   (0.005)  (0.880)
Proportion of adults in household         0.625     0.606 -0.020***         Household head education level: primary               0.458     0.457    -0.001
                                         (0.004)   (0.004) (<0.001)                                                              (0.006)   (0.007)  (0.935)
Number of seniors in household            0.158     0.162    0.005          Household head education level: secondary             0.355     0.283 -0.071***
                                         (0.004)   (0.006)  (0.410)                                                              (0.006)   (0.008) (<0.001)
Proportion of seniors in household        0.063     0.060    -0.003         Household head education level: tertiary              0.052     0.126  0.075***
                                         (0.002)   (0.003)  (0.225)                                                              (0.005)   (0.006) (<0.001)
At least one member is literate 15+       0.900     0.734 -0.167***         Household head employment category: self-employed     0.478     0.496   0.019**
                                         (0.003)   (0.008) (<0.001)                                                              (0.006)   (0.007)  (0.024)
Asset index from PCA                      0.007     0.008    0.000          Household head employment category: unemployed        0.082     0.176  0.094***
                                         (0.024)   (0.028)  (0.988)                                                              (0.003)   (0.005) (<0.001)
Household wall type: stone                0.175     0.221  0.046***         Household head employment category: other             0.036     0.017 -0.019***
                                         (0.009)   (0.011) (<0.001)                                                              (0.002)   (0.002) (<0.001)
Household wall type: wood                 0.100     0.195  0.096***         Quartiles of number of rooms: 2nd                     0.222     0.207  -0.015**
                                         (0.004)   (0.007) (<0.001)                                                              (0.005)   (0.006)  (0.027)
Household wall type: brick                0.081     0.070   -0.011*         Quartiles of number of rooms: 3rd                     0.205     0.332  0.127***
                                         (0.006)   (0.003)  (0.066)                                                              (0.005)   (0.008) (<0.001)
Household wall type: other                0.323     0.299  -0.023**         Quartiles of number of rooms: 4th                     0.218     0.152 -0.065***
                                         (0.011)   (0.011)  (0.015)                                                              (0.005)   (0.006) (<0.001)
Household roof type: grass                0.082     0.088    0.006*         Quartiles of asset index: 2nd                         0.316     0.274 -0.042***
                                         (0.004)   (0.004)  (0.074)                                                              (0.005)   (0.007) (<0.001)
Household roof type: other                0.095     0.101    0.006          Quartiles of asset index: 3rd                         0.205     0.256  0.052***
                                         (0.007)   (0.008)  (0.281)                                                              (0.005)   (0.007) (<0.001)
Household floor type: earth               0.294     0.269 -0.025***         Quartiles of asset index: 4th                         0.200     0.156 -0.045***
                                         (0.007)   (0.008) (<0.001)                                                              (0.006)   (0.007) (<0.001)
Observations                             21,585    12,662    34,247         Observations                                         21,585    12,662    34,247


Note: Standard errors for means and p-value for the difference annotated in brackets, based on an adjusted Wald test taking
the survey design into account.




                                                                       25
           Table 8: Model selection for rapid approach and cross-survey estimation.

                                                          rapid              cross-survey
dataset                                            KIHBS 2015/16 pilot KIHBS 2005/6
urban                                              0.0382    (1.71)    0.148***      (7.17)
owns house                                                             -0.0453*      (-2.15)
wall type (category 2)                                                 0.0454*       (2.24)
wall type (category 3)                             0.0419* (2.16)      0.112***      (5.77)
wall type (category 4)                             -0.117*** (-5.18)
wall type (category 5)                             -0.0506* (-2.46)    0.0730**      (2.74)
roof type (category 2)                             -0.143*** (-4.27)   -0.106***     (-5.18)
roof type (category 3)                             0.0985** (3.08)     -0.0610*      (-2.54)
floor (category 2)                                 -0.157*** (-7.70)   -0.166***     (-9.33)
floor (category 3)                                 0.0673** (2.98)     -0.111**      (-2.75)
improved drinking water source                     0.0398** (2.64)     0.0532***     (3.88)
improved sanitation facility                       -0.0541** (-3.02)   0.0671***     (5.17)
access to electricity                              0.0383    (1.63)    0.0991***     (4.25)
asset index from PCA                               0.0442** (3.14)     0.110***      (19.98)
quartiles of asset index from PCA (2nd quartile)   0.0665** (2.89)     0.0293        (1.46)
quartiles of asset index from PCA (3rd quartile)   0.118*** (3.69)     0.0349*       (2.37)
quartiles of asset index from PCA (4th quartile)   0.105     (1.95)
number of rooms in household                       0.0380*** (4.05)    0.0382***     (8.62)
quartiles of number of rooms (2nd quartile)        0.0524* (2.22)      0.0263*       (1.97)
quartiles of number of rooms (3rd quartile)        0.0534    (1.76)
quartiles of number of rooms (4th quartile)        0.0126    (0.25)
number of persons in household                     -0.00942 (-0.20)    0.194         (1.34)
number of children in household                    -0.0562 (-1.17)     -0.226        (-1.56)
proportion of children in household                0.475*    (2.13)    -1.024***     (-16.40)
number of adults in household                      -0.148** (-3.06)    -0.352*       (-2.43)
proportion of adults in household                  1.148*** (5.18)
number of seniors in household                     -0.179*** (-3.30)   -0.379**      (-2.60)
proportion of seniors in household                 1.162*** (5.04)
dependency ratio by intervals (2nd interval)                           0.0539**      (3.07)
dependency ratio by intervals (3rd interval)                           0.0237        (1.37)
at least one member is literate 15+                                    0.0528*       (2.44)
male household head                                -0.0384* (-2.30)
household head age group (category 2)                                  -0.0413*      (-2.08)
household head age group (category 3)                                  -0.0322       (-1.40)
household head age group (category 4)                                  -0.0770**     (-2.68)
household head education level (category 2)        0.117*** (4.52)     0.0628**      (3.15)
household head education level (category 3)        0.134*** (4.54)     0.132***      (5.52)
household head education level (category 4)        0.186*** (5.24)     0.351***      (7.27)
household head employment type (category 2)        0.0632*** (3.30)
household head employment type (category 3)        -0.0418 (-1.58)     -0.0778*** (-3.89)
household head employment type (category 4)        -0.135    (-1.34)   0.0227        (1.28)
assigned 2nd module                                0.0224    (1.26)
assigned 3rd module                                -0.140*** (-7.88)
constant                                           -0.587** (-2.62)    1.352***      (34.65)
N                                                  12658               12695
R-sq                                               0.373               0.511
adj. R-sq                                          0.371               0.509
AIC                                                21502.0             20232.4


                                             26