Global Poverty Monitoring Technical Note                               35




   Changes to the Extrapolation Method
      for Global Poverty Estimation

                       Daniel Gerszon Mahler and David Newhouse




                                     March 2024



Keywords: Extrapolation, nowcasting, missing data, national accounts




Development Data Group
Development Research Group
Poverty and Equity Global Practice Group
GLOBAL POVERTY MONITORING TECHNICAL NOTE 35


Abstract
This technical note summarizes changes to how the Poverty and Inequality Platform (PIP)
lines up survey-based estimates of poverty to a common reference year. The prior line-up
method assumed that welfare vectors grow in accordance with growth in real Household
Final Consumption Expenditure (HFCE) per capita for all countries except for countries
in Sub-Saharan Africa, where growth in real GDP per capita was used instead. This note
leverages PIP data to test various alternative line-up rules and evaluates their performance
out of sample. It proposes an equally simple rule that can reduce the error as measured by
the mean absolute deviation by 7.5%, which is estimated to amount to 60% of the potential
error that can be reduced given available information. This rule is that upper middle-
income and high-income countries use growth in real HFCE per capita, while low and
lower middle-income countries use growth in real GDP per capita, and that only 70% of
growth in either HFCE or GDP passes through to growth in consumption vectors.


Both authors are with the World Bank Development Economics Data Group. Corresponding author:
Daniel Gerszon Mahler (dmahler@worldbank.org). The authors are thankful for comments and guidance
received from the Global Poverty Working Group under the leadership of Deon Filmer, Haishan Fu, Luis-
Felipe Lopez-Calva, and Carolina Sanchez-Paramo. The authors are also grateful for comments from
Christoph Lakner, Espen Beer Prydz and Sergio Olivieri. The authors gratefully acknowledge financial
support from the UK government through the Data and Evidence for Tackling Extreme Poverty (DEEP)
Research Programme.




The Global Poverty Monitoring Technical Note Series publishes short papers that document methodological aspects of
the World Bank’s global poverty estimates. The papers carry the names of the authors and should be cited accordingly.
The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not
necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its
affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Global
Poverty Monitoring Technical Notes are available at https://pip.worldbank.org/publication.
   1. Introduction
This note summarizes changes to the method used by the Poverty and Inequality Platform (PIP)
for extrapolating, or “lining up�?, country poverty rates to a reference year. Poverty rates are
calculated using household surveys that collect household per capita consumption or income
(referred to henceforth as ‘welfare’) depending on the country. Lining up these survey-based
estimates is necessary because the surveys measuring poverty are not fielded each year, but
global and regional poverty estimates need to be reported for a common year. The prior method
shifted the welfare measures obtained from a household survey by a common scale factor, which
is equal to a measure of growth taken from national accounts data: growth in real Household
Final Consumption Expenditure per capita (henceforth ‘growth in HFCE’) for countries outside
of Sub-Saharan Africa, and growth in real GDP per capita (henceforth ‘growth in GDP’) for
countries in Sub-Saharan Africa (Prydz et al. 2019).

This rests on four key assumptions:
   1. The shape of the welfare distribution changes little over the extrapolation period.
   2. Growth in welfare can be proxied reasonably well by growth from national accounts data.
   3. All growth from national accounts data is passed through to growth in welfare.
   4. In Sub-Saharan Africa, growth in GDP is a better proxy for growth in welfare than growth
        in HFCE, while the opposite is the case everywhere else.

In recent research, we have found that the first two assumptions are difficult to improve upon
without using microsimulation tools (Mahler et al. 2022). In particular, efforts to model inequality
add little to nowcasting accuracy, and more sophisticated models using a thousand variables to
predict growth in the mean perform about as well as scaling up the mean by a fraction of GDP
growth. In contrast, the third assumption is not supported by the data, as there is a large and
growing gap between means from national accounts and those from survey data (Prydz et al.
2022). Little systematic research has been carried out on the final assumption regarding the
relevance of HFCE in Sub-Saharan Africa for extrapolations. The decision to use growth in GDP
instead of HFCE dates to the early 2000s and was justified partly by the difficulty of obtaining
HFCE estimates in Sub-Saharan Africa. The other justification is the weak empirical correlation
between growth in HFCE and survey consumption in that region using data from the 1990s,
although the reported correlation was equally weak in several other regions (Ravallion 2003).

To address concerns about the third and fourth assumption, we propose two changes
summarized in Table 1. These changes are proposed because they offer a combination of accuracy
and simplicity.

The first major change introduces a “passthrough rate�?, a factor that determines what share of
growth from national accounts that “passes through�? to the survey welfare measure. We adopt a
passthrough rate of 0.7 for countries that use consumption to measure welfare. In other words, if
a country using consumption grew 2 percent the year following a survey according to national

                                                                                                  1
accounts, consumption measured in the survey would be increased by 1.4 percent. The selection
of a 0.7 passthrough rate, and the decision to only apply this to countries that use a consumption
aggregate is based on empirical evidence presented in what follows, which shows a systematic
tendency of growth in mean consumption measured in surveys to consistently underestimate
growth in national accounts. There is no similar pattern for income.

                            Table 1: Summary of proposed changes

                                       Prior method                            New method
                             •   1 (full passthrough) for all   •   0.7 for consumption aggregates
        Passthrough rate
                                 countries                      •   1 for income aggregates
                             •   HFCE outside of Sub-           •   HFCE for upper middle-income
        Measure of               Saharan Africa                     and high-income countries
        economic growth      •   GDP in Sub-Saharan             •   GDP for low and lower middle-
                                 Africa                             income countries


The World Bank has used passthrough rates for nowcasting and forecasting since at least 2016
(World Bank 2016). This has generated an inconsistency between the method used for
extrapolating backwards and forwards for historical estimates, and the method used for
generating more recent estimates. The approach developed here removes this inconsistency.

The second major change alters the measure of economic growth that is used to scale up survey-
based estimates of welfare. In particular, we suggest using GDP growth for all countries classified
as low-income or lower middle-income, and HFCE growth for countries classified as upper
middle-income and high-income.

A myriad of methods exists to extrapolate estimates of poverty backwards and forwards in time
(World Bank 2023). In this paper, we restrict ourselves to a very narrow set of methods; those that
extrapolate entire welfare vectors using a fraction of growth rates in national accounts. These are
referred to as “distribution-neutral�? methods because they assume that a common scale factor is
applied all households’ welfare in the survey. It is possible that microsimulation tools, which
allow different households to experience different growth rates, could outperform the methods
used here, although we are not yet aware of systematic global evidence that demonstrates this.
For a discussion of how to evaluate microsimulation models, see Walsh and De Menezes (2023).

The rest of the paper is organized as follows. The next section outlines the prior method used for
lining-up in PIP while section 3 details how we test alternative approaches. Section 4 presents the
main results and section 5 contains robustness checks. Section 6 shows the implications on global
and regional poverty while section 7 discusses the results and concludes.




                                                                                                     2
    2. Prior approach
Few countries have survey estimates of poverty available every year. To estimate poverty at the
regional and global level, the survey estimates need to be aligned to a reference year and
aggregated. Such alignment and aggregation require assumptions about how to interpolate and
extrapolate data. Below we explain how the extrapolation was done prior to the changes
implemented with this paper.

For countries that do not have welfare aggregates at or since a specific reference year, but for
which earlier welfare aggregates are available, PIP extrapolates their most recent aggregate
forward using growth rates from national accounts. This is done by first finding the growth in
national accounts that occurred between the survey and the reference year, �����������?������,������,������ , and scaling
the survey welfare distribution, ������(������������ ), by this growth factor. This yields an estimate of the welfare
distribution at the reference year, ������(������������ ), as summarized below:

                                   ������(������������ ) = (1 + ������ ∗ �����������?������,������,������ ) ∗ ������(������������ )     [1]

Prior to the implementation of the suggestion of this paper, real GDP per capita was used for
countries in Sub-Saharan Africa, while real HFCE per capita was used for all other countries. ������
indicates the share of growth from national accounts passed through to growth in welfare. In the
prior approach, this was assumed to be 1.

Poverty for the reference year is then estimated using this extrapolated distribution. The
extrapolation method assumes distribution-neutral growth, i.e. that everyone’s welfare grows at
the same rate. This implies that inequality is assumed to stay constant. A similar approach is used
to extrapolate backwards, when the earliest survey estimate available is more recent than the
desired reference year.

Figure 1 shows the consumption aggregate from the 2015.5 Ethiopian survey expressed in 2017
PPP-adjusted dollars, and the 2018 Ethiopian distribution one would get from extrapolating this
welfare vector to 2018 using the method described above. The 2015.5 daily consumption mean is
$3.12 per capita and the growth rate in national accounts between 2015.5 and 2018 is 14.5%. The
positive growth pushes the distribution to the right, changing the share of extreme poor from
30.8% in 2015.5 to 22.3% in 2018.

The extrapolation only applies after the latest survey or before the first survey. In cases where
the reference year falls between two surveys, poverty is interpolated for the reference year
using the nearest survey on each side of the reference year, and the resulting estimates are
averaged. In general, such interpolations are less dependent on whether a passthrough rate
less than 1 is applied and which measure from national accounts is used. For more information
on interpolations, see Prydz et al (2019). This paper only deals with extrapolations.




                                                                                                       3
                              Figure 1: Illustration of extrapolation




   3. Data and Methodology
   3.1 Sample selection
We conduct an empirical analysis of 1,940 surveys from PovcalNet, the precursor to PIP, covering
158 countries between 1981 and 2019. We rely on the spring 2021 vintage of the data. We merged
this data with national accounts data from the World Development Indicators, supplemented
when necessary with data from the World Economic Outlook and Maddison Project Database.
We retain information on real GDP per capita, real HFCE per capita, real FCE per capita (the sum
of HFCE and government expenditure) and real GNI per capita. We chose these variables based
on what has been used in past research to extrapolate welfare vectors forward and what was
found to be most predictive of growth in mean welfare in a companion paper (Mahler et al. 2022).

We convert these data into a dataset of spells. A spell refers to two adjacent household surveys
from a country measured using comparable welfare aggregates, using PIP’s information on
survey comparability within countries over time. We retain 1,501 spells spanning 141 countries.
For each spell, we calculate the annualized growth in mean welfare and the annualized growth
in the four national accounts variables mentioned above.

Instead of computing spells using adjacent household survey data, we could have calculated spells
using any comparable two estimates, no matter if other surveys were constructed in between.
Suppose for example a country has comparable welfare aggregates in 2010, 2015, and 2020. Our
method consists of creating two spells from 2010-15 and 2015-20, but we could likewise have
created a spell from 2010-2020. Including these longer spells does not change our qualitative
results but lowers the errors of all models we test, likely because idiosyncratic errors decrease in
importance over longer spells (Appendix Figure A.1). We chose only to work with adjacent spells

                                                                                                  4
in the main specification because these spell lengths reflect the approximate length for which
actual extrapolations take place. In other words, the performance over 20 years for a country with
annual data is less relevant for this exercise because we rarely need to extrapolate 20 years for
such a country.

Another choice would have been not to restrict our sample to comparable spells but use any spells,
even if the welfare aggregates are not comparable. This could be relevant under the following
rationale: Suppose that incomparabilities often happen because the survey instrument changes
in a way such that more consumption gets captured in the newer survey. This would imply that
passthrough rates would be higher (as welfare growth between non-comparable surveys would
be higher than between comparable surveys, all else equal). If we believe that there will be similar
noncomparabilities in the future, then accounting for it in the passthrough rates would give
nowcasts closer to what eventually will be captured from survey data. We opt only to use
comparable spells because we are interested in extrapolating the current welfare vector forward
and backwards in time in a manner that retains comparability with the survey-based welfare
aggregate. Because of this, the extrapolations should be interpreted as changes in poverty due to
economic factors, excluding changes in how consumption or income is measured in surveys.

Most survey data pre-2000 is rarely used, and we are arguably more interested in extrapolating
poverty from 2000 onwards. As another robustness check, we try only using spells that start in
2000 or later to see if this subsample yields different results. This does not appear to be the case,
though it again lowers the errors of all models, likely because welfare aggregates were measured
with more noise in earlier decades. Using post-2000 data, Final Consumption Expenditure as the
national accounts measure performing relatively better on this subsample (Appendix Figure A.2).

    3.2 Model estimation
To estimate passthrough rates and evaluate different types of national accounts data, we run
regressions of the following form:

                               ������������������������������,�����?,������ = ������ ∗ �����������?������,�����?,������ + ������ ∗ �����������?������,�����?,������ ∗ �����������?,������ + �����������?,������        [2]

Here ������������������������������,�����?������ represents the annualized growth rate in mean welfare (consumption or income)
per capita in country �����? for spell ������. �����������?������,�����?,������ represents the annualized growth rate taken from
national accounts data between the two surveys of a particular spell. �����������?,������ represents a candidate
characteristic for calculating separate passthrough rates for different countries or spells. We
examine seven different characteristics: Region, welfare type (income vs. consumption), fragility
status, income group, the decade of the spell, the length of the spell, and the sign of the growth
rate. For income group and fragility status, we use the classification at the end of the spell to
maximize the number of spells for which we can assign an income group or fragility status.1 ������ +

1Spells ending before 1989 have no income classification and spells ending before 2000 have no fragility classification,
using World Bank definitions. When running regressions, we assign these spells to a separate group so the regressions


                                                                                                                      5
������ ∗ �����������?,������ equals the estimated passthrough rate for country �����? and spell ������ -- the percentage point
change in the welfare growth rate associated with a one percentage point increase in the national
accounts growth rate.

The specification largely follows Ravallion (2003), with the exception that we do not include an
intercept term. Including an intercept would make the passthrough rates harder to communicate.
Rather than the passthrough rate equaling ������ + ������ ∗ �����������?,������ , it would equal ������ + ������ ∗ �����������?,������ + α�?������������?������,�����?,������ , and
every single spell would have its own passthrough rate based on its national accounts growth
rate. Including an intercept would also imply that even if a country has no growth in its national
accounts, its mean welfare would grow by α, which seems counterintuitive. For these reasons,
we prefer the specification without intercepts. As a robustness check, we ran the regressions with
an intercept. The intercepts were rarely significantly different from zero (a finding similar to
Ravallion 2003) and excluding them gave better out-of-sample accuracy (Appendix Figure A.3).
This was also true when we included group-specific intercepts defined by �����������? .

When running the regressions, we weight each country equally, meaning that for a country with
10 spells, each spell gets a weight of 0.1, while for a country with two spells, each spell gets a
weight of 0.5. HFCE, GNI, and FCE estimates are missing for some spells. To avoid the results
being driven by the use of different samples across regressions, we replace missing predictions
when HFCE, GNI, or FCE is missing with the predictions from the same regression using GDP
data.

    3.3 Model evaluation
To evaluate the predictions, we use ten-fold cross-validation to ensure that we do not overfit the
model. We repeat ten-fold cross-validation ten times to average out any randomness with a
particular fold partition. We also tried using temporal-block cross-validation (Roberts et al. 2017),
which instead of dividing the spells into random folds, sequentially holds-out data from a specific
time range. This does not change our results (Appendix Figure A.4).

To evaluate the fit, we calculate the mean absolute deviation (MAD) between the predicted and
the withheld survey growth rate in mean welfare:
                                                1      1                             ∗
                                    ������������������ =        ∑            ̂������������������������,�����?,������ − ������������������������������,�����?,������
                                                             ∑ |������                                  |              [3]
                                               �����?�����?   �����?�����������?
                                                   �����?∈������       ������∈�����������?

Here C is the set of countries, ������������ is the set of spells of a country, �����?�����? is the number of countries in
                                                             ∗
the data, �����?�����������? is the number of spells of country �����?, ������������������������������,�����?,������ is the (true) growth rate in welfare
                                               ̂������������������������,�����?,������ is the predicted growth rate in welfare
calculated from the withheld survey data, and ������


all have the same sample of spells. In practical implementations of our preferred approach, which relies on income
group status, we suggest forward-extrapolating the last welfare vector year-by-year continually using the income
classification of the year in question. For backward-extrapolating the welfare vector to the 1980s, we suggest using the
first income classification available.

                                                                                                                      6
estimated from the regression. Each country is given equal weight in the calculation of the mean
absolute deviation, so the accuracy measure is not disproportionately weighted towards
countries with more survey data. We prefer the mean absolute deviation over the mean squared
error because the latter gives disproportionate weight to extreme outcomes. In Appendix Figure
A.5, we show that using the mean squared error gives similar results.

When using the MAD, we create an inconsistency between how we estimate passthrough rates
(which relies on an OLS regression, and hence minimize the MSE) and how we assess the
performance of this model fit, which relies on absolute deviations. As a robustness check, we
instead run quantile regressions estimating the conditional median, as quantile regressions
minimize mean absolute deviations. Doing so gives rise to the same model selection (Appendix
Figure A.6), with one exception: whereas our main results suggest using GDP for low-income
and lower-middle income countries, quantile regressions give a marginally lower error when
only using GDP for low-income countries. Yet, as the error of that model is greater than our
preferred model using OLS, we stick to OLS.

We do not estimate the uncertainty of the mean absolute deviation of the different models. This
is largely because our accuracy measures are obtained using cross-validation. Estimating proper
measures of uncertainty from statistics obtained using cross-validation is not straightforward due
to correlations between the measured accuracies of each fold (Bates et al. 2022). Although nested
cross-validation methods show some promise, their ability to produce accurate uncertainty
estimates has not been established in all settings.


   4. Results
   4.1 Finding the best performing model
We first calculate the mean absolute deviation from running various iterations of regression [2]
with different national accounts measures and different covariates to estimate group-specific
passthrough rates. We consider nine candidate covariates for which passthrough rates are
allowed to differ, as well as the prior practice of setting the passthrough rate equal to one, as well
as estimating a constant passthrough rate for all observations (without using any covariate,
referred to as “no covariate�? in the figures that follow). The results are shown in Figure 2. Each
dot is the result of a different regression with different estimated passthrough rates (safe for when
the passthrough rate is set to equal). Each color shows the regression results using different
national accounts measure in equation [2].




                                                                                                    7
                  Figure 2: Error by national accounts measure and covariate




                       Note: The legend refers to the national accounts measure used. X-axis
                       categories are sorted by the average error across national accounts
                       measures. The error with FCE and passthrough=1 is larger than the
                       maximum value of the y-axis. “No covariate�? refers to running the
                       regression without a covariate and estimating one unique
                       passthrough rate for all observations.



All methods yield errors that are quite high. The prior method, which used GDP for Sub-Saharan
Africa and HFCE for the remaining countries along with a passthrough equal to one, has an
average error of 3.58. This means that on average across all spells, the annualized predicted
growth rate in mean welfare is 3.58 percentage points off the survey-based estimates. By
comparison, the standard deviation of the distribution of annualized growth in mean welfare is
6. The error is reduced when passthrough rates are applied. Most notably, a passthrough rate by
income and consumption type provides the lowest average error. Using a passthrough rate by
income and consumption together with using GDP for Sub-Saharan Africa and HFCE for all else
would reduce the error from 3.58 to 3.37, a reduction of 6%.

Somewhat surprisingly, using passthrough rates that vary by covariates can yield higher average
errors than using a constant passthrough rate for all observations. This indicates that it is very
easy to overfit the data. To check this more thoroughly, we look at the average errors if an income-
consumption specific passthrough rate is used in conjunction with region and/or income group
specific passthrough rates -- the two candidate covariates that provided the best fit with the mixed
national accounts measure. Figure 3 shows the results. Evidently using income-consumption
passthrough rates and region and/or income group specific passthrough rates increases the error.
Hence, more complex models easily increase the error.




                                                                                                  8
             Figure 3: Error by national accounts measure and multiple covariates




                       Note: The legend refers to the national accounts measure used.

The evidence thus far supports the national accounts measure that uses GDP for Sub-Saharan
Africa and HFCE for the rest, while having a separate passthrough rate for income and
consumption aggregates. This mixed national accounts measure may work well because the
calculation of GDP in developing countries gets relative more attention than HFCE. Contrary to
common beliefs, there is no compelling evidence that developing countries are worse at
measuring GDP growth than developed countries (Angrist et al. 2021). HFCE may be measured
more poorly in developing countries because countries with low statistical capacity measure
HFCE as the residual portion of GDP. If this is indeed the case, it would be better to use HFCE
for countries with high statistical capacity and GDP for countries with low statistical capacity,
rather than using it for countries in Sub-Saharan Africa.

Checking this requires a measure of statistical capacity. This runs into the challenge that there is
no cross-country series of statistical capacity dating from 1981 to the present time as far as we
know. Two of the World Bank’s attempts at measuring statistical capacity (the statistical capacity
index and its successor, the statistical performance indicators) do not have sufficiently long
temporal coverage. We therefore proxy statistical capacity with country income groups, which
are based on thresholds for per capita GNI determined by the World Bank. We see if using GDP
for countries below a certain income level can improve the fit. The results are shown in Figure 4.
In Appendix Figure A.7, we proxy statistical capacity with the share of years in which an
international poverty estimate exists within a five-year window on each side of a reference year.
This does not give rise to a lower error than what is showed below.




                                                                                                  9
                     Figure 4: Error by use of GDP for various subgroups




                      Note: GDP shows the error if only using GDP data, and similarly for
                      HFCE. LIC = Low-income country, LMIC= Lower middle-income
                      country, SSA = Sub-Saharan Africa.


Using GDP for low-income and lower middle-income countries (LICs and LMICs) instead of for
Sub-Saharan Arica reduces the error from 3.37 to 3.32. One could question whether it is precisely
at the LMIC/UMIC cut-off that HFCE starts being more accurate at predicting growth in welfare.
We explore this by looking at the errors of only using GDP or HFCE at any given welfare level.
Though average welfare does not map exactly to income groups, for each average welfare level,
we overlay the likeliest income group. The results are shown in Figure 5.

                    Figure 5: Error using GDP and HFCE by welfare level




                 Note: Average error using only GDP or only HCE data for spells around a
                 given mean welfare level. Income-group overlays show the likeliest income
                 group at each welfare level.

                                                                                              10
It turns out that it is exactly at the UMIC threshold that HFCE tends to produce a lower error than
GDP. For that reason, our preferred model is to use GDP for LIC and LMICs and HFCE for UMICs
and HICs. Doing so leaves us with a reduction in the error of 7.5% vis-à-vis the prior method.
Although this may seem like a small reduction in the error, there is a lot of irreducible error in
the estimates, so getting to zero, or even close to zero is impossible. One useful benchmark is the
error from using 1000+ variables and machine learning to predict growth in mean consumption.
Such an exercise from Mahler et al. (2022) gave an error of 3.16 pct. points. If we treat this as a
lower bound, the suggested model would eliminate more than 60% of the reducible error.


   4.2 Estimating the passthrough rates
The best-performing model from the prior analysis separates spells by whether an income or
consumption aggregate are used and by whether the country is a LIC/LMIC or UMIC/HIC. This
means that there will be four passthrough rates. Estimating these passthrough rates on all past
data gives the results shown in Figure 6. Tables with the underlying regression output are shown
in Appendix Table 1.

                               Figure 6: Estimated passthrough rates




               Note: Estimated passthrough rates for consumption aggregates (left part) and
               income aggregates (right part). GDP data is used for LICs and LMICs while HFCE
               data is used for UMICs and HICs. The error bars represent 95th percentile
               confidence intervals.

The two consumption passthrough rates are both close to and statistically indifferent from 0.7.
For that reason, we suggest to proxy consumption passthrough rates by 0.7, denoted by the red
horizontal line in the left panel of Figure 6. The income passthrough rate for UMICs and HICs
using HFCE is 1.02, and not statistically different from 1. For that reason, we suggest proxying it
by 1. The estimated passthrough rate for income aggregates for LIC and LMICs using GDP is 1.34,


                                                                                                11
but this estimate is associated with large uncertainty. This is because few LICs and LMICs use
income aggregates. There are only 169 such cases, most of which concerns spells from countries
in Latin America from the 1990s and 2000s. Because this combination of welfare and national
accounts measure is rare and 1 is within the lower bound, we propose also to use a passthrough
rate of 1 in these cases.

Note that these passthrough rates are consistent with evidence from Lakner et al. (2022), which
finds a passthrough rate of 1.01 for income aggregates and 0.72 for consumption aggregates, and
with Wollburg et al. (2023), which finds a passthrough rate of 0.70 for consumption aggregates
(while not estimating passthrough rates for income aggregates). It is largely consistent with
evidence from Prydz et al. (2022) which finds a passthrough rate of 0.75 (0.73) for consumption
aggregates with GDP (HFCE) and 0.94 (0.89) for income aggregates with GDP (HFCE), all of
which are not significantly different from 0.7 and 1, respectively.


   5. Robustness checks
   5.1 Resource rents
There are reasons to believe that relying on GDP data is less appropriate in certain circumstances.
For example, Mahler et al. (2022) found GDP extrapolations to be less accurate when specific
components of GDP make up an unusually large share of overall GDP growth, or during periods
of large price increases. Of particular concern may be when GDP growth is driven by natural
resource rents, as they may not trickle down to households’ consumption. In this case using HFCE
could be preferable to GDP, even for low- and lower-middle income countries. Figure 7 tests this.
It shows the errors using either GDP or HFCE, both with a separate passthrough rate for income
and consumption aggregates, as a function of natural resource rents (i.e. for spells within a
particular range of natural resource rents). It uses data only for low-income and lower-middle
income countries.

Figure 7 shows that using GDP growth does not lead to less accurate results than using HFCE
growth for countries with high natural resource rents. On the contrary, GDP performs slightly
better, although the difference is far from statistically significant. As such, the plot suggests that
switching to HFCE when GDP growth is driven by natural resources will not improve accuracy.

   5.2 Bias
It is possible for a method to predict growth in the mean well while being biased in doing so.
Given that a bias would systematically over- or underestimate poverty, it is desirable that the bias
is small. We calculate the bias as
                                           1      1                             ∗
                             ������������������������ =        ∑            ̂������������������������,�����?,������ − ������������������������������,�����?,������
                                                        ∑ (������                                  )
                                          �����?�����?   �����?������������
                                              �����?∈������      ������∈������������




                                                                                                   12
                            Figure 7: Errors by natural resource rents




                      Note: Errors of using separate passthrough rates for income and
                      consumption aggregates while relying only on GDP or HFCE data,
                      as a function of total natural resource rents. Only spells from low-
                      and lower middle-income countries are included.



This is similar to our primary loss function, but it does not take the absolute value before taking
the mean. Biases by national accounts measure and passthrough covariate are shown in Figure 8.

The prior method had a bias of 0.28, meaning that it overstated growth in mean consumption by
0.28 percentage points on average and therefore underestimated poverty rates. The new method
(the black dot at the “Income/consumption�? x-axis category) has a bias of -0.19, so it
systematically underpredicts growth in mean consumption. Though this represents a 32% decline
in the magnitude of the bias, it is worthwhile considering if other methods could decrease the
bias further. Nearly all methods have equal or larger biases. The only exception is the method
that estimates separate passthrough rates by whether the growth rate in national accounts is
positive or negative. The bias in such a model with the proposed national accounts variable is
-0.14. One could speculate that separate passthrough rates by positive or negative GDP growth
arises because individuals are loss averse and would like to smooth consumption in times of
crisis. This would make passthrough rates from GDP to consumption aggregates lower for
negative growth spells. Yet we find that the passthrough rate is 0.7 from GDP to consumption
both for negative and positive growth spells. In negative growth spells, the passthrough rate is
lower from GDP to income aggregates and from HFCE to consumption and income aggregates,
which cannot be attributable to consumption smoothing. Due to our inability to rationalize these
differential passthrough rates, and because this model increases the mean absolute deviation
from 3.32 to 3.36, we prefer the simpler model.


                                                                                                13
                      Figure 8: Bias by national accounts variable and passthrough covariate




                                          Note: The legend refers to the national accounts measure used.
                                          X-axis categories are sorted by the average absolute value of the
                                          bias across national accounts measures. “No covariate�? refers to
                                          running the regression without a covariate and estimating one
                                          unique passthrough for all observations.

In general, the bias seems to appear because the relationship between passthrough rates and
growth rates is not fully linear. A model which includes a piece-wise second-order polynomial,
such              that           ������������������������������,�����?,������ = ������ ∗ �����������?������,�����?,������ + ������1 ∗ �����������?������,�����?,������ ∗ ������[�����������?������,�����?,������ < 0] + ������2 ∗ �����������?������,�����?,������ 2 ∗ ������[�����������?������,�����?,������ > 0] + ������3 ∗ �����������?������,�����?,������ 2 ∗
������[�����������?������,�����?,������ < 0] + �����������?,������, reduces the bias to nearly zero, but again with increases in the mean absolute
deviation. This model would also have the disadvantage of implying a different passthrough rate
for each spell which is more challenging to explain. Another way to reduce the bias would be to
include the constant term, but again this comes at the expense of increasing the overall error and
complexity of the model.

       5.3 Distribution of errors
Another possible concern is that the new method might perform better on average but may be
worse at some points of the distribution or errors. Figure 9 plots the cumulative distribution of
the error with the prior method and the new method. It shows that the new method nearly first
order dominates the prior method in terms of error magnitudes.

       5.4 Stability of passthrough rates
Yet another concern may be that the passthrough rates are not stable over time. We have already
shown that calculating separate passthrough rate by decade does not reduce the error, yet Figure
10 shows there is a minor downwards trend in passthrough rates over time (or at least until 2013).



                                                                                                                                                                                   14
Though this trend is hardly statistically significant, it could suggest revisiting the estimation of
passthrough rates periodically.


                                   Figure 9: Distribution of errors




                   Note: The figure shows the inverse cumulative distribution function of
                   errors with the prior method and the new method.




                    Figure 10: Passthrough rates estimated at different years




                  Note: Estimated passthrough rates using only spells in a 10-year window (five
                  years on both sides). GDP data is used for low- and lower-middle-income
                  countries while HFCE data is used for upper middle-income and high-income
                  countries.




                                                                                                  15
    5.5 Using the CPI deflator
Another concern is that the GDP and HFCE deflators used do not reflect the price changes
consumers experience. If this is the case, one could instead use the nominal growth in GDP or
HFCE deflated by the inflation measured from the consumer price index. Doing so gives notably
larger errors throughout, as shown in Figure 11.

                   Figure 11: Errors by national accounts measure and deflator




                          Note: X-axis categories are sorted by average error across
                          national accounts measures. “No covariate�? refers to running
                          the regression without a covariate and estimating one unique
                          passthrough for all observations. ‘GDP’ uses the GDP deflator
                          and ‘HFCE’ the HFCE deflator.

    5.6 Using different passthrough rates at various parts of the distribution
It is possible that growth on expectation trickles down to different parts of the distribution at
different rates. If, for example, the elite disproportionately capture the benefits of economic
growth, the passthrough rate at the top of the distribution would be higher than at the bottom of
the distribution. This would imply that using one passthrough rate together with distribution-
neutrality would be a poor choice. We test for this by calculating the passthrough rate from GDP
growth to each percentile of the distribution. To that end, we have derived the welfare level at
100 points of the distribution of each survey. We take the points on the distribution where the
CDF equals 0.005, 0.015, … ,0.995. For the purpose of this exercise, we pool the comparable income
and consumption spells.

Figure 12 shows that the passthrough rate is relatively stable over the distribution. If anything, it
is slightly higher at the bottom, though this is not statistically significant. The fact that it is slightly
higher at the bottom is equivalent to saying that over the entire sample of spells we consider,
inequality on average decreased a little.


                                                                                                         16
         Figure 12: Passthrough rate from GDP growth to each percentile of welfare




              Note: The figure shows then ������ ’s from 100 regressions of the following form,
              ������������������������������,�����?,�����?,������ = ������ ∗ �����������?������,�����?,������ + �����������?,������ , where p reflects the welfare at a percentile.


   6. Implications for global and regional poverty
Figure 13a shows the implications of implementing our proposed method vis-à-vis the prior
method on global poverty. We continue to rely on the spring 2021 vintage of data. Evidently, the
changes to global poverty are hardly noticeable, with the exception of the 1990s where the
proposal would lower the global poverty rate by about 1 percentage point.

This global picture masks important changes at the regional and country level. At the regional
level, particular the Middle East & North Africa and Europe & Central Asia experience visible
changes (Figure 13b). In the Middle East and North Africa, the trend is also impacted somewhat,
with our proposed method removing a one-off hike in poverty in the early 90s and mitigating the
increase in poverty projected in recent years. As a result of the latter, the Middle East & North
Africa would not be poorer than Latin America & the Caribbean in recent years. Notably, the
regional trend for the Middle East and North Africa is also smoother with the proposal, as will
be explained in more detail below.

These patterns can be explained by looking at country-level difference. Figure 14 shows the 15
countries with the largest change in poverty at any given year if our proposal were to be
implemented. The changes in Middle East and North Africa in the 1990s are driven by Iraq, while
the change in the recent years is driven by Syria and Yemen. In both cases, the passthrough rate
of 0.7 adopted mitigates the projected changes. A similar picture is found for countries such as
Myanmar and Cabo Verde. For other countries, such as Tajikistan and Bhutan, the changes are
driven by a switch from HFCE to GDP. In both cases, the trends with the new method appear
more stable because the HFCE data for those countries were very volatile.


                                                                                                                  17
                        Figure 13: Implications for global and regional poverty
                    (a) Global poverty                             (b) Regional poverty




Note: The figure compares the prior line-up rule with the new rule. The figure uses the spring 2021 vintage of data. The
$1.90 poverty line with 2011 PPPs is used given that the $2.15 line was not yet adopted at this point.




                            Figure 14: Implications for country-level poverty




   Note: The figure compares the prior line-up rule with the new rule. The figure uses the spring 2021 vintage of
   data. It includes the 15 countries with the largest changes to poverty rates.




                                                                                                                     18
    7. Discussion and conclusion
This paper considers how best to extrapolate welfare vectors only using national account
measures and assuming distributional neutrality. It addresses (i) which national account variable
should be used to extrapolate forward in time, (ii) what fraction of growth in national accounts
should be passed through to growth in welfare, and (iii) how this varies for different cases.

We argue that under these conditions, a well-performing method is to use growth in HFCE for
upper middle-income and high-income countries, and growth in GDP for low and lower-middle
income countries. For income aggregates, all growth in GDP and HFCE is passed through, while
for consumption aggregates only 70% is passed through. We show that these results have few
implications for poverty estimates at the global level but do entail large difference in estimated
poverty in some countries. Notably, the switch to using GDP for low- and lower-middle income
countries makes the poverty trends less erratic over time for some countries. This is likely due to
large levels of uncertainty in measured HFCE for some poor countries.

The proposed changes can be rationalized by both economic theory and measurement issues. A
challenge with our largely empirically driven approach is that it is difficult to distinguish these
two concerns. We speculate that three factors can explain a passthrough rate for consumption of
less than one: First, consumption aggregates may exclude or underestimate some items that make
up an increasingly large share of household budgets as countries develop, such as health
expenditures and food eaten away from home. Second, the share of consumption devoted to non-
food items tends to increase as economies grow, and longer recall periods for non-food items may
lead to greater underreporting of non-food expenditures than food expenditures. Finally, a
passthrough rate less than one from GDP to consumption is consistent with an increasing savings
rate as countries develop, which in turn is consistent with empirical evidence (Gross et al.
2020, Drescher et al. 2020, and Crozier and Zavaleta 2022).2

We have not discussed what happens to the prediction error the longer one extrapolates. For very
long spells, it is plausible that the old survey data carry little relevant information, and it may be
better to in part or fully predict without using the survey data. For example, the latter could be
done by assigning the country a poverty rate based on a cross-country regression of poverty rates
on country characteristics such as GDP/capita, demographic variables, or indicators such as built-
up area derived from satellite imagery. How best to combine estimates obtained through
intertemporal extrapolation of the type considered here with cross-country predictions is an
interesting question for future research.




2This explanation, though, cannot explain why the passthrough rate is less than one when using HFCE instead of GDP,
because HFCE excludes savings by construction. Perhaps HFCE is measured with more noise than GDP.

                                                                                                                19
References
Angrist, Noam, Pinelopi Koujianou Goldberg, and Dean Jolliffe. 2021."Why Is Growth in Developing
        Countries So Hard to Measure?" Journal of Economic Perspectives 35(3): 215-42.

Bates, S., Hastie, T., & Tibshirani, R. 2023. “Cross-Validation: What Does It Estimate and How Well Does It
         Do It?�? Journal of the American Statistical Association: 1-12.

Crozier S. L., F. B. Zavaleta. 2022. “The Marginal Propensity to Consume of 2020 COVID-19 Stimulus
        Payments in Peru.�? International Journal of Economics and Finance 14 (3): 115–15

Deaton, Angus. 2005. "Measuring Poverty in a Growing World (or Measuring Growth in a Poor
       World)." Review of Economics and Statistics 87 (1): 1-19.

Drescher K., Fessler P., Lindner P.. 2020. “Helicopter Money in Europe: New Evidence on the Marginal
       Propensity to Consume across European Households.�? Economics Letters 195(October): 109416.

Gross T., M. J. Notowidigdo, J. Wang. 2020. “The Marginal Propensity to Consume over the Business
       Cycle.�? American Economic Journal: Macroeconomics 12 (2): 351–84

Lakner, Christoph, Daniel Gerszon Mahler, Mario Negre, and Espen Beer Prydz. 2022. "How Much Does
        Reducing Inequality Matter for Global Poverty?" The Journal of Economic Inequality 20(3): 559-585.

Mahler, Daniel Gerszon, R. Andrés Castañeda Aguilar, and David Newhouse. 2022. "Nowcasting Global
        Poverty." The World Bank Economic Review 36(4): 835-856.

Prydz, Espen Beer, Dean M. Jolliffe, Christoph Lakner, Daniel Gerszon Mahler, and Prem Sangraula. 2019.
        "National Accounts Data Used in Global Poverty Measurement." Global Poverty Monitoring
        Technical Note 8.

Prydz, Espen Beer, Dean Jolliffe, and Umar Serajuddin. 2022. "Disparities in Assessments of Living
       Standards Using National Accounts and Household Surveys." Review of Income and Wealth 68: S385-
       S420.

Ravallion, Martin. 2003. "Measuring Aggregate Welfare in Developing Countries: How Well Do National
        Accounts and Surveys Agree?" Review of Economics and Statistics 85(3): 645-652.

Roberts, David R., Volker Bahn, Simone Ciuti, Mark S. Boyce, Jane Elith, Gurutzeta Guillera�?Arroita,
        Severin Hauenstein et al. 2017. "Cross�?Validation Strategies for Data with Temporal, Spatial,
        Hierarchical, or Phylogenetic Structure." Ecography 40(8): 913-929.

Wollburg, Philip, Stephane Hallegatte, and Daniel Gerszon Mahler. 2023. “Ending Extreme Poverty Has a
       Negligible Impact on Global Greenhouse Gas Emissions.�? Nature 623: 982-986.

Walsh, Brian, and Ana De Menezes. 2023. “Historical Validation for Macrosocial Metrics.�? Mimeo.

World Bank. 2016. Poverty and Shared Prosperity 2016: Taking on Inequality. The World Bank.

World Bank. 2023. Macro Poverty Outlook. https://www.worldbank.org/en/publication/macro-poverty-
       outlook. The World Bank.




                                                                                                        20
Appendix A: Additional figures and tables
         Figure A.1: Errors using all possible spells rather than only adjacent spells
  a. By national accounts measure and covariate        b. By use of GDP for other subgroups




Note: The legend refers to the national accounts measure used.   Note: GDP shows the error if only using GDP data, and similarly
X-axis categories are sorted by average error across national    for HFCE. LIC = Low-income, LMIC= Lower middle-income,
accounts measures. “No covariate�? refers to running the          SSA = Sub-Saharan Africa.
regression without a covariate and estimating one unique
passthrough for all observations. The error with FCE and
passthrough=1 is larger than the maximum value of the y-axis.

                       Figure A.2: Errors using spells from 2000 onwards
  a. By national accounts measure and covariate         b. By use of GDP for other subgroups




Note: The legend refers to the national accounts measure used.   Note: GDP shows the error if only using GDP data, and similarly
X-axis categories are sorted by average error across national    for HFCE. LIC = Low-income, LMIC= Lower middle-income,
accounts measures. “No covariate�? refers to running the          SSA = Sub-Saharan Africa.
regression without a covariate and estimating one unique
passthrough for all observations. The error with FCE and
passthrough=1 is larger than the maximum value of the y-axis.




                                                                                                                              21
                  Figure A.3: Comparison of errors with and without intercept(s)
               a. With overall intercept                    b. With group-based intercept




Note: The left figure compares errors without an intercept to errors with an intercept. The right figure adds group-
specific intercepts as well (for example, intercepts by region, datatype, income group – whichever group is used to
calculate passthrough rates). Each dot is the error from a particular model, that is, a particular national accounts
measure combined with a particular group to distinguish passthrough rates by.


                           Figure A.4: Errors using temporal cross-validation

   a. By national accounts measure and covariate                    b. By use of GDP for other subgroups




 Note: The legend refers to the national accounts measure     Note: GDP shows the error if only using GDP data,
 used. X-axis categories are sorted by average error across   and similarly for HFCE. LIC = Low-income, LMIC=
 national accounts measures. “No covariate�? refers to         Lower middle-income, SSA = Sub-Saharan Africa
 running the regression without a covariate and
 estimating one unique passthrough for all observations.
 The error with FCE and passthrough=1 is larger than the
 maximum value of the y-axis.


                                                                                                                 22
                            Figure A.5: Errors using the mean squared error
  a. By national accounts measure and covariate                    b. By use of GDP for other subgroups




Note: The legend refers to the national accounts measure     Note: GDP shows the error if only using GDP data, and
used. X-axis categories are sorted by average error across   similarly for HFCE. LIC = Low-income, LMIC= Lower
national accounts measures. “No covariate�? refers to         middle-income, SSA = Sub-Saharan Africa
running the regression without a covariate and
estimating one unique passthrough for all observations.

                               Figure A.6: Errors using quantile regression
  a. By national accounts measure and covariate                    b. By use of GDP for other subgroups




Note: The legend refers to the national accounts measure     Note: GDP shows the error if only using GDP data, and
used. The error with FCE and passthrough=1 is larger         similarly for HFCE. LIC = Low-income, LMIC= Lower
than the maximum value of the y-axis. X-axis categories      middle-income, SSA = Sub-Saharan Africa
are sorted by average error across national accounts
measures. “No covariate�? refers to running the
regression without a covariate and estimating one
unique passthrough for all observations.

                                                                                                                23
Figure A.7: Errors by use of GDP or HFCE based on statistical capacity proxy




          Note: The blue line shows the error if spells associated with a sufficiently
          low statistical capacity uses GDP while spells with a high statistical
          capacity uses HFCE. We proxy statistical capacity with the share of years
          in which an international poverty estimate exists within a five-year
          window on each side of a reference year. If a country conducted a
          household survey in 2000 and 2007 and no other years, then in 2000, it
          would have a statistical capacity score of 1/11, in 2005 of 2/11. We do this
          to check if there is a level of statistical capacity below which it makes sense
          to use GDP growth, in the sense that it would give a lower error than our
          main specification. We estimate errors if any spell with a statistical capacity
          score above x uses HFCE and any score below x uses GDP. Regardless of
          which cut-off we use, we do not reduce the error beyond the LIC/LMIC
          cut-off (dashed black horizontal line).

                            Table A.1: Regression output

                                 Consumption                        Income
                           LIC/LMIC UMIC/HIC               LIC/LMIC UMIC/HIC
                              GDP        HFCE                 GDP         HFCE
        Coefficient            0.68       0.73                 1.34        1.02
        Standard error       (0.053)    (0.084)              (0.195)     (0.085)
        R2                     0.50       0.46                 0.25        0.47
        Observations           373        160                  169         739
     Note: Results from regressing annualized growth in mean GDP or HFCE per capita
     on annualized growth in income or consumption per capita. Spells for LICs and
     LMICs use GDP while spells for UMICs and HICs use HFCE. Regressions are run
     without an intercept. Only comparable spells using adjacent surveys are used.
     Standard errors are clustered at the country level.




                                                                                            24