Policy Research Working Paper                           10231




       Outlier Detection for Welfare Analysis
                                     Federico Belotti
                                     Giulia Mancini
                                     Giovanni Vecchi




Poverty and Equity Global Practice
November 2022
Policy Research Working Paper 10231


  Abstract
  Extreme values are common in survey data and represent                             provides useful diagnostic tools. The output of outdetect
  a recurring threat to the reliability of both poverty and                          compares statisticsâ€”with focus on inequality and poverty
  inequality estimates. The adoption of a consistent criterion                       measuresâ€”obtained before and after the exclusion of out-
  for outlier detection is useful in many practical applica-                         liers. Finally, the paper carries out an extensive sensitivity
  tions, particularly when international and intertemporal                           exercise, where the same outlier detection method is applied
  comparisons are involved. This paper discusses a simple,                           consistently to per capita expenditure across more than 30
  univariate detection procedure to flag outliers in the dis-                        household budget surveys. The results are clear-cut and
  tribution of any variable of interest. It presents outdetect,                      provide a sense of the influence of extreme values on poverty
  a Stata command that implements the procedure and                                  and inequality estimates.




 This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at gvecchi@worldbank.org.




          The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
          issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
          names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
          of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
          its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                        Produced by the Research Support Team
                         Outlier Detection for Welfare Analysis



                    Federico Belotti, Giulia Mancini, and Giovanni Vecchi1




Keywords: outliers, extreme values, inequality, poverty, incremental trimming curve.

JEL codes: I32, C87, D60.



1
    Federico Belotti and Giovanni Vecchi are Professors at the University of Rome â€œTor Vergataâ€,
Department of Economics and Finance; Giulia Mancini is a Research Fellow at the University of
Sassari, Italy, Department of Economics and Business. All three authors are consultants to the World
Bank. This paper was prepared under the supervision of the Poverty & Equity Global Practice
Global Solutions Group 1: Data for Policy, and benefitted from the oversight of Utz Pape (World
Bank). We thank Silvia Redaelli and Alexandra Jarotschkin for useful comments shared while
reviewing the paper. Any errors and omissions remain ours.
1      Introduction

Outliers â€“ observations that â€œappear to deviate markedly from other members of the sample
in which they occurâ€ (Grubbs, 1969) â€“ are omnipresent in real-world data sets, and are
almost always a cause for concern. If the â€˜abnormalityâ€™ of extreme values is the result of
measurement error, rather than a genuine manifestation of the variability of the data, then
including outliers in calculations can cause serious bias to many statistics of interest.

Given the stakes, it is no surprise that the task of dealing with outliers often takes up
considerable time and effort on the part of the analyst (Mancini and Vecchi, 2022). Even
the very foundation of any strategy to handle outliers â€“ identifying which observations
qualify as â€˜extremeâ€™ â€“ is far from trivial. Underlying a number of popular outlier detection
methods is a basic algorithm: the analyst defines some concept of distance from the bulk
of the distribution, which then allows to flag as â€˜extremeâ€™ any observation for which this
distance is larger than a certain threshold. This principle is as simple as it is elusive: in
practice, it is declined in a great many different ways, and the criteria used are not always
well defined. This poses a number of problems: â€˜good practicesâ€™ fail to consolidate, and
the steps taken to deal with outliers are often difficult to replicate, which is an issue for
scientific rigor as well as for comparability of data analysis.

In this paper we present outdetect, a tool designed to help the analyst i) identify
extreme values in a univariate distribution based on a transparent procedure, and ii) gauge
their potential impact on selected statistics of interest.2 While the focus is on distributional
analysis (e.g. inequality and poverty estimation â€“ see Cowell and Flachaire, 2007, 2015),
the use of outdetect naturally extends to any situation where the presence of â€˜too largeâ€™
or â€˜too smallâ€™ values in a distribution is an issue.

The algorithm employed by outdetect relies on the normalization of the target variable,
and the imposition of cutoffs to define an outlier region. A popular version of this procedure
is to log-transform the distribution, and flag observations that are more than two or three
standard deviations from the mean (e.g., Deaton and Tarozzi, 2005). However,
outdetect allows users to choose from a number of alternative (and more flexible)
normalizing transformations, and to use a number of alternative and statistically robust
measures of location and scale. The use of transformations other than the log allows the
analyst to apply a consistent detection criterion to a wide range of variables, which is


2
    The command outdetect can be installed in Stata by typing ssc install outdetect.




                                               2
especially useful when dealing with the skewed and heavy-tailed distributions that are
commonplace in welfare analysis (e.g. household consumption, income, and wealth). Some
of these techniques are not yet available as stand-alone Stata commands (namely the
transformation proposed in Yeo and Johnson, 2000). Robust location and scale estimators
are useful when the detection rule itself is sensitive to precisely those extreme values it is
designed to flag (Davies and Gather, 1993).3

outdetect does not offer an automated way to treat outliers â€“ to replace or drop them â€“
and deliberately so. Any alteration of the raw data is potentially problematic, and must be
informed by careful investigation of the nature of extreme values. Instead, outdetect
focuses on sensitivity: it produces an array of statistics using both the â€˜rawâ€™ distribution
(the data as they are) and its outlier-free counterpart (a distribution where observations that
are flagged as outliers have been excluded from all calculations). Borrowing from the
analytical framework developed by Hampel (1974) and Hampel, Ronchetti, Rousseeuw
and Stahel (1986), it also produces two types of diagnostic plot, the Incremental Trimming
Curve (ITC) which plots the value of a statistic of interest against the proportion of extreme
values that are trimmed from the sample, and the High-influence Observation Curve (HOC)
first suggested by Cowell and Flachaire (2007), which describes the effect of any one
(extreme) observation on the estimated value of the statistics. These instruments allow the
analyst to assess the influence of extreme values on results, and inform next steps.

Of particular note is the fact that outdetect allows for the use of complex survey
settings, so that the aforementioned comparisons can be performed using population
statistics.

In the last part of the paper we use outdetect to investigate the influence of extreme
values on key inequality and poverty statistics, computed on the basis of household budget
survey data. We use a collection of data sets from the Rural Livelihoods Information
System (RuLIS), a joint initiative of the Food and Agriculture Organization (FAO), the
World Bank, and the International Fund for Agricultural Development (IFAD). We find
that the share of observations flagged as outliers of per capita consumption is 0.8% on
average, never exceeding 2.5%; that the presence or inclusion or exclusion of these
observations from calculations causes differences of as many as 6 percentage points for the

3
    Commands bacon (Billor, Hadi and Velleman, 2000; Weber, 2010), and gboxplot (Bruffaerts, Verardi
and Vermandele, 2014; Verardi and Vermandele, 2018) are also available to detect outliers with Stata, though
their functionalities and range of application are adjacent, rather than overlapping, with outdetect.




                                                     3
Gini index, circa 10 for the Mean Log Deviation and the Atkinson index, and 29 for the
Theil index. Poverty indices are found to be less sensitive. While these findings are, of
course, specific to the surveys that were included in the exercise, they give empirical
support to more general considerations, with important implications for the practice of
applied welfare measurement, in particular for welfare comparisons: a common framework
for detecting and treating extreme values is essential for both international comparisons
and within-country time trends.

The paper is organized as follows: section 2 illustrates the method for outlier detection that
is used in the rest of the paper; section 3 presents the command outdetect; section 4
shows international evidence on the sensitivity of poverty and inequality estimates to
extreme values of welfare indicators; section 5 offers some concluding remarks.



2    Outlier detection

In this section the variable of interest â€“ the target for outlier detection â€“ will be denoted by
í µí±¥ , and its probability density function by í µí±“(í µí±¥). For expositional simplicity, we shall refer
to í µí±¥ as â€˜consumptionâ€™, but nothing prevents one from thinking of í µí±¥ as standing for price,
unit value, quantity, wage, income, any expenditure component, or any other continuous
variable whose extreme values are seen as potentially problematic.

How exactly should one identify outliers in the distribution of í µí±¥ ? When does a high or low
value of consumption qualify as â€˜extremeâ€™, that is, â€˜too far awayâ€™ from the bulk of the
distribution, so much as to arise suspicion as to whether it is genuine?

Out of the many criteria proposed in the literature, outdetect uses one that is based on
the construction of an outlier region with reference to í µí±“(í µí±¥) (Davies and Gather, 1993;
Gather and Becker, 1997). If the distribution of interest is known â€“ for example, if í µí±“(í µí±¥) is
Normal â€“ then one can consider an observation to be an outlier if it falls into a range of
values that occur with arbitrarily low probability. An observation í µí±¥í µí±– (í µí±– = 1, â€¦ , í µí±›) falling
into a range defined in this way could conceivably be produced by the theoretical
distribution í µí±“(í µí±¥), but that would be a rare occurrence, making í µí±¥í µí±– an extreme value, or an
outlier. A conventional application of this criterion identifies the bounds of the outlier
region for a Normal distribution as the mean, í µí¼‡, plus or minus three times the standard
deviation, í µí¼Ž (each tail region defined in this way has a probability of about 1%).




                                               4
There are two issues with the application of this procedure to most situations. First, the
distribution: í µí±“(í µí±¥) is not known, and certainly the empirical distribution of consumption,
and most other variable of interest to welfare analysts, is not Normal. Rather, it is typically
asymmetric and heavy-tailed compared to the Gaussian distribution. However, if one can
transform í µí±¥ into something that is approximately Normal, the algorithm can still be applied:
observations that are flagged in the transformed distribution are also outliers of the
untransformed distribution. A second, subtler problem is related to the definition of the
outlier detection region in terms of mean and standard deviation of the distribution. The
empirical mean and standard deviation are not robust statistics, i.e. they are vulnerable
precisely to the outliers one is concerned about.

These considerations suggest a more general outlier detection strategy, which is
implemented by outdetect. It can be broken down in two steps:

    1) transform the variable of interest to induce normality in its empirical pdf;
    2) set robust thresholds to identify the outlier region.

To accomplish the first step, one needs to select an appropriate normalizing transformation
í µí±”(âˆ™). We shall denote the transformed (normalized) variable as í µí±¦ = í µí±”(í µí±¥). Section 2.1
elaborates on the transformations available in outdetect. To accomplish the second
step, one needs to pick a measure of central tendency, or location (such as the mean, or a
robust alternative), and a measure of dispersion, or scale (such as the standard deviation,
or a robust alternative). Section 2.2 elaborates on the measures of location and scale that
are available in outdetect.

The rule used to detect outliers can be expressed conveniently in terms of the z-score of í µí±¦,
defined as í µí±§ = (í µí±¦ âˆ’ í µí¼‡)/í µí¼Ž. Here, the letters í µí¼‡ and í µí¼Ž indicate the mean and standard
deviation of í µí±¦, or any robust alternatives (for convenience, we shall continue to refer to í µí±§
as a z-score, albeit a â€˜robustâ€™ one, when we depart from the mean and standard deviation
in favor of robust measures). The goal is to choose a conventional value zÎ± to define an
outlier region over the distribution of í µí±§, as follows:
                                                    í µí±¦í µí±– âˆ’ í µí¼‡
                                         í µí±§í µí±– = |             | > í µí±§í µí»¼                       (1)
                                                         í µí¼Ž



According to equation (1), an observation of the variable of interest, í µí±¥í µí±– , is flagged as an
outlier if í µí±§í µí±– â€“ the (robust) z-score associated with the transformation í µí±¦í µí±– = í µí±”(í µí±¥í µí±– ) â€“ exceeds,




                                                      5
in absolute value, the í µí±§í µí»¼ quantile of the distribution of z-scores.4 Equivalently, the value í µí±¥í µí±–
is flagged as an outlier if its transformation, í µí±¦í µí±– , falls outside the following region:

                                           [í µí¼‡ âˆ’ í µí±§í µí»¼ Ã— í µí¼Ž , í µí¼‡ + í µí±§í µí»¼ Ã— í µí¼Ž]                               (2)



As for the choice of í µí±§í µí»¼ , the threshold for the outlier region in terms of the (robust) z-score,
a degree of arbitrariness is inevitable: í µí±§í µí»¼ is not tied to any statistical requirement, beyond
the need to identify a â€˜low-probabilityâ€™ outlier region. The choice of the value 3 is
customary, but smaller and larger values (2.5, 3.5, or 4) are not uncommon. Higher values
of í µí±§í µí»¼ will shrink the outlier region, lower the probability thresholds for the tails of the
distribution of z-scores, and therefore flag fewer outliers.

Once outliers are detected according to the rule described by equations (1) and (2),
outdetect produces a table that shows the numbers of observations that were flagged,
both at the â€˜topâ€™ and at the â€˜bottomâ€™ (i.e. large and small extreme values, respectively), and
their proportion over the total number of observations. The output also includes a set of
summary statistics and diagnostic tools, designed to inform the user about the sensitivity
of the statistics of interest to the presence of outliers. These are described in Section 2.3.



2.1       Normalizing transformations

There is no lack of choice of transformations for achieving approximate normality in the
literature. During the early 2000s, in a contribution to the â€˜great Indian poverty debateâ€™,
Deaton and Tarozzi (2005) have explored the use of the natural logarithm as a normalizing
transformation for unit values of commodities consumed by households. This is the
simplest of transformations â€“ the logarithm â€œsqueezesâ€ large values more than small ones,
so that the logarithm of a skewed distribution becomes more symmetrical, and closer to a
Normal. Dupriez (2007) followed another route, and adopted the Box-Cox transformation
(Box and Cox, 1964), which includes the log transformation as a special case. Other useful
transformations are available, such as those proposed in Yeo and Johnson (2000), and
Friedline et al. (2014), among others. Table 1 lists the transformations that are implemented

4
    The use of the absolute value in equation (1) allows for the detection of both â€˜topâ€™ and â€˜bottomâ€™ outliers (both
â€˜too largeâ€™ and â€˜too smallâ€™), and can be easily replaced by one-sided versions when the focus is only on one tail
of the distribution.




                                                          6
by outdetect, with í µí±¥ indicating the target variable (say, consumption) and í µí±¦ indicating
the transformed variable (say the log-consumption).



Table 1. Transformations to approximate normality
 Description          Formula for y                                                             Source
 Natural log          í µí±¦ = ln (í µí±¥)                                                                  1

 Log base 10          í µí±¦ = log10 (í µí±¥ + í µí±Ž) with í µí±Ž = max[0, âˆ’(min(í µí±¥) âˆ’ 0.001)]                     2

                            í µí±¥ í µí¼† âˆ’ 1
 Box-Cox              í µí±¦ = { í µí¼†         if í µí¼† â‰  0                                                   3
                             ln(í µí±¥)   if í µí¼† = 0
                                  í µí±¥ í µí¼† âˆ’ 1
                                                if í µí±¥ â‰¥ 0, í µí¼† â‰  0
                                       í µí¼†
                                log(í µí±¥ + 1)     if í µí±¥ â‰¥ 0, í µí¼† = 0
 Yeo-Johnson          í µí±¦ =                                                                          4
                            âˆ’[(âˆ’í µí±¥ + 1)2âˆ’í µí¼† âˆ’ 1]
                                                     if í µí±¥ < 0, í µí¼† â‰  2
                                   2 âˆ’ í µí¼†
                           { âˆ’log(âˆ’í µí±¥ + 1)           if í µí±¥ < 0, í µí¼† = 2
 Inverse
 hyperbolic sine      y = ln (í µí±¥ + âˆší µí±¥ 2 + 1 )                                                      5

 Square root          í µí±¦ = âˆší µí±¥                                                                      6

Sources: [1,2 and 6] any math textbook; [3] Box and Cox (1964); [4] Yeo and Johnson (2000); [5] Friedline et
al. (2014).


Each of the transformations in Table 1 has properties that fit different needs. For example,
while the natural log may only be applied to strictly positive variables, log10 (í µí±¥ + í µí±Ž) takes
care of negative and zero values, too. Similarly, while the use of Box-Cox is limited to
strictly positive variables, the Yeo-Johnson and the inverse hyperbolic transformations
apply to all variables and perform relatively better in the presence of highly skewed
distributions (which makes them particularly suitable for the analysis household wealth
data). A general criterion to select the â€œbestâ€ transformation is goodness -of-fit: the
transformation that provides the best approximation to a Normal distribution for the
specific variable and data set in use will be the best choice for that particular context.
outdetect allows the user to maximize goodness-of-fit relying on the Pearson chi-
squared test (Snedecor and Cochran, 1989). As pointed out by Peterson and Cavanaugh
(2019), the Pearson statistic, í µí±ƒ, divided by its degrees of freedom, df, converges to 1 when
the data approaches a Gaussian distribution: í µí±ƒâ„df can be interpreted as a measure of how
close a distribution is to normality, and used to rank transformations according to how
successful they are in normalizing the data.




                                                     7
2.2       Measures of location and scale

When í µí¼‡ is the mean and í µí¼Ž is the standard deviation of the normalized consumption
distribution í µí±¦, the outlier detection rule in equations (1) and (2) itself is sensitive to the
presence of outliers. Davies and Gather (1993) suggest the use of robust location and scale
measures to contrast this problem. There is little dispute on the use of the median instead
of the mean: the median is simple to calculate, and, unlike the mean, provides a greater
degree of resilience to the presence of outliers.5 On the other hand, the choice of a robust
scale estimator appears to be somewhat debated. Table 2 shows a selection of candidate
estimators for the í µí¼Ž parameter in equations (1) and (2).


Table 2. Robust scale estimators for the transformed variable t

    Estimator                                                 í µí½ˆ                                      Source

    IQR                 í µí°¼í µí±„í µí±… = í µí±„3 âˆ’ í µí±„1                                                               1

    MAD                 í µí±€í µí°´í µí°· = 1.4826 Ã— í µí±ší µí±’í µí±‘|í µí±¦í µí±– âˆ’ í µí±ší µí±’í µí±‘[í µí±¦]|                                      2

    S                   í µí±† = 1.1926 Ã— í µí±ší µí±’í µí±‘í µí±– {í µí±ší µí±’í µí±‘í µí±— |í µí±¦í µí±– âˆ’ í µí±¦í µí±— |}                                 3

    Q                   í µí±„ = 2.2219 Ã— (|í µí±¦í µí±– âˆ’ í µí±¦í µí±— |; í µí±– < í µí±—)(í µí±˜)                                      4

Note: í µí±„3 and í µí±„1 denote the 75th and 25th percentiles, respectively; í µí±¡ is the transformation of the target variable
í µí±¥. Sources: [1] any statistics textbook, [2] Hampel (1974); [3, 4] Rousseeuw and Croux (1993). The use of (í µí±˜)
as a subscript indicates that the data have been sorted in increasing order: given a sample of í µí±› observation, the
í µí±˜ -th order statistic of the sample is its í µí±˜ -th smallest value.



An outlier detection algorithm based on a z-score with the interquartile range (IQR) as the
denominator has been experimented with in Dupriez (2007), for instance. Hampel (1974)
suggested the use of the median absolute deviation (MAD), defined in row 2 of Table 2:
the MAD is the median value of the distance between the transformed expenditure of each
household (í µí±¦í µí±– ) and the center of the distribution, as estimated by the median of the
transformed distribution (í µí±ší µí±’í µí±‘[í µí±¦]). Under the assumption that í µí±¦ is close enough to the


5   The price for the robustness of the median is a loss of efficiency with respect to the mean (Rousseeuw and
Hubert 2017: 2), but in large samples the main, if not the only, property that matters is their bias â€“ efficiency
is only relevant for small samples. Note that we are assuming unweighted estimators. The weighted median
does not, in fact, qualify as a robust statistic (Filzmoser, Gussenbauer and Templ 2016: 15).




                                                              8
standard Normal distribution, then the coefficient 1.4826 in the formula is required for
making the MAD an unbiased estimator of the standard deviation. Rousseeuw and Croux
(1993) introduced the S and Q estimators, described in rows 3 and 4 of Table 2, which are
shown to have the same 50% maximal breakdown point of MAD, but be more efficient
than MAD when í µí±“(í µí±¥)~í µí±(í µí¼‡, í µí¼Ž). Moreover, S and Q are location-free scale estimators
which makes them suitable to deal with skewed distributions, the case of highest interest
to welfare analysts, thanks to their location-free nature.

Both S and Q are â€œsturdyâ€ estimators of the standard deviation, but both estimators are
computationally demanding.6 Nevertheless, efficient algorithms proposed by Croux and
Rousseeuw (1992) make their burden manageable. Given that its core is written in Mata,
outdetect takes full advantage of these improvements, speeding up the computation
considerably.7

The availability of alternative estimators for the scale parameter of í µí±¦ begs the same
question that comes up for normalizing transformations: which one should the analyst pick?
No alternative is clearly superior according to the literature. Rousseeuw and Croux (1993)
show that S and Q have desirable statistical properties, and once the burden associated with
their computation is reduced â€“ outdetect is fast, due its reliance on Mata â€“ they turn
out to work well in most practical applications. For practical purposes, the analyst may
refer to the following ranking: í µí±„ â‰½ í µí±† â‰» í µí±€í µí°´í µí°· â‰» í µí°¼í µí±„í µí±…, where we use the sign â€œâ‰»â€ to mean
â€˜is preferred toâ€™ and â€œâ‰½ for â€˜weakly preferred toâ€™.8



6
    To calculate the S estimator, for example, one needs to compute, for each household i in the sample, the
expression {í µí±ší µí±’í µí±‘í µí±— |í µí±¦í µí±– âˆ’ í µí±¦í µí±— |} for j=1, â€¦, n. This gives n numbers, the median of which gives the estimate S (the
number 1.1926 in the formula is required for making S a consistent estimator of the standard deviation under
the assumption of normality). Similarly, the Q estimator (row 4 in Table 2) is obtained by sorting all pairwise
distances |í µí±¦í µí±– âˆ’ í µí±¦í µí±— | and taking the value that occupies the k-th position in the ranking, with k being roughly half
the number of observations (the number 2.2219 in the formula is required for making Q a consistent estimator
of the standard deviation under the assumption of normality).
7
    In order to compute the Q statistic, outdetect adapts the code written by Ben Jann for the robstat
command (Jann, Verardi and Vermandele, 2018).
8
    The final choice between S and Q is subjective, because their advantages and disadvantages are not easily
compared (Rousseeuw and Croux, 1992). Indeed, while they share the nature of location-free robust scale
estimator with a 50% breakdown point, Q is slightly preferred in terms of efficiency at the Gaussian model and,
unlike S, it has a continuous influence function. On the other hand, S requires only half as much computation
time and storage as Q.




                                                          9
2.3    Sensitivity of statistics to the presence of outliers

Once outliers have been identified, outdetect assesses the sensitivity of selected
statistics to whether or not those same outliers are included in calculations. This is done by
taking two heuristic approaches (Hampel 1974). On the one hand, the output of
outdetect reports comparisons between estimates based on â€˜rawâ€™ data (that is, inclusive
of all values) and data where all observations classified as outliers are excluded from
calculations. On the other hand, outdetect includes an option to draw the Incremental
Trimming Curve (ITC), which shows how the value of a statistic of choice changes as the
largest or smallest observations in the data set are consecutively excluded from
calculations. The ITC is inspired by the (finite-sample versions of) the empirical influence
curve (Hampel 1974, Hampel, Ronchetti, Rousseeuw and Stahel 1986, ch. 2), and Tukeyâ€™s
sensitivity curve (Huber 2002), and similar to the curve proposed by Cowell and Flachaire
(2007) to identify influential observations.

A first set of statistics that appear in the output are standard descriptive statistics: the mean
and the median, the standard deviation (abbreviated with SD), the coefficient of variation
(CV), and the interquartile range (IQR). A second set of statistics focuses on inequality
measures and Foster, Greer, and Thorbecke (1984) poverty measures.

As for the ITC, it is defined as follows:

                                Ì‚ (í µí±¥(í µí±–) ) = í µí°½
                              í µí°¼í µí±‡í µí°¶           Ì‚(í µí±–) í µí±“í µí±œí µí±Ÿ í µí±– = 0, 1, â€¦ , í µí°¼

       Ì‚(í µí±–) denotes the statistics of interest calculated on the distribution of í µí±¥ , after sorting
where í µí°½
                                                                     Ì‚ denotes, say, the Gini
the values of í µí±¥ and excluding the í µí±– -th cumulated observation. If í µí°½
             Ì‚(í µí±–) is the Gini index for í µí±¥ , calculated leaving out of the sample the first i
index, then í µí°½
                                    Ì‚, which corresponds to the case when no extreme value
                            Ì‚(0) = í µí°½
observations. If i=0, then í µí°½
is discarded, and the ITC returns the Gini index estimated on the raw data set. If i=1, then
 Ì‚(1) is the value of the Gini index obtained after discarding the first extreme value.
í µí°½
            Ì‚(2) corresponds to the Gini index calculated when the two most extreme values
Similarly, í µí°½
have been discarded from the data set. Note that data can be thought of as sorted either
ascending or descending, so that outdetect produces two ITCs, one where the impact
of â€˜too smallâ€™ values is assessed, the other focused on the impact of â€˜too largeâ€™ values.
Overall, the gradient of the ITC curves provides a neat indication of the extent to which the
chosen statistics is affected by the presence of extreme values: the steeper the ITC, the
higher the impact. The ITC is further discussed and illustrated in sections 3 and 4.




                                                   10
3      How to use the outdetect command

This section is devoted to illustrating the typical use of outdetect. After explaining the
syntax, we provide examples of the command â€œin actionâ€, using a sample of 12,447
households from Malawiâ€™s Fourth Integrated Household Survey (2016-2017).9

outdetect can be installed from the Statistical Software Components Archive by typing
ssc install outdetect. Stata 15.1 is the earliest version that can run outdetect.
The core of the command is written entirely in Mata to minimize computation time. The
command handles complex survey settings automatically, when svyset is used to declare
the sampling design features of the data to Stata (see help svyset). The use of
pweights is also allowed. The general syntax of the command is as follows:

                    outdetect varname [ if ] [ in ] [ weight ] [, options ]

outdetect identifies extreme values, either â€œtoo smallâ€ or â€œtoo largeâ€ observations, in
the distribution of varname. We shall call these observations bottom outliers and top
outliers, respectively (small values being at the bottom of the distribution of varname,
and large values being at the top). By default, outdetect creates a new variable, _out,
containing numeric codes that flag outliers of varname:

                     0        observation is not an outlier
                     1        observation is a bottom outlier (â€œtoo smallâ€)
                     2        observation is a top outlier (â€œtoo largeâ€)

The output of outdetect reports â€œRawâ€ statistics (computed using varname as is), as
well as â€œTrimmedâ€ statistics (computed using just those observations of varname that are
not flagged as outliers). A full description of all the available options is provided in the
outdetect help file.




3.1      Basic usage

In this example, the target variable is per capita annual household expenditure (here called
pce) in Malawi (monetary amounts are expressed in thousands of Malawian kwacha


9
    The data set is part of the Rural Livelihoods Information System (RuLIS), and is also publicly
available at the World Bank Microdata Library.




                                                 11
(MWK), with 1 USD exchanged against circa 800 MWKs at the time of writing). As a first
step, we load the data into memory, specify survey settings, and summarize pce:



use malawi, clear

summarize pce [aw=weight], detail




The distribution of pce displays the typical features of expenditure distributions from
survey data anywhere: it is highly skewed to the right, heavily leptokurtic, and some values
appear to be abnormally high, as well as, possibly, abnormally low, with respect to the bulk
of observations. The two largest observations in the sample, in particular, are one order of
magnitude above any other large values in the distribution. Similarly, the smallest values
in the sample (corresponding to 20-30 cents of US dollar per day) appear implausibly low.
To gain a better sense of the extent that these extreme values might impact the analysis,
you can issue outdetect using the default syntax (note that while outdetect allows
users to specify weights the same way as other Stata commands, svy-setting the data set
prior to running outdetect is the recommended practice):




                                            12
svyset psu [pweight = weight]

outdetect pce




The outcome of outdetect is organized into three parts. The top panel specifies the
settings of the outlier detection procedure, that is, the ways in which the parameters of
equation (1) (í µí±¦, í µí¼‡, í µí¼Ž, and í µí±§í µí»¼ ), are set, and whether outliers are detected in both tails of the
distribution. When the default syntax is used, outdetect uses the Yeo and Johnson
(2000) transformation. In this example, í µí¼‡ is the median of í µí±¦, í µí¼Ž is the Q estimator, and í µí±§í µí»¼
is equal to 3. Because the z-score in equation (1) is considered in absolute value, outliers
are detected both at the top and bottom of the distribution (i.e. both large and small values).

The middle section of the output summarizes how many observations are flagged as
outliers. The first column (Freq.) reports the frequency: 121 observations in total, 82 at the




                                                 13
top and 39 at the bottom; the second column (Percent) shows that observations flagged as
outliers amount to 0.97 percent of the total sample size, with a prevalence of bottom (0.66
percent) over top (0.31 percent) outliers; the third column (Share) gives the breakdown by
bottom and top outliers. In our example, about 2 out of 3 outliers are â€˜bottom outliersâ€™.
Overall, the information on the distribution of the outliers between the two tails of the
distribution helps analysts to form expectations on the potential instability of their statistics
of interest due to the presence of extreme values. Cowell and Flachaire (2007), for instance,
provide a full account of the behavior of inequality measures in the presence of extreme
values: according to their Result 1, Generalized Entropy indices with coefficient í µí¼ƒ > 1
(e.g. the squared coefficient of variation) are very sensitive to high incomes in the data,
while Result 2 shows that Atkinson measures with í µí¼€ > 1 (where í µí¼€ denotes the inequality
aversion parameter) are very sensitive to low incomes in the data.

The default categorical _out variable flags the two types of outliers in the data set: _out
takes on the value of 0 if the observation is not an outlier, 1 if it is a bottom outlier, and 2
if it is a top outlier. This can be verified by issuing tabulate _out.

The bottom section of the output contains the core results of outdetect: it compares an
array of 16 descriptive statistics obtained â€œwithâ€ and â€œwithoutâ€ outliers. The first column
(â€œRawâ€) uses pce as is, meaning that the statistics are computed using all nonmissing
observations of pce; the second column (â€œTrimmedâ€) excludes all observations flagged as
outliers from calculations. In the case of Malawi, although the incidence of observations
flagged as outliers is small (0.97%), their impact on most indicators is quite significant.10

In certain cases, as for the variable considered here, it can be useful to gauge the sensitivity
of poverty estimates to extreme values, as well. The user can expand the output by
specifying a poverty line, as shown in the following image (the nogen option is used so
the command refrains from creating the default _out variable, which currently already
exists in the data set, given that outdetect has been issued before). The additional
statistics at the bottom of the output indicate that poverty estimates are not as impacted by
the exclusion of extreme values.




10
     The addition of standard errors and tests to assess the statistical significance of differences is a
feature of an upcoming update of outdetect.




                                                    14
outdetect pce, pline(300) nogen




The results displayed so far are, of course, dependent on the settings of the outlier detection
routine, which users may wish to customize. For instance, to reproduce the procedure
originally implemented by Deaton and Tarozzi (2005), they will specify that i) the
normalization of pce be the natural logarithm, ii) the í µí±§-score be computed by subtracting
the mean and dividing by the standard deviation of the log of pce, iii) outliers be detected
only at the top of the distribution, and iv) the threshold marking the outlier region set at 2.5




                                              15
instead of 3 (the default value used by outdetect). The following code implements these
settings:

outdetect     pce,    norm(ln)     zscore(mean      std)    out(top)     alpha(2.5)
replace




The replace option generates a new _out variable, which replaces the existing one.

To apply the best fitting transformation, the user may specify option bestnormalize:

outdetect pce, bestnormalize replace




                                          16
In this case, the â€œbestâ€ normalization turns out to be Box and Cox (1964); however, results
in terms of outliers flagged do not change with respect to those obtained using the default
Yeo and Johnson (2000) transformation since, the latter, is equivalent to the generalized
Box and Cox (1964) [(í µí±¥ í µí¼† + 1) âˆ’ 1]/í µí¼†, for í µí±¥ > âˆ’1, where the shift constant 1 is included.

Finally, the user can specify option excel, to export the table of results in an Excel file
which is saved in the current working directory or in any other specified location:

outdetect pce, excel(demo, replace) replace

In this case, the replace option within parentheses refers to the Excel file, while the one
outside refers once again to the default _out variable.




                                              17
3.2      Diagnostic graphs

The sensitivity of the statistics of interest to the presence of extreme values can also be
assessed independently of the settings of the outlier detection procedure, by producing one
or more diagnostic graphs. Depending on the graph selected, the output shown on the
screen may change.

First is the Incremental Trimming Curves (ITC), defined in section 2.3. For example, the
ITC for a selection of statistics of interest can be plotted by issuing the commands below:

outdetect pce, graph(itc(2: mean))                                       (figure 1, panel a)

outdetect pce, graph(itc(2: gini))                                       (panel b)

outdetect pce, graph(itc(2: h pline(300))                                (panel c)

outdetect pce, graph(itc(2: pg pline(300))                               (panel d)

The syntax specifies that (i) the diagnostic graph to be produced is the ITC, (ii) the curve
should be plotted for the top and bottom 2 percent of observations, and (iii) the statistics of
interest are, the mean, the Gini index, the poverty headcount ratio, and the poverty gap
index.

The results are shown in Figure 1. To clarify the interpretation of the curves, let us take the
top-right panel, showing the ITC for the Gini index. The solid line shows the value of the
Gini index as the largest observations are dropped from the sample, one at a time; the
dashed line shows Gini when the smallest observations are removed, one at a time. The
steeper the curve, the more sensitive Gini is to the presence of extreme values.11

As expected, mean expenditure turns out to be quite sensitive to the presence of large
outliers (much less so for the outliers in the left tail). Similarly, the Gini index behaves
consistently with the analysis of Cowell and Flachaire (2007): it shows a remarkable
stability to bottom outliers, and high sensitivity to top outliers. Finally, both the headcount
and poverty gap indices in panels c and d show an asymmetric response to extreme values:
they are robust (but not totally insensitive) to extremely large values, but sensitive to
bottom outliers. Even more so is the poverty gap squared index (not shown in the figure),
consistently with its analytical properties (Cowell and Victoria Feser 1996b).



11
     Each time an observation is dropped, the weights of the remaining observations are recalibrated
so as to sum up to the entire population.




                                                  18
Figure 1. Incremental Trimming Curves for Malawi, 2017

                    a. Mean                                   b. Gini index (%)




          c. Headcount Poverty (%)                         d. Poverty Gap Index




Note that when ITC graphs are produced, outdetect does not generate the default _out
variable, nor does it show the output described in section 4.1. Instead, the command
generates a table like those shown in figure 1, to facilitate the interpretation of the curve.
The table reports values of the ITC for selected shares of discarded observations. For
instance, if we focus on panel (a), we can interpret the table associated with the ITC for the
mean as follows: when 0% of observations are discarded, the mean of pce is equal to




                                             19
897.88 thousand MWK (per person/year); when 1% of the smallest observations are
discarded, the mean is equal to 900.64 thousand MWK, whereas if 1% of the largest
observations are discarded, the mean is equal to 795.75 thousand MWK; and so on.

Note that users can produce the same graph by specifying that a certain number (not share)
of observations be discarded; they will simply need to add the option abs, for absolute.
The example below produces the ITC for the mean, focusing on the smallest and largest 10
observations in the sample:

outdetect pce, graph(itc(10: mean abs))

One last type of diagnostic plot that is available as part of outdetect has to do with
monitoring the goodness-of-fit of the normalization of the target distribution. This is an
important check because, if the normalization is successful, then the characterization of
outliers as â€œlow-probability observationsâ€, as defined by the tails of a Normal distribution,
will be applied accurately to the original (non-Normal) distribution. The syntax below
produces a quantile-quantile (Q-Q) plot comparing the quantiles of the transformed
distribution of pce, which we indicated by y in Section 2, to those of the inverse Normal
distribution having the same mean and standard deviation as y:




                                             20
outdetect pce, graph(qqplot) nogen


Figure 2. Assessing normalization: the quantile-quantile plot




At the bottom of the QQ-plot, outdetect also shows the p-values from three popular
normality tests: i) the Shapiro-Wilk test (Shapiro and Wilk 1965), ii) the Shapiro-Francia
test (Shapiro and Francia 1972), and iii) the Dâ€™Agostino, Belanger and Dâ€™Agostino (1990)
test. The null hypothesis being tested is that the variable is normally distributed, which is
only rejected by the first of the three tests.



4    The influence of outliers on inequality and poverty measures

In this section, we apply the outlier detection procedure that has been illustrated in section
2 to per capita expenditure in a wide array of countries, taking advantage of the collection
of survey data made available by the Rural Livelihoods Information System (RuLIS). We
use data from 34 of these countries (we excluded four surveys, which were conducted in
2005 or earlier). Our sources are listed in the Appendix.




                                                 21
Figure 3 shows some examples of the distribution of per capita consumption aggregates
based on the â€œrawâ€ data.12 All curves display a few features that are common for many
variables of interest in welfare analysis (consumption expenditure, income, wealth, but also
quantity of items consumed, calorie intake, unit values, and many others): they are skewed
and leptokurtic (fat tails), which may be symptoms of the presence of outliers.




12
     The â€œrawâ€ label is used here for convenience: survey microdata shared by National Statistical
Offices have almost certainly been edited between the end of fieldwork and the time of
dissemination, as is routine â€“ but they are raw, as far as the analyst is concerned. The conversion in
per capita (or adult equivalent) terms is necessary in this context, because values that may appear
â€˜extremeâ€™ on a per-household basis may turn out not to be, once household size is accounted for.




                                                 22
Figure 3. Per capita expenditure (LCU/person/year), untransformed variables
            MALAWI (2017)                            NIGERIA (2012)




         CÃ”TE Dâ€™IVOIRE (2008)                       TANZANIA (2015)




              MALI (2014)                            UGANDA (2016)




            SENEGAL (2011)                           RWANDA (2014)




                                       23
Figure 3 (cont.). Per capita expenditure (LCU/person/year), untransformed variables
                  IRAQ (2012)                                    BANGLADESH (2010)




                  INDIA (2012)                                      PAKISTAN (2014)




                MEXICO (2014)                                         PERU (2015)




                ARMENIA (2013)                                      GEORGIA (2015)




Source: Authorsâ€™ estimates based on data from RuLIS (2021). RuLIS database accessed on July 2021.




                                                  24
Figure 4 shows the incidence of outliers (as a percentage of the total number of observations
in each sample) in the target distributions. There is remarkable heterogeneity in the fraction
of flagged observations across countries; however, this fraction is relatively small: the
average outlier incidence across countries is 0.8%, with a peak of 2.2% in Kyrgyzstan. The
code underlying Figure 4 is the following syntax of outdetect:

svyset [pweight = weight]

outdetect pce, bestnorm

This implies that outliers are detected after transforming the target variable with the best
fitting normalization, using the median and the Q estimator to compute a robust z-score,
and applying a threshold of 3 to the resulting standardized distribution.



Figure 4. Incidence of outliers (% of sample size) of per capita consumption




Because we use a threshold of 3 to flag extreme values, the expected share of outliers
detected â€“ were the normalization of the consumption distribution perfectly successful â€“
should be around 0.3%. The higher percentage of outliers detected is due to excess
skewness of the distributions with respect to a standard normal, which is on par with the




                                             25
distributions displayed in Figure 3. Top outliers (extremely large observations) are more
frequently flagged than bottom outliers (extremely small observations) â€“ the latter are,
however, present, despite the fact that small values typically do not attract as much
attention in welfare analysis.

Besides counting how many observations are flagged, it is important to figure out how
influential these observations are. A simple way to assess the sensitivity of any statistics of
interest to extreme values is to track the value of the statistic â€“ say, the Gini index â€“ when
we progressively drop the largest and smallest observations in the distribution of per capita
consumption, a process we can describe as incremental trimming of the distribution. The
resulting graph, the Incremental Trimming Curve (ITC), is displayed in Figure 5 for
selected statistics.13 The figure uses data from Mexicoâ€™s Encuesta Nacional de Ingresos y
Gastos de los Hogares, conducted in 2014, and reports ITCs for eight different statistics of
interest: 1) average per capita consumption, 2) Gini index, 3) mean logarithmic deviation,
4) Theil index, 5) Atkinson index (with inequality aversion parameter epsilon equal to 2),
6) poverty headcount ratio, 7) poverty gap, and 8) poverty gap squared.

The ITCs in Figure 5 show that inequality statistics can be extremely sensitive to the
presence of extreme values, although to different extents. The Theil index, for instance,
experiences a vertical drop â€“ 45.5% to 36% â€“ when as few as 0.5% of large values, or about
100 observations, are excluded from calculations. The Atkinson index with a relatively
high value for the inequality aversion parameter (epsilon is set equal to 2 in the figure), on
the other hand, is more sensitive to the exclusion of small values. Poverty indices
experience smaller and more linear changes in general when extreme values are dropped.
The case of Mexico is not peculiar: in fact, it is representative of patterns that emerge in
most of the countries examined in this section.




13
     The ITC is defined in Section 2.3.




                                              26
Figure 5. Incremental Trimming Curves (ITCs) for selected statistics, Mexico 2014

     MEAN (EXPENDITURE/PERSON/YEAR)                                  GINI INDEX (%)




          MEAN LOG DEVIATION (%)                                    THEIL INDEX (%)




          ATKINSON INDEX (%) - A(2)                           POVERTY HEADCOUNT RATIO (%)




            POVERTY GAP INDEX (%)                         POVERTY GAP SQUARED INDEX (%)




Source: Authorsâ€™ estimates based on data from RuLIS (2021).




                                                  27
The empirical analysis conducted so far confirms what is known from the theoretical
literature: outliers are, in general, highly influential for most statistics of interest. A more
operational indication of the influence of extreme values on final estimates can be derived
from a simple comparison of the â€œrawâ€ data (â€˜rawâ€™ as defined in footnote 7) to some
counterfactual, â€œwhat ifâ€ scenario. A prima facie approach is to exclude outliers from the
distribution of the target variable â€“ that is, all values flagged by outdetect are â€œrejectedâ€
(to use terminology that is common in the outlier literature), and excluded from the
calculation of final estimates. Using the outlier-free distribution of consumption as a
counterfactual to the raw data amounts to looking at what would happen if the observations
flagged by the algorithm did not occur at all in the distribution. This is compatible with the
view that observations that are flagged as outliers are contaminants of the true distribution
(originated, for instance, from measurement error). This framework, of course, does not
imply that trimming extreme values is the best way to treat them. In fact, the perspective
in the present context is exploratory in nature: the main purpose is that of experimenting
with a limit, hypothetical scenario, one where all outliers flagged are, in fact, the result of
measurement error.




                                              28
Table 3. Difference (in percentage points) between statistics computed before and
after the exclusion of outliers
        Country           Gini         MLD         Theil      Atkinson        H          PG          PG2
 Armenia                   0.3           0.4         0.3         0.8          0.2         0.1         0.1
 Bangladesh                1.2           1.5         2.9         1.6          0.1         0.1         0.1
 Bolivia                   3.7           8.5        12.5        10.1          0.3         0.5         0.6
 Bulgaria                  1.0           1.3         1.3         3.0          0.4         0.4         0.3
 Burkina Faso              0.4           0.5         0.9         0.5          0.1         0.1         0.0
 Cambodia                  0.4           0.5         0.7         0.7          0.2         0.2         0.1
 Cameroon                  1.2           1.9         3.3         1.4          0.0         0.0         0.0
 CÃ´te dâ€™Ivoire             2.2           3.2         5.5         2.9          0.1         0.1         0.1
 Ecuador                   0.9           1.3         2.1         1.4          0.1         0.1         0.1
 Ethiopia                  1.1           1.4         2.5         1.2         -0.1         0.0         0.0
 Georgia                   1.6           2.2         3.5         2.5          0.1         0.2         0.2
 Ghana                     1.2           1.8         3.2         1.6          0.0         0.1         0.1
 Guatemala                 1.0           1.4         2.3         1.6          0.1         0.1         0.1
 India                     1.8           2.6         5.1         2.3          0.1         0.1         0.1
 Iraq                      1.1           1.2         1.8         1.4          0.1         0.1         0.1
 Kyrgyzstan                1.3           1.2         1.4         2.1          0.5         0.3         0.2
 Malawi                    5.0           7.6        28.2         5.7          0.1         0.1         0.1
 Mali                      4.2           7.0        15.5         4.7          0.0         0.1         0.1
 Mexico                    4.8           8.3        13.7         8.9          0.3         0.5         0.4
 Mongolia                  0.6           0.7         0.9         1.6          0.1         0.1         0.1
 Mozambique                5.2           9.0        16.5         9.4          0.7         0.8         0.7
 Nepal                     1.2           1.6         2.7         1.6          0.1         0.1         0.1
 Nicaragua                 0.6           1.0         1.4         1.3          0.2         0.2         0.1
 Niger                     0.7           0.9         1.6         0.9          0.0         0.0         0.0
 Nigeria                   6.0           9.7        27.7         7.4          0.4         0.4         0.3
 Pakistan                  0.7           0.9         1.4         1.0          0.3         0.2         0.1
 Peru                      0.8           1.2         1.8         1.4          0.1         0.1         0.1
 Rwanda                    3.4           5.6        11.5         4.3          0.3         0.3         0.2
 Senegal                   1.3           2.4         3.0         4.0          0.5         0.5         0.5
 Serbia                    1.1           1.3         1.8         1.9          0.3         0.2         0.2
 Sierra Leone              0.5           0.8         0.9         1.3          0.3         0.3         0.2
 Tanzania                  0.8           1.2         1.8         1.1          0.0         0.0         0.0
 Uganda                    2.4           4.1        10.1         3.0          0.2         0.2         0.2
 Vietnam                   2.8           3.8         7.0         3.6          0.2         0.2         0.2
 Average                   1.8           2.9         5.8         2.9          0.2         0.2         0.2
 SD                        1.6           2.8         7.2         2.6          0.2         0.2         0.2
 Min                       0.3           0.4         0.3         0.5         -0.1         0.0         0.0
 Max                       6.0           9.7        28.2        10.1          0.7         0.8         0.7

Note: MLD is for Mean Logarithmic Deviation; â€˜Atkinsonâ€™ is for the Atkinson index, where the inequality
aversion parameter epsilon is set equal to 2; H is for the poverty headcount ratio; PG is for the poverty gap;
PG2 is for the poverty gap squared. Poverty indices are computed using a relative poverty line, equal to 60%
of median per capita consumption in the original distribution.




                                                     29
Table 3 quantifies the sensitivity of an array of statistics of interest for welfare analysis.
The first four columns refer to inequality measures: the Gini index (column 2), two
members of the Generalized Entropy family of indices (Mean Log Deviation and Theil
index, in columns 3 and 4), and the Atkinson index (with the inequality aversion parameter
set equal to two, column 5). The remaining columns refers to Foster, Greer and Thorbecke
(1984) poverty measures: the headcount ratio (H), the poverty gap index (PG), and the
poverty gap squared index (PG2). The results show that the exclusion of outliers decreases
estimated inequality according to all measures. The Gini index is the least sensitive among
the inequality measures shown in Table 3, although before-after differences can be large
enough to cast doubt on the validity of estimates obtained from the raw data: the Gini index
for Nigeria, Mozambique, Malawi, and Mexico is bumped down by 5 to 6 percentage
points, a salient difference by all metrics. The Theil index is the most sensitive measure
among those we examined, showing differences as large as 28 percentage points. Overall,
these results align with the theoretical analysis in Cowell and Flachaire (2007). Differences
observed for three poverty measures â€“ the poverty headcount, poverty gap, and poverty gap
squared â€“ are smaller across the board, in line with the theoretical findings available in the
literature (Cowell and Victoria-Feser 1996b). However, our illustrative example uses
relative poverty lines, which may affect results.

The empirical magnitude of the effects documented in Table 3 is consequential, and can be
appreciated focusing on the case of Gini index (the more conservative scenario in our
setting). Figure 6 (top panel) shows the point estimates of the Gini index calculated on the
RuLIS â€˜raw data setsâ€™, that is on data as available from the data provider. Countries are
ranked low (Kyrgyzstan) to high (Bolivia) accordingly. The bottom panel of Figure 6
shows the estimated Gini indices, for the same countries, after excluding outliers (as
identified by outdetect). Even a cursory inspection of the two graphs bring to light a
number of rerankings: countries move up or down the ladder as a consequence of how
outliers are treated.




                                             30
Figure 6 â€“ Gini index (%), for selected countries

                                          All observations




                                          Without outliers




Source: our estimates based on RuLIS (2021).



Although the comparison between â€œrawâ€ and â€œtrimmedâ€ statistics in Table 3 is meant as a
sensitivity exercise, rather than an outlier treatment suggestion, clearly the behavior of the
statistics of interest does depend on the choice of what to do with observations flagged as
outliers. As an example, we consider winsorization (or censoring), which amounts to
changing the value of each outlier to that of the nearest inlier (Tukey 1962: 18). In Table 4
we compare â€œrawâ€ statistics with those obtained by winsorizing the per capita expenditure
variable, rather than trimming it.




                                                31
Table 4. Difference (in percentage points) between statistics computed before and
after winsorizing the target variable
        Country           Gini         MLD         Theil      Atkinson        H          PG          PG2
 Armenia                   0.0          0.1          0.0         0.2         0.0         0.0          0.0
 Bangladesh                0.5          0.7          1.6         0.6         0.0         0.0          0.0
 Bolivia                   1.7          3.7          7.8         5.5         0.0         0.0          0.0
 Bulgaria                  0.3          0.5          0.5         1.4         0.0         0.1          0.1
 Burkina Faso              0.0          0.1          0.2         0.1         0.0         0.0          0.0
 Cambodia                  0.1          0.1          0.2         0.2         0.0         0.0          0.0
 Cameroon                  0.4          0.7          1.6         0.4         0.0         0.0          0.0
 CÃ´te dâ€™Ivoire             0.8          1.3          2.8         1.0         0.0         0.0          0.0
 Ecuador                   0.2          0.4          0.7         0.4         0.0         0.0          0.0
 Ethiopia                  0.4          0.6          1.3         0.5         0.0         0.0          0.0
 Georgia                   0.6          0.9          1.9         0.9         0.0         0.0          0.0
 Ghana                     0.4          0.7          1.5         0.6         0.0         0.0          0.0
 Guatemala                 0.2          0.4          0.7         0.6         0.0         0.0          0.0
 India                     0.7          1.2          2.8         0.9         0.0         0.0          0.0
 Iraq                      0.3          0.4          0.7         0.4         0.0         0.0          0.0
 Kyrgyzstan                0.4          0.4          0.6         0.8         0.0         0.1          0.1
 Malawi                    3.9          6.2         26.2         4.2         0.0         0.0          0.0
 Mali                      2.5          4.3         11.3         2.6         0.0         0.0          0.0
 Mexico                    2.2          4.0          8.1         4.8         0.0         0.0          0.1
 Mongolia                  0.2          0.2          0.3         0.9         0.0         0.0          0.0
 Mozambique                2.4          4.5         10.4         4.8         0.0         0.1          0.1
 Nepal                     0.4          0.5          1.1         0.4         0.0         0.0          0.0
 Nicaragua                 0.2          0.4          0.6         0.6         0.0         0.0          0.0
 Niger                     0.2          0.3          0.7         0.3         0.0         0.0          0.0
 Nigeria                   4.5          7.5         24.4         5.0         0.0         0.0          0.1
 Pakistan                  0.2          0.3          0.6         0.3         0.0         0.0          0.0
 Peru                      0.2          0.4          0.7         0.5         0.0         0.0          0.0
 Rwanda                    1.4          2.5          6.3         1.8         0.0         0.0          0.0
 Senegal                   0.5          0.9          1.5         1.8         0.0         0.0          0.1
 Serbia                    0.3          0.4          0.7         0.6         0.0         0.0          0.0
 Sierra Leone              0.1          0.2          0.2         0.4         0.0         0.0          0.0
 Tanzania                  0.3          0.4          0.9         0.4         0.0         0.0          0.0
 Uganda                    1.5          2.7          8.0         1.8         0.0         0.0          0.0
 Vietnam                   1.4          2.0          4.4         1.6         0.0         0.0          0.0
 Average                   0.9          1.5          3.9         1.4         0.0         0.0          0.0
 SD                        1.1          1.9          6.3         1.6         0.0         0.0          0.0
 Min                       0.0          0.1          0.0         0.1         0.0         0.0          0.0
 Max                       4.5          7.5         26.2         5.5         0.0         0.1          0.1
Note: MLD is for Mean Logarithmic Deviation; â€˜Atkinsonâ€™ is for the Atkinson index, where the inequality
aversion parameter epsilon is set equal to 2; H is for the poverty headcount ratio; PG is for the poverty gap;
PG2 is for the poverty gap squared. Poverty indices are computed using a relative poverty line, equal to 60%
of median per capita consumption in the original distribution.




                                                     32
On average, differences between â€œrawâ€ and â€œtreatedâ€ results are about half as large when
winsorizing (Table 4), as they are when trimming (Table 3), and patterns are consistent:
the surveys where the impact of outliers is the largest remain the same (Nigeria, Malawi,
Mexicoâ€¦). This exercise is not intended to convey general messages but is a useful
complement to the results presented in Table 3.

As to whether trimming or winsorization is to be preferred, no general recommendation
can be found in the literature, nor any agreed-upon practices exist (van Kerm 2007;
Turkiewicz, 2017; Hlasny 2020). When it comes to making a decision, it is worth
reminding that winsorization produces artificial clusters of observations at the extremes of
the distribution. The case of Mexico 2014 in Figure 7 illustrates: the histogram of the (log)
of per capita expenditure shows two spikes (highlighted in red), created by winsorization.
Depending on the statistics of interest to the analyst, the presence of these masses might be
more or less influential: clearly, the choice between trimming and winsorizing has an
impact on cardinal comparisons involving tail-sensitive poverty and inequality measures,
which leads to recommendation that spatial and temporal poverty and inequality
comparisons are carried out on a common method for treating outliers.



Figure 7 â€“ Winsorized per capita expenditure, Mexico 2014




                                             33
5    Concluding remarks

Despite the increase in quality of household surveys in recent years, most variables of
interest to welfare analysts suffer from measurement error: the outdetect command
focuses on outliers, that is, on values that are so large or so small that they seem â€œtoo far
awayâ€ from the rest of the data. While â€˜outlierâ€™ is not necessarily a synonym for â€˜errorâ€™, in
practice their investigation often leads to the identification of gross errors; and even when
outliers turn out to be legitimate values, their analysis is informative, and is integral to the
initial stages of any data analysis. The literature provides three sets of results that motivate
our interest in developing outdetect as a tool for detecting outliers and assessing their
potential impact on poverty and inequality measures. First, the body of theoretical results
that demonstrate the extent to which different inequality and poverty measures are sensitive
to the presence of extreme values (Cowell and Victoria-Feser 1996a, 1996b; Victoria-Feser
2000; Cowell and Flachaire 2007, 2015; Cowell and van Kerm 2015). In general, welfare
indices have been shown to be highly sensitive to extreme values in the tails of the
distribution of income. Second, extreme values are omni-present in household surveys,
both in low- and high-income countries (Hlasny and Verme 2018, Hlasny, Ceriani and
Verme 2021). Third, while common experience and anecdotal evidence suggest that
analysts always inspect their data, and often adjust them in some way, to protect their
analyses from the impact of extreme values, these practices are rarely documented. Based
on the examination of the documentation accompanying the release of some 200 official
poverty and inequality estimates by national statistical offices, Mancini and Vecchi (2022)
find that the overwhelming majority does not even mention whether outliers were dealt
with, and how.

outdetect does not purport to solve the complex and long-standing problem of outlier
identification and treatment once and for all. In fact, outdetect does not include options
to treat extreme values, as no automatic procedures exist that can be recommended for that.
Rather, the command should be thought of as a heuristic tool, much in the spirit of Hampel
(1974) and Huber (1981), aimed at helping practitioners, particularly welfare analysts, with
assessing the impact of extreme values in their calculations. Our effort in developing
outdetect has focused on taking stock of both theoretical results and the practical
constraints faced by empirical analysts. As a result, outdetect is simple to use, fast to
execute even in the presence of relatively large samples, and rooted in the academic
literature. The command is intended to greatly facilitate the documentation of choices made




                                              34
at the â€œpre-analyticalâ€ stage, and the reproducibility of the analysis. This, in turn, has far-
reaching implications for the consistency of welfare comparisons: when the methodology
for identifying outliers differs, the consistency of both inter-temporal and international and
intra-country geographic comparisons is at risk (Ravallion 1994; Atkinson 2019: ch. 4).
Our preliminary empirical exploration, based on a selection of data sets drawn from the
RuLIS project, provides a tentative assessment of the effect that lack of harmonization in
dealing with outliers can have on welfare comparisons.




                                              35
Appendix



Table A1. RuLIS data sets used in section 4
 Country          Survey                                                            Year     Households
 Armenia          Integrated Living Conditions Survey                               2013        5,184
 Bangladesh       Household Income-Expenditure Survey                               2010       12,240
 Bolivia          Encuesta de los Hogares                                           2008        3,940
 Bulgaria         Multitopic Household Survey                                       2007        4,300
 Burkina Faso     EnquÃªte Multisectorielle Continue                                2014/15     10,800
 Cambodia         Cambodia Socio-Economic Survey                                    2009       11,971
 Cameroon         Fourth Cameroon Household Survey                                  2014       10,303
 CÃ´te d'Ivoire    EnquÃªte Niveau de Vie des Menages 2008                            2008       12,600
 Ecuador          Encuesta sobre Condiciones de Vida                                2014       28,970
 Ethiopia         Ethiopia Socioeconomic Survey                                    2015/16      4,954
 Georgia          Integrated Household Survey                                       2015       10,999
 Ghana            Ghana Living Standards Survey                                    2012/13     16,772
 Guatemala        Encuesta Nacional de Condiciones de Vida                          2014       11,536
 India            India Human Development Survey                                    2012       42,129
 Iraq             The Iraq Household Socio-Economic Survey                          2012       24,944
 Kyrgyzstan       Integrated Sample Household Budget and Labour Survey              2013        5,013
 Malawi           Integrated Household Survey                                       2017       12,447
                  EnquÃªte Agricole de Conjoncture IntegrÃ©e aux Conditions de Vie
 Mali                                                                               2014       3,804
                  des MÃ©nages
 Mexico           Encuesta Nacional de Ingresos y Gastos de los Hogares             2014       19,479
 Mongolia         Socioeconomic Survey                                              2014       16,174
 Mozambique       InquÃ©rito sobre OrÃ§amento Familiar                                2009       10,832
 Nepal            Nepal Living Standards Survey                                     2011        5,988
 Nicaragua        Encuesta Nacional de Hogares sobre MediciÃ³n de Nivel de Vida      2014        6,851
 Niger            National Survey on Household Living Conditions and Agriculture    2014        3,617
 Nigeria          General Household Survey                                         2012/13      4,738
 Pakistan         Pakistan Social and Living Standards Measurement Survey          2013/14     17,989
 Peru             Encuesta Nacional de Hogares                                      2015       32,188
 Rwanda           Integrated Household Living Conditions Survey                    2013/14     14,419
 Senegal          EnquÃªte de Suivi de la PauvretÃ© au SÃ©nÃ©gal                        2011        5,953
 Serbia           Living Standards Measurement Survey                               2007        5,557
 Sierra Leone     Integrated Household Survey 2011                                  2011        6,727
 Tanzania         National Panel Survey                                            2012/13      3,256
 Uganda           The Uganda National Panel Survey                                 2013/14      3,352
 Vietnam          Household Living Standards Survey                                 2010        9,399

Note: Information supplied by RuLIS technical documentation, http://www.fao.org/in-action/rural-
livelihoods-dataset-rulis/technical-documentation/en/.




                                                      36
References

Atkinson, A. B. 2019. Measuring poverty around the world. Princeton University Press.

Barnett, V., and Lewis T. 1994. Outliers in Statistical Data. 3rd edition. J. Wiley and Sons.

Billor, N., Hadi, A. S. and Velleman P. F. 2000. â€œBACON: Blocked Adaptive
Computationally Efficient Outlier Nominators. Computational Statistics & Data Analysis,
34: 279-298.

Box, G. E., and Cox, D. R. 1964. â€œAn analysis of transformations.â€ Journal of the Royal
Statistical Society: Series B (Methodological), 26(2): 211-243.

Bruffaerts, C., Verardi, V. and Vermandele, C. 2014, â€œA generalized boxplot for skewed
and heavy-tailed distributions.â€ Statistics and Probability Letters, 95: 110-117.

Cowell, F. A. 2011. Measuring Inequality. Oxford University Press.

Cowell, F. A., and Flachaire, E. 2007. â€œIncome distribution and inequality measurement:
The problem of extreme values.â€ Journal of Econometrics, 141(2): 1044-1072.

Cowell, F. A., and Flachaire, E. 2015. â€œStatistical methods for distributional analysis.â€ In
Handbook of income distribution (2: 359-465). Elsevier.

Cowell, F. A. and van Kerm, P. 2015. â€œWealth Inequality: A Survey.â€ Journal of Economic
Surveys, 29(4): 671-710.

Cowell, F. A., and Victoria-Feser, M. P. 1996a. â€œRobustness Properties of Inequality
Measures.â€ Econometrica, 64(1): 77-101

Cowell, F. A., and Victoria-Feser, M. P. 1996b. â€œPoverty measurement with contaminated
data: A robust approach.â€ European Economic Review, 40(9): 1761-1771.

Croux, C. and Rousseeuw, P. J. 1992. â€œTiem-efficient algorithms for two highly robust
estimators of scaleâ€, Computational Statistics, 1: 411-428.

Dâ€™Agostino, R. B., Belanger, A., and Dâ€™Agostino R.B. Jr. (1990). â€œA Suggestion for Using
Powerful and Informative Tests of Normality.â€ The American Statistician, 44(4), 316â€“321.

Davies, L., and Gather, U. (1993). â€œThe Identification of Multiple Outliers.â€ Journal of the
American Statistical Association, 88(423), 782â€“792.

Deaton, A. and Tarozzi, A. 2005. â€œPrices and poverty in Indiaâ€, in Deaton, A. and Kozel,
V. (eds.), The Great Indian Poverty Debate. New Delhi, India: MacMillian: 381-411.




                                             37
Dupriez, O. 2007. â€œBuilding a household consumption database for the calculation of
poverty PPPs.â€ Technical note. The World Bank, Washington DC.

Filzmoser, P., Gussenbauer, J., and Templ, M. 2016. â€œDetecting outliers in household
consumption survey data.â€ Vienna University of Technology.

Foster, J., Greer, J. and Thorbecke, E., 1984. â€œA class of decomposable poverty measures.â€
Econometrica, 761-766.

Friedline, T., Masa, R. D., and Chowa, G. A. 2015. â€œTransforming wealth: Using the
inverse hyperbolic sine (IHS) and splines to predict youthâ€™s math achievement.â€ Social
science research, 49: 264-287.

Gather, U. and Becker, C. 1997. â€œOutlier Identification and Robust Methodsâ€, in Handbook
of Statistics, edited by Maddala, G. S. and Rao, C. R., vol. 15: 123-142.

Grubbs, F. E. 1969. â€œProcedures for detecting outlying observations in samples.â€
Technometrics, 11(1): 1-21.

Hampel, F. R. 1974. â€œThe influence curve and its role in robust estimation.â€ Journal of the
American Statistical Association, 69(346): 383â€“393.

Hampel, F. R., Ronchetti E. M., Rousseeuw P. J., and Stahel W. A. 1986. Robust Statistics:
The Approache Based in Influence Functions. New York: John Wiley & Sons.

Hlasny, V. 2020. â€œNonresponse Bias in Inequality Measurement: Crossâ€Country Analysis
Using Luxembourg Income Study Surveys.â€ Social Science Quarterly 101, no. 2: 712â€“731.

Hlasny, V., Ceriani, L. and Verme, P. 2021. â€œBottom Incomes and the Measurement of
Poverty and Inequality.â€ Review of Income and Wealth.

Hlasny, V. and Verme, P. (2018) â€œTop Incomes and Inequality Measurement: A
Comparative Analysis of Correction Methods Using the EU SILC Data.â€ Econometrics,
6(30): 1-21.

Huber, P. J. 2002. â€œJohn W. Tukey's Contributions to Robust Statistics.â€ Annals of
Statistics, 30(6): 1640-1648.

Jann, B., V. Verardi and C. Vermandele. 2018. robstat: Stata module to estimate robust
univariate statistics. Available from http://ideas.repec.org/c/boc/bocode/s458524.html.

Mancini, G. and Vecchi, G. 2022. On the Construction of the Consumption Aggregate for
Inequality and Poverty Analysis. The World Bank: Washington DC.




                                            38
Ravallion, M. 1994. Poverty Comparisons. London: Routledge.

Rousseeuw, P. J. and Croux, C . 1992. â€œExplicit Scale Estimators With High Breakdown
Pointâ€, in L1 Statistical Analysis and Related Methods, edited by Dodge, Y., Amesterdam,
North Holland: 77-92.

Rousseeuw, P. J., and Croux, C. 1993. â€œAlternatives to the median absolute deviation.â€
Journal of the American Statistical Association, 88(424): 1273-1283.

Rousseeuw, P., and Hubert, M. 2017. â€œAnomaly detection by robust statistics.â€ Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2): 1236, 1-14.

Shapiro, S. S., and Francia, R. S. 1972. â€œAn approximate analysis of variance test for
normality.â€ Journal of the American Statistical Association, 67(337): 215-216.

Shapiro, S. S., and Wilk, M. B. 1965. â€œAn analysis of variance test for normality (complete
samples).â€ Biometrika, 52(3/4): 591-611.

Snedecor, G. W. and Cochran, W. G. 1989. Statistical Methods. Iowa State University
Press, Ames.

Tukey, J. W. 1962. â€œThe Future of Data Analysis,â€ The Annals of Mathematical Statistics
33, no. 1: 1â€“67.

Turkiewicz, K. 2017. â€œData trimmingâ€. In M. Allen (ed.), The SAGE encyclopedia of
communication research methods, vol. 1: pp. 347-348. SAGE Publications.

Van Kerm, P. 2007. â€œExtreme Incomes and the Estimation of Poverty and Inequality
Indicators from EU-SILC.â€ IRISS Working Paper Series no. 2007-01.

Verardi, V. and Vermandele, C. 2018 â€œUnivariate and Multivariate Outlier Identification
for Skewed or Heavy-Tailed Distributionsâ€, Stata Journal, 18(3): 517-532.

Victoriaâ€Feser, M. P. 2000. â€œRobust methods for the analysis of income distribution,
inequality and poverty.â€ International Statistical Review, 68(3): 277-293.

Weber, S. 2010. â€œbacon: An effective way to detect outliers in multivariate data using Stata
(and Mata).â€ Stata Journal, 10(3): 331-338.

Yeo, I. K., and Johnson, R. A. 2000. â€œA new family of power transformations to improve
normality or symmetry.â€ Biometrika, 87(4): 954-959.




                                            39