Policy Research Working Paper 11012 Pakistan Poverty Map 2019–2020 Ramiro Malaga Ortega Moritz Meyer Paul Corral Rodas Poverty and Equity Global Department December 2024 Policy Research Working Paper 11012 Abstract This paper summarizes the approach used to estimate mon- poverty, but it also comes at the expense of noise due to etary poverty for Pakistan in 2019–20 at the district level. sampling. Monetary poverty rates are estimated for 126 The small area estimation method is used to impute wel- districts in Punjab, Sindh, Khyber Pakhtunkhwa, and fare from the Household Income and Expenditure Survey Balochistan, including, for the first time, the districts of the 2018–19 into the Pakistan Social and Living Standard Mea- former Federal Administered Tribal Areas and the former surement Survey 2019–20. This application differs from Frontier Regions. Using the Census Empirical Best method, the standard small area estimation implementations that the analysis obtains poverty estimates with higher precision use a survey-to-census method because the two surveys are and accuracy than those of the methodology previously household surveys. Using surveys instead of a census as implemented in Pakistan. the target data offers additional information for modeling This paper is a product of the Poverty and Equity Global Department. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at obarriga@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Pakistan Poverty Map 2019–20201 Technical Paper Ramiro Malaga Ortega Moritz Meyer Paul Corral Rodas The World Bank Group The World Bank Group The World Bank Group Keywords: small area estimation, poverty, inequality, inclusive growth, spatial disparities, Pakistan JEL classification: I32, C13 1 Ramiro A. Malaga Ortega is a Consultant in the Poverty and Equity Global Practice at the World Bank; Moritz Meyer (corresponding author: mmeyer3@worldbank.org) is a Senior Economist in the Poverty and Equity Global Practice at the World Bank; and Paul Corral is a Senior Economist in the Poverty and Equity Global Practice. The authors would like to thank the Pakistan Bureau of Statistics for sharing data and providing technical feedback, and Alexandru Cojocaru, Minh Cong Nguyen, Shujaat Farooq, and Shahid Imdad for excellent suggestions. The authors also would like to thank Lander Bosch and Marziya Farooq for support to produce maps. We declare that we have no relevant or material financial interests that relate to the research described in this paper. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank Group or any affiliated organizations, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. 1. Introduction Household surveys are the primary source of information used to measure poverty and inequality in developing countries. However, policy makers are often interested in poverty estimates at a higher disaggregation than those typically available from household surveys. Knowing which districts (or tehsils) have higher poverty rates or a disproportionally large headcount of poor helps improve the design of policy interventions to support poor households, whether through enhanced targeting or other policies and programs. Given the cost and time needed to collect high-quality consumption data, there is a trade-off between sample size and the cost and quality of the data, constraining the sample size of household surveys used to estimate monetary poverty to higher levels of aggregation, such as the provincial or national levels. In the case of Pakistan, this survey is the Household Income and Expenditure Survey (HIES), which provides detailed consumption data that is representative of the provincial and rural-urban levels. In addition, the Pakistan Social and Living Standard Measurement Survey (PSLM) uses the same sampling frame as the HIES, with most of its questions belonging to a subset of the HIES questionnaire. While the PSLM does not include a consumption module, the survey has a larger sample size, which allows for representativeness at the district level. 2 With this setup, it is possible to combine the information of the HIES and PSLM to estimate monetary poverty at the district level using a statistical approach called small area estimation (SAE). The premise behind the SAE method is to project the information on household welfare from the dataset with consumption data to another dataset with higher geographical resolution (usually a population census) but without a consumption module. SAE combines two data sources, one of which has detailed consumption expenditure or income data but is not representative of the desired granularity level (district level in our case). In contrast, the other does not have information on consumption expenditure or income but is representative at the desired level—usually a census. A linear model establishes a relationship between welfare and other characteristics. The model’s estimated parameters are then used to predict consumption expenditure or income in the data lacking consumption expenditure or income. The advantage of this approach is that the measure of poverty used at the lower tier is the same as the metric used at the national and provincial levels. However, this strategy can only succeed if the two datasets share the same variables for a robust poverty prediction. The World Bank has widely used the SAE method to estimate poverty at granular levels and combine these estimates with non-monetary poverty measures. Examples include, among others, Albania, Bolivia, Bulgaria, Cambodia, Ecuador, Indonesia, Mexico, Sri Lanka, and Viet Nam. The poverty map created here differs from most other implementations because the target dataset is not a census but a large household survey (PSLM). This setup has advantages, such as having more variables in common in both datasets and higher explanatory power. Nonetheless, the high number of eligible covariates can lead to overfitting, and the estimates may have additional uncertainty due to the data being a sample and not the entire population.3 This paper is structured in seven sections. Section 2 presents background literature on unit-level SAE, while Section 3 describes the applied methodology. Section 4 describes the datasets used, and Section 5 discusses the econometric modeling and estimation in practice. Section 6 presents key findings, and Section 7 summarizes 2 The PSLM 2019–20 did not collect information for five districts in Balochistan (Chagai, Jhal Magsi, Musakhel, Panjgur, and Zhob) where we could not estimate poverty figures. 3 Future work would need to address the random nature of the survey data as an additional source of uncertainty. It is likely that modeling this as part of the prediction stage would further increase the standard errors. 2 the main conclusions. Annexes give the district estimates and the list of variables selected on the right-hand side for each provincial model. 2. Literature The article “Micro-level estimation of poverty and inequality,” by Elbers, Lanjouw, and Lanjouw (2003), is an essential reference for poverty mapping. It describes a method for obtaining small-area poverty estimates and has become widely known as the ELL method. Until recently, this method was the default method used for most poverty maps produced by the World Bank. The ELL methodology belongs to a class of unit-level models for SEA. The model is specified at the household level, and additional variables can be added at a higher level. Following ELL, the literature has offered continuous improvements. 4 These include Molina and Rao (2010), who introduced the Empirical Best (EB) estimator that uses the available survey data more efficiently. For EB estimators, the locality effects for areas present in the census and the survey are simulated using the predicted locality effects from the survey as the first moment of the distribution. The ELL methodology quickly gained ground at the World Bank, primarily due to a stand-alone PovMap program by Zhao (2006). Nguyen et al. (2018) then translated PovMap into Stata, making the software accessible to a broader audience, and distributed it as a Stata ado. Corral et al. (2020) updated the method by adapting the Monte Carlo simulation procedure from Molina and Rao (2010) to the model fitting the procedure from Van der Weide (2014), which considers heteroscedasticity and survey weights. The programs also adapted the parametric bootstrap approach for estimating the mean squared error (MSE) considered by Molina and Rao (2010) and proposed originally by González-Manteiga et al. (2008) to these extended EB estimators. Finally, Corral et al. (2020) adapted the Census EB estimator from Correa et al. (2012). They conducted simulations highlighting the importance of choosing the proper transformation of the dependent variable and showed that the best results are when the area coincides with the level at which the result needs to be reported. Corral et al. (2020) also illustrated that when sample sizes are small relative to the population, Census EB estimates approximate EB estimates. Access to microdata from the housing and population census or a large representative survey is necessary for alternative models to include area-level models, which use aggregate data at levels greater than households. The best-known area-level model for estimating poverty for small areas is the Fay-Herriot (FH) model introduced by Fay and Herriot (1979). This class of area-level models was used to produce estimates beyond poverty for populations in many geographic regions in the United States and inform the allocation of federal funds to school districts. More recently, Seitz (2019) used the FH model to produce small-area poverty estimates for Central Asia. 3. Methodology The Pakistan poverty map uses the Census EB estimator, a further development based on the original ELL estimator. Both ELL and Census EB estimators assume that the welfare. ℎ of each household ℎ within each location in the population is linearly related to a 1 vector of characteristics ℎ for that household, according to the following nested error model: ℎ = ℎ + + ℎ for ℎ = 1, . . . , and = 1, . . . , 4 For a more developed discussion on small area estimation of poverty and inequality, see Molina et al. (2022). 3 where c and are, respectively, the location (district) and the household-specific idiosyncratic errors, assumed to be independent of each other, following normal distributions: 2 ~(0, ) 2 ℎ ~(0, ) where is the number of households in the population in area c and is the number of areas, in our case, the number of districts. Under ELL, the locations are often the primary sampling units (PSU). Nevertheless, aggregation of estimates to higher levels may lead to noisier estimates as well as inaccurate estimates of noise (Marhuenda et al. 2017). Following the recommendation of Marhuenda et al. (2017), we specify the location effects at the level of aggregation at which poverty is intended to be reported (districts). Under ELL, the next step consists of imputing welfare in the census, but most of the information is missing: ∗ = ∗ + ∗ + ∗ In the original ELL application, the parameters were obtained from their joint posterior distributions: ̂ )) ̂ , vcov( ∗ ~( 2∗ � 2 ( − )/ 2 ∗ 2∗ ~ 0 0 − , and draw the residuals: ~(0, ) 2∗ � � ∗ 2∗ , var( )), and we draw the random location effects: ~ (0, ) ~( 2 2 Nevertheless, following the recommendations from Corral et al. (2021) and Corral et al. (2022), we use the adaptation from Molina and Rao (2010) made by Corral et al. (2021): First, we fit the model using the GLS from Van der Weide (2014) and obtain parameter estimates for the observed sample: ̂0 , �0 = � 2 2 � � �. , We then use the estimated parameters to simulate M vectors of welfare in the census data: ∗ ℎ ̂0 + = ℎ ∗ ∗ + ℎ , ̂0 is maintained fixed for all the M vectors and following Van der Weide (2014) where ∗ is generated as: ∗ � [̂ 0 ]) ~(̂ 0 , var with −1 ℎ ℎ ̂ 0 = � �∑ℎ � �2 � ̂ℎ � �∑ℎ � �2 �� , 0 0 4 � 2 � = 0 −1 , 2 � +∑ℎ ℎ �∑ℎ ℎ ∑ℎ 2 ℎ � 2 0 � ℎ 0 2 ∗ 2 2 2 2 � [ var � ] = − � � � + ∑ℎ � �2 ℎ � � ℎ0 �, 0 0 ℎ0 and household-specific residuals are obtained from: ∗ 2 ℎ ~�0, �ℎ0 �. For areas that are not in the consumption survey (in our case, the HIES 2018–19) but are in the target dataset (in our case, the PSLM 2019–20), the local effects are generated as: ∗ 2 ~(0, � ). 0 The Census EB estimator for a location is obtained by averaging across Monte Carlo simulations: ∗ ̂ = ∑ =1 ̂ , where would be the rate of poverty (FGT0). To estimate the noise, we rely on parametric bootstraps by Gonzalez-Manteiga et al. (2008), shown via simulations by Corral et al. (2020), to yield a more accurate measure of noise than the method from ELL. The models and estimates are all obtained using the latest SAE Stata package by Nguyen et al. (2018). 5 4. Data This implementation of the Pakistan Poverty Map 2019–20 differs from most implementations because it does not use data from one survey to impute it into a census but rather involves two surveys. This approach reflects the access restrictions to microdata from the Population and Housing Census 2017. Hence, this application uses two technically very similar surveys. The HIES 2018–19 and the PSLM 2019-20 are household sample surveys, whereby the PSLM has a sample size 6.8 times larger than the HIES. Using both surveys offers additional information for modeling because the PSLM questionnaire covers more household characteristics than the census. Another advantage of this approach is that both HIES and PSLM are conducted frequently, which allows for more frequent monitoring of poverty at the district level. Box 1: Political division of Pakistan At the time of the implementation of the HIES 2018–19 and the PSLM 2019–20, Pakistan’s political division consisted of six levels of government: National Federal Government, Provincial government, Division administration, District government, Tehsil municipal administration, and Union administration. Aside from the four provinces of Balochistan, Khyber Pakhtunkhwa (KP), Punjab, and Sindh, there is the Islamabad Capital Territory (ICT), which for the survey design is treated as part of Punjab. Federally Administered Tribal Areas (FATA) and Frontier Regions (FR) were recently merged into KP. The FATA became new districts, and the FRs were merged into neighboring districts, sampled in the HIES 2018–19 and the PSLM 2019–20. The political division of Pakistan remains dynamic and has been evolving over the past two decades. In 2006, there were 106 districts in KP, Punjab, Sindh, and Balochistan, but at the time of the HIES 2018–19 and the PSLM 2019–20, there were 131 districts in those four provinces. 5 The latest package can be downloaded from https://github.com/pcorralrodas/SAE-Stata-Package. 5 Since 1963, the HIES has been conducted regularly but with some breaks. Between 1990 and 1997, four surveys (1990–91, 1992–93, 1993–94, and 1996–97) were conducted in compliance with the new national accounts system. The 1998–99 and 2001–02 surveys were merged with the Pakistan Integrated Household Survey (PIHS), a more extensive survey with district-level representativeness. In 2004, the PIHS was renamed the Pakistan Social and Living Standards Measurement (PSLM) Survey, but the HIES module remained intact. The larger survey design has two parts: (i) the longer questionnaire with provincial representativeness (HIES) and (ii) the part with a shorter questionnaire with district representativeness (PSLM). Even though both surveys share part of their questionnaire and follow common sampling design principles and field protocols, it is more accurate to consider them separate surveys. Their implementation is independent: districts covered do not always coincide (particularly for Balochistan), the PSLM questionnaire is not a perfect subset of the HIES questionnaire, and there exist significant differences, such as district coding and electronic methods of data collection, which were rolled out independently. 4.1. Household Income and Expenditure Survey 2018–19 (HIES 2018–19) The HIES 2018–19 provides detailed outcome indicators on education, health, population welfare, housing, water sanitation, hygiene, information communication and technology (ICT), food insecurity, and income and expenditure. Its universe consists of the urban and rural areas of the four provinces of Pakistan, with 1,802 primary sampling units (PSU) and 24,809 households in the four provinces. For the first time since 2001–02, it includes the Federally Administered Tribal Areas (FATA) and the Frontier Regions (FR) as Khyber Pakhtunkhwa (KP) districts. The HIES 2018–19 updated the sampling frame based on the Population and Housing Census 2017 with a stratified two-stage sample design. The COVID-19 lockdown did not impact HIES 2018–19 because this survey was completed before the onset of the pandemic. 4.2. Pakistan Social and Living Standard Measurement Survey 2019–20 (PSLM 2019–20) The PSLM 2019–20 is designed to have district-level inference and to collect information on education, ICT, health, disability, migration, housing, water supply and sanitation (WSS), household perceptions and satisfaction, and food insecurity. Since disability is a rare event, its inclusion in the survey required a significant increase in the sample size, from 5,326 PSU and 78,635 households in PSLM 2014–15 to 5,673 PSU and 170,246 households in the four provinces of Pakistan in 2019–20. The increment in the sample size increased the precision of the district poverty estimators compared with previous poverty map implementations. The Pakistan Bureau of Statistics (PBS) carried out the PSLM 2019–20 field activities between October 2019 and March 2020, which resulted in a limited impact of the COVID-19 pandemic restrictions on the data collection process. The Punjab sample was almost wholly taken before the lockdown, with only 0.8 percent collected after the lockdown. However, in Balochistan, this proportion was 8.1 percent, 4.3 percent in Sindh, and 3.4 percent in KP. Due to COVID-19 lockdowns and security restrictions, 563 PSUs were dropped in the four provinces. Five districts of Balochistan (Zhob, Panjgur, Jhal Magsi, Chagai, and Musa Khel) were completely dropped, and the entire urban areas of four districts of Balochistan (Kalat, Khuzdar, Qilla Saifullah, and Shaheed Sikandarabad) were also excluded from the data collection. 6 A critical change in the PSLM 2019– 6 Only 607 from 163,676 sampling blocks were affected by the COVID-19 pandemic, or un-approachable/security problems/military restricted areas in the four provinces. 6 20 was the use of a new technology for collecting information via computer-assisted personal interviews (CAPI) for the first time. 4.3. Matching the HIES 2018–19 and PSLM 2019–20 The current standard for poverty mapping (Census EB) is based on estimating a consumption model on the training data (HIES 2018–19) and applying it to the target data (PSLM 2019–20) for prediction purposes using the predicted mean and variance of the locality effects for the areas in common. There are two critical requirements necessary to use the Census EB methodology: (i) matching the area codes in HIES 2018–19 and PSLM 2019–20 (Section 4.3.1); and (ii) consistency between the variables in the HIES 2018–19 and the PSLM 2019–20, when a total of 305 variables were constructed in both datasets (Section 4.3.2). 4.3.1. Matching the area codes in HIES 2018–19 and in PSLM 2019–20 To assign the predicted mean and variance of the locality effects from HIES 2018–19 to the same localities in the PSLM 2019–20, localities between the two surveys need to be matched. 7 Although the HIES 2018–19 and the PSLM 2019–20 are based on the same sampling frame, the district codes were different and did not match between the two surveys. We obtained the district for each PSU in HIES 2018–19 from PBS to make the match. Only for Punjab, Sindh, and KP was it possible to match all the districts in both surveys. For Balochistan, the five districts not sampled in the PSLM 2019–20 cannot have poverty estimated using SAE. The Census EB methodology could not improve the poverty estimates of Balochistan's Awaran district because they lack the HIES 2018–19 information. 4.3.2. Consistency between the variables in HIES 2018–19 and PSLM 2019–20 Since both surveys share a similar design, most of the modules and questions in the PSLM 2019–20 are also part of the HIES 2018–19, and both surveys share the following modules, namely demographic characteristics, dwelling characteristics, education, employment, water, sanitation and hygiene, solid waste management, durables, vaccination, pre-natal and post-natal care, and food insecurity. Only the modules of migration and disability were in PSLM 2019–20 and not in HIES 2018–19, providing a total of 305 variables with the potential of being used to explain the dependent variable—household consumption. Between the candidate variables, we include interactions with provinces, rural areas, and the provincial means for the variables. Given the size of Pakistan and its population, this poverty map compared a national model with independent models for each province. Hence, the selected variables must be comparable across provinces for both data sources. In the end, a national model was selected. We follow a three-stage process to select the variables in the HIES 2018–19 and the PSLM 2019–20 that will be included in the modeling: Step 1: Comparing HIES 2018–19 and PSLM 2019–20 questionnaires to identify questions in both surveys with similar wording and framing. Step 2: Creating candidate variables in both surveys that could explain household consumption in the HIES 2018–19. Step 3: Comparing the national and provincial distributions of the candidate variables constructed in Step 2 to examine whether they capture the same underlying characteristics or their empirical 7 Zhao (2006) provides a detailed summary on requirements for a good match. These include that the variable to match localities is numeric and has a specific structure to obtain different aggregations of the poverty and inequality indicators in the SAE ado. 7 distributions differ significantly despite similar question-wording. We discarded those variables with more than 1 percent missing values or with means and standard deviations in the PSLM 2019–20 outside a range of plus and minus 20 percent of the respective mean and standard deviation from the HIES 2018–19.8 After filtering, KP has 171 variables, Punjab 170, Sindh 126, and Balochistan 85. The national model has 106 variables. The complete list of variables selected in this stage and the variables specified for the final model is in Table A5 to Table A8 in the Annex. 4.3.3. Discrepancy between the number of household members Table 1 shows that the PSLM 2019–20 population expanded using the household sampling weights is 20 million people smaller than the 2017 Census population, even after discounting the 169,544 inhabitants of the five districts not sampled in the PSLM 2019–20. The expanded number of households in the PSLM 2019–20 exceeds the projected number of households by more than 1.4 million using the 1.9 percent annual growth rate. According to the documentation of PSLM 2019–20, it follows the same sampling framework of the Census 2017, leaving the number of household members as the most probable source of the difference. The expanded number of households per province in PSLM 2019–20 is consistent with the increase in the number of households between 2004–05 and 2017. In Annex 10, we present a graphic analysis for the four provinces. Table 1. Population, households, and average number of household members in Census 2017, HIES 2018–19, and PSLM 2019–20 Census 2017 HIES 2018–19 PSLM 2019–20 Total weighted population, weighted 207,684,626 207,881,239 187,539,838 Total numbers of households, weighted 32,205,111 33,326,273 35,475,036 Average household size, weighted 6.39 6.21 5.29 Note: According to PBS, the average population growth rate based on the comparison between Census 1998 and 2017 was 2.4 percent annually. The annual growth rate in the number of households was 1.9 percent annually. Descriptive statistics for the HIES 2018–19 and the PSLM 2019–20 suggest significant differences between the number of household members in both surveys, with the PSLM 2019–20 reporting on average almost one household member less than the HIES 2018–19. Table 2 compares the mean number of household members, illustrating that, on average, in a household in the HIES 2018–19, there were 6.21 members, while in the PSLM 2019–20, there were only 5.29 members. When disaggregating this information by the relationship to the household head, most of this difference was in the number of sons/daughters (-0.38) and grandchildren (-0.21), but with fewer members in the PSLM 2019–20 in almost all the other household members categories. Not all provinces show the same level of discrepancies: the difference in Punjab was -0.46, in KP -1.17, in Sindh -1.53, while in Balochistan, it was -2.61. Table 2. Average number of household members in HIES 2018–19 and PSLM 2019–20 Relationship to the household Difference PSLM- HIES 2018–19 PSLM 2019–20 head HIES Household head 1.00 1.00 0.00 Spouse 0.84 0.86 0.02 Sons/Daughters 3.07 2.69 -0.38 Grandchildren 0.50 0.29 -0.21 8 For the case of Balochistan, we increased the threshold to 30 percent. 8 Father/mother 0.19 0.11 -0.08 brothers/sisters 0.17 0.09 -0.08 Other family members 0.44 0.24 -0.20 All members 6.21 5.29 -0.92 Source: World Bank staff estimations based on HIES 2018–19 and PSLM 2019–20 Figure 1. Comparing the Census 2017 and the PSLM 2019–20 mean number of household members per province and district, weighted by 2019–20 population Source: World Bank staff estimations based on the Census 2017 and PSLM 2019–20 To elucidate which of the two surveys had the figures for the average number of household members closest to reality, we compared them to statistics from the Population and Housing Census 2017, where the average number of household members was 6.39. Figure 1 shows the mean number of household members per province (crosses) and district (circles) for the Census 2017 and the PSLM 2019–20, which has district inference. Most observations are below the 45-degree line, which suggests that the number of household members in the 9 PSLM 2019–20 has a systematic measurement error. Furthermore, the magnitude of the error differs substantially across provinces and districts. Consistent with the previous comparison between the Census 2017 and the PSLM 2019–20, Figure 2 shows that the PSLM 2019–20 does not follow the secular pattern of slow change in the number of household members and has a larger share of population living in households comprising fewer members. Figure 2. Number of household members, as a percentage of the population Source: World Bank staff estimations based on the HIES (2004–05 to 2018–19) and PSLM (2004–05 to 2019–20). The number of household members is the most critical variable in poverty estimation after the consumption aggregate. The natural logarithm of the number of household members was the variable that explained the largest proportion of the variability of the per-adult-equivalent consumption in the regression and was the only variable related to the household composition that remained in a parsimonious model. These discrepancies in household size affect our models in two ways. First, they creates a negative bias in the poverty rate if we include the number of household members from the PSLM 2019–20 because the per-adult- equivalent consumption will increase if there are fewer members in the household. Other household characteristics can also create a bias. For this reason, we estimated models without family composition variables to compare them with the full model after correcting the number of household members. Second, it creates a negative bias in the poverty rates because not all households or districts were affected similarly. Those with larger measurement errors will have lower poverty and less weight in the final district poverty rate because their expansion weights will be smaller. In the SAE data, each observation is a household. To expand the population, we must multiply the household weights by the number of household members. The relative importance of the households more affected by the measurement error will be smaller, creating a bias in the aggregated poverty rates. This paper proposes two solutions: (i) correction of the expansion factor using cross-entropy weight calibration and (ii) rescaling of the number of household members using the Census 2017 district mean number of household members in a way that the PSLM expanded population matches the 2019–20 United Nations population projection. 10 The variables unrelated to the household composition had much more similar distributions in the HIES 2018– 19 and the PSLM 2019–20. In particular, the variables related to the durables, dwelling characteristics, and the education of adults are time-invariant and are similar in both surveys. 5. Application to Poverty Mapping in Pakistan This section summarizes critical steps for the implementation of the Pakistan poverty map. This section also discusses the criteria used to select the best model between the multiple options that we have available to estimate the per-adult-equivalent consumption and the different modeling options that were evaluated. 5.1. The Objective One of the objectives of poverty estimation is to increase the precision of the estimates and to explain the sources of its variability. Elbers, Lanjouw, and Lanjouw (2002) divided the difference between the actual value of the error term and the estimator of the error term and identified three sources: • The first type of error is the idiosyncratic error, defined as the difference between the actual welfare indicator of a locality and its expected value due to the realization of the unobserved component of the expenditure. Under standard assumptions, this measure will increase as one focuses on smaller target populations, limiting the level of disaggregation of the locality effects. One way to contain this error is to avoid modeling the impact for those localities with a small number of households. Since the unobserved expenditure component causes this error, another way to reduce the error is to increase the overall explanatory power of the model. Hence, one of the goals of the modeling stage is to obtain a large, adjusted R squared. We used all the variables plus the interaction terms that we identified as potential candidates for the consumption model in step 3 of Section 4.3.2. • The second type of error is the model error, and its variance in large samples is a function of the asymptotic covariance matrix of the first-stage coefficients. It does not increase or fall systematically as the size of the target population increases. To reduce this error, we remove all the non-significant independent variables in the model in the first stage, which is likely to increase the variance of the coefficients. With Census EB, the betas are fixed in all the simulations, and the impact of this error is smaller on the overall precision of the estimates. Here, we select models with all the independent variables strongly significative (low p-values) and small variances and covariances in the variance- covariance matrix of the fitted regression model’s coefficients (low variance inflation factor). • The third source of error is the computation error, which is correlated neither with the idiosyncratic nor model error. We reduce it to as small as computational resources allow. We use a Monte Carlo simulation with 500 repetitions, 400 samples for bootstrapping, and double precision for critical variables such as the dependent variable. This error is less relevant with the Census EB because it does not draw parameters such as ELL. Two additional considerations on the quality of the model were followed. The first was that the residual of the final HIES 2018–19 model must hold the normality assumptions necessary to model the perturbation variance. The second was the capacity to predict accurately the poverty rate, not only the welfare aggregate. We calculated the district poverty rate, aggregated it to the provincial and national levels, and compared the results with the 11 direct estimates from HIES 2018–19 to assess how close the model can predict poverty. This attention to poverty rate prediction was critical because poverty happens in the left tail of the welfare distribution. There were several options for modeling, making it difficult to evaluate which was the most suitable for the problem. For this reason, we used ten-fold cross-validation to inform the model selection. The only exception was Balochistan, where we used five-fold cross-validation because the sample size was too small to have feasible simulations. After the best model of each type was defined, the whole HIES 2018–19 was used to compare them. 5.2. Modeling Options 5.2.1. Choosing the level of the locality effects The first modeling decision was to choose the level of aggregation at which locality effects would be defined. The main aim of the Pakistan poverty map was to estimate poverty rates and inequality at the district level. Since the target dataset is not a census but a household survey, the inference level is the limiting restriction. PSLM 2019–20 has inference only up to the district level; lower-level disaggregation would not have been statistically valid. 5.2.2. Choosing the transformation for the welfare The second modeling decision was to choose the best transformation for the variable on welfare such that the model assumptions hold. The ELL and the Census EB methods assume that the household-specific idiosyncratic error (ℎ ) and the random location effects ( ∗ ) follow a normal distribution. The welfare variable in the case of Pakistan was the per-adult-equivalent monthly consumption expenditure, which is not normally distributed. The model evaluates three different transformations of this variable: (i) the natural logarithm that produces a left-skewed distribution; (ii) the weighted log-shift consisting of taking the natural logarithm of the dependent variable displaced k units to the left, and choosing k such that it reduces the skewness to zero; and (iii) the Box-Cox transformation that reduces the skewness to zero. Figure 3 shows the three alternatives. We selected the log-shift transformation even when the Box-Cox was closer to the normal distribution of the dependent variable because it was the log-shift transformation that led to a model with residuals (ℎ ) closest to a normal distribution. The choice of the log-shift transformation was confirmed when the model with the dependent variable obtained from the Box-Cox transformation had districts with the predicted welfare that were outside the distribution of the welfare in the HIES 2018–19 (some districts with very high mean predicted welfare), the level at which the PSLM 2019–20 was representative. 5.2.3 Choosing between a national model and four provincial models The third modeling decision was to select between estimating a national model or provincial models. The national model included interactions at the provincial and urban/rural levels. The provincial models included interactions at the urban/rural level. Even when some provincial models have a higher R squared and match the national poverty direct estimates better, the national model has enough high R squared and better poverty predictions for each province. 9 Table 3 compares the provincial and the national models. An F test to discriminate between the national and provincial models was not feasible because the best provincial models did not result from a linear restriction of the best national model. Hence, we used the criteria we detailed at the beginning of this section, and we chose four provincial models with urban/rural interactions. It is important to 9 The Government of Pakistan publishes poverty numbers through the Economic Survey on the national level. 12 note that the selection was based on predictive power, and the national model performed better or equal in all provinces compared to the provincial model. Figure 3. Four specifications for the dependent variable Source: World Bank staff estimations based on the HIES 2018–19. 5.2.4 Choosing how to correct the number of household members in the PSLM 2018–19 The number of household members is an essential variable for three reasons: (i) it affects the expansion factor to the population, which is the product of the household sampling weights and the household size; (ii) it affects the poverty rates because, if the number of household members is underestimated, the per-adult-equivalent consumption would be overestimated, and the poverty rate would be downward biased; and (iii) it is one of the best predictors of welfare remaining on the right-hand side. As described in Section 4.3.2., the mean number of household members in the PSLM 2019–20 was significantly smaller than reported in the HIES 2018–19 and the Census 2017. Simultaneously, the proportions of households with fewer members based on the PSLM did not match the trend from the latest HIES/PSLM surveys (Figure 1). We explored two options to correct this data discrepancy. Option 1 - Recalculate the population weights using the cross-entropy weight calibration method implemented in the cross-entropy method at the provincial level, using as restrictions the mean number of affiliates from the 2017 Census adjusted for its trend and the mean and variance of welfare prediction using the 13 HIES model. 10 The results were unsatisfactory because the resulting weights made the district populations differ significantly from the 2017 Census figures. Alternatively, we operate as constraints the proportion of people in household sizes 1, 2, 3, 4, 5, 6, 7, and 8 plus from the projection from PSLM 2004–05 to PSLM 2014– 15 and the mean and variance of the welfare prediction using the HIES model. These results were also unsatisfactory because the populations of the expanded district differed mainly from the 2017 Census figures. Option 2 - Rescale the number of household members for each household in PSLM. After this first rescaling, the PSLM districts’ average number of household members matched the 2017 Census districts’ average number of household members. In the second stage, the number of household members was rescaled such that the total weighted population at the national level reached the World Bank’s population projection for 2020. We chose this approach because the mean number of household members is closer to the census data and matches the projected 2020 population. 5.2.5 Final model selection For each province, we used the ten-fold cross-validation and the procedure described in this section to inform the modeling decisions described in the previous section. Then, we repeated the same procedure for the whole province sample and obtained the final model. We used the weighted log-shift transformation of the monthly per-adult-equivalent consumption as the dependent variable with effects at the district level. We discarded cases where the transformed dependent variable was outside the 3.5 standard deviation interval around the mean for modeling purposes, in all instances discarding less than 0.5 percent of the observations. We compared four different ways to select the variables for the model: (i) choose those whose PSLM 2019–20 mean was inside the 99 percent confidence interval of HIES 2018–19; (ii) the same as above, but using the 95 percent confidence interval; (iii) those whose PSLM mean and standard deviation were between 0.9 and 1.1 times the mean and standard deviation of HIES 2018– 19; and (iv) and the same as above, but using 0.8 and 1.2 times the mean and standard deviation. After comparing the different models, we chose the first option because its models had better predictive power. Then, we added all possible interactions of the urban/rural indicator on the right-hand side and obtained an initial model with more variables than optimal. To obtain the base model, we discard variables that do not explain the dependent variable's variability and are too correlated with each other. For that purpose, we used the lasso method, the stepwise regression, and the variance inflator factor (VIF) to obtain a model with all the covariates significant at the 0.0001 level and a VIF smaller than three. The lasso method has the advantage of ruling out variables with low explanatory power because lasso allows the less contributive variables to have a coefficient close to zero or equal to zero, helping to eliminate those variables with little explanatory power. 11 We selected the lambda using the Bayesian Information Criterion (BIC), which tends to choose models with fewer regressors. In cases in which lasso took too much time to find the optimal model, we used a backward stepwise regression to discard variables that were not significant using a p-value of 0.50 and then used the lasso method. After lasso, we use a forward stepwise regression using the Census EB with the H3 variance decomposition to add covariates that were significant sequentially. The VIF measures how much a variable contributes to the 10 wentropy ado: Corral, P. (2022) Should you impute that? Mimeo. 11 We used the lasso command in Stata 17, which includes cluster effects by locality. 14 standard error in the regression. Discarding those variables with higher VIF reduces multicollinearity and improves the prediction. We calculated the residual prediction using the SAE ado's alpha test (residual) option to obtain the Alpha Model. Then, we regressed the residuals against the variables that passed the lasso step but were not in the base model. Next, we eliminated variables using lasso, stepwise regression, and VIF. The remaining variables explain the variance of the residuals. Then, we did a final stepwise elimination of the covariates, including the variables that explained the variance of the residuals. Once we had the variables necessary to estimate the generalized least squares (GLS) model, we proceeded to the last step of sequential elimination of variables. In Figure 4, we show the residuals kernel density and the quartile plot for each of the four provinces. The normality of the residuals is essential because, if residuals were not normal, all the simulation processes based on the normality assumption would be invalid. Figure 4. Normality of the residuals of each province for the national model Source: World Bank staff estimations based on the HIES 2018–19. 5.3 Comparing with Direct Estimates and Model Selection One way to validate the accuracy of the methodology used is to aggregate the SAE results to geographic levels where the HIES 2018–19 is representative and compare these aggregated poverty rates to direct estimates from that household survey. Table 3 compares the SAE results with direct estimates from the national, national rural, national urban, and the four provinces for the national model. • Line 1 has the HIES 2018–19 poverty estimates for the whole sample (direct estimates). Line 2 has the poverty estimates without the six districts 12 sampled in HIES 2018–19 but not in the PSLM 2019–20. Since these districts are in Balochistan, the poverty figures for KP, Punjab, and Sindh in lines 1 and 2 remain unchanged. Line 2 is the benchmark to compare the predictions. 12 Chagai, Jhal Magsi, Musakhel, Panjgur, and Zhob. Lehri was merged to Sibi before PSLM 2019–20. 15 • Line 3 has the HIES-in-HIES prediction of the best model when we start with all the variables that passed the selection process. The adjusted R squared is all above 0.48 but for Balochistan and is larger when we do not drop the natural logarithm of the number of household members. • Line 4 shows the HIES-in-PSLM prediction of the model of line 3. As we illustrate in Figure 2, it underestimates the number of household members, leading to a downward bias of poverty rates by up to 10 percent (Balochistan). As highlighted in Section 4.3.2, the problem was caused by the measurement error in the number of household members in PSLM 2019–20, leading to higher per- adult-equivalent consumption and a lower poverty rate. • Line 5 shows the HIES-in-PSLM prediction for the model of line 3 after rescaling the number of household members with the Census 2017 mean number of household members. This is the model we choose. Table 3. Direct estimates and SAE estimates National Rural Urban KP Punjab Sindh Balochistan 1. HIES 2018-19 estimates 21.89 28.18 10.91 29.52 16.24 24.10 42.69 2. HIES 2018-19 estimates without five districts not in PSLM 21.76 28.02 10.87 29.52 16.24 24.10 42.37 (This is the benchmark to compare) SAE Prediction National Model 3. SAE prediction HIES in HIES 22.07 28.91 10.43 28.87 16.21 25.47 44.50 4. SAE prediction HIES in PSLM 22.14 28.81 11.18 29.43 16.38 25.12 37.67 5. SAE prediction HIES in PSLM rescaling using Census 2017 hh size 22.30 29.15 10.51 30.04 16.56 26.17 38.73 Source: World Bank staff estimations based on the Census 2017 and PSLM 2019–20. Table 4 is the final model. All four provincial GLS models have right-hand side variables significative at least at the 0.0009 level, and the Alpha model with variables significative at least at the 0.009 level. Table 5 compares the prediction of the provincial models with the national models of line 5 of Table 3, showing how the national model failed to match the provincial direct estimates for Balochistan, showing the confidence intervals for the direct estimates. Box 2: Areas of future research The main limitation of this poverty map is that the target dataset is not a census but another survey because the World Bank needs access to the unit-level records of the Census 2017. Using a survey as a target dataset introduces an additional source of randomness for the standard errors, and estimating a poverty map using the census will be the next step. Another area of future research is to include raster and satellite data from light, dust, roads, rivers, etc., to improve the precision of the model. 16 Table 4. National estimation ln(peaexpM-1264.774) Household head completed basic education -0.1080*** (0.0095) Household head is female 0.2457*** (0.0119) Household head completed tertiary education 0.1674*** (0.0223) Household head never attended -0.1922*** (0.0089) Household head occupation: manager or professional 0.1996*** (0.0149) Household head occupation: skilled worker 0.1782*** (0.0114) Household head occupation: technician or services 0.0757*** (0.0090) Household head has a pension 0.0964*** (0.0157) Indicator of air conditoner 0.2541*** (0.0170) Indicator of cooking range 0.1724*** (0.0301) Indicator of heater 0.1019*** (0.0136) Indicator of microwave 0.1842*** (0.0161) Number of car 0.2316*** (0.0160) Number of clock 0.0372*** (0.0052) Number of dryer 0.0871*** (0.0121) Number of richshaw/chingchi -0.1007*** (0.0228) Number of ups 0.1814*** (0.0129) Floor: cement/ cement tiles 0.0278*** (0.0079) Dwelling type: Part of the large unit 0.0540*** (0.0096) Roof: Wood/Bamboo -0.1393*** (0.0106) Walls: Burned bricks/block 0.0996*** (0.0120) KP indicator for Houosehold head never attended school 0.0759*** (0.0179) KP indictor for household head skilled -0.1020*** (0.0248) KP indicator for water heater in dwelling 0.1456*** (0.0296) KP interaction for number of clocks -0.0479*** (0.0098) KP indicator for metal roof -0.2043*** (0.0247) KP indicator for pit or latrine sanitation -0.2351*** (0.0512) Sindh indicator for intermediate education -0.1116*** (0.0169) Sindh indicator for skilled household head -0.0777*** (0.0210) Sindh indicator for washing machine 0.1190*** (0.0170) Sindh indicator for concrete roof 0.0807*** (0.0203) Sindh indicator for inside dwelling hand pump 0.1272*** (0.0220) Sindh indicator for no sewing drinage -0.0885*** (0.0182) Balochistan indicator of number of cars -0.2034*** (0.0465) Balochistan indicator of number of clocks -0.0555*** (0.0112) Balochistan interaction for time to reach drinking water -0.0032*** (0.0007) Rural indicator of no durables at home -0.1647*** (0.0231) Rural indicator of paying for drinkin water -0.0854*** (0.0169) Drinking water: Inside dwelling hand pump -0.1001*** (0.0127) Drinking water: Inside dwelling open well -0.1493*** (0.0502) Indicator of normaly paying for drinking water 0.1100*** (0.0116) Open defecation: fields / open places or other -0.1251*** (0.0120) Toilet: Composting toilet -0.0807*** (0.0178) Constant 8.1043*** (0.0180) Observations Obs. (beta_model) 24618.0000 R-sq. Beta Model 0.4706 R-sq. Alpha Model 0.0015 Sigma ETA sq. 0.0123 Ratio of sigma eta sq over MSE 0.0593 Variance of epsilon 0.1969 Standard errors in parentheses ="* p<0.10 ** p<0.05 *** p<0.01" Source: World Bank staff estimations based on HIES 2018–19 and PSLM 2019–20. 17 Table 5. Prediction performance of the national model vs. the provincial models Level of Predicted Actual poverty Absolute 95% CI of the actual Obs. Adj. R2 F-Value Poverty rate/a Model estimation poverty rate rate/a prediction error National National level model 0.4706 509.95 National/b 22.30 21.76 0.60 20.78 22.74 24,618 Province level model - - Provinces/c 22.37 21.76 0.61 20.78 22.74 KP National level model 0.4706 509.95 National/d 30.04 29.52 0.52 26.83 32.20 Province level model 0.3621 150.72 KP/e 29.03 29.52 0.50 26.83 32.20 4,485 Punjab National level model 0.4706 509.95 National/d 16.56 16.23 0.33 15.25 17.21 Province level model 0.4870 448.26 Punjab/e 16.21 16.23 0.02 15.25 17.21 11,781 Sin dh National level model 0.4706 509.95 National/d 26.17 24.10 2.07 22.40 25.78 Province level model 6,216 0.5481 377.83 Sindh/e 25.75 24.10 1.65 22.40 25.78 Balochistan National level model 0.4706 509.95 National/d 38.73 42.37 3.64 38.13 46.61 Province level model 0.3413 123.89 Balochistan/e 42.50 42.37 0.13 38.13 46.61 2,136 Notes: /a The actual poverty rate corresponds to the national poverty headcount estimated using the HIES survey of 2018-19 without the five districts not in PSLM. The confidence intervals were estimated considering the survey sample design (strata and primary sampling unit) of the HIES survey. /b The national poverty rate is calculated as the weighted average of the district poverty estimates obtained from the cross-entropy reweighted sample. /c Province models (one for each of the four provinces of Pakistan) are used to estimate poverty headcount for the districts belonging to the corresponding province. The national poverty rate is calculated as the weighted average of the poverty headcount of all but five districts not sampled in PSLM, using weights after rescaling with the Census 2017. /d The poverty rates at the province level are calculated as the weighted average of the poverty rates of the districts belonging to the corresponding province using the National model. /e Province-level models (one for each of the four provinces in Pakistan) are used to estimate the poverty headcount for the districts belonging to the corresponding province. The poverty headcount at the province level is calculated as the weighted average of the poverty rates of the districts belonging to the corresponding province. 18 6. Main Findings This section summarizes the main findings from the Pakistan poverty map, showing the poverty rates, the poverty gap, the poverty headcount, and the median per-adult-equivalent monthly consumption for each district. 6.1 Poverty Map for Pakistan The findings from the Pakistan poverty map show significant spatial disparities in poverty rates across districts (Figure 5). The map shows a high degree of inter-district inequality, with districts with very low poverty rates such as ICT (3.5 percent), as well as districts such as Tharparkar, with more than three-quarters of the population below the poverty line (76.9 percent). The poorest districts are in Balochistan and Sindh, while the districts with the lowest poverty rates are in the big cities such as Karachi, Lahore, ICT, and Abbottabad from Punjab, KP, and Sindh. Sindh and KP have both districts with high and low poverty rates, while in general, poverty rates in Punjab are below 40 percent. Figure 5. Pakistan Poverty Map 2019–20 Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. Figure 6 shows the relationship between poverty rates and poverty headcount or, in other words, the relationship between the share of poor people in a district and the total number of poor people in the district. The results suggest that, for the poorest province of Balochistan, none of its districts shows up among the list of the ten districts with the highest poverty headcount. The poorest districts are from Balochistan, Sindh, and KP (cluster 1), while the districts with the three largest poor headcounts are in Punjab (cluster 2), where each 19 of them comprises more than 1 million poor (Muzaffargarh, Rahim Yar Khan, and Dera Ghazi Khan). Tharparkar is an extreme case with the highest poverty rate and a large poverty headcount. Figure 6. The district poverty rate and the poverty headcount Cluster 1 Cluster 2 Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. An analysis of the poverty gap 13 (Figure 7) suggests that, for almost half of the districts in Pakistan, a relatively small increase in welfare among the poor could reduce the poverty rate substantially, given that there are 69 districts with a poverty gap (FGT1) of less than 5 percent (0.05), for a district with a poverty gap of 5 percent 13 The poverty gap is a measure of how poor the poor are. The poverty gap belongs to the family of Foster-Greer- 1 − Thorbecke indices (FGT), with parameter equal to one: = ∑ =1 � � . 20 and an urban poverty line of PKR 3,768, an adult-equivalent member of the poor would need to increase income, on average, by only PKR 188 to escape from poverty. 115 out of 126 districts have a poverty gap of less than 10 percent. Figure 7. Pakistan Poverty Map 2019–20 - poverty gap Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. Figure 8 shows the relationship between the poverty headcount and the per-adult-equivalent monthly consumption of the poor averaged by district population. When surveying PSLM 2019–20, the urban poverty line was PKR 3,768.46 per month, while the rural poverty line was PKR 3,741.23 per month. The y-axis measures poverty intensity because it indicates how far, on average, the poor were from consuming enough to escape from poverty. The lower the value on the y-axis, the more intense will be poverty in the district. In Tharparkar, the average monthly per-adult-equivalent consumption is PKR 927 below the poverty line. Another way to interpret Figure 8 is that if we multiply the difference between the poverty line and the value on the y- axis by the value on the x-axis, we obtain the total monthly amount in Pakistani rupees necessary to transfer to the poor to escape from poverty. Figure 8 shows that, in general, poverty is more intense in Balochistan and less severe in Punjab, with Sindh and KP districts showing a wide variation of poverty intensity. 21 Figure 8. Poverty headcount per district and mean monthly per-adult-equivalent consumption of the poor per district (averaged by district population) Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. 22 Figure 9. Where the Poor live Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. Note: The size of the bubble is proportional to the number of poor in the respective district Figure 9 shows that the districts with the most significant numbers of poor people are concentrated in Punjab and Sindh. Even though Balochistan has higher poverty rates, its lower population density causes the districts with the most significant poverty headcounts to be elsewhere. The population density, poverty rate, and intensity of poverty need to be included in the design of policies for poverty alleviation. A question for decision-makers is whether there are geographical patterns in poverty rates across districts. We calculated the spatial autocorrelation using the inverse distance and the Euclidian metric. The Moran Index (value of 0.40 with a p-value of zero) suggests that there are groups of poor or non-poor districts that do not follow a random pattern. The analysis identifies two clusters of districts, one of low poverty and one of high poverty. Interestingly, these clusters cross provincial boundaries, suggesting geographical effects beyond the political divisions. Figure 10 shows the low-poverty cluster in blue, which overlaps roughly with the rich Punjab plains in the provinces of Punjab and part of KP. In this group of low-poverty districts, the Buner district is an outlier of high poverty (in orange on the map). One high-poverty cluster is in red, in the south of KP and the central part of Balochistan. These high-poverty areas are roughly located in arid mountainous regions, each with a local outlier of lower poverty rates (in light blue). In the north of Sindh is a low-poverty district embedded into a high-poverty cluster. In contrast, central KP has a high-poverty district inside a low-poverty group of districts in Punjab and KP. 23 Figure 10. Pakistan Poverty Map 2019–20 - spatial autocorrelation Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. When we compare the poverty map 2019–20 with the poverty maps of previous years, both the rank correlation and the simple correlation between the poverty rates were in the range of the interannual correlations between poverty rates from 2004–05 to 2014–15. In Figure 11, most districts are below the 45-degree line, consistent with an overall decline in poverty over these five years.14 Punjab had the lowest poverty rates but also showed the most significant reductions in poverty, with some districts going from poverty rates of around 20 to 30 percent to below 10 percent. The highest poverty rates are in Balochistan, with some districts with high reductions and others with high increases in poverty rates (above the 45-degree line). However, part of this volatility could result from the lower precision of the poverty estimation in this province. Finally, Sindh was between the low rates of Punjab and the high rates of Balochistan, but with a general decrease in poverty rates between 2014–15 and 2019–20, with Tharparkar and Badin as essential exceptions. 14 Figure 11 does not show the ex-FATA areas because in PLSM 2014–15 were not part of KP; the 2019–20 poverty map is the first to include them after its incorporation into KP. 24 Figure 11. Comparing 2014–15 poverty per district with 2019–20 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, HIES 2013–14, and PSLM 2014–15. A complimentary measure of household well-being is the Multidimensional Poverty Index (MPI), composed of sub-indicators on health, education, and living standards (Alkire and Suppa 2020). If a person is deprived in one-third or more of the 10 (weighted) indicators, then she is classified as MPI poor. The incidence indicator measures the percentage of MPI poor in each district. Figure 12 compares the district incidence MPI indicator for 2014–15 (the latest available year) and the district monetary poverty estimated using SAE. The correlation between the two indicators was 0.74, and the Spearman ranking correlation was 0.79. 25 Figure 12. MPI incidence indicator 2014–15 and PSLM 2019–20 poverty rate Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and 2014–15. 6.2 Provincial Results 6.2.1 Khyber Pakhtunkhwa The KP poverty map (Figure 13a) includes the former Federally Administered Tribal Areas (FATA) and the former Frontier Regions (FR) for the first time, with the former FATA now being the districts of South Waziristan, North Waziristan, Kurram, Orakzai, Khyber, Mohmand, and Bajaur. The former FRs were merged into their neighboring districts. South Waziristan, Mohmand, and Tank districts are much poorer than the rest of the province. In addition, Figure 13c shows the confidence intervals for the poverty rate, with most of the ex-FATA regions with the highest poverty rates, with only North Waziristan being the exception. 26 Figure 13a. KP poverty rate estimates Figure 13b. KP poverty gap Figure 13c. KP poverty confidence intervals Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. 6.2.2 Punjab Punjab is the province with the lowest poverty rates in Pakistan and the fewest differences in poverty rates between its districts (Figure 14a). However, there are four adjoining districts—Rajanpur, Dera Ghazi Khan, Rahim Yar Khan, and Muzaffargarh in the southwest of Punjab—where poverty rates are, on average, more than 20 percentage points higher than the average poverty rate of the province. Only these three districts have a poverty gap above 5 percent, making Punjab the province with the lowest poverty gap (Figure 14b). Figure 14c shows the confidence intervals for the Punjab poverty rate. 27 Figure 14a. Punjab poverty rate estimates Figure 14b. Punjab poverty gap Figure 14c. Punjab poverty confidence intervals Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. 6.2.3 Sindh In Sindh, the south of the province is the poorest (Figure 15a) and the area with the largest poverty gap (Figure 15b), especially for Tharkapar, Sujawal, and Badin. The district of Tharkapar is a particular case, with a poverty rate of almost 76.9 percent and a relatively high poverty gap of 19 percent. Sindh has much more variability in its poverty gap, with only the Tharparkar and Badin with a poverty gap above 10 percent. Figure 15c shows the results by district with 95 percent confidence intervals. 28 Figure 15a. Sindh poverty rate estimates Figure 15b. Sindh poverty gap Figure 15c. Sindh poverty confidence intervals Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. 6.2.4 Balochistan Figure 16a shows the poverty map for Balochistan, and Figure 16b shows the corresponding poverty gap map. Five districts have not been sampled in the PSLM 2019–20 such that poverty rates cannot be estimated: Chagai, Panjgur, Jhal Magsi, Zhob, and Musakhel (Lehri, the sixth, was merged into Sibi). Balochistan has the highest variability in both poverty rates and poverty gaps, with Khuzdar, Ziarat, and Shaheed Sikandarabad (Surab) having a combination of high poverty rates and high poverty gaps. Figure 16c shows the results by district with the 95 percent confidence intervals. Awaran has much lower precision estimates because there were no households from this district in the HIES 2018–19 sample that could be used to increase the precision of the distribution parameters used to generate the simulations. In general, sample sizes in Balochistan are smaller than those for other provinces, resulting in larger confidence intervals. 29 Figure 16a. Balochistan poverty rate estimates Figure 16b. Balochistan poverty gap Figure 16c. Balochistan poverty confidence intervals Source: World Bank staff estimations based on HIES 2018-19 and PSLM 2019–20. 7. Conclusions This paper documents the approach to preparing the Pakistan poverty map 2019–20. Using the SAE methodology, findings from the poverty map yield poverty rates, the poverty headcount, the poverty intensity, and the severity of poverty for 126 districts in Pakistan. The poverty map is based on data from two household surveys, namely the HIES 2018–19 and the PSLM 2019–20. In contrast to standard applications, this poverty map does not arrive at poverty estimates by predicting from the household survey (HIES 2018–19) into a population census but instead uses another household survey (PSLM 2019–20) as a target survey. For the first time, the Pakistan poverty map includes the Federally Administered Tribal Areas (FATA), which became part of KP, and the Frontier Regions (FR), which were merged with their adjacent districts. Finally, compared with previous applications, the Pakistan Poverty Map 2019–20 has a higher precision with smaller confidence intervals, partly because of the new Census EB methodology and partly because of the increased sample size of the PSLM 2019–20. The main findings can be summarized as follows: 30 • Beyond the national poverty rate of 21.9 percent, poverty rates differ substantially across districts, ranging from 3.5 percent in ICT to 76.9 percent in Tharparkar. • The districts with the highest poverty rates are in Sindh and Balochistan provinces; in contrast, districts in Punjab province report, on average, much lower poverty rates. Sindh and KP have districts with high poverty rates and districts with low poverty rates. • The poverty map shows two clusters of poverty in the country: one cluster of high poverty in the south of KP and north of Balochistan and one cluster of low poverty in the Punjab plains. • While Balochistan has the poorest districts, the province is also sparsely populated; as such, the districts with the ten highest poverty headcounts are in Punjab, Sindh, and KP; eight districts have more than 1 million poor. • Balochistan is the province with the highest poverty gap, i.e., on average, the poor are farther from reaching a consumption enough to be non-poor. An extreme case is Tharparkar in Sindh, where the average monthly per-adult-equivalent consumption is PKR 927 below the national poverty line. • Compared with the Pakistan 2014–15 poverty map, most Punjab, Sindh, and KP districts report declining poverty rates. However, poverty rates for some districts in Sindh and Balochistan increased. 31 References Alkire, S., Kanagaratnam, U., and Suppa, N. 2020. The global Multidimensional Poverty Index (MPI) 2020, OPHI MPI Methodological Note 49. Oxford Poverty and Human Development Institute Initiative, University of Oxford. Battese, G. E., Harter, R. M., & Fuller, W. A. 1988. “An error-components model for predicting county crop areas using survey and satellite data.” Journal of the American Statistical Association 83 (401) 28–36. Bedi, T., Coudouel, A., Simler, K. 2007. More than a Pretty Picture. Using Poverty Maps to Design Better Policies and Interventions. Washington, DC: World Bank. Correa, L., Molina, I. and Rao, J.N.K. 2012. “Comparison of methods for estimation of poverty indicators in small areas.” Unpublished report. Corral, P., Molina, I., & Nguyen, M. C. 2021. “Pull your small area estimates up by the bootstraps.” Journal of Statistical Computation and Simulation 91(16) 3304-3357. Corral, P., Himelein, K., McGee, K., and Molina, I. 2021. “A Map of the Poor or a Poor Map?” Mathematics 9 (21), 2780 Elbers, C., Lanjouw, J. O, & Lanjouw, P. 2002. “Micro-level estimation of welfare.” World Bank Policy Research Working Paper (2911). Fay III, R. E., & Herriot, R. A. 1979. “Estimates of income for small places: An application of James-Stein procedures to census data.” Journal of the American Statistical Association 74 (366a) 269–277. Foster, J., Greer, J., & Thorbecke, E. 1984. A class of decomposable poverty measures. Econometrica: Journal of the Econometric Society 761–766. González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and Santamaría, L. 2008. “Bootstrap mean squared error of a small area .” Journal of Statistical Computation and Simulation. 78(5), 443–462. Marhuenda, Y., Molina, I., Morales, D., & Rao, J. 2017. “Poverty mapping in small areas under a twofold nested error regression model.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 180 (4) 1111– 1136. Molina, I., & Rao, J. 2010. “Small area estimation of poverty indicators.” Canadian Journal of Statistics 38 (3) 369– 385. Molina, I., Corral, P., & Nguyen, M. 2022. “Estimation of poverty and inequality in small areas: review and discussion.” TEST, 1–24. Nguyen, M. C., Corral, P., Azevedo, J. P., & Zhao, Q. 2018. “SAE: A Stata package for unit-level small area estimation.” World Bank Policy Research Working Paper 8630. Seitz, William Hutchins. 2019. “Where They Live: District-level Measures of Poverty, Average Consumption, and the Middle Class in Central Asia.” World Bank Policy Research Working Paper 8940. Van der Weide, R. 2014. “GLS estimation and empirical Bayes prediction for linear mixed models with heteroskedasticity and sampling weights: A background study for the povmap project.” World Bank Policy Research Working Paper 7028. Zhao, Q. 2006. User manual for povmap. World Bank. 32 Annexes Annex 1: Tables of Results and Variables A1. KP results Mean Per Adult- Mean Monthly Per Poverty Rate Poverty Gap Population Poverty Equivalent Province District Poverty Rate Poverty Gap Adult-Equivalent MSE MSE PSLM2019-20 Headcount Consumption of the Consumption poor KP Bannu 0.3911 0.001535 0.0692 0.000098 1,287,145 503,364 4,700 3,080 KP Lakki Marwat 0.4506 0.002087 0.0833 0.000129 959,509 432,389 4,365 3,052 KP North Waziristan 0.3480 0.001976 0.0593 0.000113 574,922 200,047 4,691 3,106 KP Tank 0.4137 0.002779 0.0788 0.000181 454,202 187,908 4,567 3,031 KP South Waziristan 0.5587 0.002242 0.1163 0.000158 718,155 401,245 3,950 2,963 KP Dera Ismail Khan 0.4012 0.001063 0.0717 0.000075 1,728,436 693,501 4,777 3,077 KP Batagram 0.1864 0.002055 0.0262 0.000093 507,068 94,528 5,630 3,214 KP Haripur 0.0854 0.000714 0.0104 0.000027 1,065,206 90,972 6,985 3,287 KP Torghar 0.2999 0.002582 0.0468 0.000128 182,246 54,657 4,780 3,158 KP Abbottabad 0.0697 0.000616 0.0081 0.000021 1,417,867 98,884 7,101 3,311 KP Mansehra 0.0884 0.000901 0.0107 0.000036 1,654,680 146,307 6,672 3,290 KP Kohistan 0.2192 0.001392 0.0305 0.000084 834,615 182,944 7,004 3,220 KP Kohat 0.2870 0.001248 0.0454 0.000061 1,181,937 339,171 5,085 3,155 KP Orakzai 0.4693 0.003383 0.0852 0.000175 270,475 126,937 4,136 3,062 KP Kurram 0.3201 0.001498 0.0573 0.000088 654,507 209,509 5,133 3,074 KP Karak 0.3585 0.002113 0.0630 0.000113 750,219 268,984 4,794 3,083 KP Hangu 0.3133 0.002804 0.0503 0.000135 551,805 172,885 4,805 3,146 KP Malakand Protected Area 0.1901 0.001491 0.0277 0.000065 763,455 145,155 5,834 3,199 KP Chitral 0.2721 0.001978 0.0410 0.000095 476,092 129,546 5,146 3,180 KP Bajaur 0.4020 0.001627 0.0684 0.000111 1,160,368 466,420 4,433 3,105 KP Buner 0.4989 0.001681 0.0963 0.000088 952,407 475,141 4,226 3,019 KP Shangla 0.2696 0.002319 0.0411 0.000110 807,916 217,814 5,025 3,170 KP Swat 0.2853 0.000861 0.0464 0.000041 2,455,441 700,455 5,121 3,139 KP Lower Dir 0.2810 0.000723 0.0448 0.000045 1,527,410 429,156 5,126 3,146 KP Upper Dir 0.3155 0.001167 0.0511 0.000065 1,007,651 317,891 5,099 3,136 KP Swabi 0.2667 0.000629 0.0415 0.000032 1,728,849 461,026 5,242 3,164 KP Mardan 0.3179 0.000672 0.0520 0.000037 2,524,335 802,511 4,984 3,133 KP Khyber 0.5198 0.001384 0.1032 0.000109 1,046,839 544,101 4,812 3,006 KP Peshawar 0.2587 0.000296 0.0426 0.000018 4,607,450 1,192,087 5,993 3,135 KP Mohmand 0.6564 0.003082 0.1480 0.000220 504,511 331,136 3,681 2,898 KP Nowshera 0.2104 0.000757 0.0303 0.000040 1,617,723 340,323 5,528 3,207 KP Charsadda 0.3287 0.000785 0.0543 0.000049 1,713,409 563,153 4,994 3,126 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 A2. Punjab results Mean Monthly Per Mean Per Adult- Poverty Rate Poverty Gap Population Poverty Province District Poverty Rate Poverty Gap Adult-Equivalent Equivalent Consumption MSE MSE PSLM2019-20 Headcount Consumption of the poor Punjab Bahawalpur 0.2215 0.000347 0.0328 0.000020 3,902,518 864,257 5,616 3,195 Punjab Bahawalnagar 0.2414 0.000396 0.0373 0.000025 3,164,893 764,066 5,419 3,169 Punjab Rahim Yar Khan 0.3132 0.000440 0.0511 0.000029 5,113,512 1,601,774 5,178 3,136 Punjab Dera Ghazi Khan 0.3925 0.000586 0.0724 0.000042 3,128,179 1,227,745 4,663 3,056 Punjab Muzaffargarh 0.3477 0.000457 0.0589 0.000034 4,603,823 1,600,732 5,177 3,112 Punjab Rajanpur 0.4587 0.000887 0.0889 0.000075 2,122,977 973,799 4,287 3,021 Punjab Layyah 0.2060 0.000869 0.0309 0.000046 1,939,992 399,623 5,639 3,183 Punjab Faisalabad 0.1212 0.000240 0.0158 0.000013 8,383,728 1,016,395 6,604 3,266 Punjab Toba Tek Singh 0.1683 0.000472 0.0237 0.000022 2,330,863 392,387 5,918 3,219 Punjab Jhang 0.1651 0.000486 0.0229 0.000028 2,917,051 481,716 5,943 3,227 Punjab Chiniot 0.2253 0.000843 0.0340 0.000044 1,455,699 327,995 5,344 3,183 Punjab Islamabad 0.0346 0.000203 0.0038 0.000006 2,130,772 73,634 11,487 3,340 Punjab Hafizabad 0.2033 0.000753 0.0307 0.000032 1,230,531 250,170 5,771 3,186 Punjab Sialkot 0.1817 0.000314 0.0252 0.000013 4,142,637 752,648 5,823 3,229 Punjab Narowal 0.2566 0.000429 0.0384 0.000017 1,816,168 466,036 5,106 3,184 Punjab Mandi Bahauddin 0.1674 0.000464 0.0246 0.000021 1,695,412 283,784 6,279 3,198 Punjab Gujrat 0.1348 0.000276 0.0181 0.000010 2,931,575 395,220 6,454 3,245 Punjab Gujranwala 0.1733 0.000231 0.0242 0.000011 5,329,745 923,768 6,325 3,232 Punjab Sheikhupura 0.1035 0.000432 0.0129 0.000021 3,680,043 380,826 6,749 3,285 Punjab Nankana Sahib 0.0839 0.000652 0.0101 0.000028 1,441,156 120,970 6,853 3,297 Punjab Lahore 0.0382 0.000143 0.0042 0.000006 11,827,162 451,289 9,705 3,353 Punjab Kasur 0.1020 0.000405 0.0125 0.000020 3,674,594 374,867 6,526 3,288 Punjab Multan 0.2125 0.000340 0.0319 0.000021 5,047,998 1,072,923 35,473 3,190 Punjab Lodhran 0.2655 0.000849 0.0407 0.000041 1,807,785 480,045 5,067 3,170 Punjab Vehari 0.2064 0.000556 0.0298 0.000029 3,086,639 636,960 5,526 3,205 Punjab Khanewal 0.2093 0.000489 0.0313 0.000029 3,105,945 650,228 5,488 3,187 Punjab Rawalpindi 0.0673 0.000164 0.0079 0.000006 5,745,945 386,657 7,997 3,313 Punjab Attock 0.1315 0.000507 0.0169 0.000020 2,006,342 263,933 6,088 3,267 Punjab Jhelum 0.1071 0.000458 0.0136 0.000016 1,300,142 139,224 6,858 3,273 Punjab Chakwal 0.0656 0.000346 0.0076 0.000012 1,590,567 104,403 7,336 3,311 Punjab Sahiwal 0.0922 0.000433 0.0113 0.000020 2,672,826 246,545 7,112 3,287 Punjab Okara 0.0912 0.000353 0.0113 0.000015 3,234,207 294,956 7,014 3,285 Punjab Pakpattan 0.1078 0.000542 0.0136 0.000026 1,940,240 209,208 6,511 3,275 Punjab Bhakkar 0.2306 0.000729 0.0347 0.000040 1,752,647 404,203 5,276 3,184 Punjab Khushab 0.1168 0.000715 0.0153 0.000036 1,361,797 159,092 6,593 3,259 Punjab Mianwali 0.1120 0.000658 0.0150 0.000036 1,640,703 183,677 6,666 3,247 Punjab Sargodha 0.0977 0.000332 0.0124 0.000020 3,931,273 384,065 7,117 3,273 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 33 A3. Sindh results Mean Per Adult- Mean Monthly Per Poverty Rate Poverty Gap Population Poverty Equivalent Province District Poverty Rate Poverty Gap Adult-Equivalent MSE MSE PSLM2019-20 Headcount Consumption of the Consumption poor Sindh Sujawal 0.4922 0.001277 0.0960 0.000115 828,607 407,825 4,181 3,014 Sindh Badin 0.5281 0.000801 0.1100 0.000085 1,919,744 1,013,740 4,121 2,967 Sindh Jamshoro 0.2911 0.000909 0.0560 0.000069 1,057,116 307,719 5,507 3,033 Sindh Tando Allahyar 0.3912 0.001182 0.0725 0.000076 891,853 348,867 4,657 3,055 Sindh Thatta 0.4147 0.001159 0.0755 0.000101 1,044,597 433,164 4,456 3,065 Sindh Dadu 0.3160 0.000910 0.0523 0.000054 1,648,987 521,140 4,922 3,129 Sindh Matiari 0.3711 0.001147 0.0682 0.000072 819,011 303,969 4,863 3,060 Sindh Hyderabad 0.1433 0.000292 0.0211 0.000013 2,339,833 335,381 6,702 3,211 Sindh Tando Muhammad Khan 0.4757 0.001186 0.0921 0.000106 720,158 342,614 4,316 3,023 Sindh Karachi Central 0.0598 0.000186 0.0069 0.000006 3,160,347 189,079 8,388 3,334 Sindh Malir 0.1892 0.000469 0.0271 0.000019 2,046,725 387,287 5,707 3,220 Sindh Karachi West 0.1493 0.000321 0.0199 0.000012 4,155,535 620,616 5,892 3,266 Sindh Karachi East 0.0610 0.000157 0.0075 0.000005 3,058,171 186,604 9,548 3,303 Sindh Korangi 0.0629 0.000301 0.0071 0.000009 2,741,476 172,539 7,344 3,346 Sindh Karachi South 0.0842 0.000310 0.0099 0.000009 1,881,744 158,406 7,478 3,323 Sindh Shikarpur 0.2968 0.000865 0.0495 0.000053 1,312,221 389,402 5,305 3,123 Sindh Kashmor 0.2485 0.001088 0.0380 0.000065 1,159,676 288,223 5,545 3,173 Sindh Larkana 0.2405 0.000804 0.0367 0.000043 1,618,564 389,205 5,293 3,181 Sindh Kambar Shahdad Kot 0.2214 0.000732 0.0330 0.000044 1,423,127 315,092 5,449 3,190 Sindh Jacobabad 0.3122 0.001202 0.0514 0.000072 1,071,050 334,341 5,086 3,134 Sindh Umer Kot 0.4345 0.001306 0.0816 0.000128 1,141,736 496,119 4,636 3,045 Sindh Mirpur Khas 0.3969 0.000827 0.0736 0.000072 1,600,115 635,142 4,877 3,053 Sindh Tharparkar 0.7685 0.001245 0.1903 0.000134 1,751,779 1,346,296 3,255 2,817 Sindh Naushahro Feroze 0.2213 0.000789 0.0330 0.000045 1,714,565 379,403 5,450 3,188 Sindh Sanghar 0.3647 0.000708 0.0655 0.000051 2,180,235 795,201 4,857 3,078 Sindh Shaheed Benazirabad 0.4140 0.000784 0.0753 0.000054 1,716,117 710,538 4,508 3,068 Sindh Ghotki 0.2771 0.000685 0.0448 0.000047 1,753,558 485,968 5,348 3,142 Sindh Sukkur 0.2240 0.000651 0.0360 0.000038 1,583,025 354,673 6,069 3,151 Sindh Khairpur 0.2624 0.000518 0.0411 0.000037 2,558,148 671,174 5,413 3,163 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 A4. Results in Balochistan Mean Per Adult- Mean Monthly Per Poverty Rate Poverty Gap Population Poverty Equivalent Province District Poverty Rate Poverty Gap Adult-Equivalent MSE MSE PSLM2019-20 Headcount Consumption of the Consumption poor Balochistan Washuk 0.3487 0.002524 0.0590 0.000188 186,886 65,177 4,742 3,113 Balochistan Kalat 0.5953 0.002526 0.1241 0.000175 224,632 133,721 3,747 2,966 Balochistan Awaran 0.5127 0.009600 0.0966 0.000771 129,568 66,425 4,004 3,037 Balochistan Lasbela 0.3968 0.001780 0.0673 0.000103 612,919 243,199 4,381 3,118 Balochistan Mastung 0.4198 0.001723 0.0767 0.000108 282,572 118,630 4,495 3,058 Balochistan Khuzdar 0.6940 0.001048 0.1521 0.000115 849,702 589,720 3,912 2,928 Balochistan Kharan 0.4560 0.003565 0.0843 0.000226 173,117 78,941 4,275 3,050 Balochistan Surab 0.5989 0.003208 0.1201 0.000250 213,631 127,939 3,717 2,991 Balochistan Gwadar 0.2501 0.002315 0.0394 0.000111 278,931 69,753 5,282 3,164 Balochistan Kech 0.2307 0.001832 0.0341 0.000109 964,874 222,560 6,502 3,199 Balochistan Kachhi 0.4891 0.004061 0.0936 0.000287 329,642 161,240 4,158 3,025 Balochistan Jaffarabad 0.3326 0.002088 0.0529 0.000125 546,658 181,804 4,656 3,153 Balochistan Sohbatpur 0.2490 0.004110 0.0365 0.000219 213,172 53,086 5,065 3,193 Balochistan Nasirabad 0.4884 0.001981 0.0936 0.000183 518,872 253,408 4,134 3,029 Balochistan Pishin 0.3191 0.001153 0.0532 0.000059 783,766 250,068 4,864 3,122 Balochistan Quetta 0.2325 0.000572 0.0368 0.000036 2,413,800 561,136 15,124 3,161 Balochistan Killa Abdullah 0.5179 0.002281 0.0983 0.000180 806,582 417,761 3,983 3,036 Balochistan Nushki 0.3112 0.003857 0.0494 0.000204 190,327 59,235 4,869 3,154 Balochistan Ziarat 0.6456 0.003160 0.1400 0.000252 170,276 109,924 3,602 2,930 Balochistan Sibi 0.4507 0.002047 0.0885 0.000149 191,182 86,167 4,634 3,012 Balochistan Dera Bugti 0.4769 0.003464 0.0857 0.000281 333,022 158,804 4,109 3,076 Balochistan Harnai 0.4742 0.004015 0.0876 0.000263 103,224 48,944 4,145 3,059 Balochistan Kohlu 0.2269 0.002327 0.0341 0.000085 227,538 51,620 5,464 3,178 Balochistan Loralai 0.3259 0.002734 0.0532 0.000147 259,992 84,734 4,772 3,137 Balochistan Killa Saifullah 0.4349 0.002437 0.0774 0.000204 364,741 158,616 4,274 3,080 Balochistan Duki 0.3619 0.004617 0.0600 0.000269 162,706 58,880 4,497 3,121 Balochistan Sherani 0.5471 0.005610 0.1052 0.000585 162,679 89,008 3,869 3,022 Balochistan Barkhan 0.5463 0.003920 0.1065 0.000264 181,901 99,376 3,924 3,012 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 34 A5. KP: variable comparison of HIES 2018–19 and PSLM 2019–20 Order Variable Label Variable Name HIES mean HIES sd PSLM mean PSLM sd GLS model Alpha model 1 Language is Urdu L_urdu 0.125 0.331 0.128 0.334 0 0 2 Language is Punjabi L_punjabi 0.000 0.012 0.000 0.019 0 0 3 Language is Sindhi L_sindhi 0.000 0.000 0.001 0.034 0 0 4 Language is Hindko L_hindko 0.069 0.253 0.081 0.273 0 0 5 Had to skip a meal because there was not enough money or other resources to get T_4 0.063 0.242 0.058 0.234 0 0 6 Ate less than you thought you should because of a lack of money or other resourc T_5 0.157 0.364 0.122 0.327 0 0 7 Occupation: Subsidized rent H_occup_rentsubs 0.015 0.123 0.010 0.101 0 0 8 Dwelling type: Independent house/compound H_house_independ 0.800 0.400 0.834 0.372 0 0 9 Dwelling type: Part of the large unit H_house_poflunit 0.179 0.383 0.149 0.356 1 0 10 Floor: earth/sand H_floor_earth 0.556 0.497 0.572 0.495 0 0 11 Floor: cement/ cement tiles H_floor_cement 0.367 0.482 0.326 0.469 1 0 12 Roof: RCC/RBC H_roof_rccrbc 0.363 0.481 0.351 0.477 0 1 13 Roof: Wood/Bamboo H_roof_wood 0.369 0.483 0.353 0.478 1 0 14 Roof: Metal/Tin/Girders/T-Iron H_roof_grader 0.170 0.375 0.208 0.406 0 0 15 Roof: Other H_roof_other 0.002 0.050 0.002 0.048 0 0 16 Walls: Burned bricks/block H_walls_burntbricks 0.629 0.483 0.628 0.483 1 0 17 Walls: Raw bricks/mud H_walls_mudbricks 0.203 0.402 0.197 0.398 0 0 18 Walls: Stone H_walls_stones 0.162 0.369 0.164 0.370 0 0 19 Lighting: Electricity H_lighting_elect 0.870 0.336 0.861 0.346 0 0 20 Lighting: Solar H_lighting_solar 0.113 0.317 0.115 0.318 0 0 21 Drinking water: Inside dwelling hand pump W_dkw_inshandpump 0.110 0.313 0.120 0.325 1 0 22 Drinking water: Inside dwelling closed well W_dkw_insclosedwell 0.044 0.205 0.033 0.179 0 0 23 Drinking water: Inside dwelling open well W_dkw_insopenwell 0.025 0.156 0.029 0.169 1 0 24 Drinking water: Outside dwelling hand pump W_dkw_outhandpump 0.013 0.115 0.012 0.107 0 0 25 Drinking water: Outside dwelling closed well W_dkw_outclosedwell 0.016 0.127 0.012 0.110 0 0 26 Drinking water: Outside dwelling un protected spring W_dkw_outunprsprng 0.059 0.236 0.062 0.240 0 0 27 Drinking water: pond/canal / river W_dkw_pond 0.037 0.190 0.033 0.180 0 0 28 Drinking water: bottled water W_dkw_bottwater 0.000 0.014 0.000 0.022 0 0 29 Drinking water: tanker /truck/water bearer W_dkw_tanker 0.012 0.110 0.012 0.108 0 0 30 Drinking water: hand pump W_dkw_handpump 0.124 0.329 0.132 0.338 0 0 31 Drinking water: closed well W_dkw_closedwell 0.060 0.238 0.045 0.208 0 0 32 Drinking water: un protected spring W_dkw_unprsprng 0.103 0.304 0.086 0.281 0 0 33 Drinking water: no water delivery system W_dkw_nodelivery 0.112 0.316 0.115 0.319 0 0 34 Drinking water: safe to drink W_dkw_safe 0.798 0.402 0.813 0.390 0 0 35 Who installed the water delivery system: ngo, private W_whoinspriva 0.037 0.188 0.052 0.222 0 0 36 How much time is consumed (in minutes) on a round trip to fetch the drinking wat W_time 3.936 9.986 3.868 9.273 0 0 37 Doing anything to the water to make it safer: Don't know W_saferdk 0.015 0.121 0.009 0.093 0 0 38 Indicator of normaly paying for drinking water W_drinkingpay 0.147 0.354 0.141 0.348 1 0 39 Toilet: No Toilet W_toilet_notoilet 0.079 0.270 0.109 0.312 0 0 40 Toilet: Composting toilet W_toilet_pitlat 0.035 0.184 0.028 0.165 1 0 41 Open defecation: fields / open places or other W_opendefecation 0.077 0.267 0.108 0.311 1 0 42 Drainage: no, no system W_drainage_nosystem 0.492 0.500 0.465 0.499 0 0 43 Cooking water: hand pump W_ckg_handpump 0.124 0.330 0.133 0.339 0 0 44 Cooking water: closed well W_ckg_closedwell 0.058 0.234 0.045 0.208 0 0 45 Cooking water: un protected spring W_ckg_unprsprng 0.106 0.308 0.086 0.280 0 0 46 Cooking water: pond/canal / river W_ckg_pond 0.037 0.189 0.033 0.180 0 0 47 Cooking water: bottled water W_ckg_bottwater 0.000 0.000 0.000 0.012 0 0 48 Cooking water: tanker /truck/water bearer W_ckg_tanker 0.012 0.109 0.011 0.105 0 0 49 Cooking water: filtration plant W_ckg_filtration 0.001 0.037 0.001 0.037 0 0 50 Handwashing water: hand pump W_handw_handpump 0.126 0.332 0.134 0.340 0 0 51 Handwashing water: closed well W_handw_closedwell 0.058 0.234 0.045 0.207 0 0 52 Handwashing water: un protected spring W_handw_unprsprng 0.104 0.305 0.084 0.277 0 0 53 Handwashing water: pond/canal / river W_handw_pond 0.039 0.193 0.032 0.177 0 0 54 Handwashing water: tanker /truck/water bearer W_handw_tanker 0.010 0.098 0.011 0.105 0 0 55 Handwashing water: other W_handw_other 0.005 0.068 0.009 0.094 0 0 56 Number of radio D_nradio 0.120 0.329 0.130 0.338 0 0 57 Number of dryer D_ndryer 0.080 0.279 0.093 0.302 1 0 58 Number of cooking range D_ncookingrange 0.013 0.113 0.013 0.119 0 0 59 Number of microwave D_nmicrowave 0.037 0.194 0.034 0.192 0 0 60 Number of watetr filter D_nwaterfilter 0.005 0.070 0.009 0.098 0 0 61 Number of ups D_nups 0.122 0.336 0.122 0.346 1 0 62 Number of heater D_nheater 0.124 0.475 0.112 0.398 0 0 63 Number of geaser D_ngeaser 0.086 0.320 0.090 0.311 0 0 64 Number of richshaw/chingchi D_nrichshaw 0.021 0.148 0.026 0.168 1 0 65 Number of car D_ncar 0.068 0.267 0.075 0.281 1 0 66 Number of van/truck/bus D_nvantruckbus 0.021 0.153 0.019 0.145 0 0 67 Number of boat D_nboat 0.000 0.012 0.001 0.030 0 0 68 Number of clock D_nclock 0.899 0.961 1.035 0.933 1 0 69 Indicator of radio D_iradio 0.119 0.324 0.130 0.336 0 0 70 Indicator of washing D_iwashing 0.585 0.493 0.594 0.491 0 0 71 Indicator of dryer D_idryer 0.078 0.269 0.090 0.286 0 0 72 Indicator of air conditoner D_iairconditioning 0.045 0.206 0.042 0.201 1 0 73 Indicator of fan D_ifan 0.878 0.327 0.914 0.280 0 0 74 Indicator of cooking range D_icookingrange 0.013 0.111 0.013 0.114 1 0 75 Indicator of microwave D_imicrowave 0.036 0.187 0.033 0.178 1 0 76 Indicator of watetr filter D_iwaterfilter 0.005 0.070 0.009 0.096 0 0 77 Indicator of ups D_iups 0.120 0.325 0.119 0.323 0 0 78 Indicator of solar panel D_isolarpanel 0.399 0.490 0.407 0.491 0 0 79 Indicator of heater D_iheater 0.085 0.279 0.094 0.291 1 0 80 Indicator of geaser D_igeaser 0.078 0.268 0.085 0.278 0 0 81 Indicator of richshaw/chingchi D_irichshaw 0.020 0.141 0.025 0.157 0 0 82 Indicator of car D_icar 0.065 0.246 0.071 0.257 0 0 83 Indicator of van/truck/bus D_ivantruckbus 0.020 0.140 0.018 0.131 0 0 84 Indicator of boat D_iboat 0.000 0.012 0.000 0.021 0 0 85 There are no info about durables D_nodurables 0.002 0.047 0.004 0.061 0 0 86 Household head has no education information C_noeducationinfo 0.000 0.019 0.000 0.000 0 0 87 Household head has no employment information C_noemploymentinfo 0.000 0.019 0.000 0.000 0 0 88 Household head is female C_female 0.155 0.362 0.129 0.335 1 0 89 Household head is widower C_widow 0.056 0.229 0.045 0.206 0 0 90 Household head is minor C_minor 0.002 0.043 0.001 0.036 0 0 91 Household head can read but not write C_canreadbnotwrite 0.004 0.061 0.008 0.088 0 0 92 Household head never attended C_neverattended 0.482 0.500 0.500 0.500 1 0 93 Household head attended school in the past C_attendedpast 0.511 0.500 0.496 0.500 0 0 94 Household head currently attending school C_currentattending 0.006 0.077 0.004 0.063 0 0 95 Household head studied in a government institute C_schoolgov 0.489 0.500 0.474 0.499 0 0 96 Household head completed basic education C_basiceducation 0.135 0.341 0.135 0.342 1 0 97 Household head completed intermediate education C_intermededucation 0.271 0.445 0.261 0.439 0 0 98 Household head completed tertiary education C_highesteducation 0.038 0.191 0.039 0.194 1 0 99 Household head occupation: manager or professional C_occmangprof 0.068 0.252 0.070 0.254 1 0 100 Household head occupation: technician or services C_occtecserv 0.176 0.381 0.183 0.387 1 0 101 Household head occupation: skilled worker C_occskilled 0.176 0.381 0.147 0.354 1 0 102 Household head occupation: plant/machinery operators or craft/trades C_occplntcrft 0.161 0.368 0.161 0.368 0 0 103 Household head activity: services C_actserv 0.338 0.473 0.349 0.477 0 0 104 Household head has a pension C_pension 0.073 0.260 0.070 0.256 1 0 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 35 A6. Punjab: variable comparison HIES 2018–19 and PSLM 2019–20 Order Variable Label Variable Name HIES mean HIES sd PSLM mean PSLM sd GLS model Alpha model 1 Language is Urdu L_urdu 0.288 0.453 0.229 0.420 0 0 2 Language is Punjabi L_punjabi 0.555 0.497 0.567 0.496 0 0 3 Language is Sindhi L_sindhi 0.000 0.013 0.001 0.037 0 0 4 Language is Hindko L_hindko 0.000 0.013 0.000 0.008 0 0 5 Had to skip a meal because there was not enough money or other resources to get T_4 0.113 0.317 0.095 0.293 0 0 6 Ate less than you thought you should because of a lack of money or other resourc T_5 0.121 0.326 0.131 0.338 0 0 7 Occupation: Subsidized rent H_occup_rentsubs 0.010 0.102 0.006 0.077 0 0 8 Dwelling type: Independent house/compound H_house_independ 0.764 0.425 0.815 0.388 0 0 9 Dwelling type: Part of the large unit H_house_poflunit 0.171 0.377 0.127 0.333 1 0 10 Floor: earth/sand H_floor_earth 0.239 0.426 0.263 0.440 0 0 11 Floor: cement/ cement tiles H_floor_cement 0.444 0.497 0.415 0.493 1 0 12 Roof: RCC/RBC H_roof_rccrbc 0.349 0.477 0.356 0.479 0 1 13 Roof: Wood/Bamboo H_roof_wood 0.131 0.337 0.126 0.332 1 0 14 Roof: Metal/Tin/Girders/T-Iron H_roof_grader 0.495 0.500 0.500 0.500 0 0 15 Roof: Other H_roof_other 0.012 0.110 0.005 0.067 0 0 16 Walls: Burned bricks/block H_walls_burntbricks 0.933 0.251 0.917 0.276 1 0 17 Walls: Raw bricks/mud H_walls_mudbricks 0.060 0.237 0.071 0.256 0 0 18 Walls: Stone H_walls_stones 0.003 0.056 0.006 0.076 0 0 19 Lighting: Electricity H_lighting_elect 0.955 0.208 0.957 0.203 0 0 20 Lighting: Solar H_lighting_solar 0.019 0.135 0.022 0.146 0 0 21 Drinking water: Inside dwelling hand pump W_dkw_inshandpump 0.174 0.379 0.164 0.370 1 0 22 Drinking water: Inside dwelling closed well W_dkw_insclosedwell 0.002 0.041 0.004 0.066 0 0 23 Drinking water: Inside dwelling open well W_dkw_insopenwell 0.000 0.017 0.001 0.035 1 0 24 Drinking water: Outside dwelling hand pump W_dkw_outhandpump 0.055 0.227 0.055 0.228 0 0 25 Drinking water: Outside dwelling closed well W_dkw_outclosedwell 0.001 0.035 0.002 0.043 0 0 26 Drinking water: Outside dwelling un protected spring W_dkw_outunprsprng 0.001 0.036 0.001 0.038 0 0 27 Drinking water: pond/canal / river W_dkw_pond 0.003 0.052 0.006 0.074 0 0 28 Drinking water: bottled water W_dkw_bottwater 0.008 0.088 0.010 0.101 0 0 29 Drinking water: tanker /truck/water bearer W_dkw_tanker 0.037 0.190 0.030 0.171 0 0 30 Drinking water: hand pump W_dkw_handpump 0.228 0.420 0.219 0.414 0 0 31 Drinking water: closed well W_dkw_closedwell 0.003 0.054 0.006 0.079 0 0 32 Drinking water: un protected spring W_dkw_unprsprng 0.002 0.041 0.002 0.040 0 0 33 Drinking water: no water delivery system W_dkw_nodelivery 0.051 0.220 0.051 0.220 0 0 34 Drinking water: safe to drink W_dkw_safe 0.955 0.207 0.956 0.205 0 0 35 Who installed the water delivery system: ngo, private W_whoinspriva 0.112 0.315 0.095 0.293 0 0 36 How much time is consumed (in minutes) on a round trip to fetch the drinking wat W_time 4.053 8.227 4.486 8.622 0 0 37 Doing anything to the water to make it safer: Don't know W_saferdk 0.004 0.059 0.005 0.072 0 0 38 Indicator of normaly paying for drinking water W_drinkingpay 0.213 0.409 0.219 0.414 1 0 39 Toilet: No Toilet W_toilet_notoilet 0.119 0.324 0.092 0.290 0 0 40 Toilet: Composting toilet W_toilet_pitlat 0.006 0.074 0.008 0.087 1 0 41 Open defecation: fields / open places or other W_opendefecation 0.116 0.320 0.091 0.287 1 0 42 Drainage: no, no system W_drainage_nosystem 0.284 0.451 0.285 0.451 0 0 43 Cooking water: hand pump W_ckg_handpump 0.216 0.411 0.202 0.402 0 0 44 Cooking water: closed well W_ckg_closedwell 0.003 0.055 0.006 0.079 0 0 45 Cooking water: un protected spring W_ckg_unprsprng 0.002 0.042 0.002 0.042 0 0 46 Cooking water: pond/canal / river W_ckg_pond 0.002 0.044 0.004 0.067 0 0 47 Cooking water: bottled water W_ckg_bottwater 0.002 0.041 0.003 0.054 0 0 48 Cooking water: tanker /truck/water bearer W_ckg_tanker 0.021 0.143 0.020 0.141 0 0 49 Cooking water: filtration plant W_ckg_filtration 0.043 0.203 0.049 0.216 0 0 50 Handwashing water: hand pump W_handw_handpump 0.199 0.399 0.214 0.410 0 0 51 Handwashing water: closed well W_handw_closedwell 0.003 0.051 0.006 0.078 0 0 52 Handwashing water: un protected spring W_handw_unprsprng 0.001 0.038 0.002 0.042 0 0 53 Handwashing water: pond/canal / river W_handw_pond 0.002 0.049 0.004 0.062 0 0 54 Handwashing water: tanker /truck/water bearer W_handw_tanker 0.003 0.056 0.003 0.059 0 0 55 Handwashing water: other W_handw_other 0.003 0.055 0.004 0.064 0 0 56 Number of radio D_nradio 0.015 0.126 0.009 0.095 0 0 57 Number of dryer D_ndryer 0.195 0.409 0.211 0.416 1 0 58 Number of cooking range D_ncookingrange 0.021 0.145 0.020 0.145 0 0 59 Number of microwave D_nmicrowave 0.112 0.320 0.120 0.331 0 0 60 Number of watetr filter D_nwaterfilter 0.012 0.109 0.013 0.114 0 0 61 Number of ups D_nups 0.131 0.343 0.122 0.341 1 0 62 Number of heater D_nheater 0.123 0.402 0.108 0.355 0 0 63 Number of geaser D_ngeaser 0.077 0.279 0.083 0.283 0 0 64 Number of richshaw/chingchi D_nrichshaw 0.027 0.174 0.029 0.172 1 0 65 Number of car D_ncar 0.061 0.266 0.063 0.275 1 0 66 Number of van/truck/bus D_nvantruckbus 0.006 0.104 0.007 0.090 0 0 67 Number of boat D_nboat 0.000 0.018 0.001 0.025 0 0 68 Number of clock D_nclock 0.739 0.918 0.810 0.899 1 0 69 Indicator of radio D_iradio 0.015 0.120 0.009 0.095 0 0 70 Indicator of washing D_iwashing 0.611 0.487 0.626 0.484 0 0 71 Indicator of dryer D_idryer 0.190 0.392 0.208 0.406 0 0 72 Indicator of air conditoner D_iairconditioning 0.086 0.280 0.081 0.272 1 0 73 Indicator of fan D_ifan 0.948 0.222 0.947 0.224 0 0 74 Indicator of cooking range D_icookingrange 0.021 0.143 0.019 0.136 1 0 75 Indicator of microwave D_imicrowave 0.111 0.314 0.118 0.323 1 0 76 Indicator of watetr filter D_iwaterfilter 0.012 0.108 0.013 0.112 0 0 77 Indicator of ups D_iups 0.130 0.336 0.120 0.325 0 0 78 Indicator of solar panel D_isolarpanel 0.063 0.244 0.053 0.224 0 0 79 Indicator of heater D_iheater 0.108 0.311 0.099 0.299 1 0 80 Indicator of geaser D_igeaser 0.075 0.263 0.082 0.274 0 0 81 Indicator of richshaw/chingchi D_irichshaw 0.026 0.158 0.028 0.164 0 0 82 Indicator of car D_icar 0.055 0.229 0.057 0.231 0 0 83 Indicator of van/truck/bus D_ivantruckbus 0.005 0.072 0.006 0.080 0 0 84 Indicator of boat D_iboat 0.000 0.009 0.001 0.025 0 0 85 There are no info about durables D_nodurables 0.011 0.104 0.011 0.106 0 0 86 Household head has no education information C_noeducationinfo 0.000 0.016 0.000 0.000 0 0 87 Household head has no employment information C_noemploymentinfo 0.000 0.016 0.000 0.000 0 0 88 Household head is female C_female 0.122 0.327 0.098 0.298 1 0 89 Household head is widower C_widow 0.084 0.277 0.078 0.268 0 0 90 Household head is minor C_minor 0.001 0.030 0.001 0.029 0 0 91 Household head can read but not write C_canreadbnotwrite 0.010 0.098 0.019 0.135 0 0 92 Household head never attended C_neverattended 0.384 0.486 0.396 0.489 1 0 93 Household head attended school in the past C_attendedpast 0.613 0.487 0.601 0.490 0 0 94 Household head currently attending school C_currentattending 0.003 0.056 0.003 0.055 0 0 95 Household head studied in a government institute C_schoolgov 0.592 0.491 0.574 0.495 0 0 96 Household head completed basic education C_basiceducation 0.187 0.390 0.185 0.389 1 0 97 Household head completed intermediate education C_intermededucation 0.325 0.468 0.324 0.468 0 0 98 Household head completed tertiary education C_highesteducation 0.023 0.151 0.025 0.158 1 0 99 Household head occupation: manager or professional C_occmangprof 0.057 0.233 0.059 0.235 1 0 100 Household head occupation: technician or services C_occtecserv 0.200 0.400 0.201 0.401 1 0 101 Household head occupation: skilled worker C_occskilled 0.192 0.394 0.194 0.395 1 0 102 Household head occupation: plant/machinery operators or craft/trades C_occplntcrft 0.175 0.380 0.177 0.382 0 0 103 Household head activity: services C_actserv 0.343 0.475 0.341 0.474 0 0 104 Household head has a pension C_pension 0.056 0.231 0.054 0.226 1 0 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 36 A7. Sindh: variable comparison HIES 2018–19 and PSLM 2019–20 Order Variable Label Variable Name HIES mean HIES sd PSLM mean PSLM sd GLS model Alpha model 1 Language is Urdu L_urdu 0.426 0.494 0.449 0.497 0 0 2 Language is Punjabi L_punjabi 0.003 0.055 0.003 0.055 0 0 3 Language is Sindhi L_sindhi 0.571 0.495 0.534 0.499 0 0 4 Language is Hindko L_hindko 0.000 0.013 0.001 0.027 0 0 5 Had to skip a meal because there was not enough money or other resources to get T_4 0.091 0.287 0.084 0.277 0 0 6 Ate less than you thought you should because of a lack of money or other resourc T_5 0.163 0.370 0.160 0.367 0 0 7 Occupation: Subsidized rent H_occup_rentsubs 0.011 0.102 0.011 0.106 0 0 8 Dwelling type: Independent house/compound H_house_independ 0.695 0.461 0.670 0.470 0 0 9 Dwelling type: Part of the large unit H_house_poflunit 0.116 0.321 0.149 0.356 1 0 10 Floor: earth/sand H_floor_earth 0.407 0.491 0.328 0.470 0 0 11 Floor: cement/ cement tiles H_floor_cement 0.449 0.497 0.456 0.498 1 0 12 Roof: RCC/RBC H_roof_rccrbc 0.322 0.467 0.329 0.470 0 1 13 Roof: Wood/Bamboo H_roof_wood 0.310 0.462 0.287 0.452 1 0 14 Roof: Metal/Tin/Girders/T-Iron H_roof_grader 0.333 0.471 0.306 0.461 0 0 15 Roof: Other H_roof_other 0.001 0.038 0.011 0.105 0 0 16 Walls: Burned bricks/block H_walls_burntbricks 0.731 0.443 0.742 0.437 1 0 17 Walls: Raw bricks/mud H_walls_mudbricks 0.235 0.424 0.207 0.405 0 0 18 Walls: Stone H_walls_stones 0.000 0.000 0.001 0.029 0 0 19 Lighting: Electricity H_lighting_elect 0.862 0.345 0.868 0.338 0 0 20 Lighting: Solar H_lighting_solar 0.035 0.184 0.048 0.214 0 0 21 Drinking water: Inside dwelling hand pump W_dkw_inshandpump 0.262 0.440 0.274 0.446 1 0 22 Drinking water: Inside dwelling closed well W_dkw_insclosedwell 0.010 0.102 0.000 0.020 0 0 23 Drinking water: Inside dwelling open well W_dkw_insopenwell 0.000 0.019 0.002 0.043 1 0 24 Drinking water: Outside dwelling hand pump W_dkw_outhandpump 0.098 0.297 0.085 0.279 0 0 25 Drinking water: Outside dwelling closed well W_dkw_outclosedwell 0.005 0.072 0.001 0.037 0 0 26 Drinking water: Outside dwelling un protected spring W_dkw_outunprsprng 0.001 0.025 0.001 0.037 0 0 27 Drinking water: pond/canal / river W_dkw_pond 0.021 0.144 0.017 0.128 0 0 28 Drinking water: bottled water W_dkw_bottwater 0.078 0.268 0.069 0.253 0 0 29 Drinking water: tanker /truck/water bearer W_dkw_tanker 0.045 0.207 0.043 0.204 0 0 30 Drinking water: hand pump W_dkw_handpump 0.360 0.480 0.359 0.480 0 0 31 Drinking water: closed well W_dkw_closedwell 0.016 0.124 0.002 0.042 0 0 32 Drinking water: un protected spring W_dkw_unprsprng 0.001 0.035 0.001 0.038 0 0 33 Drinking water: no water delivery system W_dkw_nodelivery 0.150 0.358 0.139 0.346 0 0 34 Drinking water: safe to drink W_dkw_safe 0.909 0.287 0.894 0.308 0 0 35 Who installed the water delivery system: ngo, private W_whoinspriva 0.032 0.176 0.063 0.243 0 0 36 How much time is consumed (in minutes) on a round trip to fetch the drinking wat W_time 6.477 14.514 5.246 11.523 0 0 37 Doing anything to the water to make it safer: Don't know W_saferdk 0.003 0.056 0.007 0.081 0 0 38 Indicator of normaly paying for drinking water W_drinkingpay 0.276 0.447 0.280 0.449 1 0 39 Toilet: No Toilet W_toilet_notoilet 0.119 0.324 0.091 0.288 0 0 40 Toilet: Composting toilet W_toilet_pitlat 0.104 0.305 0.085 0.278 1 0 41 Open defecation: fields / open places or other W_opendefecation 0.117 0.322 0.089 0.285 1 0 42 Drainage: no, no system W_drainage_nosystem 0.393 0.488 0.397 0.489 0 0 43 Cooking water: hand pump W_ckg_handpump 0.376 0.484 0.367 0.482 0 0 44 Cooking water: closed well W_ckg_closedwell 0.017 0.128 0.002 0.040 0 0 45 Cooking water: un protected spring W_ckg_unprsprng 0.001 0.026 0.001 0.037 0 0 46 Cooking water: pond/canal / river W_ckg_pond 0.023 0.151 0.016 0.126 0 0 47 Cooking water: bottled water W_ckg_bottwater 0.037 0.188 0.027 0.161 0 0 48 Cooking water: tanker /truck/water bearer W_ckg_tanker 0.044 0.206 0.049 0.216 0 0 49 Cooking water: filtration plant W_ckg_filtration 0.007 0.080 0.009 0.093 0 0 50 Handwashing water: hand pump W_handw_handpump 0.373 0.484 0.372 0.483 0 0 51 Handwashing water: closed well W_handw_closedwell 0.016 0.125 0.002 0.040 0 0 52 Handwashing water: un protected spring W_handw_unprsprng 0.000 0.018 0.001 0.035 0 0 53 Handwashing water: pond/canal / river W_handw_pond 0.026 0.160 0.017 0.130 0 0 54 Handwashing water: tanker /truck/water bearer W_handw_tanker 0.045 0.208 0.044 0.206 0 0 55 Handwashing water: other W_handw_other 0.013 0.113 0.004 0.065 0 0 56 Number of radio D_nradio 0.017 0.130 0.030 0.171 0 0 57 Number of dryer D_ndryer 0.069 0.254 0.053 0.225 1 0 58 Number of cooking range D_ncookingrange 0.023 0.158 0.021 0.144 0 0 59 Number of microwave D_nmicrowave 0.068 0.252 0.059 0.237 0 0 60 Number of watetr filter D_nwaterfilter 0.019 0.137 0.017 0.132 0 0 61 Number of ups D_nups 0.062 0.247 0.070 0.258 1 0 62 Number of heater D_nheater 0.041 0.214 0.012 0.111 0 0 63 Number of geaser D_ngeaser 0.076 0.267 0.062 0.241 0 0 64 Number of richshaw/chingchi D_nrichshaw 0.012 0.114 0.020 0.142 1 0 65 Number of car D_ncar 0.057 0.257 0.056 0.240 1 0 66 Number of van/truck/bus D_nvantruckbus 0.004 0.065 0.003 0.058 0 0 67 Number of boat D_nboat 0.001 0.050 0.000 0.020 0 0 68 Number of clock D_nclock 0.699 0.941 0.650 0.770 1 0 69 Indicator of radio D_iradio 0.016 0.127 0.030 0.171 0 0 70 Indicator of washing D_iwashing 0.501 0.500 0.497 0.500 0 0 71 Indicator of dryer D_idryer 0.068 0.252 0.053 0.223 0 0 72 Indicator of air conditoner D_iairconditioning 0.081 0.272 0.063 0.243 1 0 73 Indicator of fan D_ifan 0.869 0.338 0.872 0.334 0 0 74 Indicator of cooking range D_icookingrange 0.022 0.148 0.021 0.144 1 0 75 Indicator of microwave D_imicrowave 0.068 0.252 0.059 0.236 1 0 76 Indicator of watetr filter D_iwaterfilter 0.019 0.137 0.017 0.130 0 0 77 Indicator of ups D_iups 0.061 0.240 0.069 0.254 0 0 78 Indicator of solar panel D_isolarpanel 0.192 0.394 0.211 0.408 0 0 79 Indicator of heater D_iheater 0.038 0.192 0.012 0.107 1 0 80 Indicator of geaser D_igeaser 0.076 0.265 0.062 0.241 0 0 81 Indicator of richshaw/chingchi D_irichshaw 0.011 0.105 0.020 0.139 0 0 82 Indicator of car D_icar 0.052 0.222 0.053 0.225 0 0 83 Indicator of van/truck/bus D_ivantruckbus 0.004 0.060 0.003 0.057 0 0 84 Indicator of boat D_iboat 0.001 0.027 0.000 0.020 0 0 85 There are no info about durables D_nodurables 0.065 0.247 0.048 0.214 0 0 86 Household head has no education information C_noeducationinfo 0.000 0.000 0.000 0.000 0 0 87 Household head has no employment information C_noemploymentinfo 0.000 0.000 0.000 0.000 0 0 88 Household head is female C_female 0.038 0.191 0.039 0.193 1 0 89 Household head is widower C_widow 0.057 0.231 0.058 0.234 0 0 90 Household head is minor C_minor 0.002 0.041 0.001 0.028 0 0 91 Household head can read but not write C_canreadbnotwrite 0.025 0.157 0.008 0.087 0 0 92 Household head never attended C_neverattended 0.392 0.488 0.417 0.493 1 0 93 Household head attended school in the past C_attendedpast 0.607 0.489 0.581 0.493 0 0 94 Household head currently attending school C_currentattending 0.002 0.040 0.002 0.044 0 0 95 Household head studied in a government institute C_schoolgov 0.588 0.492 0.558 0.497 0 0 96 Household head completed basic education C_basiceducation 0.206 0.404 0.189 0.391 1 0 97 Household head completed intermediate education C_intermededucation 0.237 0.425 0.237 0.425 0 0 98 Household head completed tertiary education C_highesteducation 0.031 0.174 0.033 0.178 1 0 99 Household head occupation: manager or professional C_occmangprof 0.082 0.275 0.075 0.264 1 0 100 Household head occupation: technician or services C_occtecserv 0.228 0.419 0.236 0.425 1 0 101 Household head occupation: skilled worker C_occskilled 0.171 0.377 0.206 0.404 1 0 102 Household head occupation: plant/machinery operators or craft/trades C_occplntcrft 0.174 0.379 0.173 0.378 0 0 103 Household head activity: services C_actserv 0.397 0.489 0.396 0.489 0 0 104 Household head has a pension C_pension 0.029 0.169 0.033 0.178 1 0 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 37 A8. Balochistan: variable comparison HIES 2018–19 and PSLM 2019–20 Order Variable Label Variable Name HIES mean HIES sd PSLM mean PSLM sd GLS model Alpha model 1 Language is Urdu L_urdu 0.603 0.489 0.448 0.497 0 0 2 Language is Punjabi L_punjabi 0.000 0.000 0.003 0.056 0 0 3 Language is Sindhi L_sindhi 0.152 0.359 0.107 0.309 0 0 4 Language is Hindko L_hindko 0.000 0.000 0.000 0.000 0 0 5 Had to skip a meal because there was not enough money or other resources to get T_4 0.067 0.250 0.137 0.344 0 0 6 Ate less than you thought you should because of a lack of money or other resourc T_5 0.159 0.365 0.202 0.401 0 0 7 Occupation: Subsidized rent H_occup_rentsubs 0.011 0.105 0.015 0.121 0 0 8 Dwelling type: Independent house/compound H_house_independ 0.944 0.231 0.865 0.342 0 0 9 Dwelling type: Part of the large unit H_house_poflunit 0.032 0.175 0.058 0.234 1 0 10 Floor: earth/sand H_floor_earth 0.680 0.467 0.682 0.466 0 0 11 Floor: cement/ cement tiles H_floor_cement 0.286 0.452 0.245 0.430 1 0 12 Roof: RCC/RBC H_roof_rccrbc 0.099 0.299 0.085 0.279 0 1 13 Roof: Wood/Bamboo H_roof_wood 0.633 0.482 0.613 0.487 1 0 14 Roof: Metal/Tin/Girders/T-Iron H_roof_grader 0.241 0.428 0.251 0.434 0 0 15 Roof: Other H_roof_other 0.004 0.061 0.004 0.060 0 0 16 Walls: Burned bricks/block H_walls_burntbricks 0.330 0.470 0.366 0.482 1 0 17 Walls: Raw bricks/mud H_walls_mudbricks 0.609 0.488 0.585 0.493 0 0 18 Walls: Stone H_walls_stones 0.050 0.218 0.020 0.142 0 0 19 Lighting: Electricity H_lighting_elect 0.748 0.434 0.770 0.421 0 0 20 Lighting: Solar H_lighting_solar 0.168 0.374 0.178 0.383 0 0 21 Drinking water: Inside dwelling hand pump W_dkw_inshandpump 0.031 0.174 0.014 0.120 1 0 22 Drinking water: Inside dwelling closed well W_dkw_insclosedwell 0.025 0.157 0.061 0.239 0 0 23 Drinking water: Inside dwelling open well W_dkw_insopenwell 0.008 0.088 0.002 0.044 1 0 24 Drinking water: Outside dwelling hand pump W_dkw_outhandpump 0.037 0.189 0.021 0.144 0 0 25 Drinking water: Outside dwelling closed well W_dkw_outclosedwell 0.027 0.163 0.015 0.123 0 0 26 Drinking water: Outside dwelling un protected spring W_dkw_outunprsprng 0.032 0.175 0.019 0.137 0 0 27 Drinking water: pond/canal / river W_dkw_pond 0.095 0.294 0.121 0.326 0 0 28 Drinking water: bottled water W_dkw_bottwater 0.023 0.151 0.004 0.061 0 0 29 Drinking water: tanker /truck/water bearer W_dkw_tanker 0.165 0.371 0.174 0.379 0 0 30 Drinking water: hand pump W_dkw_handpump 0.068 0.252 0.036 0.186 0 0 31 Drinking water: closed well W_dkw_closedwell 0.053 0.223 0.076 0.265 0 0 32 Drinking water: un protected spring W_dkw_unprsprng 0.034 0.182 0.020 0.139 0 0 33 Drinking water: no water delivery system W_dkw_nodelivery 0.321 0.467 0.330 0.470 0 0 34 Drinking water: safe to drink W_dkw_safe 0.673 0.469 0.661 0.473 0 0 35 Who installed the water delivery system: ngo, private W_whoinspriva 0.026 0.160 0.084 0.278 0 0 36 How much time is consumed (in minutes) on a round trip to fetch the drinking wat W_time 11.386 16.937 14.441 20.348 0 0 37 Doing anything to the water to make it safer: Don't know W_saferdk 0.008 0.087 0.015 0.120 0 0 38 Indicator of normaly paying for drinking water W_drinkingpay 0.324 0.468 0.323 0.468 1 0 39 Toilet: No Toilet W_toilet_notoilet 0.173 0.379 0.172 0.377 0 0 40 Toilet: Composting toilet W_toilet_pitlat 0.335 0.472 0.311 0.463 1 0 41 Open defecation: fields / open places or other W_opendefecation 0.172 0.377 0.166 0.372 1 0 42 Drainage: no, no system W_drainage_nosystem 0.587 0.492 0.631 0.482 0 0 43 Cooking water: hand pump W_ckg_handpump 0.067 0.251 0.024 0.154 0 0 44 Cooking water: closed well W_ckg_closedwell 0.054 0.226 0.077 0.266 0 0 45 Cooking water: un protected spring W_ckg_unprsprng 0.030 0.171 0.023 0.149 0 0 46 Cooking water: pond/canal / river W_ckg_pond 0.095 0.293 0.137 0.344 0 0 47 Cooking water: bottled water W_ckg_bottwater 0.021 0.142 0.004 0.063 0 0 48 Cooking water: tanker /truck/water bearer W_ckg_tanker 0.161 0.367 0.192 0.394 0 0 49 Cooking water: filtration plant W_ckg_filtration 0.002 0.040 0.001 0.035 0 0 50 Handwashing water: hand pump W_handw_handpump 0.066 0.248 0.028 0.165 0 0 51 Handwashing water: closed well W_handw_closedwell 0.055 0.227 0.077 0.266 0 0 52 Handwashing water: un protected spring W_handw_unprsprng 0.031 0.174 0.023 0.149 0 0 53 Handwashing water: pond/canal / river W_handw_pond 0.098 0.297 0.139 0.346 0 0 54 Handwashing water: tanker /truck/water bearer W_handw_tanker 0.169 0.375 0.188 0.391 0 0 55 Handwashing water: other W_handw_other 0.009 0.096 0.016 0.126 0 0 56 Number of radio D_nradio 0.060 0.240 0.140 0.350 0 0 57 Number of dryer D_ndryer 0.026 0.168 0.041 0.203 1 0 58 Number of cooking range D_ncookingrange 0.007 0.082 0.008 0.104 0 0 59 Number of microwave D_nmicrowave 0.030 0.175 0.017 0.132 0 0 60 Number of watetr filter D_nwaterfilter 0.004 0.062 0.018 0.139 0 0 61 Number of ups D_nups 0.023 0.157 0.072 0.266 1 0 62 Number of heater D_nheater 0.371 0.931 0.587 1.144 0 0 63 Number of geaser D_ngeaser 0.055 0.232 0.108 0.343 0 0 64 Number of richshaw/chingchi D_nrichshaw 0.011 0.116 0.015 0.129 1 0 65 Number of car D_ncar 0.067 0.263 0.088 0.308 1 0 66 Number of van/truck/bus D_nvantruckbus 0.005 0.071 0.013 0.135 0 0 67 Number of boat D_nboat 0.001 0.023 0.002 0.054 0 0 68 Number of clock D_nclock 0.838 1.259 0.746 0.893 1 0 69 Indicator of radio D_iradio 0.060 0.238 0.139 0.346 0 0 70 Indicator of washing D_iwashing 0.430 0.495 0.426 0.494 0 0 71 Indicator of dryer D_idryer 0.025 0.157 0.040 0.196 0 0 72 Indicator of air conditoner D_iairconditioning 0.015 0.122 0.015 0.121 1 0 73 Indicator of fan D_ifan 0.844 0.363 0.889 0.314 0 0 74 Indicator of cooking range D_icookingrange 0.007 0.082 0.007 0.085 1 0 75 Indicator of microwave D_imicrowave 0.029 0.167 0.017 0.128 1 0 76 Indicator of watetr filter D_iwaterfilter 0.004 0.062 0.018 0.132 0 0 77 Indicator of ups D_iups 0.022 0.148 0.071 0.257 0 0 78 Indicator of solar panel D_isolarpanel 0.233 0.423 0.413 0.492 0 0 79 Indicator of heater D_iheater 0.173 0.378 0.292 0.455 1 0 80 Indicator of geaser D_igeaser 0.054 0.227 0.101 0.301 0 0 81 Indicator of richshaw/chingchi D_irichshaw 0.009 0.096 0.014 0.117 0 0 82 Indicator of car D_icar 0.064 0.245 0.081 0.273 0 0 83 Indicator of van/truck/bus D_ivantruckbus 0.005 0.071 0.012 0.108 0 0 84 Indicator of boat D_iboat 0.001 0.023 0.002 0.043 0 0 85 There are no info about durables D_nodurables 0.042 0.202 0.019 0.136 0 0 86 Household head has no education information C_noeducationinfo 0.000 0.000 0.000 0.000 0 0 87 Household head has no employment information C_noemploymentinfo 0.000 0.000 0.000 0.000 0 0 88 Household head is female C_female 0.019 0.136 0.013 0.114 1 0 89 Household head is widower C_widow 0.038 0.191 0.026 0.160 0 0 90 Household head is minor C_minor 0.001 0.027 0.001 0.026 0 0 91 Household head can read but not write C_canreadbnotwrite 0.010 0.099 0.012 0.109 0 0 92 Household head never attended C_neverattended 0.597 0.491 0.539 0.498 1 0 93 Household head attended school in the past C_attendedpast 0.401 0.490 0.459 0.498 0 0 94 Household head currently attending school C_currentattending 0.002 0.046 0.002 0.045 0 0 95 Household head studied in a government institute C_schoolgov 0.391 0.488 0.424 0.494 0 0 96 Household head completed basic education C_basiceducation 0.147 0.354 0.182 0.386 1 0 97 Household head completed intermediate education C_intermededucation 0.167 0.373 0.200 0.400 0 0 98 Household head completed tertiary education C_highesteducation 0.034 0.182 0.030 0.171 1 0 99 Household head occupation: manager or professional C_occmangprof 0.075 0.263 0.072 0.259 1 0 100 Household head occupation: technician or services C_occtecserv 0.192 0.394 0.254 0.435 1 0 101 Household head occupation: skilled worker C_occskilled 0.165 0.371 0.275 0.446 1 0 102 Household head occupation: plant/machinery operators or craft/trades C_occplntcrft 0.218 0.413 0.139 0.346 0 0 103 Household head activity: services C_actserv 0.404 0.491 0.426 0.494 0 0 104 Household head has a pension C_pension 0.023 0.149 0.023 0.151 1 0 Source: World Bank staff estimations based on HIES 2018–19, PSLM 2019–20, and Census 2017 38 Annex 2: Analyzing the Number of Households We estimated the trends in the number of households in the HIES surveys from 2004–05 to 2018–19, the PSLM surveys from 2004–05 to 2019–20, and the Census 2017 and found that the PSLM 2019–20 number of households was inside the 95 percent confidence interval of the quadratic fit in all the provinces, suggesting that the household weights were correctly calculated. A9. Number of Households in HIES and PSLM from 2004–05 to 2019–20, and in Census 2017 Source: World Bank staff estimations based on HIES and PSLM 2004–05 to 2019–20, and Census 2017 39 Annex 3: Comparing Different Specifications We estimated models with different combinations of the level of estimation (national or provincial) and method to select the candidate variables (interval of confidence at 99 percent, the interval of confidence at 95 percent, using a threshold of 20 percent, or using a threshold of 10 percent), and different transformations of the dependent variable (natural logarithm, zero skewness natural logarithm, Box-Cox transformation, or Box-Cox transformation after natural logarithm). We estimated the absolute difference between the prediction in the target data set (PSLM 2019–20) and the direct estimates from the HIES 2018–19. We compared it with the adjusted R squared for each province. We found that, in general, the national model selecting variables using intervals of confidence of 99 percent maintains a balance between explanatory power and smaller differences to the direct estimates. A10. Comparing different models using the HIES-in-PSLM prediction Source: World Bank staff estimations based on HIES and PSLM 2004-05 to 2019-20, and Census 2017 40 A11. Comparing national models with different transformations using the HIES-in-PSLM prediction Source: World Bank staff estimations based on HIES and PSLM 2004–05 to 2019–20, and Census 2017 41