Harmonization of ethnicity indicators across household surveys: A pilot for Africa Paola Ballon, Omar Alburqueque. SSI, Global Unit May 24, 2024 Outline I. Why do we need to harmonize ethnicity? II. Why Africa? III. Our proposal for harmonization IV. Exploratory analysis V. Harmonization outcomes VI. Applications VII.Concluding remarks & recommendations I. Why do we need to harmonize ethnicity? • The WBG Scorecard FY24-FY30 calls for developing disaggregated metrics to foster insights supporting vulnerable groups. • The WBG Inclusion Agenda promotes social inclusion by improving conditions for disadvantaged groups and addressing barriers on ethnicity and other characteristics. • In a recent concept note titled “Inclusive Data-Inclusive Analytics” we stress the need to enhance disaggregated data for vulnerable groups (e.g. ethnic groups) to link with poverty reduction. I. Why do we need to harmonize ethnicity? • The lack of a universal definition and varied research operationalization of ethnicity complicates its measurement and comparability across studies. • A harmonized framework for measuring ethnicity addresses these challenges, ensuring data comparability across studies and contexts, enhancing understanding of ethnic dynamics. • By harmonizing ethnicity, we can deliver accurate, consistent, and actionable insights, improving policies supporting vulnerable groups. II. Why Africa? • Africa hosts thousands of ethnic groups with shared culture, language, religion, and history, speaking 900 to 1,500 languages (Britannica, 2024). • Ethnicity has deeply influenced Africa's history, as nations balance national unity with ethnic diversity, a challenge stemming from colonialism’s arbitrary borders. Some countries suppress diversity, while others celebrate ethnic differences. • In nations with ethnic violence (e.g. Rwanda), ethnicity is a taboo subject. In others, like Ethiopia, ethnicity is a central component of national strategy, with ethnic federalism providing regional autonomy. III. Our proposal for harmonization • We use a frequentist approach for harmonizing ethnicity, starting with the distribution of ethnic groups within each country. From this, we generate a ranking by population size and establish Top 3 and Top 5 classifications. • The harmonization approach involves recoding data specific to each survey/country. • This initiative does not aim to homogenize/standardize the ethnicity variable across countries. • This approach complements, not replaces, existing methods of capturing ethnicity in household surveys. III. Our proposal for harmonization • Top 3 Classification: Identifies the three largest ethnic groups by Structure of our harmonization approach population as distinct categories, Ethnic Share of the Top 3 Top 5 with an additional ‘residual’ group population* Classification Classification category encompassing all other Group 1 20.00% 1st 1st ethnic groups. Group 2 10.00% 2nd 2nd Group 3 5.00% 3rd 3rd • Top 5 Classification: Identifies Group 4 2.50% 4th the five largest ethnic groups by Group 5 1.25% 5th population as distinct categories Rest … … Rest with an additional ‘residual’ Group N 0.00% category encompassing all other *These represent hypothetical values. ethnic groups. III. Our proposal for harmonization • Our framework, based on population size for ethnic categorization, aims to minimize subjective bias to ethnic identity and politics. • Limitations in harmonization efforts are inevitable, so it is crucial to note: • Our work is based on surveys that are representative at national level and, at most, the ADM1 subnational level, which may result in the underrepresentation of some ethnic groups within countries. • Ethnicity is a multifaceted and evolving concept, influenced by heritage and ethnic practices. However, for practical reasons, our methodology confines ethnicity to self-identification, aligning with measurement in household surveys. III. Our proposal for harmonization A main goal of ethnicity harmonization is to disaggregate socioeconomic information by ethnic groups. To achieve this, we will utilize microdata used for the Global Monitoring Database (GMD) collection, which encompasses a broad array of socioeconomic data. IV. Exploratory Data Analysis • Our data review is based on the World Bank's datalibweb portal (developed and supported by the Global Poverty Team for Statistical Development), which provides a range of harmonized and raw microdata, as well as documentation and related files. IV. Exploratory Data Analysis Pilot Countries • We targeted ethnicity data in 16 countries: Benin, Burkina Faso, Côte d'Ivoire, Gabon, Ghana, Guinea, Gambia, Guinea-Bissau, Mali, Niger, Senegal, São Tomé and Príncipe, Seychelles, Chad, Togo, and South Africa. • Most of these countries are located in the Western part of Africa (11), the rest are divided between Central (3), East (1), and Southern (1) Africa. IV. Exploratory Data Analysis Number of Missing Country Survey Year Sampling Representation Observations ethnic groups values (%) Two-stage sampling, Enquête Harmonisée sur le Conditions de Vie des Benin 2018 Probability Proportional National and regional 51 2.58% None Ménages (EHCVM) to Size (PPS) National and regional. Enquête Harmonisée sur le Conditions de Vie des Two-stage sampling, Burkina Faso 2018 Representative for the 11 0% None Ménages (EHCVM) PPS strata of Ouagadougou Enquête Harmonisée sur le Conditions de Vie des Two-stage sampling, Côte d'Ivoire 2018 National and regional 67 0.01% None Ménages (EHCVM) PPS Enquête Gabonaise pour l'Evaluation et le Suivi Gabon 2017 N.A. N.A. 67 0% None de la Pauvreté (EGEP) Two-stage stratified Ghana Ghana Living Standards Survey 7 (GLSS-VII) 2016 National and regional 58 0.01% None sampling, PPS Enquête Harmonisée sur le Conditions de Vie des Guinea 2018 N.A. N.A. 11 0% None Ménages (EHCVM) Two-stage stratified Gambia Integrated Household Survey (IHS) 2015 National 13 0% None sampling, PPS Inquérito Harmonizado sobre as Condiçöes de Two-stage sampling, Guinea-Bissau 2018 National and regional 11 0% None vide dos Agreagados Familiares (EHCVM) PPS IV. Exploratory Data Analysis Number of Missing Country Survey Year Sampling Representation Observations ethnic groups values (%) National and regional. Enquête Harmonisée sur le Conditions de Vie des Two-stage stratified No data on Mali 2018 Representative for the - - Ménages (EHCVM) sampling, PPS ethnicity Bamako district Enquête Harmonisée sur le Conditions de Vie des Two-stage sampling, To many missing Niger 2018 National and regional 10 99.80% Ménages (EHCVM) PPS values Enquête Harmonisée sur le Conditions de Vie des Two-stage sampling, Senegal 2018 National and regional 13 0.01% None Ménages (EHCVM) PPS São Tomé and No data on Inquerito Aos Orcamentos Familiares (IOP) 2017 N.A. N.A. - - Príncipe ethnicity Stratified, systematic No data on Seychelles Household Budget Survey (HBS) 2013 National and regional - - random sampling ethnicity Enquête Harmonisée sur le Conditions de Vie des No data on Chad 2018 N.A. N.A. - - Ménages (EHCVM) ethnicity Enquête Harmonisée sur le Conditions de Vie des Two-stage sampling, Togo 2018 National and regional 36 0% None Ménages (EHCVM) PPS Indicator refers South Africa Living Conditions Survey (LCS) 2014 Multi-stage sampling National and regional 4 0% to race instead of ethnicity IV. Exploratory Data Analysis • About half of the countries with Histogram: Number of ethnic groups ethnicity data have less than 20 ethnic categories in their survey questionnaires. • The remaining countries each have at least 30 ethnic categories. • Excluding South Africa, which recognizes four racial categories, Burkina Faso, Guinea, and Guinea- Bissau each have only 11 ethnic categories. • On the other hand, Côte d'Ivoire and Gabon stand out, each having 67 ethnic categories. V. Harmonization Outcomes Top 3 Classification Results The ethnic composition varies among the analyzed countries. South Africa, with only 4 ethnic categories, sees the top 3 encompass over 90% of the population. In contrast, Benin, with over 50 ethnic categories, shows a lower participation rate for the top 3. Available codes can be found here V. Harmonization Outcomes Top 5 Classification Results The participation rate improves slightly when moving from the top 3 to the top 5 classification. VI. Applications Educational attainment – Benin (EHCVM, 2018) In 2018, significant differences in educational attainment were observed between ethnic groups in Benin. Nationally, 45% lack primary education; this rises to 58% among the Bariba and drops to 33% among the Fon. Note: Educational attainment is defined according to GMD harmonization guidelines. VI. Applications Educational attainment – Burkina Faso (EHCVM, 2018) A similar observation is noted among the Mossi and Peulh ethnic groups in Burkina Faso. Note: Educational attainment is defined according to GMD harmonization guidelines. VI. Applications Labor Status – Benin (EHCVM, 2018) In Benin, there do not seem to be significant employment disparities between ethnic groups. Note: Labor Status is defined according to GMD harmonization guidelines. VI. Applications Labor Status – Burkina Faso (EHCVM, 2018) In Burkina Faso, there is a notable difference in employment rates between the Gourmatché (74%) and Mossi (62%) ethnic groups. Note: Labor Status is defined according to GMD harmonization guidelines. VII. Concluding remarks and recommendations Our harmonization framework offers key advantages: • Informing data collection: Integrating a harmonized ethnicity variable standardizes comparisons across surveys, enhancing demographic trend identification and insights into population growth and social changes. • Visibility and sensitivity: Retaining the original ethnicity variable alongside harmonized categories improves visibility and balances the risk of discrimination, demonstrating our ethical commitment to representation and the impact on marginalized groups. • Communication and clearance: Simplifies stakeholder engagement with clear, objective categorization criteria, improving transparency and facilitating research approvals. VII. Concluding remarks and recommendations We have identified some recommendations to improve ethnic identity categorization in surveys: • Inclusive design: Ensure surveys allow respondents to acknowledge their ethnic identity without feeling forced into a category. • Flexible options: Include a question format that does not compel respondents to select an ethnic group if they do not identify with one. • "None" Option: Add options like "I do not belong to any ethnic group" or "Prefer not to say" for respondents who do not identify with listed groups or prefer not to disclose their ethnic identity. Thanks. pballon@worldbank.org oalburquequechav@worldbank.org