Policy Research Working Paper                    10046




      A Method to Scale-Up Interpretative
    Qualitative Analysis, with an Application
   to Aspirations in Cox’s Bazaar, Bangladesh
                               Julian Ashwin
                              Vijayendra Rao
                             Monica Biradavolu
                              Aditya Chhabra
                               Arshia Haque
                                Afsana Khan
                             Nandini Krishnan




Development Economics
Development Research Group
May 2022
Policy Research Working Paper 10046


  Abstract
 The qualitative analysis of open-ended interviews has vast                         for children with Rohingya refugees and their Bangladeshi
 potential in economics but has found limited use. This                             hosts. It shows that studying aspirations with open-ended
 is partly because the interpretative, nuanced human read-                          interviews extends the economics focus on material goals
 ing of text and coding that it requires is labor intensive                         to ideas from philosophy and anthropology that emphasize
 and very time consuming. This paper presents a method                              aspirations for moral and religious values, and the naviga-
 to simplify and shorten the coding process by extending                            tional capacity to achieve these aspirations. The paper shows
 a small set of interpretative human-codes to a larger, rep-                        how to assess the robustness and reliability of this approach
 resentative, sample using natural language processing and                          and finds that extending the sample of interviews, rather
 thus analyze qualitative data at scale. It applies it to ana-                      than the human-coded training set, is likely to be optimal.
 lyze 2,200 open-ended interviews on parent’s aspirations




 This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may
 be contacted at vrao@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
A Method to Scale-Up Interpretative Qualitative Analysis, with
    an Application to Aspirations in Cox’s Bazaar, Bangladesh*
                                 † Monica Biradavolu, Aditya Chhabra, Arshia Haque,
    Julian Ashwin, Vijayendra Rao,
                             Afsana Khan, Nandini Krishnan




 Originally published in the Policy Research Working Paper Series on May 2022. This
 version is updated on February 2023.
 To obtain the originally published version, please email prwp@worldbank.org.




   * This  paper supersedes our previous paper ”Qualitative Analysis at Scale: An Application to Aspirations in Cox’s
Bazaar, Bangladesh.” We have developed an open-source Python package to use the methods developed in this paper
which is available here: https://github.com/worldbank/iQual. We would like to thank participants at the CSAE Lunchtime
Seminar, Methods and Measurement Conference 2021, the World Bank’s “Half-Baked” seminar, the October 2022 meeting
of the CIFAR Boundaries, Membership and Belonging Program, the LSE Inequalities Seminar, and Ikechi Okorie for their
useful comments and feedback. Peer Nagi, Eleni Kalamara and Sudarshan Aittreya provided valuable research assistance
for the project. The authors are grateful to the World Bank’s Knowledge for Change Program, and the World Bank-UNHCR
Joint Data Center on Forced Displacement for financial support.
    † Corresponding author: Development Research Group, The World Bank, 1818 H street NW, Washington DC 20433,

vrao@worldbank.org
1    Introduction

Economists almost never analyze qualitative data. We typically analyze quantitative data from struc-

tured survey questions because they are easier to administer to large representative samples of respon-

dents, and easier to analyze using standard econometric methods. However, many questions of inter-

est to economists may be better captured with open-ended qualitative interviews rather than structured

questionnaires. These include important concepts like well-being, social norms, cultural change, vul-

nerability, resilience, decision-making, processes of change in interventions and experiments, and –

the focus of this paper – aspirations. Structured questions work best on concepts where the possible

range of responses, and follow-up questions, can be predicted in advance by the researcher. They also

require that respondents have the same understanding of the latent construct underlying the question

as the researcher.

    For these reasons, structured quantitative questions do not work well for more complex concepts

where respondents have a heterogeneous understanding of the concept, where responses can be diffi-

cult to predict, and where probes and their range of responses cannot be anticipated in advance. When

structured questions require responses with a number, or a selection from a set of choices, they can re-

sult in metrics that have the appearance of being clearly defined but hide the complexity of the “truth”

(Espeland and Stevens, 1998). Latent constructs that are more subtle and nuanced are, therefore,

arguably better studied with open-ended questions where the respondent is allowed the freedom to re-

spond in an open-ended conversational style and in the manner of their choosing, and where a trained

interviewer can probe an issue in a relatively unstructured manner by iteratively asking follow-up

questions in a more conversational style. This process also has the advantage of eliciting information

that is more ”bottom-up” and driven by the respondent rather than designed ex-ante by the researcher.

    Open-ended approaches to interviews have not been employed much by economists because an-

alyzing them is hard and almost impossible to do at scale with statistically representative samples

(Rao, 2022). They are primarily the domain of qualitative researchers in anthropology, sociology

and related fields who mull over recordings or transcripts of interviews for considerable periods of

time, listening, reading, interpreting, and carefully coding them within the context of a theory or con-

ceptual framework. Coding is a labor-intensive process typically done by trained social scientists,


                                                   2
and is an essential step in conducting nuanced analysis of qualitative data that is based on human

interpretation. Interpretative qualitative analysis is consequently associated with very small sample

studies. This small sample challenge that has been intrinsic to qualitative methods has resulted in

a large methodological literature on qualitative and case-study methods focusing on justifying and

interpreting data from interviews gathered from samples that are not designed to be statistically rep-

resentative of larger populations. Their general approach has been to inductively draw out inferences

that reflexively expand our understanding of an issue, or to inform theory, rather than claim statistical

representativeness (Small, 2009).

    This paper outlines a new method to analyze open-ended interviews at scale with statistically

representative samples by combining interpretative human coding and machine learning. The method

attempts to follow the logic of traditional qualitative analysis as closely as possible. Briefly, a sub-

sample of the transcripts of open-ended interviews are coded by a small team of trained coders who

read the transcripts, decide on a “coding-tree” and then code the transcripts using qualitative analysis

software which is designed for this purpose. This human coded sub-sample is then used as a training

set to predict the codes on the full, statistically representative sample. The annotated data on the

“enhanced” sample is then analyzed using standard regression analysis. The methods developed

in our paper are not as much a major advance in Natural Language Processing (NLP) and Machine

Learning methods as they are a practical contribution to the menu of tools available to economists and

social scientists, and to extensively test the robustness and reliability of this approach. Our methods

allow social scientists to analyze representative samples of open-ended qualitative interviews, and

to do so by inductively creating a coding structure that emerges from a close, human reading of a

sub-sample of interviews that are then used to predict codes on the larger sample. We see this as an

organic extension of traditional, interpretative, human-coded qualitative analysis, but done at scale.

    This method has several advantages over “unsupervised” NLP methods used for analyzing text

such as topic modeling (which searches for words that occur in clusters in the data) in that it attempts

to hew as closely as possible to traditional qualitative analysis by inductively using the judgement

of informed human coders to be scaled-up, rather than have computers make sense of the data. It

also has an advantage over methods such as sentiment analysis which maps text against pre-defined



                                                   3
dictionaries; sentiment analysis can only provide broad assessments of the “sentiments” observed in

the data and is not good for nuanced analysis, and dictionaries in non-European languages are not well

developed. Working with human codes in a sub-set of the data falls in the category of “supervised”

NLP methods – but gives us a training set that is specific to the sample being analyzed, and thus

has the potential for nuanced, context-specific analysis. It is thus analogous to a dictionary created

specifically for the analytic sample. We believe the method has wide applicability for a variety of

questions of interest to economists. In this paper we apply it to study parents’ aspirations for their

children by analyzing data from open-ended interviews conducted on a sample of approximately

2,200 Rohingya refuges and their Bangladeshi hosts in Cox’s Bazaar, Bangladesh.1

    Aspirations are an interesting subject to apply this method, because an open-ended approach

allows us to study dimensions of aspirations that are difficult to capture in structured questionnaires.

The literature on aspirations in development economics (Fruttero et al., 2021) focuses on what the

philosopher Agnes Callard (2018) has called “ambition” - specific goals that parents may have for

their children such as a level of education, or a profession. Open-ended interviews allow us to expand

this to explore its moral and spiritual dimensions - what Callard calls “aspiration” to distinguish it

from “ambition” - such as being a “good person”, or being religiously inclined. They also allow

us to study what the anthropologist Arjun Appadurai (2004) has called the “capacity to aspire” or

the capability to navigate your way to achieving a given goal. This paper applies the method we

develop to differentiate between, and analyze the correlates of, ambition, navigational capacity, and

aspirations (in Callard’s sense) among Rohingya refugees and their Bangladeshi hosts using open-

ended interviews. It demonstrates that they are independent concepts that have distinctly different

determinants which suggest different policy responses.

    The paper proceeds next by providing a brief overview of the literature on narrative analysis in
    1 Using basic human-coding to create a training set has been used by a few others to analyze large corpora of secondary

text data (e.g. Bonikowski and DiMaggio (2022)). Our contribution in this paper is develop a method to use iterative,
inductive and relatively nuanced human-coding, typical of reflexive qualitative research, to train a representative sample of
primary qualitative data collected by the authors. While the sample is much larger than those analyzed in standard qualita-
tive analysis, it is much smaller than the tens of thousands of text documents usually analyzed by existing NLP methods.
This presents us with some small-sample challenges for NLP that we have attempted to resolve (Bonikowski and Nelson,
2022). Coding packages such as Atlas-TI can be used to compare the “thematic proximity” between themes identified by
qualitative analysis Armborst (2017). While this approach uses the annotations provided by qualitative analysis, it does not
expand the size of the annotated sample as we are proposing.




                                                             4
economics and aspirations in development economics, as well as placing this paper in the context of

the natural language processing literature. Section 3 provides some context to the data - on Cox’s

Bazaar and the process by which the open-ended interviews were conducted and transcribed. Section

4 explains the human coding process – the development of the coding tree, the process of coding val-

idation and checking, and inter-coder reliability. In Section 5 we then move to the NLP methodology

for extending the human coded sample, describing how we cross-validate over text representations

and classifiers. We also include a discussion of the role of machine translation in our analysis. Sec-

tion 6 then discusses a range of tests that assess the value of the enhanced sample we create: testing

for bias, efficiency and interpretability. Section 7 then sets out and discusses results for both the hu-

man and enhanced samples, illustrating the added value of the sample enhancement. Section 8 then

describes a series of experiments in which we assess how the number oh human and machine anno-

tated documents affect results. We find that for researchers on a limited budget, partially machine

annotating their sample is likely to be optimal. Finally, Section 9 concludes and makes suggestions

for further work. We have developed an open-source Python package called iQual (for Interpretative

Qualitative Analysis) that will facilitate the use of the method2



2     Narrative Analysis and Aspirations

Narrative Analysis in Economics. The difficulties with using qualitative methods at scale on repre-

sentative samples have led to their largely being neglected in modern economics. There are notable

exceptions, such as the widely used monetary policy shock series developed by Romer and Romer

(2004) that uses detailed readings of central bank minutes and the narrative approach to business

cycles proposed by Shiller (2020). However, the introduction of natural language processing (NLP)

methods has led to a recent focus on using text data in a quantitative manner as an important source

of information in economic research (Gentzkow et al., 2019).3

     Most work in economics that uses text in a quantitative way falls into two categories that, while

relevant in our context, are conceptually quite different from the method we propose. The use of
    2 https://github.com/worldbank/iQual.
    3 Thistrend is also present in other social sciences, see Ferguson-Cradler (2021) for a discussion of the use of computa-
tional text analysis to identify narratives in economic history.


                                                             5
unsupervised statistical models to reduce the dimensionality of text documents into a set of inter-

pretable variables that are used in further analysis; and the use of dictionary methods to extract a

signal of interest from documents. An example of the former approach in development economics

is Parthasarathy et al. (2019), who use a structural topic model (Roberts et al., 2016) to decompose

the transcripts of village assemblies in rural India. Other examples of such work in non-development

contexts includes Hansen et al. (2018), Nimark and Pitschner (2019) and Larsen et al. (2021).

    Dictionary methods are common, particularly for the analysis of sentiment, and a wide range

of general purpose and context-specific lexicons are available. An early example of this is Tetlock

(2007) who uses a psychosocial dictionary to quantitatively measure interactions between media sen-

timent and the stock market. Many economic researchers have proposed context-specific dictionaries

that help them extract their particular signal of interest. Loughran and McDonald (2011) introduce a

dictionary that classifies words as positive or negative in the context of economic news. These dictio-

nary methods are not limited to the analysis of positive vs negative “sentiment”, but have also been

developed to measure a wide variety of other information. For example, by Apel and Grimaldi (2012)

to measure the “hawkishness” of central bank communication, by Correa et al. (2017) to measure

financial stability and Nyman et al. (2021) to measure systemic risk. The influential economic policy

uncertainty index developed by Baker et al. (2016) is also based on a simple dictionary based method.

The context-specificity of these word lists is of course a limitation as well as an advantage.4 They are

also limited in that they impose a structure on the text features that they try to extract - the presence

or absence of certain sets of words.

    Our approach is to extend a small set of human annotations conducted by qualitative researchers

to a larger representative sample using a model trained on this subsample. We are therefore perhaps

closest to literature that combines both qualitative work with NLP methods. It is quite common to

use a subset of manually classified articles to validate a measure derived from text, e.g. Baker et al.

(2016) and Shapiro et al. (2020), but our focus is on using the manual classifications to construct a

measure.5 Michalopoulos and Xue (2021) use an archive of manually coded motifs in folklore intro-
   4 Infact, Ashwin et al. (2021) suggest that, particularly in a forecasting context, as tailored dictionaries have been
constructed with previously observed events in mind, they do not capture unexpected phenomena as well as general purpose
methods.
   5 There are also recent examples of manually annotating large samples, such as Andre et al. (2021) who use open-ended




                                                           6
duced by Berezkin (2015) and then use NLP to classify these motifs into different concepts. A recent

paper by Jayachandran et al. (2021) is similar to ours in spirit, as they use a subset of manually coded

documents in order to identify which quantitative survey questions best capture women’s agency. Al-

though their approach is methodologically quite different the aim is similar: to find a way to scale

up the measurement of nuanced and complex concepts to large samples. In ongoing work Alexander

et al. (2017) conduct a “qualitative census” of poverty in the United States through open-ended inter-

views with a representative sample of poor households, which would be a potential use case of the

methodology we discuss here.

    There is also a related literature outside economics on training supervised models on human

annotations. However, while our focus is on whether and how such methods can assist substantive

economic analysis, this typically focuses on either maximizing predictive performance or assisting

an ongoing coding process.6 Yordanova et al. (2019) provide a good summary of the literature that

focuses on predictive performance. Much of this literature aims to show that a particular modelling

approach yields superior predictive performance in these tasks, but that is not our focus in this paper.

To this end, we cross-validate over a wide range of both text representations and classifiers - allowing

the data to determine which modelling approach is optimal in a given context. An application of this

                                                      uttmann, 2018) who use a supervised NLP
sort of approach in an economic context is (Mann and P¨

model classify whether patents are related to automation. However, to the best of our knowledge, our

paper represents the first attempt to demonstrate that extending samples in this way can add value

in a context of open-ended interviews dealing with nuanced and complex topics that matter to social

scientists.

    Good examples of using NLP to assist the process of human annotation are Liew et al. (2014) and

Wiedemann (2019) who propose an “active learning” approach in which a model is trained on a small

annotated sample to maximizing the true positives, which are then corrected manually. Meanwhile,

Karamshuk et al. (2017) use a hybrid approach where they first get a small number of high quality
survey responses to measure narratives about the macroeconomy, but rely on research assistants to annotate their entire
sample.
   6 Furthermore, the text features dealt with here are often quite straightforward, so potentially quite different from con-

cepts like aspiration and navigational capacity. In fact, Crowston et al. (2010) find that simple rule-based algorithms perform
better for many of their text features than their supervised models.




                                                              7
annotations, and then use these to crowdsource a much larger one and train a neural network on this

larger sample. While we think this is potentially a very useful approach, the use of crowdsourced

annotations may not be ideally suited to nuanced and complex concepts. Other work, such as Chen

et al. (2018), focuses on ambiguity and disagreement across coders, this is certainly an important

issue in qualitative work and one where NLP techniques may prove useful, but not the focus of our

paper.

    Aspiration, Ambition and Navigational Capacity. There is a thriving literature on aspirations

in development economics that emerged from Debraj Ray’s seminal paper (Ray, 2006) which ex-

tended conventional economic models of human capital investments by arguing that preferences are

not exogenously determined but are social - shaped by what an individual observes around in their

“cognitive neighborhood” that results in an “aspirations window.” This aspirations window can be

multidimensional and include things ranging from education and income to dignity and good-health.

This idea was then extended by Genicot and Ray (2017) and others, reviewed in Genicot and Ray

(2020), to show that socially determined aspirations can fundamentally affect issues that range from

education and mobility to collective action and conflict. The development of theory has gone in par-

allel with a thriving empirical literature Fruttero et al. (2021) that analyzes how aspirations matter in

a variety of important spheres, and particularly in educational and labor market investments.

    The empirical literature is based on quantitative measures of aspirations using structured question-

naires and, perhaps consequently, does not delve into broader dimensions of aspirations that Ray first

talked about such as dignity or cultural heritage which are more difficult to measure. It also misses an

important point first made by the anthropologist Arjun Appadurai (2004) that aspirations are affected

not just by an individual’s ability to imagine a different future for themselves or their children, and

by the economic resources that they can draw on by, but also by their “capacity to aspire” which is a

cultural and cognitive resource that allows then to navigate their way to a better future. Furthermore,

more recently, the philosopher Agnes Callard has argued that it is important to distinguish between

what she calls “ambition” and “aspiration” (Callard, 2018). She defines an “aspiration” as a process

of reversing a “core value” that results in a “change in the self.” An “ambition” to her is a specific goal

that which “she is fully capable of grasping in advance of achieving it” (Callard, 2018, page 229).



                                                    8
Ambition, to her, “often directed at those goods – wealth, power, fame – that can be well appreciated

even by those who do not have them.” By Callard’s definition, the economist’s understanding of aspi-

ration is more in line with what she would call “ambition” rather than “aspiration”, a distinction that

we adopt in this paper as well.

    These distinctions are not just semantic. They have implications for measurement. Navigational

capacity, being a cognitive and culturally determined capacity, is likely to be less amenable to struc-

tured questions where responses to questions are not easy to predict in advance. Similarly, aspirations

in Callard’s sense, as transformative processes are potentially very differently conceived by different

individuals and thus have heterogenous understandings of the latent concept – which also make them

difficult to study with structured questionnaires.

    These distinctions could also have potentially important implications for policy – if navigational

capacity matters it could suggest that interventions to improve cognitive ability might matter, as might

interventions to guide less advantaged people towards achieving their goals. If aspirations matter in a

way that is different from ambition, it might be important to distinguish between them in understand-

ing how people might invest time and resources in achieving aspirations vs ambitions, and – perhaps -

in designing interventions that, for instance, are delivered by cultural or faith-based institutions rather

than government.



3    Data

The data analyzed in this paper is from Cox’s Bazaar in Bangladesh where about 750,000 Rohingya

refugees who were forcibly displaced from Myanmar between 2017-18 are primarily housed. The

challenges faced by displaced populations and hosting communities go beyond monetary or monetiz-

able welfare measures such as food consumption and security, household expenditures, labor market

outcomes and earnings, and basic living standards. Particularly in contexts of forced displacement

outside the country of origin, the displacement experience is often accompanied by reliance on hu-

manitarian assistance, lack of documentation, limited or no access to labor markets and services, and

limited mobility, at least in the short term.

    Host communities at the same time, face a sudden influx of population, increasing pressure on

                                                     9
scarce local resources – land, jobs, services for instance, fears of insecurity and illicit activities, and

risks to the social cohesiveness of their communities. To the extent that displaced populations move

into poorer or lagging hosting areas, with limited capacity to adjust, these pressures may exacerbate

pre-existing challenges to welfare and socio-economic mobility among the host community.

    The 2017 influx of the Rohingya from Myanmar to Bangladesh has remained overwhelmingly

concentrated in the border district of Cox’s Bazaar. It has implied a massive increase in localized

density in the two primary hosting sub-districts of Teknaf and Ukhia, which were already lagging

compared to the rest of the district in terms of human capital, access to services and jobs in growing

sectors, and reliance on low productivity agriculture and service sector jobs. While humanitarian

assistance has been largely successful in meeting the basic needs of the displaced Rohingya in terms

of food, shelter and water, sanitation and hygiene, like many other forcibly displaced populations,

they continue to face challenges in terms of access to formal education for their children, restrictions

on freedom of movement and limited livelihood options. Our survey has three rounds: a baseline

survey and then two further rounds of open-ended interviews. The baseline survey of 5,020 ran-

domly selected households from the Cox’s Bazaar population, split evenly between Rohingya and

their Bangladeshi hosts, was conducted between April and August 2019 (World Bank, 2019). It

consisted of two modules:

   1. A household questionnaire, primarily administered to an adult member of the household (age

      > 15) who is knowledgeable about the household’s day-to-day activities. The household ques-

      tionnaire included modules on household roster and composition, housing characteristics, food

      security, consumption, household income, sources of assistance, assets and anthropometrics for

      children under 5.

   2. An adult questionnaire administered to two randomly selected adult members of the household

      (age > 15) about their individual information and experiences. This included modules on labor

      market and labor market history, history of migration, access to health services, crime and

      conflict and mental health.

The qualitative, open-ended, questions were conducted in two subsequent survey rounds in October

to December 2020 and May and July 2021. We will refer to these three waves as the Round 1, Round

                                                    10
2 and Round 3, where Round 1 included the baseline quantitative survey, and Round 2 and Round 3

feature the open-ended interviews.

    For the qualitative interviews, we attempted to obtain information from a random sample of 25%

of the full sample (i.e. 1,255 households) in Round 2 and 50% of the baseline sample (i.e. 2,500

households) in Round 3. Some households we contacted were deemed ineligible because they did

not have any children, other households refused to be interviewed, and some of the recordings were

inaudible because of phone network disruptions. With this we have a completed sample of 1,040

interviews in Round 2, and 2,038 interviews in Round 3. Of the 3038 interviews conducted we

restrict ourselves in this analysis to households that whose eldest child lived with them and was still

of school-going age. This allows for a meaningful interview on the parent’s aspirations for the child,

and to link the child being referred to in the open-ended interview to their individual characteristics in

the baseline data. With this we lose about 901 interviews leaving us with 2177 for the analysis. Round

2 interviews on aspirations lasted around 15 minutes on average. However, to be consistent across the

two rounds we Round 3 interviews were longer as they covered two additional domains, although the

questions on aspiration were the same as in Round 2.7 Both sets of open-ended interviews in the two

rounds were conducted over the phone.

    Interviews began with a short quantitative, structured questionnaire to elicit the households’ edu-

cational ambitions for their children, which included a few questions on the impact that COVID had

on children’s education. After extensive pre-testing and piloting, the final qualitative interview proto-

col that followed at the end of the short education module consisted of the following two questions:

   1. Can you tell me about the hopes and dreams you have for your children?

   2. What have you done to help them achieve these goals?

Round 2 qualitative data was collected by five interviewers, supervised jointly by a local survey firm

and a subset of the authors of this paper. Interviewers for hosts were required to be verbally profi-

cient in the local Chittagonian dialect, with those interviewing the Refugees were also required to be

familiar with the Rohingya dialect. Interviewers who had participated in Round 2 were hired again
    7 The additional modules on well-being and inter-group relations extending the total interview duration for as long as up

to 40 minutes. We leave an analysis of these additional modules for future work.


                                                            11
for Round 3 supplemented by additional interviewers; Round 3 data was collected by a total of 12

interviewers (5 males and 7 females). Several days of training including practice mock sessions were

conducted before both rounds. The primary contents covered in the training included: i) an overview

of qualitative interviews to guide interviewers on the importance of probes and the usage of respon-

dents’ own words to ask follow-up questions; ii) Qualitative Interview Ethics to reiterate principles

of interviewing such as right to privacy of personal information; and iii) Probing Exercises which

required each interviewer to pen down examples of “leading” versus “good” probes. Additionally,

throughout the data collection in both rounds, interviewers participated in debriefing sessions, which

allowed interviewers to brainstorm with the full team on appropriate interview techniques and best

practices responding to any ethical challenges.


                                        [Table 1 about here.]


   A real-time dashboard demonstrating daily interview attempts, completed interviews, average

interview duration and similar tracking components was developed and used to guide interviewers on

pace and quality. The duration spent interviewing each of the domains was used as a quality flag.

When the duration of an interview was significantly different from the average, the recordings were

sent to supervisors for a thorough check. Both the supervisors and an external local language expert

were each randomly assigned 5 recordings each per day to check. Their aggregated comments would

later be taken to debriefs to discuss scope and specific areas of improvements.

   The following pipeline was put in place to produce clean transcripts of interviews. First, 12

interviewers conducted qualitative interviews using SurveyCTO and its built-in recording features.

Second, 16 transcribers prepared handwritten Bangali verbatim transcripts of the audio. Handwritten

transcripts were then typed and a team listened to randomly selected audio recordings and checked

for mismatches, typing and spelling errors. Third, A CATI system developed solely for uploading

transcripts was used to submit the typed transcripts. The Bengali transcripts were then translated into

English using the Google Translate API. A team of 12 translators appointed by the local firm were

additionally used to manually translate the Bengali transcripts into English. A smaller subset was

subsequently employed to correct the machine translated transcripts.



                                                  12
      While the interviews were conducted in Bengali, we work with English translations of the tran-

scripts. The qualitative coding described in the next Section was performed on the machine trans-

lations of the transcripts that had been manually corrected. We discuss the merits of machine vs

human translation in more detail in Section ??. Across both Round 2 and Round 3, the interviews

are on average 12.6 distinct question-answer pairs long, with each answer made up of 13.7 words on

average.

      In addition to the open-ended interviews, we also use several quantitative variables from the

baseline survey on household characteristics. Table 1 shows summary statistics for these variables.



4      Qualitative Analysis

4.1     Coding Tree

The development of the “coding tree” for the qualitative annotation exercise comprised of two distinct

steps. First, co-author Vijayendra Rao [henceforth, VR] employed a concept driven or deductive ap-

proach in defining three broad categories: Aspiration, Ambition and Navigational Capacity as the pri-

mary response classification goals. For the second step, a classical inductive approach was employed

by the three co-authors (Arshia Haque [henceforth, AH], Afsana Khan [AK], Monica Biradavolu

[MB]) who conducted focused reading exercise on a sub sample (of 40 transcripts) in producing 21

sub-codes and their respective definitions. We ensured that this initial reading included transcripts of

male and female, and host and refugee respondents to maximize the diversity in probable sub codes

at a very early stage. With the annotation sample as large as 400 for each of the two rounds, the in-

ductive approach we followed substantially improved coding efficiency in minimizing the discovery

of too many new codes, and thereby the time needed to revisit previous transcripts to annotate those

additions.

      Using Atlas-TI, a qualitative data analysis software, a two-person team (AH and AK) coded 789

transcripts. 400 interviews 9comprising 50% host and 50% refugee) were randomly drawn from the

1,040 transcripts to be coded in Round 2. A further 400 interviews, again equally split by refugee

status, were randomly drawn from the 2,040 transcripts in Round 3. Of these 800 allocated inter-



                                                  13
views, 11 were left uncoded due to either poor audio leading to missing data, call drops-offs, or very

short responses with no plausible code applicability. Coders were asked to annotate interviews at the

question-answer pair level to preserve granularity while being able to replicate the sub-division of

interviews in the unannotated documents.

                                       [Figure 1 about here.]

   Figure 1 shows the coding tree. The qualitative distinctions between aspiration and ambition

were adapted in this paper within the context and nature of “dreams” parents expressed for their

children. For example, concrete and measurable dreams for child (e.g wishing a child would become

a doctor, teacher, entrepreneur, or specific educational goals) was used as definition for ambition

while intangible, value oriented goals (e.g wishing the child to live with dignity or be a good human

being) was classified as aspiration. Aspirations, following Callard’s definition, were divided into

“Religious” and “Secular” . Ambition was divided into five major categories – education (further sub-

coded into High, Low, Neutral and Religious), Salaried Employment, Marriage, Entrepreneurship,

Migration, Vocational Training, and No Ambition. While ambition and aspiration came up at any

point in an interview, “capacity to aspire” or Navigational capacity only appeared in response to the

third question of the instrument i.e “what have parents been able to do to fulfill dreams for their

children?” Navigational Capacity was coded into six sub-codes – Low and High “Ability” and Low

and High “Budget”. There were also three additional codes that did not fit into the structure of

aspiration, ambition and navigational capacity. These additional codes were for Covid Impacts, Public

Assistance and Worries/Anxieties.

   Descriptions and examples of these codes are displayed in Appendix A.1, but Figure 2 includes a

few examples to illustrate some differences between aspirations, ambition and capacity.

                                       [Figure 2 about here.]

   Atlas-TI software was used to set up the human coded database. The data was first organized fol-

lowing Atlas-TI’s manual on ‘column control via field name prefixes’ to name each of the documents

using their Case IDs as well as to group the documents into preferable segments before using ‘sur-

vey import’ into Atlas Desktop. The project was then set up on the cloud version of Atlas-TI where

                                                 14
both coders could work independently. To review coded excerpts, projects were imported back into

Atlas’ desktop version to generate an Excel spreadsheet with desired variables and quotation sheets

segregated by codes.

      We follow a standard approach to ensuring cross-coder agreement, with each interview being

reviewed and any disagreements resolved through discussion betwen AH, AK and MB. Further details

on this are given in Appendix A.2.


4.2     First look at the annotations

While we will compare the human annotated sample to our enhanced sample in detail throughout the

remainder of the paper, Table 2 shows some summary statistics for the human annotations. We see

that annotations are very sparse at the question-answer pair level (for example only 3.0% question-

answer pairs are annotated as Religious Aspirations). However, when aggregated to the interview

level there is much less sparsity (for example, 23% of interviews have at least one question-answer

pair annotated as Religious Aspirations). There are also notable differences between rounds, which

should not be due to differences in coding as the same coders and coding tree were used across rounds.

A decrease in the question-answer pair level proportion is at least partly due to the longer interviews,

but differences in the proportion of interviews with at least one positive are plausibly due to changes

in circumstances/attitudes over the intervening year (which of course included a global pandemic).

For example we see an increase from 14.9% of interviews in R2 mentioning Covid Impacts to 27.2%

of interviews in R3.


                                         [Table 2 about here.]



5     Methodology

In this Section, we describe the NLP modeling approach we use to scale up our sub-sample of human

annotations to the whole corpus of interviews. First, we describe in general terms how our strategy

of enhancing a human coded sample with NLP works. Second, we provide some greater detail and

discussion on the options for supervised models, text representations and training method we use.


                                                  15
In the following Section, we then explain how to assess the value and performance of this enhanced

sample.


5.1     Approach

The overall goal of our approach is to use our subset of annotated interviews to provide reliable

annotations for the remainder of the sample. Broadly, we do this by training a series of classifier

models on our annotated set and then using this model to predict annotations for the unannotated

set. More concretely, for a total of N interviews, let Nh be the number for which we have high-quality

human annotations and Nm = N − Nh the number of interviews which have not been human annotated.

Our goal is to to create an “enhanced” sample in which we retain the Nh human annotations but add

machine annotations for the remaining Nm interviews.


                                         [Figure 3 about here.]


      We train and predict for each of the 24 annotations separately, so the model for Religious Aspira-

tion will be trained and make its predictions separately from the model for Secular Aspiration. Fur-

thermore, as mentioned in Section 4, the qualitative annotations are defined at the level of question-

answer pairs (QA). This allows us to represent each annotation as a binary classification problem at

the QA level.

      Figure 3 illustrates our overall methodology for a single annotation. On the left hand side we

see a “human” sample of size Nh , in which interviews include both text w and annotations y, and

a “machine” sample in which interviews include only the text. As annotations are defined at the

QA (question-answer pair) level, so we represent wh
                                                  i,s as the sth QA in interview i in the human

sample, with yh
              i,s being the binary annotation on that QA. In other words, if the annotation Religious

Aspiration, yh
             i,s will be equal to one if that QA has been annotated as displaying religious aspirations,

and will be zero otherwise.

      We then train some classifier f () parameterised by θ to predict yh                         h
                                                                        i,s based on the QA text wi,s .

As we will discuss below, there are many options for both the classifier we can use here, as well

as how to represent the text numerically. A key point here is that the text representation must by


                                                   16
full unsupervised - i.e. we do not use any information about y or any further information about

the interview subject when creating a numerical representation of the text. The text representation,

classifier and a variety of hyperparameters are chosen using k-fold cross-validation, as we discuss

in Section 5.2. Given this trained classifier we can then predict annotations a the QA level for our

                                                                      ˆm
unannotated “machine” sample. This gives us the predicted annotations y i,s .

   Training at this more granular level, rather than at the level of the whole interview has two advan-

tages. Firstly it allows for our qualitative coders to be more precise in their annotation: potentially

picking up multiple contradictory signals within a single interview, or allowing a comparison of the

frequency with which a signal appears within interviews. Secondly, it gives our NLP models a greater

number of more precise observations on which to be trained, while splitting up the interviews in a

way that we can replicate in the unannotated sample. If the documents was not in a question-answer

interview form, the annotation and training could be done at the sentence or paragraph level to give

similar advantages.

   We then aggregate the QA level annotations to the interview level using aggregation function

g(). The choice of this aggregation function is at least in part a substantive question that depends

on the research question. For example, if we take the mean value of y across QA pairs for each

interview this gives us a measure of the intensity with which this concept comes up. On the other

hand, if we take the maximum value across the interview this gives us a measure of interviews in

which this concept comes up at least once. We perform this aggregation for both the observed human
                                                              ˆ h and the “out-of-sample” predicted
annotations Y h , the “in-sample” predicted human annotations Y
                    ˆ m . The predicted annotations for the human sample can then by used to assess
machine annotations Y

the measurement errors introduced by the model. Particularly for the quantification of measurement

errors, we make extensive use of bootstrapping, but as this is conceptually separate from the core

intuition of our method, we leave a discussion of this to Section 5.2. The observed human annotations
                                                                     ˜ . Once we have verified
and machine annotations are then combined to give an enhanced sample Y

that the enhancement does indeed add value, we proceed with substantive analysis.

   We can then assess the value of this enhanced sample, as described in more detail in Section 6.

Broadly speaking, we test whether the enhancement introduces a bias, whether it increases efficiency



                                                  17
(i.e. reduces standard errors) and whether it increases the interpretability of substantive analysis. This

is an important step, as any interpretation of substantive results should be done with these assessments

in mind. Finally, we can use our larger enhanced sample for substantive analysis, taking advantage

of the larger sample size to identify effects that would not be observable with only the human anno-

tated sample. We describe this analysis and our results around ambition, aspiration and navigational

capacity in Section 7.


5.2       Modelling choices and Bootstrapping

There are many possible options for the numerical representation of the text representation w, the

classifier f () and the aggregation function g(). While the choice of aggregation function is something

that we leave to the researcher’s discretion, we use cross-validation to select the text representation

and the classifier. As we train the classifier for each annotation independently, this allows for the fact

that a different classification model of text representation may be optimal for different annotations.

Appendix B.1 gives an exhaustive list of the text representations, models and hyperparameters that

are selected over during cross validation.

      Cross validation. As we are working with the QA pair level data we have 9,964 distinct observa-

tions in the human annotated sample, which come from the 789 annotated interviews. In our baseline

case, we use the entire annotated sample as a training set and split it into three folds for cross valida-

tion.8 We then use a combination of a grid search and the Optuna hyperparameter tuning framework

Akiba et al. (2019) to choose the text representation, classifier and hyperparameters of that classifier

that give the best validation set performance (as measured by the F1 score).


                                             [Figure 4 about here.]


      Text representations. In order to use the text of the QA pairs as inputs in a classifier, we need

to represent them numerically. There are many possible ways to do this, but we select over several

commonly used text representations. We allow the text representation to vary along three dimensions,
   8 In
      Section 8 we show how varying the size of the annotated sample affects performance, which allows fully out-of-
sample analysis.




                                                        18
illustrated by the first three panels of Figure 4, which show the proportion of bootstraps in which each

representation is chosen.

   1. We include the answer of each QA pair only, or both the question and answer. As shown in the

      first panel of Figure 4, in most cases only the answer is selected. However, for some annotations

      like Education Neutral, both the question and answer are usually included.

   2. The text representation can be based on either the English translation or a transliteration of

      Bengali into Latin characters. As shown in the second panel of Figure 4, in most cases the

      English translation performs better.

   3. We select over a range of approaches to transform the text into numerical vectors. This include

      simple vectors based on phrase counts such as the CountVectorizer and TfidfVectorizer as well

      as vectors based on pre-trained word embedding models, described in detail in Appendix B.1.

      When using count based metrics we allow for both single and two word phrases.

By allowing selection over text representations each time we train a classifier, we account for the

fact that which text representation best captures relevant text features can vary across the different

annotations in our data.

    Classifiers. As with numerical representations of text, there are many choices of classification

model available. Our goal is not to argue for a certain model or approach to the classification task,

but rather to argue for flexibility as different models will perform better in different contexts. We

thus select over a range of popular classification models including logistic regression, random forests,

support vector machines and neural networks, see Appendix B.1. Each of these models have a sep-

arate set of hyperparameters that are chosen through k-fold cross validation. We then compare the

validation set performance of each model and choose that which performs best. Unsurprisingly, given

the sparse nature of our annotations and the small training set, we find that simpler classifiers such as

a random forest and logistic regression outperform larger models such as neural networks.

    As we will conduct out substantive analysis at the household level, we aggregate the annotations

into interview-level variables. There are of course different ways to do this, i.e. different choices of

the aggregation function g(). For the sake of clarity, in the remainder of this paper we will use the

                                                   19
mean annotation value across QA pairs within an interview. This gives us a quasi-continuous measure

between 0 and 1, where 0 would denote that the annotation does not appear at all in the interview and 1

would denote that every QA pair in the interview has that annotation. This mean aggregation therefore

gives a measure of the intensity with which an annotation appears in an interview, while controlling

for the interview’s overall length.

    Bootstrapping. We use two forms of bootstrapping to account for uncertainty in our predicted

annotations. Firstly, when using the entire available training set we re-train the models with a different

draw for the validation set split and any stochastic processes involved in training the model. Secondly,

as described in more detail in Section 8, we also train models on a subset of the training data which

is sampled without replacement.

    By using validation set performance to select over such a wide range of text representations and

classifiers, we seek to demonstrate that the specifics of the supervised model used to extend the sample

are not central to our approach. In fact, we advocate being flexible over these details the optimal text

representations and classifiers will differ across contexts. Our methodology is implemented in our

Python package iQual.

    Performance. Figure 5 shows the validation set performance, as measured by the F1 scores,

across each annotation and bootstrap run.9 As a natural benchmark, the annotation sparsity which

corresponds to performance under random guesses is shown in red. In all cases our text-based models

do much better than random guesses, suggesting that our enhanced sample will add value over the

human annotated sample. It is worth noting however that there is considerably heterogeneity across

annotations. In particular annotations that are associated with less concrete concepts, such as Aware-

ness Information Low, No Ambition and Vague Non Specific, appear to be more more difficult to

predict accurately. Furthermore, it is important to note that in all cases performance is imperfect -

we are introducing additional measurement error so we need to verify that the sample enhancement

is still worth it.

                                                 [Figure 5 about here.]

    Translation. In our main results we allow our model to select between machine translations
   9 As   we will show in Section 8, validation set performance is a good guide for true out of sample performance.


                                                             20
of our transcripts into English and a transliteration of the Bengali transcripts into Latin characters.

In Appendix B.2 we explicitly compare the validation set performance of these representations along

with a human translation of the interviews into English and the raw Bengali transcripts in their original

Bengali script. We find that the machine English translation outperforms the human translation in

most cases and in all cases the transliterated Bengali outperforms models trained on the raw Bengali

transcripts.10 This may be because in translating or transliterating the transcripts, we reduce some

variance in the text while preserving the relevant content. Additionally, machine translations may be

preferable to human translations because they will be more consistent across documents.



6      Assessing the Value of the Enhanced Sample

By enhancing our human annotated sample we increase the sample size, but introduce an additional

source of measurement error. A priori, we therefore do not know if the enhanced sample has added

value. Fortunately, we can assess the value of our enhanced sample once we have created it. By

quantifying the measurement errors in our validation sets and comparing results in the human and

enhanced sample we can assess whether our enhanced sample adds value along three dimensions:

bias, efficiency and interpretability.

                                               [Figure 6 about here.]

      Bias. If our machine annotations introduce a sizeable bias, this is obviously a problem for any

later analysis (e.g. if we always over-predict Secular aspirations for refugees we might get misleading

results in the enhanced sample). We therefore need verify that any results in our enhanced sample

are not driven by biases introduced in the predictions. In order to do this we test the association

of prediction errors with household characteristics for each interview described in Table 1. We use

the validation set predictions, and regress the implied prediction errors on a range of household

characteristics. The F-statistic of these regressions tests whether there is evidence of a significant

relationship between household characteristics and the predictions errors and so forms a natural test

of bias.
    10 The
         average validation F1 scores across all annotations are 0.558 for Machine translation, 0.542 for Transliteration,
0.535 for Human translation and 0.420 for Raw Bengali.


                                                           21
    Figure 6 shows the F-statistics for this bias test across each annotation. The test is carried about

for each bootstrap iteration (show as the hollow points) as well as for the mean value across bootstraps

(the solid points). The colour of each point indicates the significance level. A statistically significant F

statistic here indicates that there may be a bias in the prediction errors that is related to the household

characteristics.

    In three cases there is evidence at the 5% level of a relationship between household characteristics

and prediction errors - No Ambition, Education Neutral and Awareness Information Low, so we can

look at these in more detail. The regressions in question, shown in Appendix C indicate that No

Ambition, Education Neutral and Awareness Information Low are all under-predicted for refugees

(i.e. the prediction errors are positive) and that Awareness Information Low is over-predicted for

more educated parents. We will need to bear this in mind in our substantive analysis discussed in

Section 7.

    In addition to explicitly testing for bias, we also include a dummy variable for whether an in-

terview is machine or human annotated in any regressions using enhanced sample data. This will

account for any overall under or over prediction.

    Efficiency. Even if measurement error does not introduce a bias in the machine annotations, it

will add extra noise to these observations. However, we can quantify the variance of this noise and

account for it in our analysis. Following Elbers et al. (2003), we account for two of the types of error

in our machine annotations: idiosyncratic error (i.e. the prediction error) and model error (i.e. the

sampling errors in the model).11

    To approximate the model error, we bootstrap the model by sampling the interviews with re-

placement B times. This gives us an empirical distribution over the predictions based on the sampled

distribution. The variance of the machine annotations, taking model error into account, can then be

approximated by the variance across all of these bootstrap samples


                                                 2      1 N B ¯            2
                                               ˆyˆ =          ∑ y ˆ−y
                                                                    ˆb,i                                 (1)
                                                       BN i∑
                                               σ
                                                           =1 b=1


      ¯
      ˆ=
where y          1
                     ∑N    B
                               ˆb,i . This can be calculated either in the training set only, or also in the
                      i=1 ∑b=1 y
                BN
  11 The   authors thank Berk Ozler for his suggestions on this point.


                                                              22
out-of-sample predictions, but we find that the estimates are virtually identical in each.

    The idiosyncratic error can then be calculated as the difference between the observed yi and

ˆi . To ensure that these predictions are out of sample, we only use the validation set predictions to
y

compute these errors.12 The estimated variance of this idiosyncratic error, σ
                                                                            ˆε2 is then the variance of


the validation set prediction errors. Of course, this variance has to be calculated the human sample,

as these are the only observations for which we observe yi .

    Assuming that these errors are normally distributed, if the idiosyncratic and modeling errors are

independent then the estimated variance the machine annotated sample will be the sum of these two
           ˆm2 =σ
                ˆy2  ˆε2 . The estimated variance of the human annotated sample (σ
                                                                                 ˆh2 ) is simply the
variances: σ      ˆ +σ

variance of the Nh observed human annotations. This gives us an estimate for the enhanced sample as

a weighted sum of the estimated variances for the human and machine annotated samples.

                                                             ˆh2 +N σ 2
                                                  2       Nh σ     m ˆm
                                                ˆ enh
                                                σ     =                                                                (2)
                                                                 N

This demonstrates that even if our measurement errors are unbiased there is still potentially a trade-

off due to the increase in variance. As our NLP models are imperfect, we would in general expect
ˆm
σ 2 >σ
     ˆh2 . Enhancing our sample therefore increases the number of observations but also increases


the noise in the sample.

    Whether this sample-size vs variance trade-off is worth accepting of course depends on the context

in which we intend to use our enhanced sample. However, we can illustrate it with the standard error

on an estimate of the population mean. The standard error on the estimated mean using the enhanced

sample will include the weighted sum of the variance terms for the human and machine annotated

observations.
                                                              ˆh2 +N σ  2
                                                           Nh σ      m ˆm
                                               ˆ enh =
                                              se                                                                       (3)
                                                                  N2

The standard error on the estimate for the human sample will be of the usual form. The standard error

in the enhanced sample will therefore be smaller if a condition on the ratio of variances in the human
  12 Aswe show in Section 8, where we also compute errors for observations in a held-out test set, for a sufficiently large
sample size, performance in the validation and in a held out test set coincide.




                                                           23
and machine annotated samples, relative to the increase in sample size, is met:

                                            ˆm
                                            σ 2   Nm + 2Nh
                                              2
                                                <
                                            ˆh
                                            σ        Nh

Note that the right hand side here will always be greater than one, but the condition shows that adding

only a small number of highly noisy machine annotations may not make estimates of the population

mean more precise. For our entire sample, where Nh = 789 and Nm = 1618, then our enhanced sample
                                                                               ˆm
                                                                               σ 2
will have a smaller standard error for an estimate of the population mean if   ˆh
                                                                               σ 2   < 4.


                                          [Table 3 about here.]


    Table 3 shows these variances computed at the interview level for the mean predictions across

bootstraps in the cross-validated models. Standard errors for the population mean that have been

adjusted as described above are also shown. We can see that in all cases the standard error of the

population mean is lower than that of the human only sample. Enhancing the sample with our method

thus increases the precision of these estimates, in spite of the the fact that predictive performance of

our models is sometimes relatively low.

    We can thus think of the machine annotated sample as being subject to an additional measurement

error due to model and idiosyncratic noise. We can check for biases in these errors and estimate their

variance in the manner described above. Once the measurement error has been quantified, we can

make the appropriate adjustments.

    Interpretability. We assess interpretability of our enhanced sample in two complimentary ways.

Firstly, we compare the statistical significance of regressions of annotations on household charac-

teristics in the enhanced and human annotated samples. Secondly, we use a supervised topic model

trained on the predicted annotations.

                                        [Figure 7 about here.]

    Assuming text-based variables should be related to household characteristics, if our enhanced

sample has improved the interpretability of our analysis it should give stronger evidence of a relation-




                                                   24
ship between the annotations and household characteristics.13 We can therefore compare F statistics

for regression of annotations on household characteristics in the human and enhanced samples. If

the enhanced sample increases this F statistic relative to the human sample it suggests that the larger

sample leads to more interpretable results in spite of the greater measurement error.

    Figure 7 shows the F statistics of these regression in the human and enhanced sample. The F

statistic in the human sample is shown as the cross and the enhanced sample as a hollow circle for

each bootstrap iteration, with the solid circle for the mean prediction across bootstraps. In all cases

the F statistic in the mean enhanced sample is higher than in the human sample. In some cases this

difference is quite small though (e.g. Reliance on God) and in an individual bootstrap runs there is a

decrease in some cases. There is thus no guarantee of increased interpretability when we enhance our

sample, but in all our cases we see an increase for the mean across bootstrap iterations.

    An alternative sense of interpretability relates to the relationship between the predicted annota-

tions and the text itself. The classifier models we use to create our enhanced sample are in general

optimised for prediction rather to give directly interpretable relationships between the text and anno-

tations. However, once we have these predictions we can use an alternative model to assess which text

features are associated with an annotation in a more interpretable way. We thus estimate a supervised

topic model Blei and McAuliffe (2008) for our machine prediction of each annotation, based on the

interview text. We can thus verify that the topics most (and least) associate with the predictions for

each annotations roughly correspond to our definitions of that annotation.


                                                 [Figure 8 about here.]


    Figure 8 shows the output of these supervised topic models for the two aspiration annotations.

There are ten topics, represented along the vertical axes by the ten most highly weighted words

in each topic. Each topic is then associated with a coefficient where a positive coefficient means

that topic is more likely to be associates with that annotation. We can thereby verify that the text

features associated with the predictions for each annotation correspond to our understanding of each

annotation. In this case, we see that the topic most associated with secular aspirations highly weights
  13 Note  that this test does not require any assumptions on how the text and household characteristics are related, just the
relatively weak assumption that there is some relationship between them.


                                                             25
words such as “good”, “dream”, “human” and “educated”, consistent with our definition of secular

aspirations. Similarly, religious aspirations are associated with topics that place a high weight on

religious terms like “hafez” and “madrasa”, “god” and “allah”.



7    Results

The overarching trend is that the results with the enhanced sample have smaller standard errors. For

instance, if we compare the correlation matrix computed on the enhanced sample in Figure 9, with

that on the human sample (Appendix C); we see that the enhanced sample shows a much higher

proportion of statistically significant correlations. Crucially, the signs of correlations and coefficients

across the human and enhanced samples are the same. Our method thus appears to be successful in

increasing the available sample size but does not introducing a bias that changes interpretation.


                                         [Figure 9 about here.]


    Focusing on Figure 9, we first look at the correlations within each of the three code domains -

Ambition, Aspirations, and Capacity. Within the Ambition domain, having “no ambition” is nega-

tively correlated with wanting a high education, a salaried job or being an entrepreneur, but positively

correlated with marriage. Ambition for a high education is on the hand negatively correlated with

marriage but positively with salaried employment. It is interesting to note also that marriage is nega-

tively correlated with salaried employment suggesting that parents who are focused on getting a child

married are less likely to say that they want her to have a salaried job. Note that parents who want

their children to have a religious education tend also talk about wanting them to have higher levels of

secular education.

    Within the Aspiration domain, however, parents who profess to have Secular Aspirations for their

children do not say that they have Religious Aspirations suggesting that aspirations, in the sense that

Agnes Callard defines it, is capturing something different from ”ambition.”

    The codes in the Capacity domain - which attempt to capture Arjun Appadurai’s concept of Nav-

igational Capacity - tend to move in the same direction. People who display High Ability or good

navigational capacity also tend to have higher budgets and more information. Conversely people who

                                                    26
display Low Ability tend to have lower budgets, but are positively correlated with both Low and High

informational awareness suggesting that Low Ability is not necessarily a function of low information.

Similarly people with low information seem to report both high and low budgets.

      Looking at the correlations across domains, the codes in the Capacity domain are generally pos-

itively correlated with high ambition. People who show High Ability also have higher education

ambition, and more likely to want a salaried job for their children. Similarly people who have higher

budgets and high information awareness also more likely to have higher education ambitions and want

salaried jobs for their children. On the other hand, parents who report that they are budget constrained

tend to want a religious education for their children, and want them to be married, have vocational

training and entrepreneurial jobs.

      In the Aspirations domain, parents who report having Secular Aspirations for their children also

want higher levels of education and salaried jobs for their children, and tend to be of higher ability

and have high information awareness. Parents with Religious Aspirations for their children show a

positively correlation both with higher levels of Religious and Secular education but also report a

higher Reliance on God.


7.1     Ambition

Next, in Tables 4-9 we report results from a set of reduced form regressions where we regress the

codes against household characteristics from the 2019 “baseline” survey. We include refugee status,

number of children, whether it is a female headed household, the age of the head, the parent’s years

of education, whether s/he is religiously educated, whether the child is female, the household’s 2019

asset index, its 2019 income, and a trauma score for the head of the household. We report results

from the human-coded sub-sample and the enhanced sample next to each other for all the regressions

and again note that the enhanced sample regressions tend to be more precisely estimated.14 We also

control for whether the data is from the second qualitative round (which is Round 3 because the

baseline survey was Round 1), and in the enhanced sample regressions we include a dummy for

whether the household interview was human annotated.
  14 Across all regression tables, results using the human annotated sample are reported in odd numbered columns, and

results using the enhanced sample in even numbered columns.



                                                         27
                                           [Table 4 about here.]

      We start with the Ambition domain, reporting Education results in Table 4. Parents with higher

levels of education have higher education ambitions for their children, and are less likely to want

a religious education. Wealthier parents with more household assets also report higher education

ambition. However, parents report lower education ambition for their female children. Parents also

report lower Religious Education ambitions for female children, as do more educated parents. Parents

with a religious education, however, also report wanting their children to be religiously educated.

                                           [Table 5 about here.]

      Table 5, which reports results on Employment ambition, shows that parents of female children and

less likely to want report wanting them to have salaried employment or to be entrepreneurs. More

educated parents are more likely to want their children to have salaried employment. Religiously

educated parents are more likely to want their children to have vocational training and less likely to

want their children to be entrepreneurs.

                                           [Table 6 about here.]

      Table 6 reports results from parents who report “no ambition” and ambitions for marriage and

migration. Refugees are more likely to not have ambitions for their children, and female heads of

household are less likely to do so. We should bear in mind here that our bias tests found that the “no

ambition” code was systematically under predicted for refugees in our machine predictions (Table 23).

The coefficient on refugee status in the enhanced sample is thus likely an underestimate, although it

remains positive and significant. Parents with a female eldest child are much more likely to have

marriage oriented ambitions, and less likely to speak about migration.


7.2     Aspirations

Table 7 reports results on Aspirations - coded into either Secular or Religious. Interestingly parents

are less likely to report having aspirations, both secular and religious, for female children. More

educated parents tend to report more secular aspirations, as do younger parents. While parents with a

religious education are more likely to have religious aspirations for their children.

                                                    28
                                         [Table 7 about here.]


7.3     Navigational Capacity

Moving to the codes categorized under Navigational Capacity. We first look at Low and High Ability

and Low and High Budget in Table 8. Note that refugees are less likely to be coded as having low

ability. This is likely because of the selection process associated with having escaped war and conflict

and having reached the relative safety of the camp which would make surviving refugees people who

are more capable than average. Ability Low is also less likely to be present with more educated par-

ents, and those from wealthier households. Ability High shows consistent results with more educated

parents more likely to be those of high ability, while older parents and female headed households are

less likely to demonstrate high levels of ability. Refugees, understandably, are more likely to talk

about being constrained by budgets as are less wealthy households, less educated educated parents

and parents with larger numbers of children. More educated parents, consistently, are more likely to

report having relatively high budgets.


                                         [Table 8 about here.]


      Table 9, reports results from the other Ability codes - information awareness, reliance on God

and ”vague-non-specific answers.” Refugees are less likely to give vague non-specific answers to the

navigational capacity question, and are also less likely to report a reliance on God as a response to the

question. Not surprisingly, better educated parents are also likely to have high information awareness

and less likely to report a reliance on God. Parents of girls more likely to show low informational

awareness and less likely to show high informational awareness.


                                         [Table 9 about here.]


      In summary, the three domains of codes for aspirations, Ambition, Aspiration and Ability show

some interesting patterns. First, they seem to be distinct concepts with different determinants. Callard’s

distinction between Ambition and Aspiration is important, and Appadurai’s notion of Navigational

Capacity also matters - just having an ambition or aspiration for your child is not enough, there is


                                                   29
a lot of heterogeneity in the capacity of parents to know how to achieve these goals. It is affected

by their ability to articulate a clear strategy of how to get there, but also by constrained budgets and

information. There is clear evidence of gender bias - parents of girls tend to have less education

and job ambitions for them and seem to be more focused on marriage. There is some evidence that

refugees tend to be of higher ability than average, but more constrained by budgets. Finally, it is clear

that parents have both religious and secular aspirations for their children that seem to be distinct from

each other - with religious aspirations more likely to be professed by parents who have a religious

education.


7.4     Does Qualitative Add Value?

Aspirations related to education goals and ambitions have been extensively studied by economists,

which raises the question of whether the open-ended questions on ambition add value. To answer this

we included a standard structured question on education ambition in round 2, in addition to the open-

ended questions. We regressed the same set of exogenous variables on the quantitative education

response, and then added the qualitative ambition codes to the regression - results are reported in

Table 10.


                                         [Table 10 about here.]


      The quant and qual education ambition results have very similar signs and significance levels,

but the qual question seems to add nuance and capture additional variation. The refugee coefficient

on quant education ambition is strongly negative which would lead us to to believe that refugees

have much lower education ambitions than hosts. The qualitative code regressions (reported in Table

6), however, reveal a more complex interpretation. Refugees have similar levels of high and low

education ambition as hosts. However, they are much more likely to report “Neutral” Education

Ambition. Neutral is the code we used for situations where respondents expressed helplessness in

context of ambitions or said that were unable to have dreams or plans on a given topic. This tells us

that it is not that refugees are less ambitious on education than hosts, but that having had disrupted

lives they have more trouble expressing a clear education ambition. Table 12 also shows that when the


                                                   30
qual education ambition codes are added to the reduced form regression on quant education ambition,

they add explanatory power without substantially changing the coefficients on the original set of right

hand side variables suggesting that are capturing additional variation.

      While ambition is relatively easy to turn into a structured question, latent concepts like “aspi-

ration” and “capacity” are harder because they are more subtle. Since we conducted these surveys

in the COVID period we were unable to conduct the extensive field work needed to develop and

pretest good structured questions on (Callardian) aspirations and navigational capacity to enable a

direct comparison between quant and qual versions of these concepts. However, we show that asking

relatively straightforward open-ended questions on aspirations and navigational capacity captured a

great deal of content. This raises the question of whether (a) developing a quantitative module would

have added value to these intrinsically more subtle latent concepts, and (b) whether relying entirely

on quantitative representations could detract from understanding the point of view of respondents.



8      Varying sample sizes

A key question for researchers looking to scale up manual annotations using supervised models is

how many of their documents to annotate in order to make the most of their data. In this Section, we

vary the size of both the human annotated (Nh ) and machine annotated (Nm ) samples to explore this

question. We find that, while out-of-sample performance and interpretability of results increases with

the number of human annotated interviews, both of these display diminishing returns to scale. Results

remain largely the same if at least 400 interviews are annotated. While these results are encouraging,

further work will be needed to test whether similar results will be found in other datasets.

      We also show in a simple cost-benefit-analysis exercises based on either (i) maximising the aver-

age F statistic in enhanced sample regressions or (ii) minimising the standard errors of key estimated

coefficients, recommend a role for machine annotation.


8.1     Performance

Increasing Nh will improves the out-of-sample performance of the classifier models as they can be

trained on a larger sample. However, annotating extra interviews without compromising the anno-

                                                   31
tation quality is both time consuming and expensive. It is therefore useful to explore how much

performance improves as the annotated sample size Nh increase. To assess this we draw subsets of

our human annotated sample without replacement, varying the sample size from 100 to the full 789

(drawing a 10 separate samples for each possible Nh ). We then train our classifier models on these

subsets, selecting the text representation and model independently in each case. Given that we now

no longer use the entire human annotated set for training, we can also quantify performance in the

out-of-bag test set as well as the validation set performance of each model.

      As Nh increases, there are improvements in both validation set and out-of-bag test set perfor-

mance. When the human annotated sample is very small, out-of-bag test set performance is worse

than validation set performance, however once we have 400 annotations, validation set and test set

performance are pretty much the same for most annotations. Averaging across annotations, moving

from 100 to 200 human annotated interviews increases the out of sample F1 score by 0.05, moving to

300 then gives an extra 0.03, moving to 400 an extra 0.013. So while increasing the size of the human

annotated set does improve prediction performance, there do appear to be diminishing returns here.15

The validation and test set performance for each annotation as Nh varies is shown in Appendix D. vary

with Nh for each annotation. Whether our machine annotations introduce bias or increase efficiency

depends on the out-of-sample performance of the supervised models. In line with the results on per-

formance, we find that a larger human annotated sample reduces bias and decreases measurement

error, but that this also displays diminishing returns after around 400 or 500 annotated interviews.


8.2     Interpretability

Interpretability can be affected by increasing both Nh and Nm . In other words, we may be able to

get stronger results by either annotating more interviews or by conducting additional interviews and

machine annotating them. For each of the models trained on samples from 100 to the full 789 hu-

man annotated interviews we therefore generate predictions for a randomly sampled subset of the

unannotated interviews, from 200 to 1,400 (again drawing a 10 separate samples for each possible
   15 There may of course be non-linearities in the performance of the text based classifier models, particularly if for very

large samples a more sophisticated model becomes feasible. However, for the number documents that it would be realistic
to annotate manually this is unlikely to be a concern.




                                                            32
combination of Nh and Nm ).

    Our enhanced sample interpretability measure - the F statistic of a regression of the annotation

on household characteristic - generally increases as extra interviews are added through Nh (on the

horizontal axis) and Nm (on the vertical axis). Averaging across all annotations, we find adding 100

more human annotated interviews increases the F statistic in the enhanced sample by 8.4% while

adding 100 more machine annotated interviews increases it by an average of 6.1%. Annotating an

additional 100 existing interviews by hand therefore increases the F statistic by around 2.3%. On

average, an additional interview therefore has around 3 times the benefit of annotating an existing

interview. This interpretability measure for each annotation as both Nh and Nm vary is shown in

Appendix D.

                                        [Figure 10 about here.]

    We can also look at the effect of increasing Nh while holding N = Nh + Nm fixed. Intuitively, we

can think of this as adding human annotations to some of the existing interviews that are currently

machine annotated. Figure 10 shows show the F statistic test for interpretability changes for each

annotation as Nh is increased while N is constant. In blue we see the F statistic for the human only

sample - a higher Nh will of course increases this as it increases the size of the human sample and so

we get a more statistically significant relationship between the text and household characteristics. In

green we see the F statistic on the enhanced sample. While a higher Nh does generally increase the

F statistic in the enhanced sample (as predictions are more accurate) the overall sample size doesn’t

increase.

    Interestingly, while the enhanced sample F statistics in Figure 10 do increase somewhat with

Nh , this increase is relatively small suggesting that it may be sufficient to annotate a relatively small

number of interviews. In many cases there appear to be diminishing returns to extra annotations -

at some point the enhanced sample is good enough that adding additional annotations doesn’t really

make a difference.

    As an alternative to the very general approach of focusing on the F-statistic across all annotations,

we can instead focus on how estimates of specific coefficients change as the sample sizes change. This

approach may be more appropriate in many applications where there will be a specific effect or effects

                                                   33
that are of primary interest. To illustrate this, Figure 11 shows how two coefficients of interest from

Section 7 (the coefficient on refugee status for ability low and the coefficient on Female eldest child in

Secular Aspirations) vary as the size of the human annotated and machine annotated samples changes.

This distribution of the coefficients in the enhanced sample are shown in red and the human sample

in blue. Rather strikingly, the enhanced sample coefficients do not change very much with Nh , again

suggesting that enhancing the sample is useful even with a very small number of human annotated

interviews.

                                                [Figure 11 about here.]


8.3     A cost benefit analysis

To give a sense of the trade-offs a researcher may face in deciding on how many interviews to conduct

and how many to annotate we conduct a simple cost benefit analysis exercise based on the results

discussed above. In our case, the marginal cost of conducting a single additional interview was

around $12 while the marginal cost of annotating one additional interview was around $3 (all costs

here are given in 2021 US dollars).16 For a given budget (between $10,000 and $20,000) we then find

the combination of Nh and Nm that maximises some objective. We report results under three different

objectives here:

   1. Maximising the average F statistic in the enhanced samples for a regression of annotation on

        household characteristics, across all annotations (i.e. Figure 7).

   2. Minimise the 95% confidence interval of the refugee status coefficient in the enhanced sample

        regression on ability low (i.e. upper panels in Figure 11).

   3. Minimise the 95% confidence interval of the refugee status coefficient in the enhanced sample

        regression on ability low (i.e. upper panels in Figure 11).

      For each objective, increasing either Nh or Nm will generally lead to a better outcome, but will be

more expensive. In our case, disregarding unannotated interviews for which we have missing data on
  16 This cost figure for annotation is likely a substantial underestimate as the major difficulty with high quality annotation
is finding annotators with the adequate skills.


                                                             34
household characteristics, we conducted 2,270 interviews of which 789 where annotated. Ignoring

fixed costs, this had an estimated cost of $30,000.


                                        [Figure 12 about here.]


    For a given budget and starting point, we can thus calculate out the optimal mix of Nm and Nh .

Figure 12 shows how different combinations of Nh and Nm perform across the three objectives we

consider. These points can be thought of as forming iso-cost curves: for a given budget we can

choose the allocation across Nh and Nm that maximises our objective. Unsurprisingly, this curve

is considerably smoother when the average enhanced sample log F statistic is the objective as this

encompasses all annotations and household characteristics. In contrast, when the objective is a single

coefficient there is likely a lot more idiosyncratic noise in the performance for a given combination

of Nh and Nm .


                                         [Table 11 about here.]


    Table 11 shows the optimal combination of Nh and Nm for budgets of $10,000, $15,000 and

$20,000. In each case, a mix of human annotated and machine annotated interviews appears to

be preferred, suggesting that there is value in the enhanced sample procedure set out in this paper.

Interestingly, in the case of the most general objective (the average F statistic), our exercise suggests

that 500 interviews be human annotated and then any extra budget be used for machine annotated

interviews. This further suggests that a relatively small human annotated sample can lead to good

results when combined with machine annotation of a larger sample.



9    Conclusion

Interpretative qualitative analysis, which is a common tool in anthropology, sociology and related

disciplines, is not used by economists but is potentially of considerable value. It is predicated on

a close, careful, inductive, and nuanced human reading and coding of textual information – usually

on open-ended interviews with respondents. The method is “reflexive” in that it allows for data to

be collected and analyzed in a more bottom-up manner that is driven more by respondents than by

                                                   35
researchers. Instead of requiring respondents to provide quantitative responses to questions that may

force false precision on a latent concept, it allows respondents to speak about a topic in a manner that

is closer to how they understand the concept resulting in more accurate and nuanced responses.

    Interpretative qualitative analysis could be potentially very useful for a variety of topics of interest

to contemporary economists such as well-being, cultural change, social norms, networks, decision-

making and the topic of this paper – aspirations. The could also be potentially of great value in

understanding change processes and mechanisms in experiments and randomized trials. However,

the high level of human effort required in employing the method has generally restricted its use to

small samples. This is one reason why it has not been used by economists. This paper present a

machine learning method to extend interpretative human coding (iQual) to large, representative sam-

ples. The method takes a smaller sub-sample which is coded by human, interpretative, coders. This

human-coded sub-sample is used as a training set to use extend to a larger, representative “enhanced”

sample that allows us to standard econometric tools to analyze the data. Rather than recommend a

particular text representation or supervised model, we select both to optimise for out-of-sample pre-

dictive performance. We demonstrate that this sample enhancement adds value by testing for bias and

showing that the enhanced sample increases the efficiency and interpretability of analysis.

    We apply the method to over 2,000 open-ended interviews on parent’s aspirations for children

collected from a representative sample of Rohingya refugees and their Bangladeshi hosts in Cox’s

Bazaar, Bangladesh. In the open-ended interviews we are able to show that aspirations have dimen-

sions that are much broader than how they have generally understood in the economics literature. In

economics aspirations have generally been views as “ambition” – specific goals on education or jobs

that parents have for their children. We show that open-ended interviews allow this to be broadened

to include the moral and spiritual dimensions of aspirations – for instance being a “good person”

or a “good Muslim,” and to understand a parent’s “navigational capacity” - their ability to act in a

way that allows aspirations to be realized. We show that these three distinct domains of “ambition,”

“aspiration” and “navigational capacity” are correlated in interesting ways with each other, and have

distinct relationships with exogenous household characteristics.

    We explore the role of the size of the human annotated sample for the value of the sample en-



                                                    36
hancement through a series of simulations. We find that, in our application at least, annotating even

a relatively small number of interviews and scaling these up with NLP can be a cost effective way of

analysing a large corpus of open ended interviews. These simulations also demonstrate the robustness

of our results to different annotated sets.

    This paper comes with a Python package, iQual (https://github.com/worldbank/iQual) that imple-

ments the supervised models and various tests that we perform.




    Author affiliations.

    Julian Ashwin: Department of Economics, London Business School.

    Vijayendra Rao: World Bank Group.

    Monica Biradavolu: Qual Analytics.

    Aditya Chhabra: World Bank Group.

    Arshia Haque: World Bank Group.

    Afsana Khan: Princeton School of Public and International Affairs, Princeton University.

    Nandini Krishnan: World Bank Group.




                                                 37
A     Qualitative coding
A.1   Coding Tree
                                           [Table 12 about here.]

                                           [Table 13 about here.]

                                           [Table 14 about here.]

A.2   Achieving Cross-coder Agreement
To achieve agreement between coders, two coders [AH and AK] first applied the codes to 30 transcripts
in Atlas-TI. The coded excerpts were shared in an Excel matrix that was reviewed by MB. Any unclear
applications of codes were identified, discussed, and resolved in weekly meetings. The process of review
and resolution was conducted throughout the coding process, in batches of approximately 60 until all 789
were coded. The continuous review process not only reduced disagreement between coders but also led to
the creation of new codes and a deeper understanding, and sharper definitions, of certain codes.
    Table 15 illustrates the process by which codes were refined to be more nuanced and context-specific
as a result of the review process. As an example take expressions of religious aspirations and ambitions.
Initially, when a parent stated that they wanted their child to be a maulvi or be alem/alemdar or hafez, or
wanted their child to go to a madrassa or noorani school, these instances were coded as Religious Aspira-
tion. After review and seeking expert input, we understood that these references should not just be coded for
religious aspiration, but also for religious ambition, specifically for Ambition:Education:Religious. Further,
this religious education ambition could be scaled using ranked codes: Ambition:Education:High, Ambi-
tion:Education:Neutral or Ambition:Education:Low. As a result, the definitions for both the aspirations and
the ambition group of codes were better specified, leading to a deeper understanding of respondents’ hopes
and dreams for their children.

                                           [Table 15 about here.]

    To account for instances where the two coders (AK and AH) and the coding reviewer (MB) did not agree
on a code, we created a 3-level ranking system for each code - “fuzzy”, “reliable”, and “very reliable”. At
the end of each batch of coding, the two coders ranked each code on whether they considered their own
application of codes to be fuzzy, reliable, or very reliable. The reviewer similarly ranked each code using
the same scale. Whenever there was a mismatch in ranks provided by these three individuals, quotations
under that code would be refined to reach a clearer definition.
    In the example shown in Table 16, MB rated the code “Salaried Employment” as fuzzy as she ob-
served religious jobs such as “madrassa teacher” coded under salaried employment by both coders. This
was resolved by further refining the “Salaried Employment” code and creating further sub-codes to separate
different types of jobs that parents aspired for their children. On the other hand, the “Vocational Train-
ing” code considered as “very reliable” because each coder evaluated that the application of this code was
unproblematic, and the reviewer agreed with this assessment.

                                                     38
                                           [Table 16 about here.]

The goal of the process was to ensure that at the end of each review process, both the coders and the reviewer
agreed that all codes were assigned the rank of “very reliable”.


B     Modelling
B.1   Model details
                                           [Table 17 about here.]

                                           [Table 18 about here.]

                                           [Table 19 about here.]

                                           [Table 20 about here.]

                                           [Table 21 about here.]

                                           [Table 22 about here.]

B.2   Translation methodology
                                           [Figure 13 about here.]


C     Additional Results
                                           [Table 23 about here.]

                                           [Table 24 about here.]

                                           [Figure 14 about here.]


D     Varying Sample Size
                                           [Figure 15 about here.]

                                           [Figure 16 about here.]




                                                     39
References
Akiba, T., Sano, S., Yanase, T., Ohta, T. and Koyama, M. (2019), Optuna: A next-generation hyperpa-
  rameter optimization framework, in ‘Proceedings of the 25th ACM SIGKDD international conference on
  knowledge discovery & data mining’, pp. 2623–2631.

Alexander, J. T., Andersen, R., Cookson Jr, P. W., Edin, K., Fisher, J., Grusky, D. B., Mattingly, M. and
  Varner, C. (2017), ‘A qualitative census of rural and urban poverty’, The Annals of the American Academy
  of Political and Social Science 672(1), 143–161.

Andre, P., Haaland, I., Roth, C. and Wohlfart, J. (2021), ‘Narratives about the macroeconomy’.

Apel, M. and Grimaldi, M. (2012), ‘The information content of central bank minutes’, Riksbank Research
  Paper Series No. 92 .

Appadurai, A. (2004), ‘The capacity to aspire: Culture and the terms of recognition’, Culture and Public
  Action, ed. Vijayendra Rao and Michael Walton, Stanford, California: Stanford University Press pp. 59–
  84.

Armborst, A. (2017), ‘Thematic proximity in content analysis’, Sage Open 7(2), 2158244017707797.

Ashwin, J., Kalamara, E. and Saiz, L. (2021), ‘Nowcasting euro area gdp with news sentiment: a tale of two
  crises’.

Baker, S. R., Bloom, N. and Davis, S. J. (2016), ‘Measuring economic policy uncertainty’, The Quarterly
  Journal of Economics 131(4), 1593–1636.

Berezkin, Y. E. (2015), ‘Folklore and mythology catalogue: its lay-out and potential for research’, The
  Retrospective Methods Network (S10), 58–70.

Blei, D. M. and McAuliffe, J. D. (2008), Supervised topic models, in ‘Advances in Neural Information
  Processing Systems’, pp. 121–128.

Bonikowski, B. and DiMaggio, P. (2022), ‘Mapping culture with latent class analysis: A response to eger
  and hjerm’, Nations and Nationalism 28(1), 353–365.

Bonikowski, B. and Nelson, L. K. (2022), ‘From ends to means: The promise of computational text analysis
  for theoretically driven sociological research’, Sociological Methods & Research 51(4), 1469–1483.

Callard, A. (2018), Aspiration: The agency of becoming, Oxford University Press.

Chen, N.-C., Drouhard, M., Kocielnik, R., Suh, J. and Aragon, C. R. (2018), ‘Using machine learning
  to support qualitative coding in social science: Shifting the focus to ambiguity’, ACM Transactions on
  Interactive Intelligent Systems (TiiS) 8(2), 1–20.

Correa, R., Garud, K., Londono, J. M., Mislang, N. et al. (2017), ‘Constructing a dictionary for financial
  stability’, IFDP notes. Board of Governors of the Federal Reserve System, Washington, DC .

                                                       40
Crowston, K., Liu, X. and Allen, E. E. (2010), ‘Machine learning and rule-based automated coding of
  qualitative data’, proceedings of the American Society for Information Science and Technology 47(1), 1–
  2.

Elbers, C., Lanjouw, J. O. and Lanjouw, P. (2003), ‘Micro-level estimation of poverty and inequality’,
  Econometrica 71(1), 355–364.

Espeland, W. N. and Stevens, M. L. (1998), ‘Commensuration as a social process’, Annual review of sociol-
  ogy 24(1), 313–343.

Ferguson-Cradler, G. (2021), ‘Narrative and computational text analysis in business and economic history’,
  Scandinavian Economic History Review pp. 1–25.

Filmer, D. and Pritchett, L. H. (2001), ‘Estimating wealth effects without expenditure data—or tears: an
  application to educational enrollments in states of india’, Demography 38(1), 115–132.

Fruttero, A., Muller, N. and Calvo-Gonzalez, O. (2021), The power and roots of aspirations, Technical
  report, World Bank.

Genicot, G. and Ray, D. (2017), ‘Aspirations and inequality’, Econometrica 85(2), 489–519.

Genicot, G. and Ray, D. (2020), ‘Aspirations and economic behavior’, Annual Review of Economics 12.

Gentzkow, M., Kelly, B. and Taddy, M. (2019), ‘Text as data’, Journal of Economic Literature 57(3), 535–
  74.

Hansen, S., McMahon, M. and Prat, A. (2018), ‘Transparency and deliberation within the fomc: a compu-
  tational linguistics approach’, The Quarterly Journal of Economics 133(2), 801–870.

Jayachandran, S., Biradavolu, M. and Cooper, J. (2021), Using machine learning and qualitative interviews
  to design a five-question women’s agency index, Technical Report 21-104, Northwestern Global Poverty
  Research Lab Working Paper.

Karamshuk, D., Shaw, F., Brownlie, J. and Sastry, N. (2017), ‘Bridging big data and qualitative methods
  in the social sciences: A case study of twitter responses to high profile deaths by suicide’, Online Social
  Networks and Media 1, 33–43.

Larsen, V. H., Thorsrud, L. A. and Zhulanova, J. (2021), ‘News-driven inflation expectations and informa-
  tion rigidities’, Journal of Monetary Economics 117, 507–520.

Liew, J. S. Y., McCracken, N., Zhou, S. and Crowston, K. (2014), Optimizing features in active machine
  learning for complex qualitative content analysis, in ‘Proceedings of the ACL 2014 Workshop on Lan-
  guage Technologies and Computational Social Science’, pp. 44–48.

Loughran, T. and McDonald, B. (2011), ‘When is a liability not a liability? textual analysis, dictionaries,
  and 10-ks’, The Journal of Finance 66(1), 35–65.

                                                     41
              uttmann, L. (2018), ‘Benign effects of automation: New evidence from patent texts’, Avail-
Mann, K. and P¨
  able at SSRN 2959584 .

Michalopoulos, S. and Xue, M. M. (2021), ‘Folklore’, The Quarterly Journal of Economics 136(4), 1993–
  2046.

Nimark, K. P. and Pitschner, S. (2019), ‘News media and delegated information choice’, Journal of Eco-
  nomic Theory 181, 160–196.

Nyman, R., Kapadia, S. and Tuckett, D. (2021), ‘News and narratives in financial systems: exploiting big
  data for systemic risk assessment’, Journal of Economic Dynamics and Control 127, 104119.

Parthasarathy, R., Rao, V. and Palaniswamy, N. (2019), ‘Deliberative democracy in an unequal world: A
  text-as-data study of south india’s village assemblies’, American Political Science Review 113(3), 623–
  640.

Rao, V. (2022), ‘Can economics become more reflexive? exploring the potential of mixed-methods’, World
  Bank Group Policy Research Working Paper (9918).

Ray, D. (2006), ‘Aspirations, poverty, and economic change’, Understanding poverty 1, 409–421.

Roberts, M. E., Stewart, B. M. and Airoldi, E. M. (2016), ‘A model of text for experimentation in the social
  sciences’, Journal of the American Statistical Association 111(515), 988–1003.

Romer, C. D. and Romer, D. H. (2004), ‘A new measure of monetary shocks: Derivation and implications’,
  American Economic Review 94(4), 1055–1084.

Shapiro, A. H., Sudhof, M. and Wilson, D. J. (2020), ‘Measuring news sentiment’, Journal of Econometrics
  .

Shiller, R. J. (2020), Narrative economics, Princeton University Press.

Small, M. L. (2009), ‘How many cases do i need?’ on science and the logic of case selection in field-based
  research’, Ethnography 10(1), 5–38.

Tetlock, P. C. (2007), ‘Giving content to investor sentiment: The role of media in the stock market’, The
  Journal of Finance 62(3), 1139–1168.

Wiedemann, G. (2019), ‘Proportional classification revisited: Automatic content analysis of political mani-
  festos using active learning’, Social Science Computer Review 37(2), 135–159.

World Bank (2019), Cox’s bazaar - baseline survey (april 2019-august 2019): Baseline information docu-
  ment, Technical report, Poverty Global Practice, World Bank.

Yordanova, K. Y., Demiray, B., Mehl, M. R. and Martin, M. (2019), Automatic detection of everyday social
  behaviours and environments from verbatim transcripts of daily conversations, in ‘2019 IEEE Interna-
  tional Conference on Pervasive Computing and Communications (PerCom’, IEEE, pp. 1–10.

                                                    42
List of Tables
  1    Quantitative variable summary statistics . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   44
  2    Qualitative human annotations summary . . . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   45
  3    Measurement error variances . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   46
  4    Educational ambition variables and household characteristics . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   47
  5    Employment ambition variables and household characteristics . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   48
  6    Other ambition variables and household characteristics . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   49
  7    Aspiration variables and household characteristics . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   50
  8    Ability, Budget and household characteristics . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   51
  9    Other Navigational Capacity variables and household characteristics         .   .   .   .   .   .   .   .   .   .   .   .   .   52
  10   Quant Education Ambition . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   53
  11   Cost Benefit Scenarios . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   54
  12   Definitions and Examples from transcripts of Aspiration . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   55
  13   Definitions and Examples from transcripts of Ambition . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   56
  14   Definitions and Examples from transcripts of Navigational Capacity          .   .   .   .   .   .   .   .   .   .   .   .   .   57
  15   Coding religious education . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   58
  16   Resolving disagreement . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   59
  17   Statistical methods for text vectorization . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   60
  18   Pre-trained embeddings for text vectorization . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   61
  19   Classifier Options I . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   62
  20   Classifier Options II . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   63
  21   Classifier Options III . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   64
  22   Classifier Options III . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   65
  23   Annotations with evidence of bias . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   66
  24   Quant Ambition: full results with coefficients for all quant variables      .   .   .   .   .   .   .   .   .   .   .   .   .   67




                                                    43
                 Table 1: Quantitative variable summary statistics

Statistic                       N      Mean     St. Dev.   Notes
Refugee status                 2,407   0.460     0.498     Dummy variable, 1 for refugee
Female eldest child            2,294   0.525     0.499     Dummy variable, 1 for female
Eldest child’s age             2,295   10.069    4.963     Integer
Female household head          2,407   0.185     0.388     Dummy variable
Household head’s age           2,287   33.231   10.425     Integer
Number of children             2,405   2.675     1.463     Integer
Parent’s years of education    2,398   3.548     3.837     Integer
Parent’s religious education   2,398   0.036     0.186     Dummy variable
Asset Index                    2,406   0.147     1.820     Principle Component of assets
                                                           owned following
                                                           Filmer and Pritchett (2001)
Household Income               2,407   1.125     2.340     Income for last month
                                                           in 10,000s Bangladeshi taka
Trauma Event Score             2,287   2.641     2.410     Sum of positive responses for
                                                           experience of twelve possible
                                                           traumatic events following
                                                           Harvard Trauma Questionnaire
Quant Education Ambition       1,267   4.272     1.657     Ordered categorical (1-7) from
                                                           question on parents’ ambitions
                                                           for eldest child’s education




                                           44
                Table 2: Qualitative human annotations summary

                                           Proportion of QA pairs   Proportion of interviews
Category      Annotation                    R2      R3     Total     R2      R3       Total
              Religious                    0.036   0.028   0.030    0.208   0.234    0.230
Aspirations
              Secular                      0.053   0.028   0.036    0.332   0.332    0.360
              No Ambition                  0.021   0.013   0.016    0.129   0.129    0.137
              Salaried Employment          0.094   0.101   0.099    0.310   0.362    0.354
              Vocational Training          0.007   0.004   0.005    0.041   0.041    0.045
              Entrepreneur                 0.031   0.012   0.018    0.154   0.122    0.150
              Education Low                0.013   0.035   0.028    0.094   0.475    0.312
Ambition
              Education Neutral            0.185   0.062   0.101    0.772   0.574    0.691
              Education High               0.064   0.048   0.053    0.375   0.454    0.427
              Education Religious          0.035   0.016   0.022    0.210   0.168    0.198
              Marriage                     0.082   0.036   0.050    0.385   0.396    0.418
              Migration                    0.022   0.007   0.012    0.104   0.079    0.097
              Vague Non-Specific           0.066   0.017   0.033    0.420   0.234    0.349
              Reliance on God              0.039   0.017   0.024    0.243   0.228    0.253
              Ability High                 0.048   0.032   0.037    0.311   0.424    0.391
              Ability Low                  0.035   0.033   0.034    0.230   0.360    0.321
Capacity
              Budget High                  0.033   0.013   0.020    0.200   0.188    0.212
              Budget Low                   0.111   0.039   0.062    0.522   0.401    0.492
              Awareness Information High   0.070   0.031   0.043    0.387   0.297    0.367
              Awareness Information Low    0.007   0.012   0.010    0.061   0.145    0.114
              Covid Impacts                0.022   0.021   0.021    0.149   0.272    0.226
Other         Public Assistance            0.020   0.005   0.010    0.137   0.058    0.107
              Worries Anxieties            0.049   0.014   0.025    0.268   0.175    0.242




                                            45
                                          Table 3: Measurement error variances

             Category        Annotation                           ˆh
                                                                  σ 2        ˆy
                                                                             σ 2         ˆε
                                                                                         σ 2         ˆh
                                                                                                    se         ˆ enh
                                                                                                              se
                                                                               ˆ
                             Religious                          0.0060     0.0073     0.0020      0.0027     0.0018
             Aspirations
                             Aspirations Secular                0.0090     0.0084     0.0042      0.0034     0.0022
                             No Ambition                        0.0015     0.0010     0.0010      0.0014     0.0009
                             Salaried Employment                0.0156     0.0175     0.0055      0.0045     0.0029
                             Vocational Training                0.0014     0.0010     0.0003      0.0013     0.0007
                             Entrepreneur                       0.0053     0.0075     0.0015      0.0026     0.0018
                             Education High                     0.0093     0.0090     0.0055      0.0034     0.0023
             Ambition
                             Education Neutral                  0.0245     0.0267     0.0108      0.0056     0.0037
                             Education Low                      0.0027     0.0023     0.0014      0.0019     0.0012
                             Education Religious                0.0047     0.0049     0.0023      0.0024     0.0016
                             Marriage                           0.0133     0.0127     0.0016      0.0041     0.0024
                             Migration                          0.0042     0.0026     0.0007      0.0023     0.0012
                             Vague Non-Specific                 0.0062     0.0073     0.0049      0.0028     0.0021
                             Reliance on God                    0.0041     0.0043     0.0020      0.0023     0.0015
                             Ability High                       0.0064     0.0098     0.0036      0.0029     0.0021
                             Ability Low                        0.0057     0.0050     0.0038      0.0027     0.0018
             Capacity
                             Budget High                        0.0046     0.0055     0.0025      0.0024     0.0017
                             Budget Low                         0.0156     0.0116     0.0060      0.0044     0.0026
                             Awareness Information High         0.0091     0.0096     0.0070      0.0034     0.0024
                             Awareness Information Low          0.0010     0.0008     0.0010      0.0011     0.0008


Note: This Table reports the measurement error variances and standard error on the sample mean for each annotation. The σ          2
                                                                                                                                  ˆh
column reports the variance of the annotation in the human sample. The σˆy2
                                                                          ˆ column reports the variance of the machine annotated
sample, across all bootstraps. The σ  2 column reports the variance of validation set errors. Finally, se
                                     ˆε                                                                          ˆ enh represent the
                                                                                                        ˆ h and se
standard errors of the sample mean in the human and enhanced samples respectively, taking the measurement error adjustments
into account.




                                                                46
                      Table 4: Educational ambition variables and household characteristics

                                                                       Dependent variable:
                                 Education High        Education Neutral              Education Low           Education Religious
                                (1)         (2)         (3)              (4)         (5)        (6)            (7)           (8)
R3                            −0.029∗    −0.021∗∗    −0.122∗∗∗        −0.169∗∗∗   0.037∗∗∗    0.040∗∗∗       −0.005        0.0004
                              (0.016)     (0.009)     (0.023)          (0.012)     (0.009)    (0.004)        (0.012)       (0.006)

Machine annotated                          0.001                      0.027∗∗∗               −0.004∗∗                      0.0003
                                          (0.004)                      (0.006)                (0.002)                      (0.003)

Refugee                       −0.005      −0.008      0.032∗∗         0.026∗∗∗      0.005      0.004          0.012         0.003
                              (0.011)     (0.006)     (0.016)          (0.008)     (0.006)    (0.003)        (0.008)       (0.004)

Number of children            −0.0004     −0.001      −0.005          −0.004∗      −0.002     −0.001         −0.003        −0.001
                              (0.003)     (0.001)     (0.004)         (0.002)      (0.001)    (0.001)        (0.002)       (0.001)

Female HH head                −0.002       0.003       0.008           −0.003      −0.004      0.001         −0.011        −0.001
                              (0.009)     (0.005)     (0.014)          (0.007)     (0.005)    (0.002)        (0.007)       (0.004)

Age of HH head                 0.0001    0.00001     −0.0002          −0.001∗∗∗   −0.0002     0.00005        0.0004        0.0001
                              (0.0004)   (0.0002)    (0.001)           (0.0003)   (0.0002)    (0.0001)      (0.0003)      (0.0002)

Parent’s years of education   0.005∗∗∗   0.004∗∗∗     −0.002            0.001     −0.001∗    −0.001∗∗        −0.001       −0.001∗
                               (0.001)    (0.001)     (0.002)          (0.001)    (0.001)    (0.0003)        (0.001)      (0.0004)

Religiously educated parent    0.023       0.004      −0.020            0.001      −0.008     −0.006          0.008       0.018∗∗
                              (0.018)     (0.011)     (0.026)          (0.015)     (0.010)    (0.005)        (0.013)      (0.008)

Female eldest child           −0.006     −0.013∗∗∗    −0.008            0.007       0.001     −0.001        −0.011∗∗     −0.014∗∗∗
                              (0.007)     (0.004)     (0.011)          (0.006)     (0.004)    (0.002)        (0.005)      (0.003)

Age of eldest child           0.0004      0.0001      −0.001           0.0002     −0.0003    −0.0004∗∗      −0.001∗      −0.001∗∗∗
                              (0.001)    (0.0003)     (0.001)         (0.0004)    (0.0003)    (0.0001)      (0.0004)      (0.0002)

HH asset index                 0.003     0.005∗∗∗      0.002            0.001      −0.001    −0.002∗∗         0.001         0.001
                              (0.003)     (0.002)     (0.005)          (0.002)     (0.002)    (0.001)        (0.002)       (0.001)

HH income                      0.002      −0.001       0.003          −0.0001      −0.001     −0.0001        −0.002        −0.001
                              (0.002)     (0.001)     (0.003)         (0.001)      (0.001)    (0.0005)       (0.002)       (0.001)

Parent trauma experience      −0.002       0.001      −0.001           0.0003      0.0003      0.001         0.0003         0.001
                              (0.002)     (0.001)     (0.002)          (0.001)     (0.001)    (0.0004)       (0.001)       (0.001)

Constant                      0.054∗∗∗   0.069∗∗∗    0.243∗∗∗         0.247∗∗∗    0.029∗∗∗    0.014∗∗∗      0.047∗∗∗      0.054∗∗∗
                               (0.017)    (0.010)     (0.025)          (0.013)     (0.009)     (0.004)       (0.012)       (0.007)

Observations                    696        2,177        696             2,177       696        2,177          696          2,177
R2                             0.086       0.070       0.251            0.297      0.104       0.149         0.060         0.050
F Statistic                   5.347∗∗∗   12.481∗∗∗   19.106∗∗∗        70.149∗∗∗   6.591∗∗∗   29.047∗∗∗      3.662∗∗∗      8.842∗∗∗
Note:                                                                                                 ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                                 47
        Table 5: Employment ambition variables and household characteristics

                                                          Dependent variable:
                              Salaried Employment         Vocational Training            Entrepreneur
                                (1)         (2)            (3)          (4)            (5)           (6)
R3                             0.021       0.014         −0.005      −0.002          −0.013      −0.028∗∗∗
                              (0.021)     (0.012)        (0.006)     (0.003)         (0.011)      (0.007)

Machine annotated                         −0.002                     −0.002                       0.007∗∗
                                          (0.006)                    (0.001)                      (0.003)

Refugee                       −0.014     −0.013∗         −0.002       0.0003         −0.009        −0.001
                              (0.014)    (0.008)         (0.004)      (0.002)        (0.007)       (0.005)

Number of children             0.001     −0.003∗         −0.001       0.0005          0.003        0.002∗
                              (0.003)    (0.002)         (0.001)     (0.0004)        (0.002)       (0.001)

Female HH head                −0.026∗∗    −0.001         −0.001      −0.002         0.014∗∗         0.001
                               (0.012)    (0.007)        (0.004)     (0.002)        (0.007)        (0.004)

Age of HH head                0.0001     −0.00003        −0.0001    −0.00002         0.0004        0.0002
                              (0.001)    (0.0003)        (0.0002)   (0.0001)        (0.0003)      (0.0002)

Parent’s years of education   0.006∗∗∗   0.004∗∗∗         −0.001     −0.0002        −0.0004       −0.001∗
                               (0.001)    (0.001)        (0.0004)    (0.0002)       (0.001)       (0.0005)

Religiously educated parent    0.008      −0.009          0.010      0.007∗∗         −0.007      −0.021∗∗
                              (0.023)     (0.014)        (0.007)     (0.003)         (0.013)      (0.009)

Female eldest child           −0.018∗    −0.012∗∗         0.004      0.003∗∗        −0.012∗∗     −0.013∗∗∗
                              (0.009)     (0.006)        (0.003)     (0.001)         (0.005)      (0.003)

Age of eldest child           −0.0002    −0.0003          0.0001     0.00005       −0.00005        0.0003
                              (0.001)    (0.0004)        (0.0002)    (0.0001)      (0.0004)       (0.0003)

HH asset index                 0.005       0.003         0.0002       0.0001          0.001        −0.001
                              (0.004)     (0.002)        (0.001)     (0.0005)        (0.002)       (0.001)

HH income                     −0.0003      0.002         0.0003      0.00001         −0.001        −0.001
                              (0.003)     (0.001)        (0.001)     (0.0003)        (0.002)       (0.001)

Parent trauma experience      0.0001       0.002         −0.001      −0.0003         −0.001       −0.0001
                              (0.002)     (0.001)        (0.001)     (0.0003)        (0.001)      (0.001)

Constant                      0.095∗∗∗   0.112∗∗∗        0.014∗∗      0.005∗          0.017       0.026∗∗∗
                               (0.022)    (0.013)        (0.007)      (0.003)        (0.012)       (0.008)

Observations                    696       2,177            696        2,177           696          2,177
R2                             0.094      0.050           0.015       0.010          0.042         0.040
F Statistic                   5.883∗∗∗   8.777∗∗∗         0.874       1.662∗        2.467∗∗∗      6.927∗∗∗
Note:                                                                         ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                    48
               Table 6: Other ambition variables and household characteristics

                                                            Dependent variable:
                                   No Ambition                     Marriage                        Migration
                                 (1)         (2)             (3)              (4)            (5)               (6)
R3                              0.004       0.003         −0.086∗∗∗     −0.093∗∗∗         −0.013          −0.001
                               (0.007)     (0.003)         (0.018)        (0.010)         (0.008)         (0.005)

Machine annotated                        −0.005∗∗∗                       −0.006                            0.002
                                          (0.001)                        (0.005)                          (0.002)

Refugee                       0.014∗∗∗    0.005∗∗∗          0.007          0.007           0.007           0.002
                               (0.004)     (0.002)         (0.012)        (0.006)         (0.006)         (0.003)

Number of children            −0.001      −0.0003           0.004         0.003∗           0.002          0.0001
                              (0.001)     (0.0004)         (0.003)        (0.002)         (0.001)         (0.001)

Female HH head                −0.008∗∗     −0.002           0.007        −0.0002          −0.003           0.001
                               (0.004)     (0.002)         (0.011)       (0.006)          (0.005)         (0.003)

Age of HH head                0.0003∗      0.0001         −0.0004        −0.0004          −0.0001          0.0002
                              (0.0002)    (0.0001)        (0.0005)       (0.0002)         (0.0002)        (0.0001)

Parent’s years of education   −0.00002    −0.0001          −0.001         0.0002          0.00000         −0.001∗
                              (0.0004)    (0.0002)         (0.001)        (0.001)         (0.001)         (0.0003)

Religiously educated parent   −0.006       −0.001          −0.008        −0.013            0.006          −0.001
                              (0.008)      (0.003)         (0.020)       (0.012)          (0.010)         (0.006)

Female eldest child             0.001       0.001         0.033∗∗∗       0.045∗∗∗        −0.011∗∗∗       −0.008∗∗∗
                               (0.003)     (0.001)         (0.008)        (0.005)         (0.004)         (0.002)

Age of eldest child           0.00005      0.0001          0.002∗∗       0.002∗∗∗          0.0001        −0.0002
                              (0.0002)    (0.0001)         (0.001)       (0.0004)         (0.0003)       (0.0002)

HH asset index                −0.002       −0.001         −0.007∗        −0.003           0.0004         −0.0004
                              (0.001)     (0.0005)        (0.003)        (0.002)          (0.002)        (0.001)

HH income                     −0.00000    −0.0001          0.004∗        −0.0003          −0.001         −0.00003
                               (0.001)    (0.0003)         (0.002)       (0.001)          (0.001)         (0.001)

Parent trauma experience       0.0001     −0.0002           0.001        −0.001           −0.001           −0.001
                               (0.001)    (0.0003)         (0.002)       (0.001)          (0.001)         (0.0005)

Constant                      −0.010       0.005∗         0.049∗∗∗       0.053∗∗∗         0.021∗∗         0.018∗∗∗
                              (0.007)      (0.003)         (0.019)        (0.011)         (0.009)          (0.005)

Observations                    696        2,177            696           2,177             696            2,177
R2                             0.064       0.030           0.098          0.105            0.034           0.015
F Statistic                   3.888∗∗∗    5.211∗∗∗        6.199∗∗∗      19.488∗∗∗         2.025∗∗         2.619∗∗∗
Note:                                                                               ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                     49
  Table 7: Aspiration variables and household characteristics

                                                Dependent variable:
                                      Secular                           Religious
                                (1)              (2)              (3)               (4)
R3                            −0.035∗∗        −0.029∗∗∗         0.016           0.012
                              (0.015)          (0.008)         (0.013)         (0.008)

Machine annotated                              0.008∗∗                         0.006∗
                                               (0.004)                         (0.004)

Refugee                       −0.005           −0.005         −0.00002         −0.002
                              (0.010)          (0.005)         (0.009)         (0.005)

Number of children             0.002           −0.001          −0.002          −0.001
                              (0.003)          (0.001)         (0.002)         (0.001)

Female HH head                 0.002           −0.002          −0.012           0.001
                              (0.009)          (0.005)         (0.008)         (0.005)

Age of HH head                −0.0005         −0.0004∗∗         0.001∗          0.0002
                              (0.0004)         (0.0002)        (0.0003)        (0.0002)

Parent’s years of education   0.002∗          0.002∗∗∗         −0.0002         −0.001
                              (0.001)          (0.001)         (0.001)         (0.001)

Religiously educated parent    0.003            0.008           0.015          0.017∗
                              (0.017)          (0.010)         (0.015)         (0.010)

Female eldest child           −0.013∗         −0.012∗∗∗       −0.021∗∗∗       −0.022∗∗∗
                              (0.007)          (0.004)         (0.006)         (0.004)

Age of eldest child           0.0002          0.00003         −0.001∗∗        −0.001∗∗∗
                              (0.001)         (0.0003)        (0.0005)         (0.0003)

HH asset index                −0.001           −0.001          −0.005∗         −0.002
                              (0.003)          (0.001)         (0.003)         (0.001)

HH income                     0.0002            0.001          −0.001          −0.001
                              (0.002)          (0.001)         (0.002)         (0.001)

Parent trauma experience      −0.001           0.00003         0.0002           0.001
                              (0.002)          (0.001)         (0.001)         (0.001)

Constant                      0.074∗∗∗        0.078∗∗∗         0.043∗∗∗        0.060∗∗∗
                               (0.016)         (0.009)          (0.014)         (0.009)

Observations                    696            2,177             696            2,177
R2                             0.044           0.050            0.051           0.040
F Statistic                   2.595∗∗∗        8.790∗∗∗         3.046∗∗∗        6.893∗∗∗
Note:                                                    ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                         50
                               Table 8: Ability, Budget and household characteristics

                                                                       Dependent variable:
                                   Ability Low           Ability High                  Budget Low                 Budget High
                                (1)          (2)       (3)              (4)          (5)        (6)            (7)           (8)
R3                             0.004       −0.005    −0.019      −0.048∗∗∗       −0.070∗∗∗   −0.062∗∗∗       −0.020∗     −0.034∗∗∗
                              (0.012)      (0.006)   (0.013)          (0.007)      (0.019)    (0.010)        (0.011)      (0.006)

Machine annotated                           0.002                      0.005                  −0.003                        0.005
                                           (0.003)                    (0.003)                 (0.005)                      (0.003)

Refugee                       −0.015∗    −0.018∗∗∗    0.002            0.001     −0.036∗∗∗   −0.030∗∗∗        0.004         0.003
                              (0.008)     (0.004)    (0.009)          (0.005)     (0.012)     (0.007)        (0.007)       (0.004)

Number of children            0.004∗      0.004∗∗∗    0.001           −0.001       0.007∗∗    0.006∗∗∗       −0.0002      −0.002∗
                              (0.002)      (0.001)   (0.002)          (0.001)      (0.003)     (0.002)       (0.002)      (0.001)

Female HH head                 0.006        0.004    −0.006          −0.009∗∗      0.019∗      0.004         0.0002        −0.005
                              (0.007)      (0.004)   (0.008)          (0.004)      (0.011)    (0.006)        (0.006)       (0.003)

Age of HH head                 0.0003      0.0001    −0.0005     −0.0004∗∗         0.0001      0.0001        −0.0003      −0.0002
                              (0.0003)    (0.0002)   (0.0003)     (0.0002)        (0.0005)    (0.0002)       (0.0003)     (0.0001)

Parent’s years of education   −0.001∗    −0.002∗∗∗   0.003∗∗∗        0.003∗∗∗    −0.003∗∗∗   −0.003∗∗∗       0.002∗∗∗     0.002∗∗∗
                              (0.001)     (0.0004)    (0.001)        (0.0005)     (0.001)     (0.001)         (0.001)     (0.0004)

Religiously educated parent   −0.016      −0.017∗∗    0.002            0.007        0.006     −0.001         −0.0005       −0.002
                              (0.014)      (0.008)   (0.015)          (0.009)      (0.021)    (0.012)        (0.012)       (0.007)

Female eldest child           −0.006        0.003     0.004            0.001        0.008     0.008∗         −0.003        −0.002
                              (0.006)      (0.003)   (0.006)          (0.003)      (0.009)    (0.005)        (0.005)       (0.003)

Age of eldest child           −0.0003      0.0001     0.0001          0.0003       0.0003      0.0003        −0.0001       0.0001
                              (0.0004)    (0.0002)   (0.0005)        (0.0003)      (0.001)    (0.0004)       (0.0004)     (0.0002)

HH asset index                −0.005∗    −0.004∗∗∗    0.003            0.001       −0.005    −0.006∗∗∗        0.003         0.001
                              (0.002)     (0.001)    (0.003)          (0.001)      (0.004)    (0.002)        (0.002)       (0.001)

HH income                     −0.002       −0.001    0.003∗            0.001       −0.004     −0.002∗        0.003∗∗        0.001
                              (0.002)      (0.001)   (0.002)          (0.001)      (0.003)    (0.001)        (0.001)       (0.001)

Parent trauma experience      −0.002∗∗    −0.001∗     0.001            0.001        0.001     −0.0002        0.0004       −0.0003
                               (0.001)    (0.001)    (0.001)          (0.001)      (0.002)    (0.001)        (0.001)      (0.001)

Constant                      0.041∗∗∗    0.041∗∗∗   0.052∗∗∗        0.072∗∗∗     0.106∗∗∗    0.099∗∗∗       0.038∗∗∗     0.048∗∗∗
                               (0.013)     (0.007)    (0.014)         (0.008)      (0.020)     (0.011)        (0.011)      (0.006)

Observations                    696        2,177       696             2,177        696        2,177           696         2,177
R2                             0.036       0.036      0.059            0.095       0.125       0.097          0.081        0.080
F Statistic                   2.127∗∗     6.205∗∗∗   3.589∗∗∗        17.400∗∗∗    8.104∗∗∗   17.841∗∗∗       4.994∗∗∗    14.381∗∗∗
Note:                                                                                                 ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                                51
                      Table 9: Other Navigational Capacity variables and household characteristics

                                                                          Dependent variable:
                              Awareness Information Low   Awareness Information High        Reliance On God            Vague Non-specific
                                 (1)           (2)           (3)             (4)            (5)         (6)              (7)           (8)
R3                            −0.002          0.003       −0.038∗∗        −0.024∗∗∗      −0.023∗∗    −0.014∗∗∗       −0.037∗∗∗     −0.040∗∗∗
                              (0.005)        (0.003)       (0.016)         (0.008)        (0.011)     (0.005)         (0.012)       (0.006)

Machine annotated                          −0.003∗∗∗                        0.002                     0.0002                          0.003
                                            (0.001)                        (0.004)                    (0.003)                        (0.003)

Refugee                         0.001       −0.0002        0.0004          −0.008       −0.028∗∗∗    −0.014∗∗∗       −0.031∗∗∗     −0.011∗∗∗
                               (0.004)      (0.002)        (0.010)         (0.006)       (0.007)      (0.004)         (0.008)       (0.004)

Number of children            −0.001        −0.001∗        −0.003         −0.002∗         −0.001      −0.001          −0.003         −0.001
                              (0.001)       (0.0004)       (0.003)        (0.001)         (0.002)     (0.001)         (0.002)        (0.001)

Female HH head                −0.003          0.001         0.010           0.003         0.010∗       0.002          −0.008         −0.002
                              (0.003)        (0.002)       (0.009)         (0.005)        (0.006)     (0.003)         (0.007)        (0.004)

Age of HH head                0.00000       0.0001∗       −0.00001        0.00001        −0.0004     −0.0002          −0.0002      −0.0003∗
                              (0.0001)      (0.0001)      (0.0004)        (0.0002)       (0.0003)    (0.0001)         (0.0003)     (0.0002)

Parent’s years of education   −0.001∗∗     −0.0004∗∗       0.003∗∗         0.001∗∗       −0.002∗∗    −0.001∗∗∗        −0.001         0.0003
                              (0.0004)      (0.0002)       (0.001)         (0.001)        (0.001)     (0.0003)        (0.001)       (0.0004)

Religiously educated parent   −0.002         −0.001        −0.003          −0.004         −0.011      −0.003           0.004          0.006
                              (0.006)        (0.003)       (0.017)         (0.010)        (0.012)     (0.006)         (0.014)        (0.007)

Female eldest child            0.005∗       0.004∗∗∗       −0.001         −0.009∗∗         0.007       0.003           0.003         −0.001
                               (0.003)       (0.001)       (0.007)         (0.004)        (0.005)     (0.003)         (0.006)        (0.003)

Age of eldest child            0.0001      −0.00003       −0.00004        −0.0002         0.0001      0.0001           −0.001       −0.0001
                              (0.0002)     (0.0001)        (0.001)        (0.0003)       (0.0004)    (0.0002)         (0.0004)      (0.0002)

HH asset index                −0.0005        0.0002         0.002          −0.002         −0.003     −0.0001           0.001          0.001
                              (0.001)       (0.0005)       (0.003)         (0.002)        (0.002)    (0.001)          (0.002)        (0.001)

HH income                      0.0001        0.0002         0.003          0.002∗∗         0.001      0.0001          −0.001         −0.001
                               (0.001)      (0.0003)       (0.002)         (0.001)        (0.001)     (0.001)         (0.002)        (0.001)

Parent trauma experience       0.0002        0.0001         0.001           0.001         0.0001      0.0003          −0.0002        0.0003
                               (0.001)      (0.0003)       (0.002)         (0.001)        (0.001)     (0.001)         (0.001)        (0.001)

Constant                       0.011∗        0.005∗       0.065∗∗∗        0.073∗∗∗       0.069∗∗∗    0.052∗∗∗         0.116∗∗∗      0.084∗∗∗
                               (0.006)       (0.003)       (0.016)         (0.009)        (0.011)     (0.006)          (0.013)       (0.007)

Observations                    696          2,177          696            2,177           696        2,177             696          2,177
R2                             0.023         0.016         0.069           0.038          0.069       0.028            0.166         0.121
F Statistic                    1.347        2.664∗∗∗      4.219∗∗∗        6.603∗∗∗       4.210∗∗∗    4.705∗∗∗        11.330∗∗∗     22.981∗∗∗
Note:                                                                                                           ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                                     52
                                         Table 10: Quant Education Ambition

                                                                Dependent variable:
                                                                   eld edu ambition
                                        (1)          (2)           (3)          (4)             (5)          (6)
            Refugee                 −1.644∗∗∗    −1.499∗∗∗                                  −1.472∗∗∗    −1.379∗∗∗
                                      (0.216)      (0.117)                                   (0.203)      (0.111)

            Female eldest child       −0.120     −0.313∗∗∗                                    0.006       −0.189∗∗
                                      (0.142)     (0.077)                                    (0.136)       (0.076)

            Machine annotated                                                 0.228∗∗                       0.099
                                                                              (0.089)                      (0.076)

            No Ambition                                       −10.972∗∗∗    −9.305∗∗∗        −4.469∗     −5.138∗∗∗
                                                               (2.784)       (2.075)         (2.386)      (1.740)

            Salaried Employment                                2.663∗∗∗      1.587∗∗∗        1.793∗∗∗     1.181∗∗∗
                                                               (0.637)       (0.361)         (0.573)      (0.314)

            Vocational Training                                −3.879∗       −3.258∗         −2.416       −2.335
                                                               (1.978)       (1.709)         (1.657)      (1.429)

            Entrepreneur                                         0.550        −0.686         −1.615∗      −0.182
                                                                (1.033)       (0.543)        (0.946)      (0.465)

            Education Low                                     −4.184∗∗∗     −5.363∗∗∗        −1.429      −3.279∗∗∗
                                                               (1.564)       (1.040)         (1.347)      (0.915)

            Education Neutral                                  −0.860∗       −0.598∗∗        −0.015         0.128
                                                               (0.461)        (0.261)        (0.442)       (0.247)

            Education High                                     3.636∗∗∗      3.705∗∗∗        2.606∗∗∗     2.204∗∗∗
                                                               (0.810)       (0.495)         (0.747)      (0.441)

            Education Religious                               −3.264∗∗∗     −3.396∗∗∗        −1.385      −1.773∗∗∗
                                                               (0.962)       (0.591)         (0.850)      (0.522)

            Marriage                                          −1.853∗∗∗     −1.782∗∗∗       −1.911∗∗∗    −1.783∗∗∗
                                                               (0.709)       (0.391)         (0.661)      (0.359)

            Migration                                           −0.045        −1.083          1.560       −0.330
                                                                (1.257)       (0.817)        (1.390)      (0.760)

            Constant                 3.613∗∗∗     3.987∗∗∗     4.039∗∗∗      4.171∗∗∗        3.587∗∗∗     3.917∗∗∗
                                     (0.332)      (0.179)      (0.160)       (0.105)         (0.339)      (0.193)

            Observations                392        1,184          426          1,267           392         1,184
            R2                         0.411       0.389         0.286         0.206          0.515        0.466
            F Statistic              24.089∗∗∗   67.704∗∗∗     16.610∗∗∗     29.566∗∗∗      18.672∗∗∗    45.974∗∗∗
            Note:                                                                     ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

Note: Coefficients on Number of children, Female HH head, Age of HH head, Parent’s years of education, Religiously educated
parent, Age of eldest child, HH asset index, HH income and Parent trauma experience are omitted to save space, full results are
shown in Appendix C.



                                                              53
                     Table 11: Cost Benefit Scenarios

Objective                                 Budget    Nh     Nm      Price
Average F statistic                       $10,000   500    200     $9,900
Ability Low Refugee effect                $10,000   500    200     $9,900
Secular Aspirations Female child effect   $10,000   100    600     $8,700
Average F statistic                       $15,000   500    600    $14,700
Ability Low Refugee effect                $15,000   500    600    $14,700
Secular Aspirations Female child effect   $15,000   200   1,000   $15,000
Average F statistic                       $20,000   500   1,000   $19,500
Ability Low Refugee effect                $20,000   300   1,200   $18,900
Secular Aspirations Female child effect   $20,000   300   1,200   $19,500




                                    54
                   Table 12: Definitions and Examples from transcripts of Aspiration
Code         Subcode      Definition                                            Examples from transcripts
Aspiration   Religious    Religiously motivated aspirations for children.       Expressions of parental desires for their children that
                                                                                were coded for religious aspirations:
                                                                                • Ability to read Quran
                                                                                • Maintain Islamic covering
                                                                                • Prays regularly or Prays 5 times
                                                                                • Works in Islamic banks
                                                                                • Become a maulvi / alem / alemdar / elamdar /
                                                                                  mawlana [i.e equivalent to an Islamic Scholar]
                                                                                • Become hafiz / hafez [i.e memorize Quran] or wants
                                                                                  to send to hafez khana [i.e send to schooling that
                                                                                  primarily focuses on helping children memorize
                                                                                  Quran]
                                                                                • Send to noorani madrassa / school [ i.e schooling for
                                                                                  religious education equivalent to primary level]
                                                                                • Wants to send to madrassa [ i.e attend schooling
                                                                                  which follows religious curriculum]
                                                                                • Wants the child to learn/study Arabic

Aspiration   Secular      Expressions of parental aspirations in terms of       Expressions of parental desires for their children that
                          positive character traits, which can be intangi-      were coded for secular aspirations:
                          ble, or desire for unspecified positive things to     • Take care of wife and children and old parents by
                          happen to the child (e.g., hoping for a good life       doing jobs
                          partner for the child or hoping the child to attain   • Earn enough money to live a beautiful life
                          decent standard of living).                           • Be healthy and have a respectable job
                                                                                • If people recognize him [give him recognition]
                                                                                • Earn well and build a house
                                                                                • The more prosperous my child gets, the happier I
                                                                                  will be.
                                                                                • Make him a doctor for the good of the nation




                                                             55
                 Table 13: Definitions and Examples from transcripts of Ambition
Code       Subcode        Definition                                             Examples from transcripts
Ambition   No Ambition    Expressions of helplessness in context of am-
                          bitions or implied unwillingness to, or lack of        •   There is nothing to do except sitting quietly.
                          dream/plan.                                            •   I have no hope
                                                                                 •   There is no plan because I don’t understand
                                                                                 •   No hope for girls, they will get married

Ambition   Salaried Em-   Coded when specific job, occupation or work            Doctor, Government job, NGO job, Teacher in non-
           ployment       type was highlighted.                                  religious school
Ambition   Vocational     Any vocational training in the context of ambi-        Tailoring/Handicrafts
           Training       tion is mentioned.
Ambition   Entrepreneur   Coded when non-wage enterprise job is men-             Shopkeeper, business, own farm
                          tioned. Applies regardless of whether business
                          type is specified.
Ambition   Education      Coded when dreams for the child’s education
           Low            are lower or equivalent to higher secondary (for       • I hope to educate him up to tenth grade.
                          non-religious education) or noorani madrassa           • I had hoped to educate her up to SSC but now I can-
                          (for religious education). The code is not used          not educate her due to the lack of money.
                          if parent indicates the current status of the child,
                          e.g., “my child is studying at class 10”. For the
                          code to apply, it has to be a future ambition Also,
                          code is not used if the education not specific,
                          e.g., “I want to teach my child Arabic.”
Ambition   Education      Coded when education is mentioned in vague
           Neutral        terms. Also coded when “madrassa” is referred          • I hope to get the boy educated till the end.
                          as a religious education ambition.                     • If he wants to study, then I will educate him as long
                                                                                   as he wants to.

Ambition   Education      Coded when dreams for the child’s education are
           High           above higher secondary (for non-religious edu-         • I want my child to study engineering.
                          cation) or for high religious education.               • I want my child to be a maulvi.

Ambition   Education      Coded along with all Aspiration:Religious as-          My child will become a:
           Religious      piration codes aside from when hafezi is men-          • Maulvi / Alem / Alemdar / Elamdar / Mawlana
                          tioned. However, code also if “sending to hafez        My child will go to:
                          khana” is a future dream                               • noorani madrassa/school
                                                                                 • madrassa
                                                                                 • Hafez khana
                                                                                 • learn arabic

Ambition   Marriage       Coded any time marriage is mentioned in the
                          context of ambition                                    • will get her married

Ambition   Migration      Any time ambition is related to leaving current
                          place of residence for work, studying or reset-        • Go abroad
                          tling.                                                 • Go back to Burma




                                                              56
               Table 14: Definitions and Examples from transcripts of Navigational Capacity
Code           Subcode        Definition                                            Examples from transcripts
Navigational   Vague/Non      When parent mentioned unspecific or unclear
Capacity       Specific       attempts/measures to help achieve dreams for          •   trying hard
                              child.                                                •   will do as much as I can
                                                                                    •   will do my best
                                                                                    •   let’s see what happens

Navigational   Reliance on    When either the parent fully/partially relies on
Capacity       God            God to fulfill future dream for children or is ful-   • even if there is hope, it depends on God willing
                              ly/partially reliant on God at present.               • god is running our lives somehow

Navigational   Ability High   Coded when the parent demonstrates having
Capacity                      gone the extra mile ensure a better future for the    • I am somehow managing my children’s education by
                              child. This needs to be coded inferentially, as         borrowing money from my brothers.
                              no specific sequence of repeating words/phrases       • We try to cover our expenditures by selling some of
                              can be strictly identified to classify instances of     the items from the monthly aid that we get. [Double
                              high ability.                                           coded with Budget Low]

Navigational   Ability Low    Coded when the parent specified having no re-
Capacity                      sources to help the child.                            • What can we do from here? We are having to stay
                                                                                      how we are.

Navigational   Budget High    Coded when the parent expresses having money,
Capacity                      including an ability to save or spend money.          • I am educating her anyway I can. By helping finan-
                                                                                      cially, with hard work, appointing a private tutor and
                                                                                      financing their education.

Navigational   Budget Low     Coded when the parent expresses not having
Capacity                      money.                                                • Hoping to teach her as per the ability Allah grants
                                                                                      me. However, if there is money involved, I cannot
                                                                                      educate her.

Navigational   Awareness      Coded when the parent displays awareness or in-
Capacity       Information    formation. Inferentially coded.                       • I talk to my husband, so that he doesn’t obstruct the
               High                                                                   children’s education in any way. There is nothing
                                                                                      to do here without education. If they do not study,
                                                                                      their future will be dark. To brighten their future,
                                                                                      they have to be educated in any way. We had places
                                                                                      and properties when we were in Myanmar. But now,
                                                                                      we don’t have anything here, except to study. That’s
                                                                                      why I am trying to educate my children. [Double
                                                                                      coded with High Ability]

Navigational   Awareness      Not knowing what to do, cluelessness.
Capacity       Information                                                          • Question: What kind of doctor would you be happy
               Low                                                                    with? Answer: He could be a popular doctor.




                                                                 57
                            Table 15: Coding religious education

Statement                                 Code applied
Wants child to be a Maulvi/alem           Aspiration:Religious + Ambition:Education Religious +
                                          Ambition:Education:High
Wants child to go to madrassa             Aspiration:Religious + Ambition:Education:Religious +
                                          Ambition:Education:Neutral
Wants to send child to noorani madrassa   Aspiration:Religious + Ambition:Education:Religious +
                                          Ambition:Education:Low
Wants child to be a hafez                 Aspiration:Religious




                                              58
                               Table 16: Resolving disagreement

CodeCode              Description                              AH              AK              MB
Salaried Employment   Coded when secular job, occupation       Reliable        Reliable        Fuzzy
                      or work type was highlighted.
Vocational training   Any vocational training in the context   Very Reliable   Very Reliable   Very Reliable
                      of ambition is mentioned.




                                                     59
                         Table 17: Statistical methods for text vectorization

Method Name       Description                               Hyperparameters (Options)             Hyperparameters (Used)


TfidfVectorizer   TfidfVectorizer is a method for
                  converting text into numerical            • ngram range: The range of n-        • max features: The maximum
                  representations, specifically term          grams to consider when creating       number of words to keep in the
                  frequency-inverse document fre-             the vocabulary.                       vocabulary based on word fre-
                  quency (TF-IDF) vectors.             It   • min df: The minimum number            quency. [1000, 10000]
                  counts the frequency of words in            of documents a word must be in      • ngram range: The lower and
                  a document and down-weights                 to be included in the vocabulary.     upper boundary of the range of
                  the importance of commonly used                                                   n-values for different word n-
                  words. This can be useful for text        • max df: The maximum number
                                                              of documents a word can be in to      grams to be extracted. { (1,1),
                  classification tasks, as it allows the                                            (1,2), (1,3) }
                  classifier to focus on the words that       be included in the vocabulary.
                  are most distinctive to a particular      • max features: The maximum
                  document.                                   number of words to keep in the
                                                              vocabulary, based on word fre-
                                                              quency.
                                                            • use idf: A boolean flag indicat-
                                                              ing whether to use the inverse-
                                                              document-frequency weighting.
                                                            • norm: The type of normaliza-
                                                              tion to apply to the vectors.
                                                            • smooth idf: A boolean flag in-
                                                              dicating whether to smooth the
                                                              idf values.
                                                            • sublinear tf: A boolean flag in-
                                                              dicating whether to apply sub-
                                                              linear scaling to the term fre-
                                                              quency.


CountVectorizer   CountVectorizer is a method for
                  converting text into numerical rep-       • ngram range: The range of n-        • max features: The maximum
                  resentations, specifically a sparse         grams to consider when creating       number of words to keep in the
                  matrix of word counts. It counts            the vocabulary.                       vocabulary, based on word fre-
                  the frequency of words in a doc-          • min df: The minimum number            quency. [1000, 10000]
                  ument and does not down-weight              of documents a word must be in      • ngram range: The lower and
                  the importance of commonly used             to be included in the vocabulary.     upper boundary of the range of
                  words. This can be useful for                                                     n-values for different word n-
                  text classification tasks, as it al-      • max df: The maximum number
                                                              of documents a word can be in to      grams to be extracted. { (1,1),
                  lows the classifier to consider all                                               (1,2), (1,3) }
                  words equally, rather than down-            be included in the vocabulary.
                  weighting the importance of com-          • max features: The maximum           • binary: Whether to use binary
                  monly used words.                           number of words to keep in the        or frequency counts. {True,
                                                              vocabulary, based on word fre-        False}
                                                              quency.
                                                            • binary: A boolean flag indicat-
                                                              ing whether to create binary vec-
                                                              tors, with 0/1 values indicating
                                                              the presence/absence of a word
                                                              in a document.




                                                            60
                                 Table 18: Pre-trained embeddings for text vectorization


Model Name                           Dimensions     Description


all-mpnet-base-v2                    768            This a pre-trained language understanding model that combines the advantages
                                                    of masked language modeling (MLM) and permuted language modeling (PLM)
                                                    to address the limitations of both methods. It leverages the dependency among
                                                    predicted tokens through PLM and takes auxiliary position information as input
                                                    to make the model see a full sentence, reducing the position discrepancy between
                                                    pre-training and fine-tuning. This model was pre-trained on a large-scale dataset
                                                    and generates a vector of 768 dimensions.


all-roberta-large-v1                 1024           This is a pre-trained language understanding model with a vector representation of
                                                    1024 dimensions. It was developed as an improvement upon the BERT model and
                                                    was trained using the masked language modeling (MLM) objective. It has achieved
                                                    strong performance on natural language processing tasks and can be fine-tuned on
                                                    labeled datasets for specific tasks such as classification or language translation.


average word embeddings              300            This is a method for converting text into numerical representations, specifically
 glove.6B.300d                                      word embeddings. It uses a pre-trained GloVe model to generate 300-dimensional
                                                    vector representations for each word in a document, and then averages these vectors
                                                    to create a single representation for the entire document. This can be useful for text
                                                    classification tasks, as it allows the classifier to consider the semantic relationships
                                                    between words, rather than just their frequencies.


distiluse-base-multilingual-cased-   512            This is a pre-trained language understanding model that maps text into a 512-
v2                                                  dimensional vector representation. It is a smaller and faster version of the popular
                                                    transformer model, BERT, and has been trained on a large multilingual dataset, al-
                                                    lowing it to process text in multiple languages. It has also been cased, meaning it
                                                    can distinguish between upper and lower case letters. This model is useful for nat-
                                                    ural language processing tasks such as language translation and text classification,
                                                    and can be fine-tuned on labeled datasets for specific tasks.




                                                             61
                                              Table 19: Classifier Options I


Method               Description                    Hyperparameters (Options)                   Hyperparameters (Used)


LogisticRegression   This is a linear classifier
                     that uses a logistic func-     • C: The inverse of the regularization • penalty: The type of regularization to
                     tion to predict the prob-        strength, with higher values indicating     use: L1 or L2.
                     ability of a sample be-          less regularization.                      • C: Inverse of regularization strength.
                     longing to a particular        • penalty: The type of regularization to      [0.00002, 10000]
                     class.    It is commonly         use, either L1 or L2.
                     used for binary classifica-    • fit intercept: A boolean flag indicating
                     tion tasks, but can also be      whether to fit an intercept term.
                     used for multi-class classi-   • tol: The tolerance for stopping criteria.
                     fication by implementing a     • intercept scaling: The scaling of the in-
                     one-versus-rest approach.        tercept term, if it is being fitted.
                                                    • class weight: The class weights to use
                                                      for unbalanced classes.
                                                    • max iter: The maximum number of iter-
                                                      ations for the optimization algorithm.


SGDClassifier        This is a linear classifier
                     that uses stochastic gradi- • loss: The loss function to use, with         • loss: The loss function to use. (“modi-
                     ent descent to learn the      options such as “hinge”, “log”, “modi-         fied huber”)
                     parameters of the model.      fied huber”, “squared hinge”, and “per-      • penalty: The type of regularization to
                     The modified huber loss       ceptron”.                                      use: L1 or L2.
                     function is a smooth ap- • penalty: The type of regularization to          • learning rate: The learning rate schedule
                     proximation of the hinge      use, with options such as L1, L2, “elas-       to use. (“optimal”)
                     loss, which is commonly       ticnet”, and “none”.                         • alpha: The constant that multiplies the
                     used for linear classifica- • alpha: The regularization strength, with       regularization term. [0.00002, 1000]
                     tion tasks.                   higher values indicating stronger regu-
                                                   larization.
                                                 • l1 ratio: The proportion of L1 regular-
                                                   ization to use in the elasticnet penalty.
                                                 • tol: The tolerance for the stopping crite-
                                                   ria.
                                                 • learning rate: The learning rate for
                                                   the optimization algorithm, with options
                                                   such as “constant”, “optimal”, and “invs-
                                                   caling”.
                                                 • eta0: The initial learning rate for the
                                                   “constant” and “invscaling” learning rate
                                                   schedules.
                                                 • power t: The exponent for the “invscal-
                                                   ing” learning rate schedule.




                                                                 62
                                                   Table 20: Classifier Options II


Method                   Description                      Hyperparameters (Options)                    Hyperparameters (Used)


RandomForestClassifier   This is an ensemble classi-
                         fier that uses multiple de-      • n estimators: The number of decision • n estimators: The number of trees in the
                         cision trees to make pre-          trees in the forest.                       forest. [100, 1000]
                         dictions. It randomly se-        • criterion: The function to measure the • max depth: The maximum depth of the
                         lects a subset of features         quality of a split, with options such as   tree. [10, 100]
                         to consider at each split          “gini” and “entropy”.
                         in the tree, which helps to      • max depth: The maximum depth of the
                         reduce overfitting and im-         decision tree.
                         prove the generalization of      • min samples split: The minimum num-
                         the model.                         ber of samples required to split an inter-
                                                            nal node.
                                                          • min samples leaf: The minimum num-
                                                            ber of samples required to be at a leaf
                                                            node.
                                                          • min weight fraction leaf: The minimum
                                                            weighted fraction of the sum total of
                                                            weights required to be at a leaf node.
                                                          • max features: The number of features to
                                                            consider when looking for the best split.
                                                          • max leaf nodes: The maximum number
                                                            of leaf nodes in the tree.
                                                          • min impurity decrease: The minimum
                                                            decrease in impurity required to split the
                                                            node.
                                                          • bootstrap: A boolean flag indicating
                                                            whether to use bootstrap samples when
                                                            building the trees.
                                                          • oob score: A boolean flag indicating
                                                            whether to use out-of-bag samples to es-
                                                            timate the generalization error.


DecisionTreeClassifier   This is a classifier that uses
                         a tree structure to make de-     • criterion: The function to measure the • max depth: The maximum depth of the
                         cisions based on the fea-          quality of a split, with options such as     tree. [5, 100]
                         tures of a sample. At              “gini” and “entropy”.                      • min impurity decrease: A node will be
                         each node in the tree, the       • splitter: The strategy to use when search-   split if this split induces a decrease of
                         classifier considers a sin-        ing for a split, with options such as        the impurity greater than or equal to this
                         gle feature and splits the         “best” and “random”.                         value. [0.00002,10000]
                         data based on the value of       • max depth: The maximum depth of the
                         that feature. The final de-        tree.
                         cision is made based on          • min samples split: The minimum num-
                         the path taken through the         ber of samples required to split an inter-
                         tree.                              nal node.
                                                          • min samples leaf: The minimum num-
                                                            ber of samples required to be at a leaf
                                                            node.
                                                          • min weight fraction leaf: The minimum
                                                            weighted fraction of the sum total of
                                                            weights required to be at a leaf node.
                                                          • max features: The number of features to
                                                            consider when looking for the best split.
                                                          • max leaf nodes: The maximum number
                                                            of leaf nodes in the tree.
                                                          • min impurity decrease: The minimum
                                                            decrease in impurity required to split the
                                                            node.




                                                                        63
                                      Table 21: Classifier Options III


Method          Description                   Hyperparameters (Options)                   Hyperparameters (Used)


MLPClassifier   This is a classifier that
                uses a neural network with    • hidden layer sizes: The number of neu- • hidden layer sizes: The ith element rep-
                multiple layers to make         rons in each hidden layer.                    resents the number of neurons in the ith
                predictions. It is com-       • activation: The activation function to        hidden layer. [(100,), (100, 100), (100,
                monly used for classifica-      use, with options such as “identity”, “lo-    100, 100)]
                tion tasks and can handle       gistic”, “tanh”, and “relu”.                • activation: Activation function for the
                both continuous and cate-     • solver: The algorithm to use for opti-        hidden layer. (“tanh”, “relu”)
                gorical data. The number        mization, with options such as “lbfgs”, • alpha: L2 penalty (regularization term)
                of layers and the number        “sgd”, and “adam”.                            parameter. [0.01, 1]
                of units in each layer can    • alpha: The regularization strength, with
                be adjusted to fit the com-     higher values indicating stronger regu-
                plexity of the task.            larization.
                                              • batch size: The number of samples to
                                                use in each iteration of the optimization
                                                algorithm.
                                              • learning rate: The learning rate for
                                                the optimization algorithm, with options
                                                such as “constant”, “invscaling”, and
                                                “adaptive”.
                                              • learning rate init: The initial learning
                                                rate for the “constant” and “invscaling”
                                                learning rate schedules.
                                              • power t: The exponent for the “invscal-
                                                ing” learning rate schedule.
                                              • max iter: The maximum number of iter-
                                                ations to run the optimization algorithm.
                                              • shuffle: A boolean flag indicating
                                                whether to shuffle the training data be-
                                                fore each epoch.
                                              • tol: The tolerance for the stopping crite-
                                                ria.
                                              • warm start: A boolean flag indicating
                                                whether to reuse the solution of the pre-
                                                vious call to fit.
                                              • momentum: The momentum for the op-
                                                timization algorithm.
                                              • nesterovs momentum: A boolean flag
                                                indicating whether to use Nesterov’s mo-
                                                mentum.
                                              • early stopping: A boolean flag indicat-
                                                ing whether to use early stopping to ter-
                                                minate the optimization early.
                                              • validation fraction: The fraction of the
                                                training data to use as validation data for
                                                early stopping.
                                              • beta 1: The beta 1 parameter for the
                                                Adam optimization algorithm.




                                                           64
                                                Table 22: Classifier Options III


Method                 Description                      Hyperparameters (Options)                 Hyperparameters (Used)


KNeighborsClassifier   This is a non-parametric
                       classifier that uses the K       • n neighbors: The number of neighbors • n neighbors: Number of neighbors to
                       nearest neighbors of a             to use when making a prediction.          use by default for kneighbors queries.
                       sample to make a predic-         • weights: The weight function to use       [10,10000]
                       tion. It is commonly used          when making a prediction, with options • weights: weight function used in predic-
                       for classification tasks and       such as “uniform” and “distance”.         tion. (“uniform”, “distance”)
                       can handle both continu-         • algorithm: The algorithm to use for find-
                       ous and categorical data.          ing the nearest neighbors, with options
                       The number of neighbors            such as “brute” and “kd tree”.
                       to consider (K) is a hyper-      • leaf size: The number of points at which
                       parameter that can be ad-          to switch to a brute force search for the
                       justed to fit the complexity       nearest neighbors.
                       of the task.                     • p:    The power parameter for the
                                                          Minkowski distance metric.
                                                        • metric: The distance metric to use, with
                                                          options such as “euclidean”, “manhat-
                                                          tan”, and “minkowski”.
                                                        • metric params: Additional parameters
                                                          for the distance metric.


SVC                    This is a classifier that uses
                       a support vector machine         • C: The regularization strength, with • C: Penalty parameter C of the error term.
                       (SVM) to find the optimal          higher values indicating stronger regu-    [0.00001, -00]
                       hyperplane to separate the         larization.
                       different classes. It is com-    • kernel: The kernel to use for the decision
                       monly used for classifica-         function, with options such as “linear”,
                       tion tasks and can handle          “poly”, “rbf”, “sigmoid”, and “precom-
                       both continuous and cate-          puted”.
                       gorical data. The kernel         • degree: The degree of the polynomial
                       function used to project           kernel.
                       the data into a higher di-       • gamma: The kernel coefficient for the
                       mensional space can be             rbf, poly, and sigmoid kernels.
                       adjusted to fit the com-         • coef0: The independent term in the poly-
                       plexity of the task.               nomial and sigmoid kernels.
                                                        • shrinking: A boolean flag indicating
                                                          whether to use the shrinking heuristic.
                                                        • probability: A boolean flag indicating
                                                          whether to enable probability estimates.
                                                        • tol: The tolerance for the stopping crite-
                                                          ria.
                                                        • class weight: The class weights to use
                                                          for unbalanced classes.
                                                        • verbose: The level of verbosity in the
                                                          output.
                                                        • decision function shape: The shape of
                                                          the decision function, with options such
                                                          as “ovo” and “ovr”.




                                                                     65
                 Table 23: Annotations with evidence of bias

                                                 Dependent variable:
                              No Ambition    Education Neutral     Awareness Information
                                 errors           errors                Low errors
                                  (1)               (2)                      (3)
R3                              −0.002           0.056∗∗∗                  −0.001
                                (0.006)           (0.018)                  (0.006)

Refugee                        0.012∗∗∗           0.025∗∗                  0.009∗∗
                                (0.004)           (0.012)                  (0.004)

Number of Children              −0.001            −0.003                   −0.001
                                (0.001)           (0.003)                  (0.001)

Female HH head                 −0.006∗             0.014                   −0.004
                               (0.003)            (0.011)                  (0.003)

Age of HH head                  0.0002            0.0003                   −0.0001
                               (0.0001)          (0.0005)                  (0.0001)

Parent’s years of education    −0.0002           −0.0002                  −0.001∗∗∗
                               (0.0004)          (0.001)                   (0.0004)

Religiously educated parent     −0.005            −0.008                   −0.011∗
                                (0.006)           (0.020)                  (0.006)

Female eldest child              0.001            −0.005                    0.002
                                (0.003)           (0.008)                  (0.003)

Age of eldest child             0.0001            −0.001                    0.0001
                               (0.0002)           (0.001)                  (0.0002)

HH asset index                  −0.001             0.005                    0.001
                                (0.001)           (0.003)                  (0.001)

HH income                      −0.0001            −0.001                   0.0004
                               (0.001)            (0.003)                  (0.001)

Parent trauma experience       −0.0001             0.001                   0.0002
                               (0.001)            (0.002)                  (0.001)

Constant                        −0.009           −0.050∗∗∗                  0.005
                                (0.006)           (0.019)                  (0.006)

Observations                     696               696                       696
R2                              0.057             0.043                     0.038
F Statistic                    3.421∗∗∗          2.585∗∗∗                  2.251∗∗∗
Note:                                                        ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                            66
Table 24: Quant Ambition: full results with coefficients for all quant variables

                                                              Dependent variable:
                                                                eld edu ambition
                                  (1)         (2)             (3)              (4)           (5)              (6)
 Refugee                       −1.644∗∗∗   −1.499∗∗∗                                     −1.472∗∗∗        −1.379∗∗∗
                                (0.216)     (0.117)                                       (0.203)          (0.111)

 Number of children             0.119∗∗      0.028                                        0.113∗∗            0.022
                                (0.055)     (0.029)                                       (0.051)           (0.028)

 Female HH head                  0.134       0.106                                          0.134            0.059
                                (0.198)     (0.101)                                        (0.186)          (0.095)

 Age of HH head                 0.016∗∗    0.012∗∗∗                                        0.012∗          0.009∗∗
                                (0.007)     (0.004)                                        (0.007)         (0.004)

 Parent’s years of education   0.064∗∗∗    0.074∗∗∗                                         0.029          0.056∗∗∗
                                (0.020)     (0.011)                                        (0.019)          (0.011)

 Religiously educated parent     0.271     0.676∗∗∗                                         0.282          0.768∗∗∗
                                (0.364)     (0.201)                                        (0.337)          (0.190)

 Female eldest child            −0.120     −0.313∗∗∗                                        0.006         −0.189∗∗
                                (0.142)     (0.077)                                        (0.136)         (0.076)

 Age of eldest child             0.004       0.003                                          0.003            0.005
                                (0.006)     (0.003)                                        (0.006)          (0.003)

 HH asset index                  0.088      0.075∗∗                                         0.070          0.058∗∗
                                (0.058)     (0.030)                                        (0.053)         (0.028)

 HH income                      −0.003       0.015                                          0.010            0.013
                                (0.037)     (0.017)                                        (0.035)          (0.016)

 Parent trauma experience        0.007      0.030∗                                          0.014           0.029∗
                                (0.031)     (0.017)                                        (0.029)          (0.017)

 Machine annotated                                                           0.228∗∗                         0.099
                                                                             (0.089)                        (0.076)

 No Ambition                                            −10.972∗∗∗         −9.305∗∗∗      −4.469∗         −5.138∗∗∗
                                                         (2.784)            (2.075)       (2.386)          (1.740)

 Salaried Employment                                        2.663∗∗∗        1.587∗∗∗      1.793∗∗∗         1.181∗∗∗
                                                             (0.637)         (0.361)       (0.573)          (0.314)

 Vocational Training                                        −3.879∗         −3.258∗        −2.416           −2.335
                                                            (1.978)         (1.709)        (1.657)          (1.429)

 Entrepreneur                                                0.550           −0.686       −1.615∗           −0.182
                                                            (1.033)          (0.543)      (0.946)           (0.465)

 Education Low                                          −4.184∗∗∗          −5.363∗∗∗       −1.429         −3.279∗∗∗
                                                         (1.564)            (1.040)        (1.347)         (0.915)

 Education Neutral                                          −0.860∗         −0.598∗∗       −0.015            0.128
                                                            (0.461)          (0.261)       (0.442)          (0.247)

 Education High                                             3.636∗∗∗        3.705∗∗∗      2.606∗∗∗         2.204∗∗∗
                                                             (0.810)         (0.495)       (0.747)          (0.441)

 Education Religious                                    −3.264∗∗∗          −3.396∗∗∗       −1.385         −1.773∗∗∗
                                                         (0.962)            (0.591)        (0.850)         (0.522)

 Marriage                                               −1.853∗∗∗          −1.782∗∗∗     −1.911∗∗∗        −1.783∗∗∗
                                                         (0.709)            (0.391)       (0.661)          (0.359)

 Migration                                                  −0.045           −1.083         1.560           −0.330
                                                            (1.257)          (0.817)       (1.390)          (0.760)

 Constant                      3.613∗∗∗    3.987∗∗∗         4.039∗∗∗        4.171∗∗∗      3.587∗∗∗         3.917∗∗∗
                                (0.332)     (0.179)          (0.160)         (0.105)       (0.339)          (0.193)

 Observations                     392        1,184         426               1,267          392             1,184
 R2                              0.411       0.389        0.286              0.206         0.515            0.466
 F Statistic                   24.089∗∗∗   67.704∗∗∗    16.610∗∗∗          29.566∗∗∗     18.672∗∗∗        45.974∗∗∗
 Note:                                                                                 ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01




                                                       67
List of Figures
  1    Coding tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   69
  2    Examples of qualitative codes . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   70
  3    Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   71
  4    Choices of text representation and classifier . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   72
  5    Validation set performance . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   73
  6    Bias test for each annotation . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   74
  7    Interpretability test . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   75
  8    Example of supervised LDA topics . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   76
  9    Correlations between annotations in enhanced sample . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   77
  10   F-statistic test for interpretability increases with Nh (holding N fixed)   .   .   .   .   .   .   .   .   .   .   .   .   .   78
  11   Distribution of regression coefficients of interest with Nh and Nm . .      .   .   .   .   .   .   .   .   .   .   .   .   .   79
  12   Cost trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   80
  13   Validation set performance across different translation approaches .        .   .   .   .   .   .   .   .   .   .   .   .   .   81
  14   Correlations between annotations in human sample . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   82
  15   Validation and test set performance for increasing Nh . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   83
  16   F-statistic test for interpretability increases with both Nh and Nm . .     .   .   .   .   .   .   .   .   .   .   .   .   .   84




                                                    68
Figure 1: Coding tree




         69
                                  Figure 2: Examples of qualitative codes

        (a) Ambition:Education:Low                                 (b) Ambition:Education:High
“God willing, I will teach my son up to 10th class.         “My daughter’s dream is to study. I’ll do it. If Allah
If he wants to stay in Bangladesh for 20-25 years, I        keeps me alive, I will educate my daughter so she
want him to get a job here”                                 can get a job in administration.”

                                                             (d) Navigational Capacity:Ability:High
 (c) Navigational Capacity:Ability:Low
                                                            “The school is still closed for Corona. So, by selling
“I don’t do much at home. I help her as much as I           some of my food, I have arranged for private teacher
can.”                                                       by paying at minimum.”

            (e) Aspiration:Secular                                     (f) Aspiration:Religious
“They will become well behaved, good human be-              “I don’t want make my son work. I want him to
ings. Will have a respectable job.”                         become a religious cleric (hujur)..”




                                                       70
                                                              Figure 3: Methodology

                            Aggregate to interview level
  Human (������# )                             #
                               ������!# = ������(������!,!         #
                                               , … , ������!,$ )
                                           ⋮
   Interview 1                  #        #             #
                             ������! = ������(������% ! ,! , … , ������% ! ,$ )
   #                #
 ������!,!            ������!,!                                                   Measurement errors
           ⋮                                                                         .#
                                                                              ������ # − ������
   #                #
 ������!,$            ������!,$
                            Aggregate to interview level
           ⋮                    .!# = ������(������
                               ������           #
                                         /!,!   , … , ������ #
                                                      /!,$ )
                                            ⋮
 Interview ������#               .  #        #               #
                             ������! = ������(������              /% ! ,$ )
                                        /% ! ,! , … , ������
                                                                                                                       Assess bias, efficiency and
  #                #
������% ! ,!
                 ������% ! ,!
                                                                                                                            interpretability
           ⋮
  #                #
������% ! ,$
                 ������% ! ,$
                                                                                                     Enhanced sample
                                    Train classifier                                                              .&
                                                                                                       4 = ������ # , ������
                                                                                                      ������
                                      #        # 2
                                   /',(
                                   ������   = ������(������',( ; ������)
Machine (������& )
                                                                                                                          Substantive analysis

   Interview 1
           &
         ������!,!
           ⋮                                                          Aggregate to interview level
           &
         ������!,$                   Predict annotations                      .!& = ������(������
                                                                          ������          &
                                                                                   /!,!   , … , ������ &
                                                                                                /!,$ )
                                     &        & 2
                                  /',(
                                  ������   = ������(������',( ; ������)                               ⋮
           ⋮                                                           .! = ������(������
                                                                          &        &               &
                                                                       ������                       /% ! ,$ )
                                                                                  /% ! ,! , … , ������
 Interview ������&
        &
      ������%" ,!

          ⋮
        &
      ������%" ,$




                                                                        71
                                                            Figure 4: Choices of text representation and classifier

                                                            Question             Language              Vectorization             Classifier
                              Aspiration: Religious
                                Aspiration: Secular
                            Ambition: No Ambition                                                                                                   Answer only
                   Ambition: Salaried Employment                                                                                                    Question + Answer
                      Ambition: Vocational Training
                                                                                                                                                    Bengali Transliteration
                           Ambition: Entrepreneur
                                                                                                                                                    English Translation
                         Ambition: Education High
                       Ambition: Education Neutral                                                                                                  TfidfVectorizer
Annotation




                          Ambition: Education Low                                                                                                   CountVectorizer
                     Ambition: Education Religious
                                                                                                                                                    all−mpnet−base−v2
                                Ambition: Marriage
                               Ambition: Migration                                                                                                  all−roberta−large−v1
                     Capacity: Vague Non Specific                                                                                                   distiluse−base−multilingual−cased−v2
                        Capacity: Reliance On God
                                                                                                                                                    LogisticRegression
                             Capacity: Ability High
                              Capacity: Ability Low                                                                                                 SGDClassifier
                            Capacity: Budget High                                                                                                   SVC
                             Capacity: Budget Low                                                                                                   RandomForestClassifier
             Capacity: Awareness Information High
                                                                                                                                                    KNeighborsClassifier
             Capacity: Awareness Information Low
                                                      0.0     0.5      1.0 0.0     0.5       1.0 0.0       0.5         1.0 0.0      0.5       1.0
                                                                                            Selected


Note: This Figure shows the selected text representation and classifier for each annotation across 25 bootstraps. The first panel
shows the proportion of runs in which the Question is included in the text representation. The second panel shows whether the
chosen text representation was based on Bengali transliterated into Latin characters, or a machine translation into English. The
third panel shows the selected vectorizer, which is applied to convert the text into numeric vectors. Finally, the fourth panel shows
the selected classifier.




                                                                                                 72
                                                         Figure 5: Validation set performance


                                                        Aspiration: Religious
                                                          Aspiration: Secular
                                                      Ambition: No Ambition
                                             Ambition: Salaried Employment
                                                Ambition: Vocational Training
                                                     Ambition: Entrepreneur
                                                   Ambition: Education High
                                                 Ambition: Education Neutral
                          Annotation                Ambition: Education Low                                        F1 score
                                               Ambition: Education Religious
                                                                                                                       Random
                                                          Ambition: Marriage
                                                         Ambition: Migration                                           Validation set
                                               Capacity: Vague Non Specific
                                                  Capacity: Reliance On God
                                                       Capacity: Ability High
                                                        Capacity: Ability Low
                                                      Capacity: Budget High
                                                       Capacity: Budget Low
                                       Capacity: Awareness Information High
                                       Capacity: Awareness Information Low
                                                                                0.0   0.2    0.4       0.6   0.8
                                                                                            F1 score


Note: This Figure shows validation set performance of the selected model for each annotation and each bootstrap run, as measured
by the F1 score. The sparsity of the annotation across QA pairs is shown in red as a reference point: this would be the expected F1
score if predictions were drawn randomly based on the overall proportion of positives.




                                                                                      73
                                                         Figure 6: Bias test for each annotation

                                                               Aspiration: Religious
                                                                 Aspiration: Secular
                                                             Ambition: No Ambition
                                                    Ambition: Salaried Employment
                                                       Ambition: Vocational Training
                                                                                                                  p−value
                                                            Ambition: Entrepreneur
                                                          Ambition: Education High                                    <1%
                                                        Ambition: Education Neutral                                   <5%




                                 Annotation
                                                           Ambition: Education Low
                                                      Ambition: Education Religious                                   >5%
                                                                 Ambition: Marriage
                                                                Ambition: Migration
                                                      Capacity: Vague Non Specific
                                                         Capacity: Reliance On God                                    Bootstrap
                                                              Capacity: Ability High
                                                                                                                      Mean
                                                               Capacity: Ability Low
                                                             Capacity: Budget High
                                                              Capacity: Budget Low
                                              Capacity: Awareness Information High
                                              Capacity: Awareness Information Low
                                                                                       −1          0          1
                                                                                            log F statistic



Note: This Figure shows the log F statistic for the regression of the validation set errors on household characteristics, for each
annotation. The color of each point indicates the significance level of the F statistic. The hollow circles represent the statistic
for each bootstrap and the solid circle represents the statistic for an enhanced sample based on the mean prediction across each
bootstrap.




                                                                                       74
                                                                   Figure 7: Interpretability test

                                                  Aspiration: Religious
                                                    Aspiration: Secular
                                                Ambition: No Ambition
                                       Ambition: Salaried Employment
                                          Ambition: Vocational Training
                                               Ambition: Entrepreneur
                                             Ambition: Education High
                                           Ambition: Education Neutral
                    Annotation

                                              Ambition: Education Low                                           Sample
                                         Ambition: Education Religious
                                                                                                                   Enhanced
                                                    Ambition: Marriage
                                                   Ambition: Migration                                             Human
                                         Capacity: Vague Non Specific
                                            Capacity: Reliance On God
                                                 Capacity: Ability High
                                                  Capacity: Ability Low
                                                Capacity: Budget High
                                                 Capacity: Budget Low
                                 Capacity: Awareness Information High
                                 Capacity: Awareness Information Low
                                                                          0     1          2            3   4
                                                                                     log F statistics


Note: This Figure shows the log F statistic for the regression of each annotation on household characteristics in the enhanced
and human samples. The hollow circles represent the statistic for each bootstrap and the solid circle represents the statistic for an
enhanced sample based on the mean prediction across each bootstrap.




                                                                                    75
                                                                    Figure 8: Example of supervised LDA topics

                                    (a) Aspirations:Secular                                                                                  (b) Aspirations:Religious
            will people good well make children dream human educated child                                                 make want doctor son master wants desire become will hafez
         get job will married good education dream marry government studies                                          school home read teaching now madrasa teach closed study reading
                  business son shop house boy abroad father take send don                                                       god will allah try dreams many willing hope whatever can
             make want doctor dream son master wants become eldest hafez                                                      money pay rupees hard work save eat education earn cost
Topics




                                                                                                            Topics
                   god will allah hope try dreams willing whatever fulfill many                                               yes one two class three years old sister understand brother
                  children educate boys will girls study work hope want teach                                                          son house let abroad boy can still small child take
            school home read teaching teach now madrasa send closed child                                                      can money much don else teach help study anything want
                         yes one two class girl three years brother sister little                                      get job married will good girl education dream government studies
                   money pay can rupees eat earn save hard education cost                                                        will work boys girls study grow future able give studying
                           can much else anything don now still say help see                                            children educate people want well good will hope educated study

                                                                                    0.0   0.2   0.4   0.6                                                                                    0.0   0.1   0.2   0.3
                                                                                          Estimate                                                                                                 Estimate




                                                                                                        76
                                             Enhanced
                     Aspiration: Religious    −0.009                  −0.01                     −0.047**                           −0.024                            −0.05**                    0.355***                      0.025                           −0.022                      0.854***                           0.013                 −0.011                   0.013                           0.095***                       0.04*                      0.014                      −0.016                     0.033                   0.106***                                  0.01

                       Aspiration: Secular                           −0.088*** 0.109***                                             0.009                            −0.044** 0.092*** 0.274*** −0.085*** −0.028                                                                                                             −0.02                 −0.007                  0.182***                          0.008                        0.134***                    −0.029                    0.098***                   0.046**                  0.157*** −0.043**

                   Ambition: No Ambition                                                        −0.131***                           −0.03                            −0.043** −0.112*** −0.095*** 0.056***                                                                                −0.026                            0.145***               −0.031 −0.059*** −0.017 −0.059*** 0.038*                                                                                                      −0.031                    −0.022                   −0.07***                                 0.042**

           Ambition: Salaried Employment                                                                                           −0.052** −0.132*** 0.451***                                                                −0.019                          −0.012                      −0.035* −0.144*** −0.072***                                                       0.002                           −0.038* 0.176*** −0.097*** 0.178*** −0.084*** 0.086***                                                                                                                           −0.033

             Ambition: Vocational Training                                                                                                                           0.099*** −0.039*                                         0.008                           0.007                       −0.018                            −0.008                  0.03                   −0.042**                         −0.001                        0.058*** 0.059***                                      0.011                     0.126***                 0.11***                                  −0.003

                   Ambition: Entrepreneur                                                                                                                                                       −0.063*** 0.064*** −0.062*** −0.052**                                                                                       −0.033                 0.042**                  0.009                           −0.002                        0.08***                    0.067*** 0.053*** 0.103*** 0.081***                                                                                      −0.03

                 Ambition: Education High                                                                                                                                                                                     0.028                          −0.155*** 0.39*** −0.073*** −0.009                                                                            0.048**                          0.066*** 0.167***                                         −0.008                    0.131***                    0.028                   0.166***                                 −0.008
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Correlation
              Ambition: Education Neutral                                                                                                                                                                                                                    −0.159*** 0.063*** 0.197***                                                           0.041**                 0.239***                         0.042**                       0.233***                    −0.014                    0.173***                    0.1***                  0.19***                                  −0.034*                                     1.0
                 Ambition: Education Low                                                                                                                                                                                                                                                  −0.038*                           −0.003                 0.017                   −0.138*** −0.044** −0.084*** 0.034* −0.092*** −0.045** −0.051**                                                                                                                                                    0.022
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0.5
            Ambition: Education Religious                                                                                                                                                                                                                                                                                    0.022                 −0.001                   0.038*                          0.074***                      0.043**                     0.008                      −0.006                    0.043**                  0.085***                                  0.006
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0.0
                       Ambition: Marriage                                                                                                                                                                                                                                                                                                          −0.033                  0.041**                           0.005                        0.077*** 0.064*** 0.081*** 0.111***                                                                        0.004                                   −0.028
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         −0.5
                      Ambition: Migration                                                                                                                                                                                                                                                                                                                                   −0.026                           0.032                         0.04**                             0                 0.063***                   0.08***                  0.119***                                 −0.031

             Capacity: Vague Non Specific                                                                                                                                                                                                                                                                                                                                                                   0.068*** 0.093*** −0.082*** 0.059***                                                                           −0.002                   0.059***                                  −0.01                                      −1.0

               Capacity: Reliance On God                                                                                                                                                                                                                                                                                                                                                                                                  −0.024                      0.032                      −0.04*                    0.073*** 0.116***                                                  0.002




77
                    Capacity: Ability High                                                                                                                                                                                                                                                                                                                                                                                                                           −0.103*** 0.765***                                    0.11***                  0.346*** −0.066***

                     Capacity: Ability Low                                                                                                                                                                                                                                                                                                                                                                                                                                                      −0.101*** 0.45***                                   0.052**                                  0.062***

                   Capacity: Budget High                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   −0.008                   0.234*** −0.058***

                    Capacity: Budget Low                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0.309***                                  0.018

     Capacity: Awareness Information High                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     0.014




                                               Aspiration: Secular
                                                                                                                                                                                                                                                                                                                              Ambition: Marriage
                                                                                                                                                                                                                                                                                                                                                     Ambition: Migration
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Capacity: Ability Low


                                                                                                                                                                                                                                                                                                                                                                                                                                            Capacity: Ability High
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Capacity: Budget Low




                                                                        Ambition: No Ambition
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Capacity: Budget High




                                                                                                                                                                       Ambition: Entrepreneur
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Figure 9: Correlations between annotations in enhanced sample




                                                                                                                                                                                                                                                                Ambition: Education Low




                                                                                                                                                                                                   Ambition: Education High
                                                                                                                                                                                                                                                                                                                                                                                                              Capacity: Reliance On God




                                                                                                                                                                                                                               Ambition: Education Neutral




                                                                                                                                     Ambition: Vocational Training
                                                                                                                                                                                                                                                                                                                                                                             Capacity: Vague Non Specific




                                                                                                                                                                                                                                                                                            Ambition: Education Religious




                                                                                                   Ambition: Salaried Employment
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Capacity: Awareness Information Low


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Capacity: Awareness Information High
                                        Figure 10: F-statistic test for interpretability increases with Nh (holding N fixed)

                               Aspiration: Religious                Aspiration: Secular                  Ambition: No Ambition             Ambition: Salaried Employment             Ambition: Vocational Training

                     4


                     2


                     0



                           Ambition: Entrepreneur              Ambition: Education High             Ambition: Education Neutral               Ambition: Education Low               Ambition: Education Religious

                     4


                     2


                     0
   log F statistic




                                                                                                                                                                                                                           F statistic

                               Ambition: Marriage                   Ambition: Migration            Capacity: Vague Non Specific              Capacity: Reliance On God                    Capacity: Ability High                Enhanced
                                                                                                                                                                                                                                Human
                     4


                     2


                     0



                               Capacity: Ability Low               Capacity: Budget High                 Capacity: Budget Low           Capacity: Awareness Information High     Capacity: Awareness Information Low

                     4


                     2


                     0


                         200         400        600    800   200         400        600    800     200         400       600      800      200       400        600        800      200         400        600       800
                                                                                                 Human annotated sample size



Note: This Figure shows F statistics of for each annotation of a regression on household characteristics in the human (in blue) and
enhanced (in green) samples as existing interviews are annotated. The total sample size in the enhanced sample is thus constant,
but interviews are moved from the machine annotated set to the human annotated. Each point represents a bootstrap run and the
lines show a local regression fit to these points.




                                                                                                                        78
                                                          Figure 11: Distribution of regression coefficients of interest with Nh and Nm

                                            Ability Low
                                                     Nh = 100                     Nh = 200                        Nh = 400                    Nh = 700

                                      0.1
Coefficient on Refugee




                                                                                                                                                                  Sample
                                      0.0
                                                                                                                                                                     Enhanced
                                                                                                                                                                     Human
                                     −0.1



                                             200   600     1000    1400   200   600     1000    1400   200    600       1000   1400   200   600     1000   1400
                                                                                  Machine annotated sample size (Nm)

                                             Secular Aspirations
                                                      Nh = 100                    Nh = 200                        Nh = 400                    Nh = 700
Coefficient on Female eldest child




                                      0.10


                                      0.05

                                                                                                                                                                  Sample
                                      0.00
                                                                                                                                                                     Enhanced

                                     −0.05                                                                                                                           Human


                                     −0.10


                                             200    600     1000   1400   200   600      1000   1400   200    600       1000   1400   200   600     1000   1400
                                                                                  Machine annotated sample size (Nm)

Note: This Figures shows how the distribution of coefficient estimates for two coefficients of interest change as the number of
human annotated interviews (Nh ) and the number of machine annotated interviews (Nm ) are varied. The upper panels represent the
coefficient on the refugee status variable in the regression for Ability Low, controlling for other household characteristics, i.e. from
the first two columns of Table 8. The lower panels represent the coefficient on female eldest child variable in the regression for
Secular Aspirations, i.e. from the first two columns of Table 7. In each case, the distribution of the coefficient estimated on the
human annotated sample is shown in blue and on the enhanced sample is shown in red. Across panels, from left to right we show
the effect of an increase in Nh , so within each panel the blue distribution is the same. As Nh increases we see that the coefficient
estimated on the human sample becomes more precise. Within the panels, from left to right we show the effect of an increase in Nm .
As Nm increase the coefficient estimated on the enhanced sample becomes more precise. Interestingly, the estimated coefficient in
the enhanced sample for a large Nm does not vary much with Nh , suggesting that a large human annotated sample is not necessary
to get value from the enhancement. The coefficient distributions are calculated through bootstrapping both which interviews are
included in the training sample and the coefficient estimate itself.




                                                                                                             79
                                                                                                Figure 12: Cost trade-offs

                               Average enhanced log F statistic                                 Ability Low                                                       Secular Aspirations
   Price (thousand US$)




                                                                    Price (thousand US$)




                                                                                                                                      Price (thousand US$)
                          20                                                               20                                                                20




                          10                                                               10                                                                10




                                −2.0        −1.5             −1.0                               0.02   0.03   0.04   0.05      0.06                                  0.02       0.03    0.04    0.05   0.06
                                         − log F statistic                                               Refugee coef 95% CI                                                Female child coef 95% CI


Note: This Figure plots each combination of Nh and Nm with the objective on the horizontal axes and the price on the vertical axes.
Across all three panels, moving further to the south west indicates a cheaper combination and a better outcome. The first panel
uses the average enhanced sample F statistic for a regression of annotation on household characteristics, across all annotations. The
objective in the second and third panels respectively are the coefficient on refugee status for Ability Low and the coefficient on a
female eldest child for Secular Aspirations.




                                                                                                              80
                  Figure 13: Validation set performance across different translation approaches


                                                         Aspiration: Religious
                                                           Aspiration: Secular
                                                       Ambition: No Ambition
                                              Ambition: Salaried Employment
                                                 Ambition: Vocational Training
                                                      Ambition: Entrepreneur
                                                    Ambition: Education High                                     F1 score
                                                  Ambition: Education Neutral                                        Human translation
                           Annotation                Ambition: Education Low
                                                Ambition: Education Religious                                        Machine translation
                                                           Ambition: Marriage                                        Random
                                                          Ambition: Migration
                                                Capacity: Vague Non Specific                                         Raw Bengali
                                                   Capacity: Reliance On God                                         Transliteration
                                                        Capacity: Ability High
                                                         Capacity: Ability Low
                                                       Capacity: Budget High
                                                        Capacity: Budget Low
                                        Capacity: Awareness Information High
                                        Capacity: Awareness Information Low
                                                                                 0.0   0.2     0.4   0.6   0.8
                                                                                             F1 score


Note: Figure shows the validation set performance across different translation approaches. As in Figure 5, teh sparsity of each
annotation is shown in red as a reference point. In each translation approach, we select over the possible vectorizers as described
in Section 5.2. The average validation F1 scores across all annotations are 0.558 for Machine translation, 0.542 for Transliteration,
0.535 for Human translation and 0.420 for Raw Bengali.




                                                                                        81
                                             Human
                     Aspiration: Religious   −0.024                  0.035                     −0.058                           −0.043                           −0.07**                    0.208***                       0.006                           0.051                       0.734***                          −0.028                 −0.052                   0.023                            0.046                         −0.038                      0.006                     −0.065*                     −0.015                   −0.006                                   0.017

                       Aspiration: Secular                          −0.086**                   0.068*                           0.007                            −0.026                      0.073**                      0.259***                         −0.05                       −0.055                            −0.003                  0.017                  0.161***                          −0.017                        0.076**                             0                  0.09**                     0.11***                  0.093***                                 −0.038

                   Ambition: No Ambition                                                      −0.142*** −0.041                                                    −0.04                     −0.124*** −0.082**                                              0.03                       −0.003                            0.151***               −0.024                  −0.065*                           −0.029                        −0.072**                    0.058                      −0.035                     −0.022 −0.102*** 0.092***

           Ambition: Salaried Employment                                                                                        −0.054                           −0.13*** 0.462*** −0.136***                                                               0.001                       −0.046 −0.189*** −0.058                                                           −0.013 −0.105*** 0.167*** −0.072** 0.158*** −0.097*** 0.183*** −0.076**

             Ambition: Vocational Training                                                                                                                       0.147***                    −0.015                        0.045                           −0.014                      −0.035                            −0.058                 0.069*                   −0.041                           0.033                         0.138***                    0.056                      0.073**                    0.13***                  0.139***                                 0.012

                   Ambition: Entrepreneur                                                                                                                                                    −0.057                       0.077**                          −0.041                      −0.058                            −0.036                 0.094***                 0.008                            0.077**                        0.016                      0.069*                     −0.019                    0.175*** 0.105***                                                  −0.005

                 Ambition: Education High                                                                                                                                                                                 −0.083** −0.205*** 0.227*** −0.145*** −0.024                                                                                                  0.091**                            0.03                         0.204***                    −0.06*                    0.188***                    −0.029                   0.169***                                 −0.052
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Correlation
              Ambition: Education Neutral                                                                                                                                                                                                                 −0.136*** 0.075**                                              0.111*** 0.108*** 0.119***                                                       0.062*                         0.08**                     0.009                      0.065*                     0.069*                   0.111***                                 −0.05                                       1.0
                 Ambition: Education Low                                                                                                                                                                                                                                                0.033                             0.023                  0.004                  −0.102*** −0.067*                                               −0.037                      0.051                     −0.072** −0.062* −0.11***                                                                     0.012
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0.5
            Ambition: Education Religious                                                                                                                                                                                                                                                                                 −0.02                 −0.013                   0.051                            0.039                         −0.036                      0.006                      −0.043                     0.012                    −0.008                                   0.024
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0.0
                       Ambition: Marriage                                                                                                                                                                                                                                                                                                       −0.003                   −0.003                           0.042                         −0.005                      0.085**                    0.035                      0.084**                  −0.064*                                  0.011
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        −0.5
                      Ambition: Migration                                                                                                                                                                                                                                                                                                                                −0.016                           0.025                         0.11***                     −0.03                     0.103***                    0.055                    0.161***                                 −0.013

             Capacity: Vague Non Specific                                                                                                                                                                                                                                                                                                                                                                 0.071**                       −0.008 −0.127***                                       0.019                      −0.045                   0.093***                                 −0.004                                      −1.0

               Capacity: Reliance On God                                                                                                                                                                                                                                                                                                                                                                                                −0.033                      0.046                      −0.029                    0.111***                  0.091**                                  −0.038




82
                    Capacity: Ability High                                                                                                                                                                                                                                                                                                                                                                                                                         −0.197*** 0.681***                                     −0.028                   0.343*** −0.113***

                     Capacity: Ability Low                                                                                                                                                                                                                                                                                                                                                                                                                                                    −0.115*** 0.312***                                   −0.08**                                  0.057

                   Capacity: Budget High                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 −0.129*** 0.331*** −0.094***

                    Capacity: Budget Low                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           0.104***                                             0

     Capacity: Awareness Information High                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   −0.062*




                                              Aspiration: Secular
                                                                                                                                                                                                                                                                                                                           Ambition: Marriage
                                                                                                                                                                                                                                                                                                                                                  Ambition: Migration
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Capacity: Ability Low


                                                                                                                                                                                                                                                                                                                                                                                                                                          Capacity: Ability High
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Capacity: Budget Low




                                                                      Ambition: No Ambition
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Capacity: Budget High




                                                                                                                                                                   Ambition: Entrepreneur
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Figure 14: Correlations between annotations in human sample




                                                                                                                                                                                                                                                             Ambition: Education Low




                                                                                                                                                                                               Ambition: Education High
                                                                                                                                                                                                                                                                                                                                                                                                            Capacity: Reliance On God




                                                                                                                                                                                                                            Ambition: Education Neutral




                                                                                                                                 Ambition: Vocational Training
                                                                                                                                                                                                                                                                                                                                                                           Capacity: Vague Non Specific




                                                                                                                                                                                                                                                                                         Ambition: Education Religious




                                                                                                Ambition: Salaried Employment
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Capacity: Awareness Information Low


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Capacity: Awareness Information High
                                                   Figure 15: Validation and test set performance for increasing Nh

                           Aspiration: Religious               Aspiration: Secular              Ambition: No Ambition                Ambition: Salaried Employment             Ambition: Vocational Training


              0.75

              0.50

              0.25

              0.00

                       Ambition: Entrepreneur             Ambition: Education High            Ambition: Education Neutral               Ambition: Education Low               Ambition: Education Religious


              0.75

              0.50

              0.25
   F1 score




              0.00
                                                                                                                                                                                                                     Random
                           Ambition: Marriage                  Ambition: Migration           Capacity: Vague Non Specific              Capacity: Reliance On God                    Capacity: Ability High
                                                                                                                                                                                                                     Test set
                                                                                                                                                                                                                     Validation set
              0.75

              0.50

              0.25

              0.00

                           Capacity: Ability Low           Capacity: Budget High                   Capacity: Budget Low           Capacity: Awareness Information High     Capacity: Awareness Information Low


              0.75

              0.50

              0.25

              0.00
                     200         400        600    800   200        400       600    800     200         400      600       800      200       400        600        800      200         400        600       800
                                                                                           Human annotated sample size

Note: This Figure shows validation set performance (in green) and held-out test set performance (in blue) for each annotation as
the size of the human annotated training set increases along the horizontal axes. Each point represents a bootstrap run and the lines
show a local regression fit to these points. The sparsity of the annotation across QA pairs in the training set is shown in red as a
reference point.




                                                                                                                  83
                                                         Figure 16: F-statistic test for interpretability increases with both Nh and Nm

                                                Aspiration: Religious               Aspiration: Secular                   Ambition: No Ambition                      Ambition: Salaried Employment             Ambition: Vocational Training


                                   1200

                                   800

                                   400


                                            Ambition: Entrepreneur             Ambition: Education High                 Ambition: Education Neutral                     Ambition: Education Low               Ambition: Education Religious
   Machine annotated sample size




                                   1200

                                   800

                                   400


                                                Ambition: Marriage                  Ambition: Migration                Capacity: Vague Non Specific                   Capacity: Reliance On God                     Capacity: Ability High


                                   1200

                                   800

                                   400


                                                Capacity: Ability Low           Capacity: Budget High                        Capacity: Budget Low                Capacity: Awareness Information High      Capacity: Awareness Information Low


                                   1200

                                   800

                                   400


                                          200         400       600     800   200        400       600      800        200         400      600        800           200       400       600         800      200         400        600       800
                                                                                                                    Human annotated sample size


                                                                                                          Standardised Enhanced F statistic
                                                                                                                                                    −3 −2 −1 0   1



Note: This Figure shows how interpretability of each annotation, as measured by the F statistic in a regression of the annotation on
household characteristics in the enhanced sample, varies with the number of the human annotated interviews along the horizontal
axis (Nh ) and the number machine annotated interviews (Nm ) along the vertical axis. The color of each cell corresponds to the mean
F statistic in the enhanced sample across draws for that Nm and Nh . These F statistics are standardised so that they have zero mean
and unit standard deviation within each annotation, ensuring a consistent color gradient for each annotation.




                                                                                                                                84