What Makes Public Sector Data Valuable
           for Development?




                                                                                                                      Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
     Dean Jolliffe, Daniel Gerszon Mahler, Malarvizhi Veerappan, Talip Kilic,
                                and Philip Wollburg


Data produced by the public sector can have transformational impacts on development out-
comes through better targeting of resources, improved service delivery, cost savings, in-
creased accountability, and more. Around the world, the amount of data produced by the
public sector is increasing rapidly, but we argue the full potential of data to improve develop-
ment outcomes has not been realized yet. We outline 12 features needed for data to generate
greater value for development and present case studies substantiating these features. We ar-
gue that a key reason why the transformational value of data has not yet been realized is that
suboptimal data—data not satisfying these 12 features—are being supplied. The features
are that the data should be of adequate spatial and temporal coverage (complete, frequent,
and timely), should be of high quality (accurate, comparable, and granular), should be easy
to use (accessible, understandable, and interoperable), and should be safe to use (impartial,
confidential, and appropriate).
JEL Codes: C81, C83, O10, O20
Keywords: data, development, statistics.


Around the world, the supply of public sector data has increased rapidly. Since 2005,
the number of countries without a population and housing census conducted over
the preceding 10 years has fallen by nearly 80 percent (from 36 to 8), the number
of countries without a labor force survey conducted over the same period has fallen
by 50 percent (from 98 to 49), and the number of countries without nationally rep-
resentative administrative data on learning assessments within a five-year range has
fallen by 83 percent since 2008 (from 36 to 6) (World Bank 2021a).
   Public sector data can have a transformational role in development and efforts to
reduce poverty. Amongst many fruitful uses, data can be used to increase access to
government services, prepare for and respond to emergencies, target resources and
The World Bank Research Observer
© 2023 International Bank for Reconstruction and Development / The World Bank. Published by Oxford University Press
https://doi.org/10.1093/wbro/lkad004                                                                   38:325–346
foster the inclusion of marginalized groups, save money and resources in policy im-
plementation and service delivery, monitor progress and track performance, increase
accountability, and empower individuals (World Bank 2021b).
   While the increased availability of data has improved development outcomes, we
argue that their full potential for development is far from being realized. We suggest
that a major, and sometimes overlooked, reason for this shortfall is that the data pro-




                                                                                                        Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
duced do not have certain features that make them valuable. We first develop a con-
ceptual framework specifying 12 features often needed for data to most effectively
generate value: data should have adequate coverage (complete, frequent, and timely),
be of high quality (accurate, comparable, and granular), be easy to use (accessible,
understandable, and interoperable), and be safe to use (impartial, confidential, and
appropriate). We argue that often the feature least present can be decisive for the
value, or lack thereof, that can be derived from the data. Next, we use a collection
of case studies to provide support for these features and showcase how they matter
in practice. Too often, we find, the data produced by governments do not satisfy these
features and thus are not conducive to transforming development outcomes. The data
may be of poor quality, siloed in various administrative systems, not shared with the
public, not readable by computers, and so forth.
   We aim to contribute to the literature in several ways. We develop a framework of
features that increase the potential of data to be valuable. We discuss how the inter-
action of the 12 features is crucial to understanding the value of data and the policies
necessary to increase the value of data. We further explore how our framework re-
lates to the equilibrium of data supply and demand. We directly tie the theoretical
framework of data features to case studies to support our inferences on how best to
scale-up public data systems to realize greater value.
   We make no attempt to estimate the net social value of public sector data, in part
because many of the benefits occur in dimensions for which prices do not exist (e.g.,
improved health from drinking clean water), but this also reflects our view that there
is no satisfactory way to assign a monetary value to an inexhaustible, or nonrival,
good. The limitless scope for data to be used and reused to address new and unex-
pected problems, as well as the potential for data to be misused for harm, suggests to
us that any estimate of the expected net social value of public sector data is one that
will be very imprecise.
   Rather, by reviewing a series of use cases of public sector data, we provide evidence
of the theoretical framework. Following recommendations in the literature on case
studies, this has involved an iterative and inductive approach of using case studies to
develop the theory and testing the validity of the theory using case studies (Dubois
and Gadde 2002, Eisenhardt and Graebner 2007). The use of case studies is consid-
ered appropriate when current theoretical perspectives are inadequate due to little
empirical substantiation (Eisenhardt 1989), which we believe is pertinent due to the
difficulty of estimating the net social value of public sector data discussed above. We

326                                           The World Bank Research Observer, vol. 38, no. 2 (2023)
acknowledge the limitations of using a case study approach which some may con-
sider as lending itself to opinion-based and prescriptive analyses.
   We restrict our analysis to data collected in the public sector at large, focusing on
low- and middle-income countries. This means that we neglect private sector data,
most citizen-generated data,1 and data from high-income countries. We also shy
away from discussing two closely related issues: (a) why these 12 features are often




                                                                                           Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
missing from public sector data, that is, issues concerning the political economy of
data production, and (b) under what conditions the demand for public sector data can
be maximized. Our focus on data production means that the features we identify in-
crease the potential for data to be valuable for development. However, for that value to
be realized, the data need to be demanded and used for legitimate purposes.
   There are other frameworks that list features conducive for data to be valuable.
Such frameworks have been created by national statistical offices (Statistics Canada
2017), international organizations (OECD 2011 and United Nations 2019), and
academia (Biemer 2010 and Wilkinson et al. 2016). Other frameworks have been cre-
ated in more narrow contexts, such as for the case of health data (Wang and Desalvo
2018), official statistics in the US (Groshen 2021), and for the purpose of maintain-
ing confidence in public sector data (Gore et al. 1991). There is a considerable overlap
between the different frameworks, though the exact names and definitions of the fea-
tures in them differ from source to source. All 12 features we identify can be found in
some shape or form in one of the other frameworks. Though many of the frameworks
originate from work with survey data, they can be adapted to other data types. The
framework of Biemer (2010), for example, which is based on the concept of a total
survey error, has been adapted to big data (Amaya et al. 2020) and agricultural data
(Carletto et al. 2021).
   Other collections of case studies of the impact of data on development exist as well,
and we will draw examples from some of these other collections. Among these are the
Data Impact Case Studies gathered by Open Data Watch (2020), the Value of Data
Case Studies gathered by the Global Partnership for Sustainable Development Data
(2021), Open Data Impact Map’s collection of use cases of open data (Center for Open
Data Enterprise 2021), GovLab’s collection on the same topic (Verhulst and Young
2016), a World Bank collection on the same topic (World Bank 2015), data2x’s col-
lection of examples related to gender data (data2× 2019), and the World Bank’s col-
lection of examples related to digital identification in the health sector (World Bank
2018a), the public sector (World Bank 2018b), and the private sector (2018c).

Conceptual Framework
We outline 12 features that we argue are needed for public sector data to maximize
their potential value for development. The 12 features relate to coverage, quality, and
usability. With respect to coverage, inferences from the data need to be valid for all

Jolliffe et al.                                                                     327
Figure 1. Features of Data Conducive to Maximizing their Value for Development




                                                                                                           Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
relevant units and time periods. We argue that this means that the data need to be
complete, frequent, and timely (these features, and the ones to follow will be defined
more clearly in what follows). With respect to quality, the data need to accurately
measure the concepts of interest to the users and be comparable and granular. To en-
sure usability of the data, they should be easy and safe to use. Easy to use means that
the data can be accessed, that they are understandable, and that they are interoper-
able. To ensure that the data are safe to use, they need to be impartial, confidential,
and appropriate (Figure 1).
   Though many different types of public sector data exist—such as censuses, survey
data, administrative data, and geospatial data—we argue that these features increase
the potential value of all public sector data. The exact way in which each feature mat-
ters varies from data type to data type. One of the features we will outline, for exam-
ple, is that data benefit from being frequent. However, frequent data mean a different
thing for GDP data than for weather data. Moreover, public sector data are used for a
variety of purposes, such as to inform policy making, program implementation, and
monitoring. We likewise argue that satisfying these features increases the potential
value of data used for any of these purposes despite their differences. Again, the ex-
act ways the features matter, and the relative importance of different features, differs
between use cases.

328                                              The World Bank Research Observer, vol. 38, no. 2 (2023)
Adequate Coverage:

• Complete: Data are representative of the population of interest. Often, for data to maximize their
  value for development, they need to cover the entire population of interest, whether geo-
  graphical areas, households, or something else. In the case of census and administrative
  data, this typically means that the entire population of interest is enumerated. In the case
  of sample data, this means a complete sampling frame containing the entire population of




                                                                                                       Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
  interest, each member having some known, positive probability of being selected into the
  sample. In this case, completeness does not imply that the entire population of interest is
  directly captured in the data, but that the data are representative of the population of in-
  terest. Completeness is particularly relevant for redistributive policies where identifying the
  worst off requires that all are identified.
• Frequent: Data that are produced at regular intervals. The frequency of data relates to how of-
  ten new data points are made available. If one is only able to get new data on a topic, say,
  every 10 years, then understanding developments and implementing policies to improve
  outcomes becomes difficult. High frequency of data is particularly relevant for evaluating
  projects, tracking goals in national development plans, macroeconomic conditions, and
  more. For some outcomes, such as weather forecasts, the ideal frequency is at least daily,
  if not more frequent, while for other outcomes, such as macroeconomic data, the desired
  frequency may be monthly or quarterly. In each case, the desired frequency of data depends
  on the temporal dynamics of the outcome of interest, and this frequency may change as
  other conditions change.
• Timely: Data that are released shortly after collection or after an event occurring. Timeliness is
  particularly relevant for emergencies where policy response needs to be immediate, such as
  environmental disasters, health crises, conflicts and more. The timeliness of data depends
  on the mode of data collection. Some machine-generated data can be made available nearly
  instantaneously, while other data, such as survey data, often takes place over a longer period
  of time and needs careful analysis and cleaning before it can be released, leaving a time lag
  of several months.

High Quality:

• Accurate: Data that measure concepts of interest with minimal error. For data to be valuable
  they need to measure their concept of interest with minimal error. Minimizing the error
  requires paying attention both to the variance and bias of the statistic of interest. The former
  implies having enough power to minimize sampling error while the latter means that the
  data on expectation measure the statistic of interest accurately. Jointly minimizing the error
  means that the number of observations is adequate for relevant policy questions and that
  data collection and processing use methods and tools designed to capture the signal of some
  phenomena while filtering out noise or measurement error that can bias conclusions and
  evidence-based policies.
• Comparable: Data that are comparable across space and time. For data to be used for cross-
  country analysis or for tracking patterns over time, they need to be comparable across space
  and time. This means, for example, that the data should conform to certain standards and
  data collection instruments should be relatively stable over time. Examples of conforming
  to standards include the near-universal acceptance of the System of National Accounts
  for measuring the size of economies, and similarly the widely accepted standards for mea-
  suring geolocation and units of time. There are fewer clear examples of how to maintain


Jolliffe et al.                                                                                 329
  comparability over time, but one common view is to avoid making unnecessary changes to
  the data collection processes which may induce a break in comparability of the data series.
• Granular: Data that can be broken down by subgroups. Often for data to maximize their value
  they need to be broken down by subgroups, such as geographical areas, time, sex, disability
  status, and more. This requires these subgroups to be captured appropriately in the data.
  The level of detail available with granular data means that de-identifying subjects becomes
  critical to protecting the privacy of the subjects.




                                                                                                                Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
Easy to Use:

• Accessible: Data that are made available to a wide audience of users. Failing to make data available
  to relevant users prevents them from being used effectively. Making data accessible entails
  that the data are machine-readable, meaning that content can be processed by computers,
  and, where relevant, that the data are made available free of charge. Not all data should be
  made available to everyone, and to ensure the safe use of data, accessibility may be condi-
  tioned on a terms of use agreement and on other restrictions, such as how and where data
  can be accessed.
• Understandable: Data that are easy to understand, process, and use. Making data accessible is
  not enough for them to be used. To facilitate use, the data need to be easy to understand
  as well. This entails that the data come with metadata describing how they were gener-
  ated/collected and how to process them. For certain applications, understandability neces-
  sitates that the websites hosting the data are translated to English (Ekhator-Modayode and
  Hoogeveen 2021). For other applications, the value and use of data can also be maximized
  when they are summarized or visualized in figures, tables and more.
• Interoperable: Data that can be linked to other data sources. For data to maximize their use, it
  is often necessary to link and combine different data sources through common identifiers
  for persons, facilities, firms, geospatial coordinates, time stamps, or common classification
  standards. This ensures that information from multiple datasets can be combined, maxi-
  mizing the potential uses. Interoperability increases when data are conforming to specific
  standards. Interoperability amplifies the risk of data breaches and misuse, implying that
  terms of use agreements ought to be in place for users wishing to merge different datasets.

Safe to Use:

• Impartial: Data that are immune to the influence of stakeholders. For data to be used safely, they
  need to be immune to harmful influences of any stakeholders in the data lifecycle, such as
  funders, producers, or users. If stakeholders can negatively influence data such as altering
  or censoring data values to promote some agenda, the data lose credibility and prevent ob-
  jective and accurate data analysis. Lack of impartiality can have externalities on other data
  products; if users know that one data source has been meddled with, they may lose trust in
  all data products from the same institution.
• Confidential: Data that protect personal and sensitive information and are only accessible in a safe
  and secure manner. Protecting personal and sensitive information requires de-identification
  of data, such that individuals or establishments cannot be identified in the data. Accessibil-
  ity in a safe and secure manner allows access for legitimate purposes but seeks to limit the
  scope for misuse of the data, which could imply, for instance, that data can only be accessed
  from certain virtual or physical enclaves or through systems preventing users from storing
  the data locally.


330                                                   The World Bank Research Observer, vol. 38, no. 2 (2023)
• Appropriate: Data that measure concepts of interest with a clear development purpose. One way
  to restrict misuse of data is to ensure that only data appropriate for measuring the concepts
  of interest for a clear policy purpose are produced and used, without attempts to collect
  excessive information or conduct surveillance. Appropriateness implies a proportionality or
  adequacy principle, by which the amount of data is proportional or adequate to the need.




                                                                                                  Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
Interactions between Data Features

To understand how value can be derived from these features it is constructive to cre-
ate a data production function, which is a function of the 12 features: supply =
 f (x1 , x2 , . . ., x12 ).2 Though such a function cannot be estimated in practice, the signs
of its partial and cross derivatives are often known. As a general rule, the more
present a feature is, the greater potential the data bring to development, meaning
that ∂ f /∂ xi > 0. But there are important caveats and nuances to this broad state-
ment, given that the effect of one feature on the value of data is not independent of
the other data features. More specifically, there are three important ways in which
value is determined from the interaction of features: (a) the feature least present is
often the constraining factor on the value that can be derived (i.e., increasing other
features may not increase value if the least present feature is unchanged), (b) at times
there are positive spillovers between features, (c) at times there are negative spillovers
between features.
1. The importance of the features least present. Often the derived value will be de-
   termined by the feature least present. When this is the case, the data produc-
   tion function can be represented through a Leontief specification: supply =
   min (x1 , x2 , . . ., x12 ). Such a situation is comparable to Kremer’s O-Ring theory
   of economic development, where one malfunction in the value chain in produc-
   tion can dramatically reduce a product’s value (Kremer 1993). More colloquially,
   this is sometimes expressed as a chain being only as strong as its weakest link. If,
   for instance, nearly all features are present, but the data are not produced with
   adequate coverage, they may not be able to serve their intended policy objective.
   When data are not of high quality, they could misguide policy decisions. When
   the data are not easy to use, they might not be used at all. When data are not
   safe to use, they may end up causing harm due to data breaches, surveillance, or
   exclusion.
2. Positive spillovers between features. The strength of certain features in a dataset may
   nurture progress in other features. For the data production function, this implies
   that the cross derivatives are positive in such cases: ∂ 2 f /∂ xi ∂ x j > 0. For exam-
   ple, when data are made accessible to a larger public, it tends to create a sense of
   greater scrutiny of the data which fosters impartiality (or, in other words, serves
   as a check on efforts to manipulate data). The argument also goes the other way;
   weakness in a feature might cause weakness in another feature. For example,
   when data quality is weak, and especially when methodological foundations are
Jolliffe et al.                                                                            331
   vague, it becomes easier to manipulate statistics in one direction or another, vio-
   lating impartiality (Jerven 2019).
3. Negative spillovers between features. On other occasions the presence of one feature
   has a negative impact on other features. In such cases, the cross derivatives of the
   data production function are negative: ∂ 2 f /∂ xi ∂ x j < 0. For data producers, this
   can create trade-offs and conflicts between features. For example, when advanc-




                                                                                                         Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
   ing one feature is costly—such as collecting more data, which can help with cover-
   age and accuracy—this may limit resources that can be devoted to other features.
   Data producers may also have to balance temporal comparability with the need
   to revise the data collection instruments if the behavior of the enumerated entity
   changes significantly (e.g., list of consumption items may need to change if new
   items are introduced into society) or if international standards of data collection
   are updated over time, in part to facilitate the adoption methods to increase accu-
   racy. And perhaps most importantly, the potential harm from data not being safe
   to use increases when certain other features are in place (Hasanzadeh et al. 2020,
   Heijlen and Crompvoets 2021). For example, when data are granular, interopera-
   ble, and have wide coverage, they can be misused more easily and more effectively
   for illicit ends, magnifying the risk of confidentially violations. In essence, because
   these features make the data very useful and usable, they multiply the ways in
   which the data could be used to cause harm.


The Equilibrium Data Supply and Demand

Even if the 12 features are present in some public sector data, the data need not gen-
erate value for development. For the value to be realized, the data also need to be de-
manded and used effectively for legitimate purposes.
   Just like we have specified a data supply function, we could likewise specify the de-
terminants of a data demand function, which would include data literacy, infrastruc-
ture, incentives, and more. Discussing the determinants of data demand is beyond
the scope of this paper and is covered in greater detail in World Bank (2021b). Here
we want to discuss whether the market for data—the intersection of data supply and
data demand—clears with an efficient amount of data being produced. In practice,
this does not tend to be the case for several reasons, three of which are:

1. Markets for public sector data are largely non-existent. In private markets, prices re-
   veal to producers of private goods the efficient supply of goods to provide. In the
   case of public sector data, there are no prices that inform governments of how
   people value data. Without prices, there are no economic forces that help ensure
   an equilibrium outcome where supply equals demand.
2. Government data supply is often a public good. Most public sector data, really data in
   general, can be characterized as having attributes of a public good. In particular,

332                                            The World Bank Research Observer, vol. 38, no. 2 (2023)
   data are nonrival—one person’s use does not diminish another person’s use. As
   with all public goods, this tends to lead to an undersupply.
3. Data use is often a positive externality. Most legitimate data use has positive impacts
   on individuals beyond the one using the data. This means that public sector data
   is a positive externality and by consequence that both the use and production of
   the data is below the optimal level.




                                                                                             Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
   These three issues have been widely recognized and led to market interventions,
most notably large international and national efforts aimed at boosting the capac-
ity of national statistical offices and at boosting data literacy (World Bank 2021b).
Though these efforts certainly are important, our analysis sheds light on a further
aspect of the sub-optimal supply of data. In particular, we argue that sub-optimal
supply is not only related to a sub-optimal amount of data supplied but also to the
wrong kind of data supplied—data that do not have the 12 features listed above. Im-
proving the feature least present may in many cases be quicker and cheaper than pro-
ducing new datasets or improving data literacy. Making data more understandable,
for example, does not require producing new data and increases the use of data at any
given data literacy level. Likewise, introducing safeguards that minimize the poten-
tial for data to lose its impartiality comes at minimum economic costs and is likely to
increase the trust users have in data and hence have positive spillovers on other data
products.


Case Studies
This section relies on case studies illustrating how each feature can bring about pos-
itive change or how the absence of a feature has resulted in worse development out-
comes. Most examples involve several features either directly or indirectly, and as out-
lined above, having in place a single feature in isolation is seldom enough to bring
about value.


Completeness
When data cover the entire population of interest it is possible to make inferences
that one otherwise would not be able to. One of the most fundamental ways in which
countries can make sure their data cover their entire population of interest is through
population registration systems and by assuring that all individuals are covered in
government databases. In Thailand, at the turn of the century, only 71 percent of
the population was covered by a public health insurance scheme that was intended
to be universal. Yet the country had a near-universal population register in which
citizens were issued a personal identification number when they were born or when
their household was registered for the first time. Leveraging this register and the

Jolliffe et al.                                                                       333
personal identification information from the existing public insurance scheme, the
government was able to identify the population not covered and increased health in-
surance from 71 percent to 95 percent (World Bank 2020a and World Bank 2018a).
    In the absence of nation-wide administrative data, representative survey data ful-
fill a similar role. In Nigeria, for example, the government commissioned the 2015
National Water Supply and Sanitation Survey to understand access to water and




                                                                                                       Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
sanitation of all households in Nigeria. As part of the survey, 201,842 households,
89,721 improved water points, 5,100 water schemes, and 51,551 public facilities in-
cluding schools and health care facilities were enumerated. The data revealed that
sanitation services of 130 million Nigerians did not meet the standard for sanita-
tion as expressed in the United Nation’s Millennium Development Goals. More specif-
ically, the data revealed that inadequate access to water is particularly an issue for
poor households, and that public expenditure in water and sanitation is limited and
of poor quality (Figure 2) (World Bank 2017). In response to these findings, Presi-
dent Muhammadu Buhari declared a state of emergency in the sector and launched
the National Action Plan for the Revitalization of Nigeria’s Water, Sanitation and
Hygiene (WASH) Sector (Nigeria Federal Ministry of Water Resources 2018). Pres-
ident Buhari pledged that his administration would prioritize WASH infrastructure
development, long-term planning, and stakeholder coordination. In recognition of
the sector-wide crisis, the government requested from the World Bank a 700 million
USD lending operation to support the sector.

Frequency
When data are frequent, they can be better used for monitoring. Often, this requires
data that are at least annual, so they can inform annual budget and policy decisions.
This has been the case in Costa Rica where a Multidimensional Poverty Index using
a household survey has been adopted as an official measure to inform and monitor
poverty reduction strategies. The index reveals how the country is performing along
key indicators related to education, health, housing, employment, and social protec-
tion. In May 2016, a Presidential Directive was issued stating that the index should
be used for budgetary planning and as an official measure for allocating resources
and monitoring and evaluating social programs. Through this Directive, the index
has been used to modify the allocation of resources, which helped accelerate poverty
reduction during austerity without an increase in the budget (Multidimensional
Poverty Peer Network 2017).
   When data are infrequent, the consequences can be dire. A study of intergov-
ernmental fiscal transfers in Bolivia, Ecuador, and El Salvador revealed that trans-
fers can be misallocated in the absence of frequent data. Since the transfers rely
on subnational population estimates and given that recent population estimates are
not always available, the transfers at times rely on outdated data. By employing

334                                          The World Bank Research Observer, vol. 38, no. 2 (2023)
Figure 2. Complete Data Pinpointed Areas of Nigeria that Needed Better Sanitation




                                                                                           Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
Source: World Bank (2017).



retrospective census estimates, Roseth et al. (2019) were able to simulate how trans-
fers would have been allocated had population data been available at the time of their
allocation. In El Salvador, inaccuracies in municipal population sizes led to 92 million
US dollars being misallocated between 2000 and 2007. This is equivalent to more
than 27 times the annual budget of the national statistical office and 8 times the bud-
get of the latest census.


Timeliness

When data are timely, they can lead to better emergency response when disasters hit,
whether environmental, financial, health, or conflict related. For example, weather
data, especially weather forecasts, can help people anticipate and prepare for extreme

Jolliffe et al.                                                                     335
events. The value of such data can be illustrated by two intense cyclones in the Bay
of Bengal that occurred 14 years apart. The 1999 cyclone caught the Indian state
of Odisha by surprise, causing massive devastation, killing more than 10,000 people,
and destroying housing and public infrastructure. Since then, the Odisha State Disas-
ter Management Authority and the government of Odisha have invested in weather
forecast data and disaster response measures. When another cyclone hit in 2013,




                                                                                                         Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
nearly 1 million people were evacuated to cyclone shelters, safe houses, and inland
locations, and only 38 people died during and immediately after the storm (Hallegatte
et al. 2017). These impressive results would not have possible without the weather
data that gave sufficient advance warning of the cyclone.
   Satellite data can likewise offer a timely response to emerging threats. In Sri Lanka,
where more than two million hectares of natural forest are home to over 30 endan-
gered mammals, satellite data have been used to safely and quickly respond to defor-
estation of protected areas (Jamilla and Ruiz 2020). The Department of Forest Con-
servation used to rely mostly on routine patrolling, which can be a strain on resources
and be a dangerous task due to the terrain and animals that may be encountered
by patrollers. Propelled by Covid-19, which restricted the use of patrolling, the de-
partment now relies on a mobile app provided by the Global Forest Watch. The app
relays real-time satellite imagery to alert users of possible deforestation and provide
evidence when deforestation has occurred. In one case, a popular local TV channel
reported that encroachments had taken place without being addressed by the depart-
ment, upon which the department used the satellite data and alerts to identify two
locations of encroachment. As a result of this, legal actions were taken to stop the
encroachments from expanding.

Granularity

Data can be granular along a number of dimensions, for example granular in space,
granular in time, or granular in demographic attributes, such as sex. Spatial gran-
ularity can help target resources and foster inclusion. In Croatia, for instance, data
from the population census were combined with household survey data and admin-
istrative data to create detailed maps of poverty and deprivations (Figure 3) (Croatian
Bureau of Statistics 2016 and Croatian Bureau of Statistics 2017). The maps re-
vealed large differences in living standards across municipalities and within the re-
gions used for allocating funds from the European Union (EU). More than one-third of
the EU’s annual budget—equivalent to more than €50 billion—is dedicated to invest-
ments in infrastructure, such as hospitals and schools, in less economically developed
regions. The allocation of the funds depends on regions’ gross domestic product per
capita, which means that poor municipalities situated in non-poor regions may be
prevented from receiving funding. Armed with the poverty map, Croatia responded
with proposals for new regional divisions that concentrate EU funds in the poorest

336                                            The World Bank Research Observer, vol. 38, no. 2 (2023)
Figure 3. Mapping Pockets of Poverty in Croatia Allowed Better Targeting of Antipoverty Funds




                                                                                                Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
Source: Croatian Bureau of Statistics (2016).



areas (Government of the Republic of Croatia 2019). This reordering, thanks to bet-
ter data and analysis, has the potential to reduce inequality and pockets of poverty in
Croatia.
   In the absence of granular data, resources may be poorly allocated. This was the
case in Sierra Leone, which lacked granular data when making 1,561 water in-
vestments across 12 districts in 2012. The investments reached 28,556 individu-
als, a majority of which were located in areas where the surrounding populations
were already served by other functional water points. A retrospective analysis using
highly granular water point data made available through the Water Point Data Ex-
change (WPdx)—an online platform for rural water point data sharing, access, and
analysis—showed that the results could have been very different if granular data
were available and used in 2012. Had the WPdx data been available and used for
the investments in 2012, it would have been possible to reach nearly 4 for times as

Jolliffe et al.                                                                           337
many people with only about a third of water point investments. This is equivalent to
a reduction in costs per-person reached from 54.66 to 3.94 USD (WPdx 2022).

Accuracy
In the absence of a sound methodological base, indicators derived from data may be




                                                                                                         Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
inaccurate and misleading, limiting their policy relevance. In the Middle East and
North Africa, basic statistics such as unemployment rates are measured with impre-
cise definitions, blurring the line between unemployment and informality which dis-
tort the role of women and rural areas in assessments of national employment. Small
changes in the definitions have large implications on the quantity of interest (Arezki
et al. 2020).
   Inaccurate data also pertains to GDP figures. In 2010, Ghana officially revised
its GDP figures upwards by about 60 percent. This change suggested that Ghana’s
economic performance had been significantly underestimated for years. The revision
was the result of moving from the 1968 System of National Accounts guidelines
to the 1993 vintage as well as accounting more accurately for emerging sectors of
the Ghanaian economy (Jerven and Duncan 2012). Similar upwards revisions oc-
curred in other African nations, including Nigeria, Senegal, Kenya, and Zimbabwe
(Koumane et al. 2019). While these revisions have no real direct effect on improving
wellbeing, an accurate assessment of a country’s output better informs macroeco-
nomic policies, affecting well-being indirectly. It also improves cross-country compa-
rability which allows countries to better monitor relative performance.

Comparability

When data lack comparability, they will be of limited use. A good example of this is
COVID-19 case data, which, though it arrived daily in most countries of the world,
could only imperfectly be used for within-country comparisons over time. The pri-
mary reason for this is that as countries increased their testing capabilities over time,
more people were reported as having contracted the virus. Though increased testing
is critical, this made data on confirmed cases less comparable over time within coun-
tries.
    Conversely, when a country’s data is comparable to that of other countries, it
makes it possible to benchmark their performance against peers, which they can use
to evaluate national policies and assess national priorities. Countries often respond
with reforms in areas where they are lagging. As one example, the Democratic Repub-
lic of Congo (DRC) made gender equity reforms upon seeing data from the Women,
Business and the Law Index—an index created by the World Bank to compare laws
and regulations affecting women’s economic opportunity across economies. The re-
form effort led to changes the DRC’s Family Code, which for decades contained legal

338                                            The World Bank Research Observer, vol. 38, no. 2 (2023)
provisions that prevented married women from carrying out economic activities. The
adoption of a new Family Code in July 2016 allowed married female entrepreneurs in
DRC to start formal businesses, open bank accounts, register a company, and perform
a host of other economic activities without interference from their husbands.

Accessibility




                                                                                           Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
When data producers make data widely available, they empower individuals to make
better choices through more information and knowledge. The digital revolution has
increased the potential and ease with which information can be shared. In Ethiopia,
small-scale farmers lack access to reliable price information, and thus often receive
prices far below market value. In 2008 the Ethiopian Commodity Exchange opened,
providing price information to farmers through text messages, hotlines, and online
information. As a result, the farmers have been able to cut trader margins by half
and increase their revenue (Vaitla et al. 2015).
   Another example where data attained greater value when it was made accessible
relates to public procurement. Too often, public projects are not implemented ade-
quately due to poor procurement, such as inflated costs, corruption, or ghost con-
tracts. Since 12 percent of global GDP is spent on public procurement, this matters
tremendously for development outcomes (Bosio and Djankov 2020). In Uganda, in
an attempt to improve procurement outcomes, local government entities made ad-
ministrative procurement data from the bidding process, down to the execution of
contracts, available to certain Civil Society Organizations (CSOs). The CSOs trained
community members to understand the information in the contracts and conduct site
checks to verify it. The findings revealed mismanagement of resources by contractors
and government officials and high dependence on noncompetitive contracts. Aside
from the direct benefits of assuring that contracts complied with national procure-
ment standards, the national public procurement agency upgraded its procurement
portal in line with international open contracting data standards, making Uganda
the first African country to do so (Africa Freedom of Information Centre 2018 and
Global Partnership for Social Accountability 2020).

Understandability

Data accessibility is not enough to ensure that data are used by governments or indi-
viduals when the data are difficult to understand or there is a lack of skills to under-
stand how to use the data. This was the case in Brazil where receiving summarized
data on research findings was instrumental for mayors to make policy changes. Evi-
dence from 2,150 municipalities found that informing municipal mayors of research
findings on the effectiveness of a simple policy change, increased the probability that
their municipality implemented the policy by 10 percentage points (Hjort et al. 2021).

Jolliffe et al.                                                                     339
   Another example, where making the data more understandable helped guide pol-
icy choices, comes from Pakistan. Prior to 2008, Pakistan’s Punjab province suffered
from poor government service delivery due to inefficiencies, rent seeking and more.
Lack of digitized service delivery processes made it impossible to track service deliv-
ery and monitor performance and satisfaction with services. In an attempt to take
on these challenges, in 2008, officials in one district of Punjab put in place a pilot




                                                                                                         Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
Citizen Feedback Monitoring Program (CFMP), in which data on service provision
from citizens was crowd-sourced using simple text messaging and other information
and communications technology (ICT) applications. Analytical reports were sent to
government officials enabling them to identify patterns and take evidence-based cor-
rective measures. In 2012, the initiative was scaled up to 36 districts of the province
and across 25 different public services. As of 2019, CFMP had contacted 29 million
citizens to solicit their feedback, and the government of Punjab had taken 41,600
corrective measures in response to CFMP data, including warnings, penalties and sus-
pension. One of many results of this is that the availability of medicine increased from
46 percent to 77 percent (Global Delivery Initiative 2019). In a 2015 evaluation of
the program, 90 percent of respondents said it had helped build trust between citizens
and authorities (Masud 2015).



Interoperability

When data sources are interoperable the potential use and value of data increase.
As one example, interoperability can help governments prioritize scarce resources
by cutting costs by eliminating duplicate or ghost recipients of social transfers—
beneficiaries, often of pension funds, who are no longer alive. This was the case in
Argentina where the government identified noneligible beneficiaries across various
social programs using the country’s system of unique taxpayer ID numbers to link
datasets, generating an estimated savings of 143 million USD over eight years (World
Bank 2020a and World Bank 2018b).
   Interoperability can also induce cost savings for the private sector, particularly data
containing key national identifiers of companies, individuals, geographical units, and
other entities, which allow for easy linking with the company’s own data. National ID
systems, for example, can increase private sector efficiency through cost savings such
as removing the need for companies to create their own ID systems, and increase rev-
enues by expanding the potential customer base (World Bank 2020a and World Bank
2018c). In India, Aadhar, the unique 12-digit identity number that can be obtained
by residents and citizens of India, has reduced the need for identity verification for
firms. As a result, it is estimated that a firm’s typical onboarding costs for new em-
ployees could decrease from 1,500 rupees to 10 rupees—a reduction of 99.3 percent
(World Bank 2020a and World Bank 2018c).

340                                            The World Bank Research Observer, vol. 38, no. 2 (2023)
Impartiality

When data are not impartial, the potential for misuse increases and trust in data can
be eroded. Assuring that data cannot be manipulated by producers can have positive
consequences on government budgets. India saved 19 percent on a rural employ-
ment guarantee scheme—the world’s largest workfare program—after introducing




                                                                                            Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
e-governance to the payments system. Most of the savings were incurred because lo-
cal officials were no longer tasked with distributing the funds and thus could not mis-
manage them and misreport the data. The change reduced officials’ personal wealth
by 10 percent (Banerjee et al. 2020).
   The World Bank recently underwent a case in which the impartially of one
of its key data products was called into question. An independent external re-
view established “data irregularities”—data choices not immune to the influence of
stakeholders—with respect to four countries’ rankings in the annual Doing Business
report. One of these was with respect to China’s 2018 ranking, where the indepen-
dent review found World Bank senior leadership to push for last-minute methodolog-
ical changes in an attempt to boost China’s ranking. This came in the backdrop of
high-ranking Chinese government officials expressing concerns that the country’s
ranking did not reflect its economic reforms at a time where the World Bank was
heavily reliant on China in negotiations for its capital increase goals (Machen et al.
2021). As a result of finding these irregularities, the credibility of the Doing Business
report was compromised and the World Bank decided to discontinue it.



Confidentiality

When data are not confidential, individuals may be identifiable. This violates core
principles of data protection and increases the likelihood that harm can be done to the
identified subjects. De-identifying individuals is not always enough to maintain con-
fidentiality. In the 1990s, for example, the Governor of Massachusetts approved mak-
ing de-identified medical records of state employees available for researchers. Though
the data had key identifiers such as name and addresses removed, by triangulating the
information available with other public information, a researcher was able to iden-
tify the medical records of the governor. Other individuals could likewise be identified
(Heffetz and Ligett 2014).
   Beyond re-identification, data breaches also pose a threat to confidentiality because
they raise doubts as to whether personal information can be safe. Aadhar, the unique
12-digit identity number of India mentioned before, has suffered from several data
leaks. In one instance, more than 200 government websites accidentally made per-
sonal data including demographic data, names, phone numbers, religion, bank ac-
count numbers and more, available publicly on the internet (TECH2 2018). Though

Jolliffe et al.                                                                      341
the data were quickly removed, such leaks often have permanent implications on con-
fidence in data systems.

Appropriateness
To avoid data misuse and surveillance, the amount of data collected should not be ex-




                                                                                                        Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
cessive but appropriate for the particular needs. Governments’ tracking and contact
tracing of infected and at-risk individuals through smartphone apps or phone loca-
tion data during the COVID-19 pandemic shed light on this. One example of how to
minimize the risk of surveillance beyond what is needed to minimize the spread of
the virus, which has been taken up by some countries, is to rely on a protocol devel-
oped by Apple and Google for their contact-tracing apps. With this protocol, phones in
close proximity of each other will use low energy Bluetooth technology to exchange a
temporary key code, which changes every 15 minutes. This information is stored on
each phone, rather than in a centralized database. When a user reports a positive test
result, the temporary key codes are used to notify other users of their potential ex-
posure (Zastrow 2020). This solution serves the intended purpose of notifying users
that may have been exposed to the virus, while being relatively privacy-preserving. It
cannot be used to enforce quarantine, track, or identify individuals, limiting the scope
for surveillance and other inappropriate uses (Privacy International 2020).
   In contrast, some countries have taken a more expansive approach to contact
tracing, in which more sensitive information was collected and used and, in some
instances, shared with law enforcement agencies. In Israel, mobile phone location
data was used by the domestic security agency to identify individuals exposed to the
virus and by the police to enforce quarantine, a practice which was subsequently
challenged as unconstitutional in the Supreme Court (Amit et al. 2020; Bradford
et al. 2020). India’s contract-tracing app stores location data alongside a set of de-
mographic information, including age, gender, phone number, and travel history, on
a centralized server. It was initially mandatory for air and rail travel as well as for
government employees to return to work (Arun 2020; Bradford et al. 2020; Privacy
International 2020).

Conclusion
Around the world, more and more data are produced in the public sector, yet more
value could be reaped from these data. In this paper, we have presented a con-
ceptual framework and empirical arguments suggesting why the returns to ex-
isting public sector data are not maximized. We show that for data to yield re-
turns for governments they must live up to certain features, these being that they
are of adequate spatial and temporal coverage (complete, frequent, and timely),
of high quality (accurate, comparable, and granular), easy to use (accessible,

342                                           The World Bank Research Observer, vol. 38, no. 2 (2023)
understandable, and interoperable) and safe to use (impartial, confidential, and
appropriate).
    We substantiate and validate these features through case studies covering a wide
range of countries, topics, data types, and data uses—an approach we acknowledge
may not satisfy all readers and appear prescriptive and opinion-based. These case
studies, we argue, illustrate how the 12 features matter in practice for generating




                                                                                                            Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
value from data.
    Too often governments or other public sector institutions are concerned with pro-
ducing a particular dataset, with less of an eye to how those data will generate value.
It is imperative for governments not only to invest in more data, but also to ensure
that data possess the right features. Often the value derived from data is determined
by the features least present, emphasizing the need for a comprehensive analysis of
which of the 12 features elicited in this paper may be lacking. This can ensure that
the whole chain from data production to data use is in place and the data can come
closer to realizing their value for improved wellbeing of the population and economic
development.


Notes
Development Data Group The World Bank Group (@worldbank.org) 2121 Pennsylvania Av-
enue, NW Washington, DC 20433, USA. Corresponding author: Daniel Gerszon Mahler
(dmahler@worldbank.org).
The author ordering was constructed through American Economic Association’s randomization tool
(confirmation code: qhPJIjX8dDgr). All authors are with the Development Data Group of the World
Bank. The authors are thankful for comments and feedback received from Andrew Dabalen, Gero Car-
letto, Kathleen Beegle, Lucas Kitzmuller, Luis Alberto Andres, Paolo Verme, Robert Cull, Tim Kelly, Umar
Serajuddin, and Vivien Foster. The authors are also thankful to the editor and three anonymous re-
viewers of the World Bank Research Observer for valuable comments. In addition, they are thankful for
the many colleagues who provided and reviewed case studies including Ann-Sofie Jespersen, Aparajita
Goyal, Audrey Ariss, Benjamin David Roseth, Brian Banks, Emilia Galiano, Elizabeth Goldman, Florence
Kondylis, Frederic Meunier, Hana Brixi, Isis Gaddis, Joao Pedro Wagner De Azevedo, Katy Sill, Madalina
Papahagi, Marc Tobias Schiffbauer, Marelize Gorgens, Maria Poli, Natalia Baal, Natasha Rovo, Megumi
Kubota, Paul Andres Corral Rodas, Sabina Alkire, Sonali Vyas, Stephane Hallegatte, Stephanie Jamilla,
Tea Trumbic, Theresa Beltramo, Utz Johann Pape, and Zelalem Yilma Debebe.
    1. Arguably, our framework can still speak to the features under which these other data types can be
repurposed and used for public good, but a sufficiently in-depth discussion of these data types is beyond
the scope of this paper.
    2. See Dillon et al. (2020) for another example of a data production function.



References
Africa Freedom of Information Centre. 2018. “Eyes on the Contract: Citizens Voice in Improving
   the Performance of Public Contracts in Uganda.” https://africafoicentre.org/download/eyes-on-the-
   contract-citizens-voice-in-improving-the-performance-of-public-contracts-in-uganda/.


Jolliffe et al.                                                                                      343
Amaya, A., P. P. Biemer, and D. Kinyon. 2020. “Total Error in a Big Data World: Adapting the TSE Frame-
  work to Big Data.” Journal of Survey Statistics and Methodology 8 (1): 89–119.
Amit, M., H. Kimhi, T. Bader, J. Chen, E. Glassberg, and A. Benov. 2020. “Mass-Surveillance Technolo-
  gies To Fight Coronavirus Spread: The Case of Israel.” Nature Medicine 26 (8): 1167–9.
Arezki, R., D. Lederman, A. A. Harb, N. Y. L. W. Elmallakh, Y. Fan, A. M. Islam, H. A M. Nguyen, and
   M. Zouaidi. 2020. “Middle East and North Africa Economic Update, April 2020: How Trans-
   parency Can Help the Middle East and North Africa.” Country Economic Memorandum




                                                                                                                  Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
   No. 147545, World Bank, Washington, DC. http://documents.worldbank.org/curated/en/
   343911586470772558/Middle-East-and-North-Africa-Economic-Update-April-2020-How-
   Transparency-Can-Help-the-Middle-East-and-North-Africa.
Arun, C., 2020. “India’s Contact Tracing App Is a Bridge Too Far.” Council on Foreign Relations. Digital
  and Cyberspace Policy Program, September 2. https://www.cfr.org/blog/indias-contact-tracing-app-
  bridge-too-far.
Banerjee, A., E. Duflo, C. Imbert, S. Mathew, and R. Pande. 2020. “E-Governance, Account-
  ability, and Leakage in Public Programs: Experimental Evidence from a Financial Man-
  agement Reform in India.” American Economic Journal: Applied Economics 12 (4): 39–72.
  https://www.aeaweb.org/articles?id=10.1257/app.20180302.
Biemer, P. P. 2010. “Total Survey Error: Design, Implementation, and Evaluation.” Public Opinion Quar-
   terly 74 (5): 817—848.
Bosio, E., and S. Djankov. 2020. “How Large Is Public Procurement.” Let’s Talk Development, February
   5. https://blogs.worldbank.org/developmenttalk/how-large-public-procurement.
Bradford, L., M. Aboy, and K. Liddell. 2020. “COVID-19 Contact Tracing Apps: A Stress Test for Privacy,
   the GDPR, and Data Protection Regimes.” Journal of Law and the Biosciences 7 (1): .
Carletto, C., A. Dillon, and A. Zezza. 2021. Agricultural Data Collection to Minimize Measurement Error and
   Maximize Coverage. Policy Research Working Paper No. 9745, World Bank, Washington, DC.
Center for Open Data Enterprise. 2021. “Open Data Impact Map.” Accessed August 10, 2021.
   https://www.opendataimpactmap.org/usecases.
Croatian Bureau of Statistics. 2016. “Croatia Small-Area Estimation of Consumption-Based Poverty.”
   https://razvoj.gov.hr/UserDocsImages//Istaknute%20teme/Kartom%20siroma%C5%A1tva//
   Croatia%20Small-Area%20Estimation%20of %20Consumption-Based%20Poverty%20(Poverty%
   20Maps).pdf .
———. 2017. “Index of Multiple Deprivation: Conceptual Framework for Identifying Lagging
 Municipalities and Towns in Croatia.” https://razvoj.gov.hr/UserDocsImages//Istaknute%20
 teme/Kartom%20siroma%C5%A1tva//Index%20of %20Multiple%20Deprivation%20-
 %20Conceptual%20framework_18.06.2019.pdf .
data2x. 2019. “Big Data, Big Impact? Toward Gender-Sensitive Data Systems.” https://data2x.org/wp-
   content/uploads/2019/11/BigDataBigImpact-Report-WR.pdf .
Dillon, A., D. Karlan, C. Udry, and J. Zinman. 2020. “Good Identification, Meet Good Data.” World Devel-
    opment 127: 104796. https://doi.org/10.1016/j.worlddev.2019.104796.
Dubois, A., and L.-E. Gadde. 2002. “Systematic Combining: An Abductive Approach to Case Research.”
  Journal of Business Research 55 (7): 553–60.
Eisenhardt, K. M. 1989. “Building Theories from Case Study Research.” Academy of Management Review
   14 (4): 532–50.
Eisenhardt, K. M., and M. E. Graebner. 2007. “Theory Building from Cases: Opportunities and Chal-
   lenges.” Academy of Management Journal 50 (1): 25–32.
Ekhator-Mobayode, U. E., and J. Hoogeveen. 2021. “Microdata Collection and Openness in the Middle
   East and North Africa: Introducing the MENA Microdata Access Indicator.” Policy Research Working
   Paper No. 9892. World Bank, Washington, DC.

344                                                     The World Bank Research Observer, vol. 38, no. 2 (2023)
Global Delivery Initiative. 2019. “Improving Public Service Delivery in Pakistan through Citizen Feed-
   back.” https://www.globaldeliveryinitiative.org/sites/default/files/case-studies/cs_pakistancitizen_
   v3a.pdf
Global Partnership for Social Accountability. 2020. “Making Public Contracts Work for People: Expe-
   riences from Uganda.” Accessed June 30, 2021. https://www.thegpsa.org/stories/making-public-
   contracts-work-people-experiences-uganda.
Global Partnership for Sustainable Development Data. 2021. “Value of Data Case Studies.” Accessed




                                                                                                                 Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
   August 10, 2021. https://www.data4sdgs.org/resources/value-data-case-studies.
Gore, S. M., T. Holt, and I. P. Fellegi. 1991. “Maintaining Public Confidence in Official Statistics.” Journal
  of the Royal Statistical Society. Series A (Statistics in Society): 1–6.
Government of the Republic of Croatia. 2019. “Gov’t Launches Changes to Country’s Statistical Subdi-
  vision.” News release, January 23. https://vlada.gov.hr/news/gov-t-launches-changes-to-country-s-
  statistical-subdivision/25178.
Groshen, E. L. 2021. “The Future of Official Statistics.” Harvard Data Science Review 3 (4).
   https://doi.org/10.1162/99608f92.591917c6.
Hallegatte, S., A. Vogt-Schilb, M. Bangalore, and J. Rozenberg. 2017. Unbreakable: Building the Resilience
  of the Poor in the Face of Natural Disasters. World Bank, Washington, DC.
Hasanzadeh, K., A. Kajosaari, D. Häggman, and M. Kyttä. 2020. “A Context Sensitive Approach to
  Anonymizing Public Participation GIS Data: From Development to the Assessment of Anonymiza-
  tion Effects on Data Quality.” Computers, Environment and Urban Systems 83: 101513.
Heffets, O., and K. Ligett. 2014. “Privacy and Data-Based Research.” Journal of Economic Perspectives 28
   (2): 75–98.
Heijlen, R., and J. Crompvoets. 2021. “Open Health Data: Mapping the Ecosystem.” Digital Health 7:
   20552076211050167.
Hjort, J., D. Moreira, G. Rao, and J. F. Santini. 2021. “How Research Affects Policy: Experimental Evidence
   from 2,150 Brazilian Municipalities.” American Economic Review 111 (5): 1442–80.
Jamilla, S., and S. Ruiz. 2020. “Satellite Data Helps Sri Lankan Forest Officers Patrol during Pandemic, at
   a Safe Distance.” Global Forest Watch, July 23. https://blog.globalforestwatch.org/people/sri-lanka-
   covid-19-forest-monitoring/.
Jerven, M., 2019. “The Problems of Economic Data in Africa.” In Oxford Research Encyclopedia of Politics.
   Oxford University Press. https://doi.org/10.1093/acrefore/9780190228637.013.748
Jerven, M., and M. E. Duncan. 2012. “Revising GDP Estimates in sub-Saharan Africa: Lessons from
   Ghana.” African Statistical Journal 15: 13–24.
Koumane, C. Y., B. B. N. Kalimi, and F. Pirlea. 2019. “Many African Economies Are Larger Than
  Previously Estimated. World Development Indicators Stories.” World Development Indicators Sto-
  ries, September 10. https://datatopics.worldbank.org/world-development-indicators/stories/many-
  economies-in-ssa-larger-than-previously-thought.html.
Kremer, M. 1993. “The O-Ring Theory of Economic Development.” Quarterly Journal of Economics 108
   (3): 551–75.
Machen, R. C., M. T. Jones, G. P. Varghese, and E. L. Stark. 2021. “Investigation of Data Irregularities
  in Doing Business 2018 and Doing Business 2020: Investigation Findings and Report to the Board of
  Executive Directors.” WilmerHale.
Masud, M. O. 2015. “Calling Citizens, Improving the State: Pakistan’s Citizen Feedback Monitor-
  ing Program, 2008–2014.” Innovations for Successful Societies, Princeton University, Prince-
  ton, NJ. https://successfulsocieties.princeton.edu/publications/calling-citizens-improving-state-
  pakistan%E2%80%99s-citizen-feedback-monitoring-program-2008-E2%80%93
Multidimensional Poverty Peer Network. 2017. “Dimensions.” August, Number                                  4.
  https://www.mppn.org/wp-content/uploads/2017/08/Dim_4_ENGLISH_online.pdf

Jolliffe et al.                                                                                           345
Nigeria Federal Ministry of Water Resources. 2018. “National Action Plan for Revitalization of the
   WASH Sector.” June 26. https://waterresources.gov.ng/policy-documents/June 26, 2020.
OECD. 2011. “Measuring Trust in Official Statistics.” https://www.oecd.org/sdd/50027008.pdf
Open Data Watch. 2020. “Data Impact Case Studies.” Accessed November 1, 2020.
  https://dataimpacts.org/case-studies/.
Privacy International. 2020. “Covid Contact Tracing Apps Are a Complicated Mess: What You




                                                                                                                 Downloaded from https://academic.oup.com/wbro/article/38/2/325/7118955 by Joint Bank-Fund library user on 04 September 2023
   Need To Know.” https://privacyinternational.org/long-read/3792/covid-contact-tracing-apps-are-
   complicated-mess-what-you-need-know
Roseth, B., A. Reyes, and K. Y. Amézaga. 2019. “The Value of Official Statistics: Lessons from Intergov-
   ernmental Transfers.” Inter-American Development Bank. https://doi.org/10.18235/0001883.
Statistics Canada. 2017. Data quality toolkit, release data September 27. https://www.statcan.gc.ca/
   eng/data-quality-toolkit
TECH2. 2018. “Aadhaar Security Breaches: Here Are the Major Untoward Incidents That Have
  Happened with Aadhaar and What Was Actually Effected.” https://www.firstpost.com/tech/news-
  analysis/aadhaar-security-breaches-here-are-the-major-untoward-incidents-that-have-happened-
  with-aadhaar-and-what-was-actually-affected-4300349.html
United Nations. 2019. “United Nations National Quality Assurance Frameworks Manual for
  Official Statistics.” https://unstats.un.org/unsd/methodology/dataquality/un-nqaf-manual/#UN-
  NQAF-Manual https://dataimpacts.org/project/health-surveys/.
Vaitla, B., C. Wells, and C. Van Horn. 2015. “Market Data Raise Farmers’ Income.” Data Impacts Case
   Studies. https://dataimpacts.org/project/market-data-raise-farmer-income/.
Verhulst, S., and A. Young. 2016. “Open Data Impact: When Demand and Supply Meet.” Accessed Au-
   gust 10, 2021. https://thegovlab.org/static/files/publications/open-data-impact-key-findings.pdf .
Wang, Y. C., and K. DeSalvo. 2018. “Timely, Granular, and Actionable: Informatics in the Public Health
  3.0 Era.” American Journal of Public Health 108 (7): 930–4.
Wilkinson, M. D., M. Dumontier, I. J. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, and N. Blomberg
   et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific
   Data 3 (1): 1–9.
World Bank. 2015. Open Data for Sustainable Development. Transport and ICT, World Bank, Washington,
  DC.
———. 2017a. A Wake Up Call: Nigeria Water Supply, Sanitation, and Hygiene Poverty Diagnostic. World
 Bank, Washington, DC.
———. 2018a. The Role of Digital Identification for Healthcare: The Emerging Use Cases. World Bank, Wash-
 ington, DC.
———. 2018b. Public Sector Savings and Revenue from Identification Systems: Opportunities and Constraints.
 World Bank, Washington, DC.
———. 2018c. Private Sector Economic Impacts from Identification Systems. World Bank, Washington, DC.
———. 2020a. Benin, Burkina Faso, Togo and Niger - Second Phase of West Africa Unique Identification for
 Regional Integration and Inclusion (WURI) Project. World Bank, Washington, DC.
———. 2021a. “Statistical Performance Indicators.” World Bank, Washington, DC. Accessed Septem-
 ber 16, 2021. https://www.worldbank.org/en/programs/statistical-performance-indicators.
———. 2021b. “World Development Report 2021: Data for Better Lives.” World Bank, Washington, DC.
 https://wdr2021.worldbank.org/
WPdx. 2022. “Data Use Impact Desktop Study.” https://www.waterpointdata.org/wp-content/
  uploads/2022/02/Data-Use-Impact-Desktop-Methodology-and-Results_revised.pdf
Zastrow, M. 2020. “Coronavirus Contact-tracing Apps: Can They Slow the Spread of COVID-19? Nature.
   https://doi.org/10.1038/d41586-020-01514-2

346                                                    The World Bank Research Observer, vol. 38, no. 2 (2023)