AUTHOR ACCEPTED MANUSCRIPT
                                 FINAL PUBLICATION INFORMATION

                     Assessing Bias in Smartphone Mobility Estimates in Low Income Countries

                           The definitive version of the text was subsequently published in

      Proceedings of the 4th ACM SIGCAS Conference on Computing and Sustainable Societies, , 2021-06-28

    Published by Association for Computing Machinery and found at http://dx.doi.org/10.1145/3460112.3471968




                    THE FINAL PUBLISHED VERSION OF THIS MANUSCRIPT
                      IS AVAILABLE ON THE PUBLISHER’S PLATFORM



This Author Accepted Manuscript is copyrighted by World Bank and published by Association for Computing
Machinery. It is posted here by agreement between them. Changes resulting from the publishing process—such as
editing, corrections, structural formatting, and other quality control mechanisms—may not be reflected in this version
of the text.

You may download, copy, and distribute this Author Accepted Manuscript for noncommercial purposes. Your license
is limited by the following restrictions:

    (1) You may use this Author Accepted Manuscript for noncommercial purposes only under a CC BY-NC-ND
        3.0 IGO license http://creativecommons.org/licenses/by-nc-nd/3.0/igo.

    (2) The integrity of the work and identification of the author, copyright owner, and publisher must be preserved
        in any copy.

    (3) You must attribute this Author Accepted Manuscript in the following format: This is an Author Accepted
        Manuscript by Milusheva, Sveta; Björkegren, Daniel; Viotti, Leonardo Assessing Bias in Smartphone
        Mobility Estimates in Low Income Countries © World Bank, published in the Proceedings of the 4th ACM
        SIGCAS Conference on Computing and Sustainable Societies 2021-06-28 CC BY-NC-ND 3.0 IGO http://
        creativecommons.org/licenses/by-nc-nd/3.0/igo http://dx.doi.org/10.1145/3460112.3471968




                                                 © 2022 World Bank
 Assessing Bias in Smartphone Mobility Estimates in Low Income
                           Countries
                           Sveta Milusheva, Daniel Björkegren, Leonardo Viotti
                                                         2021


    It has become common for governments and practitioners to measure mobility using data from smartphones,
especially during the COVID-19 pandemic. Yet in countries where few people have smartphones, or use mobile internet,
the movement of smartphones may not be a good indicator of the movement of the population. This paper develops a
framework for approaching potential bias that can arise when measuring mobility with smartphones. Using mobile
phone operator records in Uganda, we compare the mobility of smartphones and the basic and feature phones that are
more common. Smartphones have different travel patterns, and decrease mobility substantially more in response to a
COVID-19 lockdown. This suggests caution when interpreting smartphone mobility estimates in contexts with low
adoption.


1     Introduction
     Understanding the mobility of populations is crucial for transportation (35; 23; 15; 19), the spread of disease
(2; 7; 25; 28; 31; 37; 38; 45; 39; 46), natural disasters (14; 36; 22; 8), and–during a pandemic–measuring social contact
(1; 18; 20). A wide array of recent work uses data collected from the motion of smartphones to infer how people
in a society move (9; 11; 12; 27; 30; 29). Under the COVID-19 pandemic this type of analysis has crossed into the
mainstream, with a proliferation of analysis using providers like Google Mobility Reports, Facebook, Unacast, Cuebiq,
SafeGraph, and Baidu. But this raises the question: do smartphones move in the same way as the population? This is a
concern particularly in societies where few people own smartphones. Smartphone owners are likely to be wealthier,
may live in different areas, and may move differently. If so, smartphone mobility estimates may be misleading about
how populations move (10). This paper assesses this question, by comparing how smartphones move to how other types
of mobile phones move, in a baseline month, and in response to the shock of the arrival of COVID-19.
     Smartphone mobility data has several advantages for measuring mobility: many smartphones have GPS which can
provide precise locations and can collect data passively at a high frequency. Additionally, mobility can be measured by
independent apps. However, there are many countries where few people own smartphones, and even among adopters,
usage is low due to high costs of data and sparse wifi coverage. Another possibility is to measure mobility from operator
records, which note the cell towers used to transmit transactions. This is harder for researchers and policymakers to
access, but includes the mobility of both smartphones and basic/feature phones. An issue is that location measurement
is active, not passive–operators typically only record the locations of individuals when they make a transaction.
     In this paper we develop a framework for thinking about two biases that can arise when inferring mobility from
digital data: selection into ownership of the technology and selective use of the technology. We use data from a
major mobile network operator in Uganda, a lower-income country that has similar phone ownership patterns as other
countries in sub-Saharan Africa. We identify likely smartphone users in the dataset, and compare the behaviors of this
sample of users to non-data users.
     We find that data users (smartphones) have different mobility patterns from users who do not regularly use data
packets. Data users have more longer-distance travel, with 13% to 22% more daily trips to non-neighboring counties
at baseline on average. Additionally, they decrease mobility more after the COVID-19 lockdown policies relative to
non-data users, particularly in the counties most affected by COVID-19 policies. That means that inferring mobility
based on smartphones could lead policymakers to erroneously believe that population mobility has dropped more than
it actually has. This is in line with research from developed country settings that has found larger decreases in mobility
among higher income populations (42; 17; 21).

1.1    Related literature
    This paper joins a large literature that uses internal data from mobile phone operators (Call Detail Records, or
CDR) to measure the mobility of mobile phones in developing country contexts (44; 16; 25; 45; 46; 4; 3; 6; 19). It is
challenging to access these data, however, and these examples are difficult to replicate across countries (26). As more
people have adopted smartphones, it has become common to measure mobility using smartphone apps in developed
countries, where many people own smartphones. There is an emerging literature using these measures in developing
country contexts (27; 29; 20; 34). We study a low income context where few people have smartphones.
    This paper builds on work that studies biases that arise from measuring mobility using data from mobile phones
(13). These works have primarily assessed two types of bias:
    Mobility estimates may not be representative when they are measured on a subset of a population that has adopted a
digital technology. (43) finds that mobility differs little by demographics among mobile phone owners in Kenya during
an early period of adoption, 2008-2009.
    Many digital technologies collect digital trace data only when particular software and features are enabled (e.g.,
GPS and apps that collect user data), or actions are taken (e.g., calls are placed, for operator data, or app check-ins or
posts). (47; 32) find that mobility as measured by where actions are taken can differ from more passively collected GPS
measures.



                                                            1
    (10) assesses the net of both biases in the US, finding that smartphone mobility measures undercount older
demographics when compared to voting records in a national election.
    This paper evaluates whether smartphones have different mobility patterns from basic/feature phones in a developing
country. In settings like ours, smartphone penetration is still low and the selection bias may be larger because the
devices are costly relative to average income. Additionally, often people pre-pay for phone services, and given the high
cost of data, this may further limit the population that is captured with smartphone data. This is a setting with limited
data to validate indicators; we provide evidence by comparing those with access to a smartphone to those without access
within the same dataset and study how the generated mobility indicators differ. In this way we limit any differences that
might arise due to the data sources and can focus on the differences in behaviors that are measured by the same source
for different types of users.


2    Background
    Mobile phone subscriptions have grown dramatically in the last two decades, even in low-income countries. In 2005,
there were 23 mobile phone subscriptions per 100 people in developing countries; by 2019 there were 103 subscriptions
per 100 people (40). However, in lower income countries few of these phones are smartphones connected to the internet.
In Africa, there are only 33 mobile broadband subscriptions per 100 people (Figure 1). Given the lack of data in many
African countries, though, there is substantial interest in statistics generated using smartphones. We consider what
biases may arise when studying population mobility based on smartphones.




                                Figure 1: Mobile Phone Subscriptions by Region, 2020

Notes: Regions are based on the regional grouping of the ITU’s Telecommunication Development Bureau. Values are
June 2020 estimates for 2020 and the data was updated November 2020. Source: ITU World Telecommunication/ICT
Indicators database.


    We focus on Uganda, which has similar rates of adoption to other countries in sub-Saharan Africa. Per 100
people, Uganda has 57 mobile phones, but only 34 mobile broadband connections (10th percentile and 16th percentile
respectively, out of 184 countries reported by ITU in 2018 or 2019). The proportion of mobile phones that have
broadband connections (around 0.59), is close to the median across sub-Saharan African countries (0.67).
    Households with smartphones are wealthier and more educated than those without. Table 1 shows demographic
characteristics by phone type, as collected by Research ICT Africa (RIA)’s 2018 After Access ICT Access and Use
Survey. People without a mobile phone are less likely to have electricity, have fewer years of education, they have lower
log household income and a lower number of assets on average. While those with a basic phone show higher values
for all of these characteristics they are still lower than those with a feature phone and much lower than those with a
smartphone. These patterns in relation to the characteristics of different types of phone owners are consistent across
the other eight countries in sub-Saharan Africa where RIA conducted this survey in 2018 (see Appendix for table of
statistics). This suggests that smartphones will tend to track the behavior of high income people.



                                                           2
    People with internet access self report about twice as much travel as those without access, in the 2016 Demographic
and Health Survey (DHS) (41). Male internet users reported taking an average 11 trips in the last 12 months, but non
internet users only 6 (for females this difference was 4 versus 2).1 Phone owners move more as well: male phone
owners took 9 trips compared to nonowners who report taking only 4 (for females, this difference was 3 compared to 2).
    We focus on early 2020, with data that brackets the first case of COVID in Uganda (March 21, 2020). A number of
strict measures were put in place with the goal of reducing transmission of COVID-19 and that had important effects
for mobility. These included suspension of public gatherings on March 18th, the closing of schools and a ban on travel
to countries labeled as high risk due to their case numbers. This was followed by a suspension of public transport and a
ban on international travel starting on March 25th, and then a lockdown and nationwide curfew from 7pm to 6:30am on
March 30th. The lockdown was extended past May 5th, but with some restrictions easing after this date, and while the
curfew was extended on May 18th, shops, public transport and schools started to reopen at that time in a limited way
(24).
    The COVID-19 pandemic provides an important example of how these type of mobility indicators generated from
mobile phones can be relevant and timely for policymakers in a crisis. Additionally, the context provides an opportunity
to study how different measures of mobile phone data portray mobility, in baseline conditions and in response to new
policies and shocks.

                                        Table 1: Demographics by Phone Owned in Uganda

                                                                             Type of Phone Owned
                                                                      (1)        (2)       (3)    (4)
                                                                     Basic     Feature Smart     None

                                Percent of Individuals              34.7%        5.9%         8.0%       51.4%
                                Percent of Phone Owners             71.3%       12.2%        16.5%          -
                                Years of Education                    7.6        10.1         13.1         5.3
                                Has Electricity                       0.6         0.8          0.9         0.5
                                Number of Major Assets               0.9          1.1          2.1        0.5
                                Log HH Income                        11.8        12.1         12.8        10.9
                                HH Size                               4.9         5.6          4.7         5.2
                                Observations                         685          131          247        801

Notes: Data come from the After Access Africa 2018 survey conducted by Research ICT Africa (RIA). Nationally
representative individual weights were applied to produce mean values for characteristics. Number of assets was
calculated by summing how many of the following assets were owned by the household: landline, refrigerator, radio,
TV, car, motorcycle.



3      Methods
3.1         Theory: A sampling problem
   Each individual i within the population of interest N has true mobility given by their full sequence of locations
over time, that is,
                                                   Li = (lit )t ∈T
for each location lit visited at every moment in time t ∈ T .
    Any digital device of type d captures only a sample of this mobility. Measured mobility may differ from the
population in two ways:

    1. Device d records only individuals Nd ⊆ N who have adopted that device. If adopters have different mobility
       patterns, they may not be representative of the population.
    2. Device d captures locations only at particular times td ⊆ T . For example, smartphones may record a location
       every few minutes when the GPS is on, or CDR records the tower used when a call is placed. If sampling times
       td are correlated with location, mobility measures may be biased.2 The frequency of sampling t can also affect
       measures of mobility. If an identical movement pattern is tracked with different devices, with d sampled less
       frequently than d (td ⊂ td ), d may appear to have less mobility because more location observations are missing.
     In our setting, adoption of smartphones is far lower than of any mobile phone, so that Nsmart phone ⊂ Nanymobilephone ⊂
N . Our aim is to assess the first potential bias for smartphone owners, relative to any mobile phone, while holding fixed
the second type of bias. To do this, we attempt to comparably measure the mobility of smartphone and non-smartphone
users within data that includes both.
     This approach will not uncover the bias resulting from omitting people who do not have mobile phones at all. This
is in contrast to (10), who instead take a particular time t corresponding to the U.S. election and compare smartphone
mobility estimates to ground truth poll statistics. Such ground truth data is rare in low income countries.
    1 Individuallevel, nationally representative weights are used.
    2 Forexample, if smartphone users keep location sensing on when traveling, but turn it off in their neighborhood to conserve battery, they will
appear to be away from home more than they actually are. Or, if a user places calls only while at home, they will appear in CDR data to remain
stationary at home, regardless of their actual travel.




                                                                        3
(a) Proportion of adults with mobile phones (b) Proportion of phones that use data
(DHS 2016)                                  (CDR)                                                        (c) Population density (Census 2014)

                                                   Figure 2: Geographic Representation

Notes: Panel a uses regionally representative weights. In panel b, proportions were calculated by dividing number of
data users by the total number of users, for a given home location in February based on data from the main mobile
phone provider that are used in this paper.



3.2         Data
    We work with the largest mobile network operator (MNO) in Uganda, which supported COVID-19 efforts by
allowing access to anonymized, aggregated data to understand mobility and the epidemiology of the disease. As a side
effect of operation, MNOs store Call Detail Records (CDR) for billing purposes, which contain a record for each call
and internet data use for each account, including a timestamp and the location of the closest cell phone tower.3 When a
user makes calls or uses internet data in different locations, these records reveal that the person has connected with
different towers, and thus that the user has moved. We use voice call and data observations for February and April 2020.
    This mobility data from operators differs in several respects from commonly used smartphone mobility data (13; 32).
First, crucially, it captures the mobility of both smartphones and basic/feature phones. This allows us to compute the
mobility of people with smartphones, who would appear in these common datasets, and the mobility of people with
basic/feature phones who are omitted from those datasets. Second, it tends to be less precise, since towers can be spaced
out far apart, especially in rural areas. Third, it collects location data only under active use (when a call or data packet is
sent), while smartphone mobility data may be collected more passively. (In this context it is fairly common for users to
turn off GPS to conserve battery, so smartphone mobility data may be more actively selected than in other contexts.)
A fourth difference is that smartphone mobility data is typically reported only for people who have particular apps
installed, regardless of their operator (though these apps tend to be common). Our data does not restrict based on apps,
but it is only for a single operator which has a large share of the market and has a similar fraction of smartphones as the
national average.

3.3         Methods
   We identify smartphones based on use of mobile data. We define a user as a data user (likely smartphone) if they
have at least one internet data transaction per day on average, in at least one of the three months: February, March or
April; and a non-data user otherwise.4 In February, there were a total of 11,818,038 unique subscribers with at least
one observation. Of these, 4,299,886 were defined as data users (36% of mobile phones).5 We define a trip by two
consecutive calls from the same user using mobile phone towers located in different counties.


4      Analysis
4.1         Geographic Representation
    Mobile phone ownership is moderate across Uganda, as shown in Figure 2 Panel a. 86% of adults in the capital of
Kampala own mobile phones, and ownership remains high in nearby regions, as reported by the DHS (41). However,
ownership is lower in the periphery, with only 25% of adults owning a phone in Karamoja in the northeast.
    Few Ugandans own smartphones, and those that do are predominantly in urban areas. We present the proportion of
mobile phones that are smartphones in our CDR data in panel b of Figure 2.6 In some counties, only 16% of phone
subscribers use data; while in others, they are as high as 62%. Comparing the map in panel b of the smartphone
proportion in our data with a map of population density in Uganda (panel c) shows that the areas with the highest
percentage of data users are the urban, denser areas and the greater Kampala area. These spatial differences suggest
that smartphone data is likely to underweight rural areas (see Figure 7 in the Appendix for a comparison of population
density versus proportion of data users by county).
    3 The data are de-identified, with account numbers replaced by a random ID that can be followed over time.
    4 We allow a user to qualify in any of the months, as COVID may have affected data usage.
    5 Note that over 6,546,047 million users have at least one data observation during the sample period, but having just one observation is unlikely to

be indicative of owning a smartphone. There is a large group of users that have only one or two data observations (potentially they may have a feature
phone that allows some limited data use, but is unlikely to have mobile apps which collect smartphone location data).
    6 These proportions were calculated by dividing number of data users by the total number of users, for a given home location in February. Home

location is the mode of the location of the last observation of each day that month (33; 25).




                                                                           4
4.2     Mobility
    We compare the mobility of data users (which would be observed in smartphone mobility data) with non-data users
(which would not), to assess whether smartphone mobility is representative of the mobility of all phone owners. We
first consider baseline mobility in February, to analyze differences during typical conditions, and then the change during
a crisis when multiple policies affected mobility.




      Figure 3: Cumulative Distribution Function of Average Daily Transactions Before and After Downsampling

Notes: Number of users with each daily frequency of voice calls were averaged across the days of the month. For the
downsampled figures, the number of voice calls per user were randomly reduced to align the number of daily calls for
data users and non-data users and to produce the same daily average number of calls per user of around 5.4.



4.2.1    Sampling procedure
    In order to compare mobility across user types, it is necessary to address the differential use bias: locations are
captured only at specific times; in our data, only when transactions are placed. Since we aim to compare data users and
non-data users, we aim to equalize bias, but do not claim to remove all bias, by ensuring that the timing and frequency
of location observations is comparable across the two groups. In our data for February, we find that each data user has
their location observed 29 times per day on average, but each non-data user is observed only 6. This imbalance would
inherently lead to differences in the measurement of mobility between the two groups: we would mechanically see
more movement among data users as a function of the higher number of observations.
    We explore one approach to correct for differences in active usage between smartphone and other phone users.
We subsample location observations, computing mobility using only observations that are deemed more comparable:
˜
tdatauser ⊆ tdatauser and ˜
                          tnondatauser ⊆ tnondatauser . The sampling of locations could differ between data users and non-data
users in many subtle ways. The success of this approach will depend on the interaction of usage and mobility, which
more work is needed to explore. We demonstrate a proof of concept here, which has two steps:
    First, we restrict consideration to locations observed during voice calls, which are less likely to be differentially
used between the two types of handsets. We omit data transactions, which are used primarily by smartphones, and SMS,
because in instances where a person would send a text, smartphone users may substitute to WhatsApp or other chat
apps.7 This step helps to mitigate a large part of the different phone usage: in February data users have 8 calls per day
on average, while non-data users have 6.
    Second, we downsample to account for different temporal resolution between groups, since even after restricting
observations to voice calls, data users have more transactions (see Figure 3 for the cumulative distribution function).
The downsampling is conducted so as to match the daily distribution of voice call frequency for data users and non-data
users. With Nd data users and Nn non-data users, the larger non-data user population is broken up into 2000 bins in
order of their daily number of transactions and the number of transactions is averaged per bin. The data user sample is
similarly grouped into 2000 bins based on daily call frequency. For each data user, we randomly draw as many calls
from their set of calls as the average number of calls in the corresponding bin of non-data users.
    We base the daily transaction distribution on a day that is the 10th percentile of daily calls per subscriber for
February and April.8 We apply this distribution for non-data users to downsample both data users and non-data users
for each day in February and April. This helps to mitigate any differences in phone usage both across data and non-data
users as well as across time.
    Prior to downsampling, data and non-data users on average have 7.6 and 6.2 daily observations in February and 6.8
and 5.6 daily observations respectively in April. After the downsampling, both data users and non-data users have 5.4
calls per day on average in February and April and the distributions across months and users are the same (Figure 3).
    Downsampling reduces the number of trips we infer for data users from 0.83 to 0.67 daily on average in February,
lower than the average of 0.73 trips for non-data users (down from 0.80). This difference could arise from an actual
difference in mobility, or if data users’ calls do not have the same distribution with respect to travel that nondata users
   7 After we limit to voice observations, the number of subscribers reduces as there are some users that only have data observations. 11,439,718
users remain in total, and 4,074,677 data users remain.
   8 We did not choose the lowest day out of all the days in February and April for calls per subscriber because the lowest day would likely be an

outlier and therefore the distribution of calls might be atypical.


                                                                        5
do. Uniform random downsampling would not resolve the temporal sampling problem if, for example, data users are
less likely to call businesses for information while traveling. We further break down movement into close and far,
looking at trips to neighboring counties versus non-neighboring counties. We find that when focusing on trips further
away, both the downsampled dataset and the dataset with all voice observations show that data users have more trips per
person (between 12.7% for downsampled and 22% for the full dataset more trips compared to non-data users). For
closer trips to neighboring counties, the pattern when downsampling the data is different from the non-downsampled
data with trips per person in February for data users being above non-data users for the full dataset and below non-data
users for the downsampled dataset (see Appendix Figure 11).
    We compare the probability of traveling from an origin location to a destination location across data and non-data
users in a baseline time period (February). We then compare their responses to the sudden implementation of policies
that limited mobility due to the COVID-19 pandemic.

4.2.2     Baseline mobility
    We visualize baseline movement between counties using an origin/destination (OD) matrix, for February 2020
(Figure 4). The y axis represents origins, and the x axis destinations, ordered by county ID and clustered by the four
main regions in the country. Each cell is colored by an intensity corresponding to the fraction of trips from that origin to
that destination, out of all trips made by that group in the country. The dimensions of each cell are scaled by county
population. Panel a shows the OD matrix for data users, while Panel b shows it for non-data users. Because this measure
compares the relative amounts of mobility within each group, this will minimize exposure to any remaining transaction
frequency bias.
    The mobility patterns of data and non-data users share many features. Movement is highly clustered within region,
as we see the four regional squares along the diagonal are a darker color. Kampala, in the Central Region, is a mobility
hub and the widest cell (given it has the highest population). It is the only row and column that are almost entirely
darker, representing links with most other counties.
    However, data users’ trips are much more concentrated in the central region. Panel c shows the percentage point
difference between the two matrices.9 Trips to Kampala account for only 4% of non-data users’ trips but 10% of data
users’ trips. Data users are relatively less likely to travel within the eastern region, as shown by the higher proportion of
negative values (this region had lower smartphone adoption in Figure 2). More generally, they have relatively less travel
between all other regions since so much of their travel is concentrated in the central region.




                     (a) Data users’ mobility                                                  (b) Non-data users’ mobility




                                             (c) Regional difference: data users - non-data users

                                       Figure 4: Origin/Destination Matrices for February 2020
NOTES: Number of trips as proportion of total trips made by group during the month. Cell dimensions are scaled by county population. In panel c
the population weighted county differences are summed at the regional level


   9 We   compute the population weighted difference over all counties within each region.




                                                                         6
4.2.3    Change in mobility in response to COVID-19
    There is often interest in how mobility changes in response to new policies (15; 21; 30). During the COVID
pandemic, there was substantial interest in both monitoring changes from baseline, and understanding how policies
aimed at mitigating spread of the disease reduced mobility and curbed disease transmission. We compute how mobility
changed from baseline before COVID-19 disrupted Uganda (February 2020) to a month under lockdown (April 2020).
    Data users decrease mobility 64% more than non-data users (downsampled measure) or 40% (using measure based
on all voice calls). Non-data users decrease their daily trips by 14% on average, but data users decrease their trips by
23%, in the downsampled mobility measures.10 If we instead measure mobility using all voice calls, non-data users
decrease their trips by 19%, and data users by 27%.11 See Appendix Figure 9 for the daily values for trips per subscriber
as a percent of the baseline.
    We compare the percent decrease in number of trips per person, at the county level, for data users (y axis) versus
non-data users (x axis) (Figure 5). If movement decreases equally for data users and non-data users, then the circles on
the figure would fall along the red 45 degree line. Instead, data users deviate from the line at the top right. The same
conclusion holds when we calculate and plot the values using all voice observations without downsampling (Appendix
Figure 8). Given we saw in Table 1 that smartphone owners in Uganda are on average higher income, this aligns with
research from developed country settings that has found larger decreases in mobility among higher income populations
(42; 17; 21). This suggests that mobility as measured by smartphones may have subtle but systematic biases in low
income populations.
    We break down changes in mobility further in the OD matrices in Figure 6. Generally movement has decreased
between most county pairs for both data users and non-data users. It is important to note that there are a few county
pairs where we see increases in mobility. When we compare the decrease in mobility for data users and non-data users,
panel c shows that the largest difference is in movement to and from the Central region. The decreases in movement to
and from this region, as well as between counties in this region, is much larger. The ability to break down mobility
into such a fine level and analyze the heterogeneity is one of the important benefits of working with mobile phone or
smartphone data like this.


5       Discussion
    Digital technologies provide opportunities to better understand populations about which little data has been gathered.
However, how an individual is represented in these data depends on whether they have adopted, and how they use these
technologies. In particular, there is concern that the poorest segments of the population may be omitted (5).
  10 Decreases are calculated by comparing daily trips per user for each day in April to the average value for the corresponding day of the week in

February. We then calculate the average value across days in April.
  11 This aligns with the fact that voice observations per subscriber per day decline from February to April; therefore, it is possible that some of

the decrease measured when looking at all observations arises from true decrease in movement and some arises mechanically from a decrease in
observations. The downsampling procedure aims to correct for this by equalizing the number of observations per subscriber in February and April,
but it may be overcompensating if user mobility behavior is correlated with phone usage behavior.




                                                                         7
     Figure 5: Percent Change in Number of Trips per Person for Data Users and for Non-Data Users by County

Notes: The daily trips per person indicator is calculated by taking an average of the total number of subscribers entering
a county on a given day divided by the number of subscribers whose home location is that county that month.


    In these cases, ideally one would be able to compare digital measures to an authoritative ground truth (10). However,
in many developing countries, such ground truth data is not available. This paper shows how one digital source of data
can be used to better understand what is measured by another.
    We use data on an operator that serves all types of phones in Uganda to better understand what can be captured in
smartphone mobility data. We find that smart phone adoption is concentrated in urban areas. We find that data that is
sampled actively, when a technology is being used, can lead to biases. We demonstrate one simple possible adjustment
that involves downsampling, but more work needs to be done to better understand the origin of biases in different types
of data. We are optimistic that this process can lead to both a better understanding of biases in new forms of data, and
feasible corrections that allow them to more inclusively measure behavior.
    We find a number of takeaways, regardless of whether the data is downsampled or not. Data users travel to
non-neighboring counties more on average during the baseline period, and decreased movement much more in response
to COVID-19 lockdowns. Given that those users made more long distance trips on average at baseline, and react the
most to the restrictive measures, this potentially means important decreases in the likelihood of spread of the infection
within a country. At the same time, the results point to the need for recognizing the limitations of data coming solely
from wealthier users. When using data from digital devices, policymakers should consider potential omissions of
low-income people.


Acknowledments
    This research would not have been possible without the support of MTN Uganda, who provided the data as well
as their time and technical resources for processing the data in order to support research that could help with the
ongoing COVID-19 pandemic. We are grateful to Aidan Coville for valuable comments and Ruiwan Zhang for research
assistance. The research has been funded with UK aid from the UK government through the ieConnect for Impact
program; with support from the Trust Fund for Statistical Capacity Building III (TFSCB-III), which is funded by the
United Kingdom’s Foreign, Commonwealth & Development Office, the Department of Foreign Affairs and Trade
of Ireland, and the Governments of Canada and Korea; as well as support from the Research Support Budget in the
Development Economics Vice-Presidency.
    The findings, interpretations and conclusions expressed in this paper do not necessarily reflect the views of the
World Bank, the Executive Directors of the World Bank or the governments whom they represent. The World Bank
does not guarantee the accuracy of the data included in this work.


References
 [1] Hamada S Badr, Hongru Du, Maximilian Marshall, Ensheng Dong, Marietta M Squire, and Lauren M Gardner.
     2020. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling
     study. The Lancet Infectious Diseases 20, 11 (2020), 1247–1254.


                                                            8
                (a) Data users’ mobility                                                 (b) Non-data users’ mobility




                                           (c) Difference: data users - non-data users

                Figure 6: Origin/Destination Matrices: Percent Change from February to April 2020

Notes: Change in mobility is calculated as the difference in average daily trips in April versus in February.




                                                               9
 [2] Duygu Balcan, Vittoria Colizza, Bruno Goncalves, Hao Hu, José J Ramasco, and Alessandro Vespignani. 2009.
     Multiscale Mobility Networks and the Spatial Spreading of Infectious Diseases. Proceedings of the National
     Academy of Sciences 106, 51 (2009), 21484–21489.
 [3] Linus Bengtsson, Jean Gaudart, Xin Lu, Sandra Moore, Erik Wetter, Kankoe Sallah, Stanislas Rebaudet, and
     Renaud Piarroux. 2015. Using mobile phone data to predict the spatial spread of cholera. Scientific reports 5
     (2015), 8923.
 [4] Linus Bengtsson, Xin Lu, Anna Thorson, Richard Garfield, and Johan Von Schreeb. 2011. Improved response to
     disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake
     geospatial study in Haiti. PLoS medicine 8, 8 (2011).
 [5] Joshua Blumenstock. 2018. Don’t forget people in the use of big data for development.

 [6] Joshua E Blumenstock. 2012. Inferring patterns of internal migration from mobile phone call records: evidence
     from Rwanda. Information Technology for Development 18, 2 (2012), 107–125.
 [7] Isaac I Bogoch, Oliver J Brady, MU Kraemer, Matthew German, Marisa I Creatore, Manisha A Kulkarni, John S
     Brownstein, Sumiko R Mekaru, Simon I Hay, Emily Groot, et al. 2016. Anticipating the International Spread of
     Zika Virus from Brazil. Lancet 387, 10016 (2016), 335–336.
 [8] Leah Platt Boustan, Matthew E Kahn, and Paul W Rhode. 2012. Moving to higher ground: Migration response to
     natural disasters in the early twentieth century. American Economic Review 102, 3 (2012), 238–44.
 [9] Serina Chang, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, and Jure Leskovec.
     2021. Mobility network models of COVID-19 explain inequities and inform reopening. Nature 589, 7840 (2021),
     82–87.
[10] Amanda Coston, Neel Guha, Derek Ouyang, Lisa Lu, Alexandra Chouldechova, and Daniel E Ho. 2021. Leverag-
     ing Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy.
     In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 173–184.
[11] Corentin Cot, Giacomo Cacciapaglia, and Francesco Sannino. 2021. Mining Google and Apple mobility data:
     temporal anatomy for COVID-19 social distancing. Scientific reports 11, 1 (2021), 1–8.
[12] Song Gao, Jinmeng Rao, Yuhao Kang, Yunlei Liang, Jake Kruse, Dorte Dopfer, Ajay K Sethi, Juan Francisco Man-
     dujano Reyes, Brian S Yandell, and Jonathan A Patz. 2020. Association of mobile phone location data indications
     of travel and stay-at-home mandates with covid-19 infection rates in the us. JAMA network open 3, 9 (2020),
     e2020485–e2020485.

[13] Kyra H Grantz, Hannah R Meredith, Derek AT Cummings, C Jessica E Metcalf, Bryan T Grenfell, John R Giles,
     Shruti Mehta, Sunil Solomon, Alain Labrique, Nishant Kishore, et al. 2020. The use of mobile phone data to
     inform analysis of COVID-19 pandemic epidemiology. Nature communications 11, 1 (2020), 1–8.
[14] Clark L Gray and Valerie Mueller. 2012. Natural disasters and population mobility in Bangladesh. Proceedings of
     the National Academy of Sciences 109, 16 (2012), 6000–6005.
[15] Rema Hanna, Gabriel Kreindler, and Benjamin A Olken. 2017. Citywide effects of high-occupancy vehicle
     restrictions: Evidence from “three-in-one” in Jakarta. Science 357, 6346 (2017), 89–93.
[16] Felana Angella Ihantamalala, Vincent Herbreteau, Feno MJ Rakotoarimanana, Jean Marius Rakotondramanga,
     Simon Cauchemez, Bienvenue Rahoilijaona, Gwenaëlle Pennober, Caroline O Buckee, Christophe Rogier,
     Charlotte Jessica Eland Metcalf, et al. 2018. Estimating sources and sinks of malaria parasites in Madagascar.
     Nature communications 9, 1 (2018), 3897.
[17] Jonathan Jay, Jacob Bor, Elaine O Nsoesie, Sarah K Lipson, David K Jones, Sandro Galea, and Julia Raifman.
     2020. Neighbourhood income and physical distancing during the COVID-19 pandemic in the United States.
     Nature human behaviour 4, 12 (2020), 1294–1302.

[18] Moritz UG Kraemer, Chia-Hung Yang, Bernardo Gutierrez, Chieh-Hsi Wu, Brennan Klein, David M Pigott, Louis
     Du Plessis, Nuno R Faria, Ruoran Li, William P Hanage, et al. 2020. The effect of human mobility and control
     measures on the COVID-19 epidemic in China. Science 368, 6490 (2020), 493–497.
[19] Gabriel E Kreindler and Yuhei Miyauchi. 2021. Measuring commuting and economic activity inside cities with
     cell phone records. Technical Report. National Bureau of Economic Research.

[20] Shengjie Lai, Nick W Ruktanonchai, Liangcai Zhou, Olivia Prosper, Wei Luo, Jessica R Floyd, Amy Wesolowski,
     Mauricio Santillana, Chi Zhang, Xiangjun Du, et al. 2020. Effect of non-pharmaceutical interventions to contain
     COVID-19 in China. Nature 585, 7825 (2020), 410–413.
[21] Minha Lee, Jun Zhao, Qianqian Sun, Yixuan Pan, Weiyi Zhou, Chenfeng Xiong, and Lei Zhang. 2020. Human
     mobility trends during the early stage of the COVID-19 pandemic in the United States. PLoS One 15, 11 (2020),
     e0241468.
[22] Xin Lu, Linus Bengtsson, and Petter Holme. 2012. Predictability of population displacement after the 2010 Haiti
     earthquake. Proceedings of the National Academy of Sciences 109, 29 (2012), 11576–11581.
[23] Glenn Lyons and John Urry. 2005. Travel time use in the information age. Transportation Research Part A: Policy
     and Practice 39, 2-3 (2005), 257–276.


                                                        10
[24] Federica Margini, Anooj Pattnaik, Tapley Jordanwood, Angellah Nakyanzi, and Sarah Byakika. 2020. Case Study:
     The Initial COVID-19 Response in Uganda. Technical Report. ThinkWell and Ministry of Health Uganda.
[25] Sveta Milusheva. 2020. Managing the spread of disease with mobile phone data. Journal of Development
     Economics 147 (2020), 102559.

[26] Sveta Milusheva, Anat Lewin, Tania Begazo Gomez, Dunstan Matekenya, and Kyla Reid. 2021. Challenges and
     Opportunities in Accessing Mobile Phone Data for COVID-19 Response in Developing Countries. Data & Policy
     Forthcoming (2021).
[27] Pierre Nouvellet, Sangeeta Bhatia, Anne Cori, Kylie EC Ainslie, Marc Baguelin, Samir Bhatt, Adhiratha
     Boonyasiri, Nicholas F Brazeau, Lorenzo Cattarino, Laura V Cooper, et al. 2021. Reduction in mobility and
     COVID-19 transmission. Nature communications 12, 1 (2021), 1–9.

[28] Emily Oster. 2012. Routes of Infection: Exports and HIV Incidence in Sub-Saharan Africa. Journal of the
     European Economic Association 10, 5 (2012), 1025–1058.
[29] Pedro S Peixoto, Diego Marcondes, Cláudia Peixoto, and Sérgio M Oliva. 2020. Modeling future spread of
     infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil. PloS one
     15, 7 (2020), e0235732.
[30] Emanuele Pepe, Paolo Bajardi, Laetitia Gauvin, Filippo Privitera, Brennan Lake, Ciro Cattuto, and Michele
     Tizzoni. 2020. COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national
     lockdown. Scientific data 7, 1 (2020), 1–7.
[31] R Mansell Prothero. 1977. Disease and Mobility: a Neglected Factor in Epidemiology. International Journal of
     Epidemiology 6, 3 (1977), 259–267.
[32] Gyan Ranjan, Hui Zang, Zhi-Li Zhang, and Jean Bolot. 2012. Are call detail records biased for sampling human
     mobility? ACM SIGMOBILE Mobile Computing and Communications Review 16, 3 (2012), 33–44.
[33] Nick W Ruktanonchai, Patrick DeLeenheer, Andrew J Tatem, Victor A Alegana, T Trevor Caughlin, Elisabeth
     zu Erbach-Schoenberg, Christopher Lourenço, Corrine W Ruktanonchai, and David L Smith. 2016. Identifying
     Malaria Transmission Foci for Elimination Using Human Mobility Data. PLoS Computational Biology 12, 4
     (2016), e1004846.
[34] Nick Warren Ruktanonchai, Corrine Warren Ruktanonchai, Jessica Rhona Floyd, and Andrew J Tatem. 2018.
     Using Google Location History data to quantify fine-scale human mobility. International journal of health
     geographics 17, 1 (2018), 1–13.

[35] Andreas Schafer and David G Victor. 2000. The future mobility of the world population. Transportation Research
     Part A: Policy and Practice 34, 3 (2000), 171–205.
[36] Xuan Song, Quanshi Zhang, Yoshihide Sekimoto, Ryosuke Shibasaki, Nicholas Jing Yuan, and Xing Xie. 2016.
     Prediction and simulation of human mobility following natural disasters. ACM Transactions on Intelligent Systems
     and Technology (TIST) 8, 2 (2016), 1–23.
[37] David Stuckler, Sanjay Basu, Martin McKee, and Mark Lurie. 2011. Mining and Risk of Tuberculosis in
     Sub-Saharan Africa. American journal of public health 101, 3 (2011), 524–530.
[38] Clarence C Tam, Mishal S Khan, and Helena Legido-Quigley. 2016. Where Economics and Epidemics Collide:
     Migrant Workers and Emerging Infections. The Lancet 388, 10052 (2016), 1374–76.

[39] Andrew J Tatem and David L Smith. 2010. International population movements and regional Plasmodium
     falciparum malaria elimination strategies. Proceedings of the National Academy of Sciences 107, 27 (2010),
     12222–12227.
[40] ITU World Telecommunications. 2021.               ICT Indicators database.          https://www.itu.int/en/ITU-
     D/Statistics/Pages/stat/default.aspx (2021).

[41] Uganda Bureau of Statistics (UBOS) and ICF. 2018. Uganda Demographic and Health Survey 2016. Technical
     Report. BOS and ICF.
[42] Joakim A Weill, Matthieu Stigler, Olivier Deschenes, and Michael R Springborn. 2020. Social distancing responses
     to COVID-19 emergency declarations strongly differentiated by income. Proceedings of the National Academy of
     Sciences 117, 33 (2020), 19658–19660.
[43] Amy Wesolowski, Nathan Eagle, Abdisalan M Noor, Robert W Snow, and Caroline O Buckee. 2013. The impact
     of biases in mobile phone ownership on estimates of human mobility. Journal of the Royal Society Interface 10,
     81 (2013), 20120986.
[44] Amy Wesolowski, Nathan Eagle, Andrew J Tatem, David L Smith, Abdisalan M Noor, Robert W Snow, and
     Caroline O Buckee. 2012. Quantifying the Impact of Human Mobility on Malaria. Science 338, 6104 (2012),
     267–270.
[45] Amy Wesolowski, CJE Metcalf, Nathan Eagle, Janeth Kombich, Bryan T Grenfell, Ottar N Bjørnstad, Justin
     Lessler, Andrew J Tatem, and Caroline O Buckee. 2015. Quantifying Seasonal Population Fluxes Driving Rubella
     Transmission Dynamics Using Mobile Phone Data. Proceedings of the National Academy of Sciences 112, 35
     (2015), 11114–11119.


                                                         11
[46] Amy Wesolowski, Taimur Qureshi, Maciej F Boni, Pål Roe Sundsøy, Michael A Johansson, Syed Basit Rasheed,
     Kenth Engø-Monsen, and Caroline O Buckee. 2015. Impact of Human Mobility on the Emergence of Dengue
     Epidemics in Pakistan. Proceedings of the National Academy of Sciences 112, 38 (2015), 11887–11892.
[47] Zengbin Zhang, Lin Zhou, Xiaohan Zhao, Gang Wang, Yu Su, Miriam Metzger, Haitao Zheng, and Ben Y Zhao.
     2013. On the validity of geosocial mobility traces. In Proceedings of the Twelfth ACM Workshop on Hot Topics in
     Networks. 1–7.




                                                        12
6     Appendices

                          Table 2: Demographics by Phone Owned in sub-Saharan Africa

    Kenya                    (1)              (2)                  (3)             (4)
                          No Phone        Basic Phone        Feature Phone      Smartphone
    Years of Educ            7.3              9.7                 11.4            14.3
    Has Electricity          0.5              0.6                  0.6             0.9
    Log HH Income            8.0              8.9                  9.0             9.8
    Number of Assets         0.9              1.1                  1.3             2.3
    Observations            134               512                 153              409
    Mozambique
    Years of Educ            3.6              5.7                 8.3               10.7
    Has Electricity          0.4              0.6                 0.7                1.0
    Log HH Income            6.8              7.5                 8.1                8.6
    Number of Assets         0.7              1.1                 2.5                2.7
    Observations             504              427                 54                186
    Ghana
    Years of Educ            4.5              7.1                7.0                12.3
    Has Electricity          0.6              0.9                0.8                 1.0
    Log HH Income            4.5              5.5                5.9                 6.0
    Number of Assets         1.3              1.8                2.2                 2.5
    Observations             266              502                123                309
    Nigeria
    Years of Educ            4.1              8.6                10.4               13.5
    Has Electricity          0.5              0.9                 0.8                1.0
    Log HH Income            8.9              9.7                10.0                9.8
    Number of Assets         1.1              2.0                 2.3                2.6
    Observations             628              425                456                299
    Rwanda
    Years of Educ            4.1               6.2                7.2               13.0
    Has Electricity          0.2               0.5                0.4                1.0
    Log HH Income            9.6              10.2               10.6               11.8
    Number of Assets         0.4               1.0                1.0                2.1
    Observations             551              387                144                129
    South Africa
    Years of Educ            7.2              8.6                10.9               12.1
    Has Electricity          0.9              0.9                 1.0                1.0
    Log HH Income            7.1              7.3                 7.1                7.8
    Number of Assets         2.2              2.5                 2.7                3.2
    Observations             263              623                133                796
    Tanzania
    Years of Educ             5.5              7.2                8.5               11.9
    Has Electricity           0.3              0.5                0.6                0.9
    Log HH Income            10.5             11.3               11.9               12.4
    Number of Assets          0.7              1.1                1.2                2.4
    Observations             402              468                 77                244
    Senegal
    Years of Educ             2.3              3.3                9.3               10.5
    Has Electricity           0.7              0.9                0.9                1.0
    Log HH Income            10.3             10.8               10.5               11.2
    Number of Assets          1.6              1.8                2.4                2.9
    Observations             233              564                112                324


Notes: Data come from the After Access Africa 2018 survey conducted by Research ICT Africa (RIA). Nationally
representative individual weights were applied to produce mean values for characteristics. Number of assets was
calculated by summing how many of the following assets were owned by the household: landline, refrigerator, radio,
TV, car, motorcycle.




                                                        13
            Figure 7: Population Density versus Proportion of Phones that use Data at the County Level

Notes: Values are calculated for 202 counties. Population data to calculate density come from the 2014 Uganda Census.
Proportions were calculated by dividing number of data users by the total number of users, for a given home location in
February based on the data from the main mobile phone provider that are used in this paper.




Figure 8: Percent Decrease in Number of Trips per Person for Data Users and for Non-Data Users by County, Using All
Voice Observations

Notes: All voice observations are used for data users and non-data users without downsampling. The daily trips per
person indicator is calculated by taking total number of subscribers entering a county on a given day and dividing by
the number of subscribers whose home location is that county that month.




                                                          14
                 Figure 9: Daily Trips Per Person as a Percent of the Baseline Daily Trips Per Person

Notes: The baseline is defined for each day of the week by averaging days in February.




         Figure 10: Average Daily Moves Per Subscriber versus Average Daily Observations Per Subscriber

Notes: Observations and trips per subscriber are calculated at the national level. The values are calculated with all voice
observations. The arrows start at the centroids of the February clusters and point to the centroids of the April clusters.




                                                            15
      Figure 11: Average Daily Trips Per Day Per Subscriber for Neighboring and Non-Neighboring Counties

Notes: Neighboring county is defined as a county that shares any part of a border with the origin county.
Non-neighboring counties are all other counties.




                                                         16
(a) Data users’ mobility                                              (b) Non-data users’ mobility




                      (c) Regional difference: data users - non-data users

    Figure 12: Origin/Destination Matrices for February 2020 (No Downsamping)




                                              17
         (a) Data users’ mobility                                              (b) Non-data users’ mobility




                               (c) Regional difference: data users - non-data users

Figure 13: Origin/Destination Matrices: Percent Change from February to April 2020 (No Downsamping)




                                                       18