WPS4195
  USING THE GLOBAL POSITIONING SYSTEM IN HOUSEHOLD SURVEYS
                 FOR BETTER ECONOMICS AND BETTER POLICY*



                                 John Gibson, University of Waikato
                 David McKenzie, Development Research Group, World Bank




                                                  Abstract
Distance and location are important determinants of many choices that economists study.
While these variables can sometimes be obtained from secondary data, economists often
rely on information that is self-reported by respondents in surveys. These self-reports are
used especially for the distance from households or community centers to various
features such as roads, markets, schools, clinics and other public services. There is
growing evidence that self-reported distance is measured with error and that these errors
are correlated with outcomes of interest. In contrast to self-reports, the Global Positioning
System (GPS) can determine almost exact location (typically within 15 meters). The
falling cost of GPS receivers (typically below US$100) makes it increasingly feasible for
field surveys to use GPS as a better method of measuring location and distance. In this
paper we review four ways that GPS can lead to better economics and better policy: (i)
through constructing instrumental variables that can be used to understand the causal
impact of policies, (ii) by helping to understand policy externalities and spillovers, (iii)
through better understanding of the access to services, and (iv) by improving the
collection of household survey data. We also discuss several pitfalls and unresolved
problems with using GPS in household surveys.

JEL codes: C81, O12, R20
Keywords: Global Positioning System; Distance; Location; Survey Measurement;
Networks; Externalities.


World Bank Policy Research Working Paper 4195, April 2007

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the
exchange of ideas about development issues. An objective of the series is to get the findings out quickly,
even if the presentations are less than fully polished. The papers carry the names of the authors and should
be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely
those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors,
or the countries they represent. Policy Research Working Papers are available online at
http://econ.worldbank.org.



* We are grateful to Kathleen Beegle, Chris Bennett, Piet Buys, Geua Boe-Gibson, Alan de Brauw, Uwe
Deichmann, John Hoddinott, and Ben Olken for helpful comments and advice. Financial support from
Marsden Fund grant UOW0503 is gratefully acknowledged.

Introduction

         Distance and location are important determinants of many choices that economists study.

For example, in the von Thünen model, distance to market determines land owners decisions

about what crop is most profitable to produce. In studies of child labor market activity, distance

from urban areas is shown to be an important determinant of both schooling and work decisions

(Fafchamps and Wahba, 2006). In migration models, greater distance between origin and

destination implies larger migration costs and reduces migration flows (Borjas, 2004).

         While location and distance can sometimes be obtained from secondary data, economists

often rely on information that is self-reported by respondents in surveys. These self-reports are

especially used for the distance from households or community centers to various features such

as roads, markets, schools, clinics and other public services. There is growing evidence that self-

reported distances and areas are measured with error (Goldstein and Udry, 1999) and that these

errors are correlated with outcomes of interest (Escobal and Laszlo, 2005). In contrast to self-

reports, the Global Positioning System (GPS) can determine almost exact location (typically

within 15 meters).

         These GPS locations are determined from satellites (currently 30) with highly precise

atomic clocks that orbit about 20,000 kilometers above the surface of the earth and send out a

unique radio signal with a time-stamp. A GPS receiver uses the time delay between transmission

and reception to calculate the distance to each satellite, and calculates the latitude and longitude

of the current location by using triangulation. More precise calculations, including elevation, can

be made if four satellites are in view (El-Rabbany, 2006). Accuracy depends partly on whether

anything obscures the GPS receiver's view of the sky and the quality of the receiver used to

process the satellite signal. Consumer-grade GPS receivers are accurate to within 15 meters,
95 percent of the time,1 with further improvements in accuracy to about three meters achieved by

using differential GPS where information from a local reference station augments that from the
satellites.2

         Two principal factors have dramatically increased the feasibility and usefulness of
collecting GPS information in household surveys. First, on May 1st, 2000, the U.S. military


1http://www.garmin.com/support/faqs/faq.jsp?faq=582&webPage=Main%20web%20page
2Specifically, a `base station' GPS receiver is set up on a precisely known location and used to compare position
based on the satellite signals with this known location. The difference is then applied to other GPS receivers in the
area to correct their calculations of their unknown locations.


                                                     - 2 -

turned off selective availability (SA), which had introduced random errors of up to 100 meters in

the civilian signal. The removal of SA allowed more accurate measurement, increasing the range

of possible applications. Secondly, the cost of a basic GPS receiver has fallen to under US$100,
making it within the budget of most household surveys.3 Coverage and precision will improve

even further in the next few years with the launch of the European GALILEO system, expected
to be operational by 2008.4

        Surveys that are well-known to economists that have used GPS to geo-reference the

locations of community centers (and hence clusters of households, given the sample design)

include the Demographic and Health Surveys (DHS) (since 1997) and the Indonesia Family Life

Survey (IFLS). GPS has been used to provide locations for individual households (and

enterprises) in a few recent World Bank surveys including the Rural Investment Climate Surveys

(RICS) in Indonesia and Sri Lanka, and the Living Standard Measurement Surveys (LSMS) in

Albania and Tanzania. Nevertheless, the majority of household surveys collected in developing

countries still do not geo-reference communities or households, in part through lack of

information about the benefits of doing so.

        An alternative to using GPS is to use secondary data on locations. In developed countries

postal addresses are widely used for this purpose. For example, the United Kingdom has

2.1 million postcodes for 26 million addresses. Thus postcodes are a very accurate proxy for

household location since it is possible to get map grid references to the nearest 100 meters for
most postcodes.5 These very detailed location data have been used to examine the location

patterns of manufacturing by Duranton and Overman (2005). However very few developing

countries have detailed post codes so this method is typically infeasible. It is also the case that in

developing countries, face-to-face interviewing predominates, compared with telephone

interviewing in developed countries, so it is quite feasible for field teams to gather GPS data as a

part of their usual survey workload.

        Another source of secondary data is from remote sensing, which is the gathering of data

from a sensor mounted on either an aircraft or satellite. These data are especially used in studies

of land cover, such as (de)forestation (Deininger and Minten, 2002) and urban sprawl


3The Garmin eTrex GPS unit has been used in a number of household surveys. On February 21, 2007 it could be
purchased for $88 at Walmart.com and $93 at Amazon.com.
4See http://ec.europa.eu/dgs/energy_transport/galileo/index_en.htm for more details. [accessed March 12, 2007].
5http://www.xyzmaps.com/NewPostcode.htm


                                                   - 3 -

(Burchfield, Overman, Puga and Turner, 2006). The unit of analysis is the pixel or picture

element, which determines the size of the smallest landscape feature that can be distinguished

and mapped. Typical sizes are 30 meter × 30 meter or 1 kilometer × 1 kilometer grids. However,

these grid cells are not individual agents and do not make decisions. Hence, using data at this

level often involves an aggregation across decision makers. This aggregation may lead to an

`ecological fallacy' of drawing inferences about the behavior of individuals from analyses based

on grouped or area-level data (Freedman, 2004). Instead, under the general principle of matching

the spatial scale of the decision process and the scale at which measurement is carried out

(Anselin 2002) it may be better to survey individual decision-making agents and use GPS to then

link them to other spatial data.

        This linking of different layers of data takes place in a Geographic Information System

(GIS). While GIS can be seen simply as a tool for combining, manipulating and displaying

spatial information that may have been captured in a variety of ways, including GPS, a broader

view sees an emerging geographic information science (Goodchild, 1992). This science may

enable researchers to produce more measurements than just the distance between features and to

discover new relationships for geographically referenced information. Some of the literature that

we review here relies more heavily on GIS than on GPS but still serves as an example of the
types of analyses that could be facilitated by a greater use of GPS in household surveys.6

        In this paper we review four ways that GPS can help lead to better economics and better

policy: (i) through constructing instrumental variables that can be used to understand the causal

impact of policies, (ii) by helping to understand policy externalities and spillovers, (iii) through

better understanding of the access to services, and (iv) by improving the collection of household

survey data. We then discuss some pitfalls, unresolved problems, and ongoing research issues.


FOUR WAYS USING GPS CAN LEAD TO BETTER ECONOMICS AND BETTER

POLICY

1. GPS can be used to help identify causal impacts

        The majority of empirical work in development economics aims to identify the effect of a

particular variable of interest, X, on a particular outcome, Y. For example, Deininger and Minten



6Recent reviews of the use of GIS in economics that are based largely on developed countries are Overman (2006)
and Bateman et al. (2002).


                                                 - 4 -

(2002) wish to examine whether poverty is associated with higher or lower levels of

deforestation in southern Mexico. A standard concern is that there are other variables which are

correlated with X and which also affect Y. Failure to control for these variables then gives biased

results. One of the most basic uses of GPS is to allow researchers to better control for geographic

and locational characteristics in their regressions. Such characteristics are increasingly found to

be relevant to outcomes of interest for development economists and practitioners. For example,

Deininger and Minten obtain data from a GIS on soil quality, rainfall, elevation, slope and other

geographic features, and find that higher levels of poverty are statistically associated with

greater likelihoods of deforestation. However, when they re-estimate the model without using

the GIS data they find poverty to be associated with lower levels of deforestation. The problem

here is that the poor live on worse quality land which limits the benefits of deforestation, so

failing to control for land quality gives an opposite result.

        Propensity-score matching has become a popular tool for investigating policy impacts

(see Ravallion, 2006 for a recent review). The basic idea is to compare individuals subject to a

policy to similar individuals not subject to the policy. Typical variables used for matching are

household socioeconomic characteristics, and an often crude set of community-level variables.

Brady and Hui (2006) argue that GIS can be used to more explicitly include geography in

matching. They present three arguments for doing so:

    1) lots of individual data that we would like to match on is unmeasured, and so place can

        serve as a proxy for unmeasured individual characteristics;

    2) near-by places are more likely to share community characteristics, such as culture, trust,

        and government ability; and

    3) geographic matching can be visually persuasive, if you see sudden changes in outcomes

        across administrative borders when a program is in one community and not its neighbor.

Nevertheless, they acknowledge that in some cases places which are most comparable in terms

of cultural or socioeconomic characteristics may not be geographically close. Therefore it is

important that matching not only be done on geography. Although the U.S. labor literature has

emphasized the importance of comparing participants in training programs and non-participants

from the same local labor markets (see Heckman et al. 1997), the matching literature to date has

generally not explicitly included geographic proximity as a criteria when matching individuals in




                                             - 5 -

different communities. As more surveys include GPS coordinates, this will become increasingly

possible.

         The above two examples highlight the ability of GPS to help researchers to better control

for (potentially) observable characteristics. Several recent papers are also using GPS to create
instruments for use in instrumental variables estimation.7 The most standard application is to use

distance as an instrument. For example, Oster (2006) wishes to examine the response of sexual

behavior to HIV prevalence rates in Africa. The concern is that HIV prevalence is endogenous,

as places where people have a lot of risky sex are likely to have high rates of HIV prevalence.

Her solution is to use the GPS information contained in the Demographic and Health Surveys

(DHS) to calculate the distance of each cluster to the Democratic Republic of Congo (DRC),

where the HIV virus is thought to have originated. She argues that the spread of the virus should

be related to the distance from the DRC, but that after controlling for region, latitude, longitude,

and country-level measures of development, distance from the DRC should not otherwise affect

sexual behavior. As a further check, she uses data on pre-marital sex to show that distance from

the DRC was not correlated with sexual behavior before the spread of HIV occurred.

         A potential concern with using distance as an instrument is that distance to borders and

major cities is likely to also determine access to markets, schools, health facilities, and other

infrastructure (see section below), which in turn can have important impacts on economic

behavior. As with all instruments, it is therefore important to perform robustness checks, and to

control for potential threats to the exclusion restriction. However, when used appropriately,

distance can perform well as an instrument. McKenzie, Gibson and Stillman (2006) compare the

performance of different non-experimental estimators in estimating the income gains from

migrating from Tonga to New Zealand against the experimental estimate obtained using a visa

lottery. They find that using the GPS-measured distance from a household in Tonga to the

location of the New Zealand immigration office in Tonga where application forms must be

deposited is a strong predictor of the decision to apply to migrate. Furthermore, they verify

among a sample of migrants in New Zealand that there is no relationship between their wages


7Of course researchers can use geography to create instruments without using GPS, through painstaking map work.
Recent examples include Woodruff and Zenteno (2007) who use distance from the capital of the state an individual
was born in to the nearest station on the north/south railway lines as they existed in the early 1900s as an instrument
for migration in Mexico; and Hoxby (2000) who uses the number of streams in a metropolitian area in the U.S. as an
instrument for the number of school districts in examining the impact of school choice. GPS can make such
applications more accurate and less time-consuming.


                                                    - 6 -

earned in New Zealand and how far away they lived from the immigration office in Tonga. As a

result, they would expect this instrument to perform well, and show that it gets within two

percent of the income gain predicted by the experimental estimator.

         More innovative uses of GPS explicitly use geography as a source of exogenous variation

which can be used to identify the impact of the variable of interest. For example, Olken (2006)

uses GIS data on community locations and geography to look at the impact of television and

radio on social capital in Indonesian villages. To identify this impact, he exploits differences in

the over-the-air signal strength in different villages caused by mountains located between some

villages and the transmission towers. To mitigate the concern that distance and geography might

otherwise be correlated with social capital he controls for district fixed effects, distance and

travel time to major cities, and elevation and uses a physical model of radio transmission that

predicts how signal strength should theoretically vary with topography.

         Geo-referencing is also being used to help create instruments in more aggregate

applications. For example, Duflo and Pande (2007) study the impact of dams on agricultural

production and poverty in India. They use GIS data to calculate differences in the gradients of

rivers across districts and use this gradient as an instrument which predicts whether or not a dam

is constructed in that district. A U.S. example is provided by Rosenthal and Strange (2006) who

examine whether a greater density of workers in a given area leads to higher wages. Motivated

by the Manhattan skyline, they note that the height of buildings, and hence density of

employment, is determined in part by the underlying geology. They therefore use GIS

information on the type of underlying bedrock, and on seismic and landslide hazards as

instruments for the density of employment.

         Finally, many unobserved variables, such as climate and soil in agricultural settings, are
spatially correlated, leading to spatial autocorrelation in the error term of regression equations.8

Failure to account for this structure in the error terms will lead to incorrect standard errors being

used for inference, possibly lead one to conclude that a policy has a significant effect when it

does not, or vice versa. Distances between observations obtained through GPS can be used to

account for spatial autocorrelation in the error term of the regression equation. Case (1991) and

Conley (1999) provide procedures for doing this.


8See Anselin (2002) for an accessible review of spatial econometrics. Note also that in the agricultural example
given here, the omitted climate and soil variables are likely to be correlated with the regressors of interest, and so
one will wish to also include detailed spatial variables as controls in the regression.


                                                     - 7 -

2. Using GPS can help understand policy externalities and spillovers

         The spatial proximity of one household to another may be directly of interest, particularly

for understanding the interactions between actions taken by different households, the role of

social networks, and the potential spillovers from policies which treat some households and not

others.

         One example of interactions between households is the possibility that they learn from

one another's actions. Conley and Udry (2005) study learning in the context of the decision to

adopt pineapple in Ghana, and of how much fertilizer to apply to it. They note that the classic

identification problem here is that the fact that a farmer is more likely to adopt a new technology

soon after his neighbors have done so might just be a consequence of some unobserved variable

that is spatially correlated - such as soil types, pests or topographic features ­ rather than the

result of genuine learning. They therefore use GPS to define the geographic neighbors of a given

plot to be those within 1 kilometer of the center of the plot, and also collect data on who farmers

talk to (informational neighbors). Controlling for the deviation of a farmer's input from his

geographic neighbors, they can then identify learning through the impact of informational
neighbors' choices.9 Furthermore, they do find evidence of positive spatial correlation in

unobserved shocks to the productivity of fertilizer, highlighting the importance of controlling for

geographic effects when examining learning.

         Another example of using GPS to study learning from neighbors is provided by

McKenzie, Gibson and Stillman (2007) who study how negative employment experiences for

emigrants affect the expectations of would-be emigrants. These would-be emigrants were all

unsuccessful in a random ballot in Tonga that offers an opportunity for ballot winners who

obtain employment to move to New Zealand. When interviewed subsequently about their

employment (and income) expectations had they moved to New Zealand, the would-be

emigrants greatly understated employment rates and incomes compared with the actual outcomes

for the emigrants. One factor explaining this understatement is that many ballot winners who

moved found that their initial job opening in New Zealand was no longer available, and news of

this negative outcome appears to flow back to the would-be emigrants in Tonga. Specifically, if

all ballot winning emigrants within a six kilometer circle (based on the GPS measurements) did



9They also allow for the error term to be spatially correlated across plots as a general function of their physical
distance, using the spatial GMM estimator of Conley (1999).


                                                    - 8 -

not take up their initial job in New Zealand, the employment expectations of the ballot losers

were lower by 19.6 percentage points.

        The standard approach to evaluating the impact of a policy is to compare outcomes for

those subject to the policy to outcomes for a comparable group not subject to that policy.

However, as Miguel and Kremer (2004) point out, this can give misleading estimates of the

effect of a policy when there are externalities. They investigate the impact of a deworming

treatment in schools in Kenya which was randomized across schools. Using GPS distances at the

level of the school, they control for the number of primary school pupils within a certain distance

of the school, and then use the number of treated pupils within this distance to measure health

spillovers. They find that naïve estimates which fail to take externalities into account would

underestimate the program treatment effects, leading to the mistaken conclusion that deworming

is not cost-effective.

        Miguel and Kremer (2004) use GPS distances at the level of a school, and note that these

are subject to some measurement error due to U.S. government using Selective Availability to

downgrade GPS accuracy at the time of their survey. Now that more accurate measurement is

available, a refinement to their research would be to use GPS locations of the residences of each

individual child, which could then be used to construct a child-specific measure of exposure to

treated and non-treated children. This would provide more variation in the extent of spillover,

which could be used to examine the heterogeneity in treatment effects.


3. Information on the spatial distribution of population and services is essential to

understanding access to services

        One of the most common uses of GPS information to date in developing countries has

been to measure access to infrastructure and social services, particularly health care. For

example, Perry and Gessler (2000) use GPS to measure access from communities to primary

health care facilities in Andean Bolivia and use this to propose an alternative model of health

distribution in the study area.

        In addition to providing purely descriptive measures of access, GPS data on distance and

travel times can be used to understand barriers to the use of particular services. Entwisle et al.

(1997) examine the importance of accessibility to family planning on choice of contraceptive

device, and in doing so, demonstrate two advantages of GPS over survey-based measures of



                                            - 9 -

access. They note first that data on family planning accessibility is often collected in surveys

only for certain political or administrative boundaries, such as whether there is a facility in the

village. However, facilities in neighboring administrative units may be closer. Using geo-

referenced data allows more flexible specification of boundaries, which are not constrained by

administrative definitions. Secondly, they note that reported travel times to health facilities in

their survey are often heaped in terms of 30 minute multiples, whereas using GIS gives no

clumping, allowing better specification of functional form.

        Gibson et al. (2006) examine the use of different financial channels for receiving

remittances in Tonga. Transactions costs on money transfers are much higher using Western

Union than when the recipient withdraws funds from an Automatic Teller Machine (ATM).

There are eight ATMs on the main island of Tongatapu compared to five Western Union

branches, so a branches per capita measure of access would suggest that ATMs are more

accessible. However, they collect GPS coordinates of the ATMs and Western Union branches,

and combine this with village-level population information from the Census and a digitalized

road network to measure the share of the population within different travel distances of the two

competing financial channels. Figure 1 shows their results. Although Western Union has less

branches, they are more dispersed, and cover 97 percent of the population within a 10km travel

distance, compared to only 77 percent of the population covered by ATMs within this distance.

Figure 1 also illustrates how effective the combination of GPS data collection and mapping

software can be at illustrating access in a form accessible for policymakers.

        More recent health applications combine distance with measures of infrastructure quality.

Hong, Montana and Mishra (2006) use the 2003 DHS in Egypt to look at the relationship

between IUD contraceptive use and the quality of family planning services available. They link

each household to the nearest family planning clinic within 10km, and then use detailed DHS

survey data to measure the quality of the facility. Rosero-Bixby (2004) uses GPS data on census

tracts and locations of health facilities in Costa Rica to assess the extent to which health reforms

led to improvements in access, measuring access with a combination of distance and services

provided by the facility. He notes that households may not necessarily use the nearest facility,

particularly if it is low quality, and by using GIS one can calculate measures such as the density

of services that meet a standard quality within a specified radius.




                                             - 10 -

        Figure 1: Service Areas for ATMs (left) and Western Union branches (right)

                                      for Tongatapu, Tonga.




Source: Figure 4 in Gibson et al. (2006)


        A limitation with the above set of health studies is that they only measure distance at the

level of a community, whereas households on opposite sides of a village or town may be closer

to different facilities from each other. A second limitation is that distance to health facilities

could be correlated with a host of other unmeasured factors, such as poverty, disease

environment, and other infrastructure, which could also affect health decisions. A recent

innovative experiment by Thornton (2006) in rural Malawi is able to identify the causal impact

of distance. She studies the decisions of individuals who had been tested for HIV to attend a

voluntary counseling and testing (VCT) center to learn their test results. Using their GPS

coordinates, households in villages were grouped into zones, and a location within each zone was

randomly selected to place a small portable tent, which served as the temporary counseling

center, with the average straight-line distance to the center 2.1km. As Figure 2 below shows, she

finds a strong negative impact of distance on the probability of learning the results of the test,

particularly for those living within 1km of the center.




                                            - 11 -

                         Figure 2: Greater distance lowers the probability of

                                     accessing VCT centers in Malawi




                                        Source: Figure 3 of Thornton (2006)



4. Using GPS can improve the collection of household survey data

         GPS is also starting to be used to improve the quality and cost-effectiveness of collecting

household survey data.           These uses occur at several phases of data collection, from the

development of a sample frame, to quality control, and use for follow-up surveys. More accurate

and cost-effective surveying enables researchers to carry out better analysis and provide better

evidence-based advice to policymakers.

         Representative household surveys require an accurate sample frame. The most common

approach involves using a recent Census to select enumeration areas. However, censuses may

become outdated during periods of rapid urbanization, and will be of little use in drawing

samples in post-conflict countries that haven't had a census for decades. For example,
Afghanistan is planning on completing a census in 2007, its first since 197910, while Lalasz

(2006) reports 15 countries have not taken a Census since 1990. The traditional solution to this

problem is to do area sampling, in which enumerators list all households in a well-defined block,

such as a village or an area bounded by certain city streets. Such blocks are largely determined

by the convenience in defining them and locating them, and can be expensive to enumerate.

         Landry and Shen (2005) show how GPS can be used to do area-based sampling quickly

and cheaply, since enumeration areas can be defined in terms of spatial coordinates, and made


10http://afghanistan.unfpa.org/projects.html [accessed December 28, 2006]


                                                   - 12 -

arbitrarily small. They apply this to the problem of surveying in China, where household

registration lists are widely used as sample frames. Widespread migration from rural areas

however means that many households are unlikely to be found on these registration lists. They

use GPS to survey randomly chosen 54 × 54 meter squares (approximately one square second),

and find that 45 percent of the households reached were not on household registration lists.

        However, one potential problem with this approach is that the sample size is not known

until data collection has occurred, since the number of households within a spatial block is not

known ex ante. Landry and Shen use existing population data to create a rough population model

of Beijing, but found that the number of dwellings within their spatial units was four times as

large as they had budgeted for, so they only administered their questionnaire to one-fourth of the

units. It appears likely that aerial photography will alleviate such problems in the future. For

example, Cowen and Jensen (1998) extracted individual dwelling unit information in a 32 census

block area in South Carolina from aircraft multispectral data. They found the correlation of

dwelling unit data derived from remote sensing with similar data derived from the census to have

a correlation of 0.91. As the resolution of satellite imagery continues to improve and fall in price,

it appears likely that the combination of remote sensing and spatial sampling will become the

standard for constructing sample frames in situations where reliable census or registrar data are

not available.

        Another example of combining remote sensing and GPS for drawing samples is provided

by Kumar (2007) in a survey of 1600 households spread across different air pollution zones in

Delhi, India. The study area was partitioned into different strata characterized by air pollution

levels (obtained by remote sensing) and proximity to main point sources of air pollution.

Random points were then simulated using GIS techniques (weighting by the size of the

residential area in each strata) and GPS was then used to navigate to the households located at

each selected point. These households were then asked to participate in the survey. This method

of creating a frame and drawing a sample should be more efficient than simply imposing a

regular grid across the study area, since air pollution is irregularly distributed over space.

         Visualization of the locations at which sampling has occurred can provide a useful form

of quality control to ensure that interviewers conduct surveys where they are supposed to, and to

check whether any dwellings are inadvertently missed. In 2004, Timor-Leste became the first

country to use GPS units to record the locations of all households in their Census. USAID



                                            - 13 -

Timor-Leste (2004) reports that survey managers checked the GPS points visited by the Census

teams against detailed aerial photograph maps, and used this to detect areas missed in the

enumeration, sending enumerators back to complete the surveys. Population counts from the

Census are used in many countries for a variety of policy purposes, including the division of

federal money and for defining political representation. Undercount can be particularly high in

developing countries ­ Lalasz (2006) reports that the 1991 Census is thought to have

undercounted Nigeria's population (officially put at 89 million) by perhaps 20 million people.

The use of GPS can help show where such undercount has occurred, and help survey managers

reduce it.

        Another potential use of GPS is through reducing the cost and time taken to re-locate the

same dwelling for follow-up surveys. Two reasons for follow-ups are to allow field managers to

check errors made by enumerators, and for the collection of panel data. A recurrent problem in

many developing countries is the lack of street addresses, making re-locating the same dwelling

time consuming, especially in densely populated urban areas. A pilot study conducted by

Dwolatsky et al. (2006) with the aim of tracing patients who left a tuberculosis control program

in South Africa shows the potential for using GPS to re-locate dwellings. They compared the

time taken to re-find a home given residential addresses with the time with a customized

personal digital assistant (PDA) linked to GPS. The time taken to find the dwelling was found to

be 20-50 percent less using the PDA/GPS device. The main limitation of this study was that it

was a small pilot of only 20 houses, so further experiments are needed to confirm the promising

results found here.

        When panel surveys attempt the more difficult (but conceptually correct) task of tracking

individuals rather than dwellings, GPS can be very useful for tracking people who had

previously been co-residents in the same household. For example, the Kagera Health and

Development Survey 2004 (KHDS 2004) in Tanzania used GPS to record the locations of 2700

households that contain members who had been in the baseline sample of 900 households first

interviewed in 1991-94 (Beegle, de Weerdt and Dercon, 2006). Measures such as how far people

have moved from either their baseline village center or from households with members who had

been co-residents in the baseline surveys can be related to various socioeconomic characteristics.

        Finally, collecting GPS data for households allows the possibility of linking the

household data set to other surveys and other datasets. There is considerable option value in



                                          - 14 -

doing this, since many potential uses of the data will not be known at the time of collecting the

survey.


HOW MUCH IMPROVEMENT DOES GPS GIVE OVER SELF-REPORTS,

AND IS A STRAIGHT LINE GOOD ENOUGH?

       A subset of the uses of GPS detailed above involves measuring distances from

households and communities to other households, communities, or infrastructure. The natural

question which then arises is whether we need to use GPS to measure these distances, or whether

distances can be obtained directly through self-reports in household surveys. A follow-up

question is then whether a simple straight-line (crows-fly) distance is sufficient, or whether the

GPS coordinates should be integrated with GIS information on transport routes and topography

to measure travel distances and travel times.

       The consequences of mis-measuring distance depend on how distance is going to be used,

how badly it is mis-measured, and on the nature of the mis-measurement. If measurement errors

are classical, then when distance is used as a regressor, as in the studies of access, the effect will

be an attenuation bias which understates the impact of distance. Using distance as an instrument

with classical measurement error will lower the power of the instrument, potentially giving rise

to weak instrument concerns, but will still result in consistent estimates.

       However, there are strong reasons to believe that measurement errors are not random.

Entwisle et al. (1984) note as an example that if people are asked to report travel times to a

health provider, those who currently use the source will have more accurate knowledge than

those who do not. Thus the measurement error is likely to be correlated with usage patterns, a

problem if one wishes to investigate the impact of distance on usage. Indeed, Andrabi et al.

(2007) report that in their survey in Punjab, Pakistan, many households do not even know the

name of their nearest school, let alone its location. If the measurement error is correlated with

socioeconomic variables which also affect the outcome of interest, then the mis-measured

distance will also give inconsistent instrumental variable estimates.

       At present there are very few studies which systematically compare self-reports of

distance and travel times to GPS measurements, particularly in developing countries. However,

the two studies of which we are aware suggest errors can be large, and correlated with

socioeconomic variables, although it should be noted that neither explicitly considers physical


                                           - 15 -

distance.11 Goldstein and Udry (1999) map agricultural plots in Ghana using GPS equipment and

compare the estimates to cultivators' estimates of plot size. Figure 3 shows their striking results:

the correlation between reported plot size and mapped plot size is only 0.15. Moreover, this

measurement error is non-random: using their data we find that richer households report a
smaller area than do poorer households for the same measured plot size12, and that women report
a larger area than men do for the same measured plot size.13


  Figure 3: The low correlation between reported plot size and GPS-measured plot size in

                                                                  rural Ghana


                                      80




                                        60


                  (Ropes)

                         Size             40



                             Reported
                                            20




                                              0

                                                0                1                     2              3
                                                                 Measured Size (Hectares)


                                                  Source: Figure A1 in Goldstein and Udry (1999).


11There are also several other surveys which have collected both self-reports and validated measures, for which
formal write-ups do not yet exist. Alan de Brauw and John Hoddinott at IFPRI report that surveys in rural Ethiopia
and China have generally found quite good agreement between self-reported area and measured area, whereas in
Mozambique self-reports were very inaccurate. One hypothesis is that the accuracy of land area measurement is
related to the scarcity of land: in places like China and Ethiopia, where land is scare and allocated by the
Government, people have a better idea of the size of their land, than in places where land is relatively abundant.
Roberts et al (2006) report on a survey in Bukoba, Tanzania where self-reported distance was compared with
distances calculated by using pedometers and an estimate of average step length. They find that over 60 percent of
self-reported distances were more than twice the calculated distances.
12This parallels the classic psychology study of Bruner and Goodman (1947) in which poor children recall coins as
bigger than rich children do.
13These coefficients come from a median regression of reported area per measured area on gender, education, and
household wealth, Both gender and household wealth are significant at the 1% level.


                                                                  - 16 -

         Escobal and Laszlo (2005) compare self-reports of the time in minutes it takes

agricultural producers in Peru to get to the nearest populated center with the true travel time. The

latter is measured by having surveyors walk with a random sample of respondents, and time their

journeys, following the same route and pace as the respondent, and using GPS to measure

latitude, longitude, altitude and distance. GIS is then used to compute travel time accounting for

terrain for those in the survey who weren't accompanied by the surveyor. They find respondents

consistently under-report the time taken to reach the center. For example, among coffee

producers in the Selva, mean self-reported time was 6.7 minutes, compared to a mean true time

of 13.0 minutes. The correlation between self-reported time and true travel time is only 0.28 for

the coffee producers, 0.29 for the potato farmers, and -0.08 for the rice farmers in their sample.

Furthermore, Escobal and Laszlo do find that measurement errors are correlated with

socioeconomic variables. Not surprisingly, individuals who own a watch give more accurate

reports of travel times. They also find a negative correlation between the measurement error and

education, so that more educated people have less measurement error.

         These results therefore strongly suggest that self-reported distances will be misleading,

with measurement errors correlated with outcomes of interest. GPS coordinates can then be used

to give more accurate measurement. The simplest approach is to calculate the straight-line

distance between points. This has the advantage of computational simplicity, and does not

require the user to have additional geo-coded information on transport networks or topography.

Alternatively, users can combine GPS point coordinates with information on the location of

transportation routes, and perhaps information on road quality and topography, to measure exact

travel distances and predict travel times.

         The correlation between GPS straight-line travel distances and exact travel distances is

likely to be much higher than the correlation between self-reported distance and GPS-distances.

For example, in McKenzie, Gibson and Stillman (2006) we computed the distance in meters

from each household in our sample in Tonga to the New Zealand immigration office in

Nuku'alofa. The Pearson (Spearman) correlation between the straight-line and exact travel

distance based on road networks is 0.82 (0.78). We do find the absolute percentage measurement

error to be correlated with whether or not an individual migrates, and with their income from

work in Tonga. The measurement error is greater for more remote individuals (on the other side

of a lagoon), and this remoteness in turn is correlated with economic behavior. Nevertheless, the



                                            - 17 -

size of this error is small enough in our application that we find no difference in the IV estimates

when we use straight-line distances compared to road distances. Using the straight-line distance

as an instrument for migration we estimate the income gain from migration to be $280 (s.e.

$122) compared to $281 (s.e. $101) when we use the road-distance.

        More generally, the difference between straight-line and road distance measures will be

larger when geographical features such as mountains, lakes, lagoons, and rivers lie between a

household or village and the location of interest. Hence the difference between straight-line and

road distances will be correlated with how remote a location is, which in turn is likely to affect

many variables of interest. As a result, road distances are preferred to straight-line distances

where possible.

        Furthermore, since a curve between two points is always longer than a line, travel

distances will be longer than straight-line distances. As a result, measures of access based on

straight-line distances will over-estimate the proportion of the population which is covered by a

given service. A good example of this is provided by Noor et al. (2006) who examine health

coverage in Kenya, where the government has set a target of ensuring that the whole population

lives within 1 hour of effective health services by 2010. Using straight-line distances which is the

standard approach to coverage, they would estimate that 82% of the population is currently

within 1 hour of government health services, however when they adjust this for the travel

network, the proportion currently covered drops to 63-68% of the population being covered.

Extrapolating this to all of Kenya, this would mean that 19 million rather than 25 million are

currently covered.

        These errors will especially matter when travel cost is a key feature of the analysis, as it

often is for non-market valuation. For example, Bateman, Brainard, Garrod and Lovett (1999)

use the travel-cost method (TCM) of estimating consumer surplus for a woodland recreation site.

They use actual road network distance and then compare the consumer surplus estimates with

those obtained by assuming straight line travel. Since straight line travel is shorter, using this

unrealistic simplifying assumption results in an underestimate of consumer surplus values of up

to 20 percent.




                                           - 18 -

PITFALLS AND UNRESOLVED PROBLEMS

Interviewer error

        Although taking GPS readings is quite straightforward, it requires good training,

otherwise surveys end up with another source of measurement error. One solution is to take

multiple readings for the same location and then take their average. Some GPS receivers have an

inbuilt function for doing this. For example the guidelines for collecting GPS data in DHS

surveys (Montana and Spencer, 2004) recommend taking multiple readings within a five minute
period and then averaging them.14


Datum and Coordinate Projection Problems

        Geographic coordinate systems use latitude and longitude to define the location of points

on the surface of the earth. Because the earth is not a true sphere (the poles are flattened and it

bulges at the equator) it is difficult to solve geodetic problems for point location. Instead, a

spheroid approximation to the shape of the earth is used, and this surface, its origin and the

orientation of its latitude and longitude lines make up a geodetic datum. GPS receivers typically

use the World Geodetic System 1984 (WGS84) datum, which is a geocentric datum (its origin

coincides with the center of the Earth) designed for making measurements world wide. However

there are hundreds of other datums which may use a different center, spheroid, or reference point

on the Earth's surface in order be locally more accurate. Coordinate values resulting from

interpreting latitude, longitude, and height values based on one datum, as though they were based

on another datum, can cause position errors of up to one kilometer (Ramachandran, 2000)

although the discrepancy will typically be less. Bennett (2006) gives the example of walking

around Tiananmen Square in Beijing with a GPS receiver and then importing the measurements

into Google Earth which showed a path offset by approximately 14 meters due to the difference

between the WGS84 datum and the datum used by Google Earth.

        In addition to these datum shift errors, which may be small enough to escape notice,

another common source of error comes from mixing geographic and projected coordinate

systems. These projected coordinates are designed to overcome a major weakness of latitude and

longitude for distance- and area-measuring purposes, which is that these are not constant units. A



14The fieldguides of Spencer et al. (2003) and Montana and Spencer (2004) provide a good starting point for
researchers planning on using GPS in a household survey.


                                                 - 19 -

degree of latitude is 110.6 kilometers at the equator and 111.7 kilometers at the poles. A degree

of longitude is rather more elastic, being 111.3 kilometers at the equator, 55.8 kilometers at 60

degrees latitude and only 16.9 kilometers at 80 degrees latitude. Consequently Cartesian

geometry cannot be used to measure either distances or areas when working with latitude and
longitude coordinates.15

          Projected coordinate systems convert latitude and longitude coordinates from the earth's

three dimensional surface onto a two dimensional map. For example, a widely used projected

coordinate system for measuring distance is the Universal Transverse Mercator (UTM) which

divides the earth into 60 zones, each spanning six degrees of longitude and ranging from latitude

80 degrees South to 84 degrees North. Each zone has its own central meridian and locations

within a zone are measured in meters eastward from the central meridian (which is given a value

of 500,000 meters so that even the furthest west point in the zone does not get a negative
coordinate) and meters northward from the equator.16 For example, the coordinates of Lima, Peru

translate from decimal degrees of 12.05S, 77.05W to an Easting of 276,836 and a Northing of
8,667,084 within the UTM zone 18S.17

          Consequently if location data from a GPS (which are for a three dimensional surface and

so use a geographic coordinate system) are combined with two dimensional map data (in a

projected coordinate system), a conversion has to be made to line up the various data layers. This

is a common and easily done task in a GIS, and building up different layers of data for

households, villages, and features of interest like roads, coastlines, rivers, or public services adds

value to the information in each layer. However, the data layers will not line up if different

coordinate systems are used for each layer (for example, using the wrong UTM zone number

could place the same features several hundred kilometers apart). It is surprisingly easy to

incorrectly have unmatched coordinate system because metadata, which should tell users about

the coordinate system and datum used, are not always included with existing geographical data.

For example, one of the authors digitized a road around the edge of the island of Tongatapu in


15    Instead  the   formula   for  great  circle   distance  should    be   used,  which   can   be   found    at:
http://mathforum.org/library/drmath/view/51711.html. Stata code that implements this formula is available in the
"globdist" ado written by Ken Simons.
16 For locations in the Southern Hemisphere, the Northing is calculated as 10,000,000 minus the meters south of the
equator.
17 This zone ranges from 72 to 78 degrees west with a central meridian of 75 degrees. The Easting of 276,836
indicates that Lima is approximately 223 kilometers west of this meridian. The Northing of 8,667,094 indicates that
it is approximately 1300 kilometers south of the equator.


                                                  - 20 -

Tonga by driving on it with a GPS receiver turned on. An existing base map of the coastline was

obtained but no metadata were available to show the projection used. Even after choosing the

most likely coordinate system for the base map, UTM1S for the UTM zone that Tonga is located

in, the road and the coastline were misaligned since it appeared that on one side of the island the

road that the author had driven on was in the ocean.

        A more general issue is that in many developing countries there is not a lot of off-the-

shelf geographical information available at the resolution level needed to merge it with village or

community level data. Information is more often available at a coarser scale, making it difficult

to link household locations to local level geographic features. Even when information is

available at high resolutions, it may not always match up with the household survey due to these

differences in coordinate systems. It is therefore important for countries to have a spatial data

infrastructure (SDI) which coordinates different collection activities so that different geographic
datasets can be matched together.18


Road Network Problems

        Practical difficulties are increased when constructing a road network for measuring either

distances or traveling times. The algorithm used in a GIS for calculating shortest distance

requires good alignment between the lines and junctions of the digitized road network. For

example, if digitized road segments at a junction do not line up, the algorithm will back-track

and seek a path elsewhere. These problems may be especially apparent when roads have been

digitized for another reason, such as cartographic display, so once again metadata about the

origin and purpose of geographic layers used in conjunction with GPS data is very useful. It is

also sensible to budget considerable research assistance time to clean a roads dataset since

misalignment problems may be common. For example, the road network underlying the service

areas for money transfer facilities in Figure 1 took more than one week to clean. The digitized

roads had been obtained from an earlier cartographic project rather than being digitized with a

GPS when the research team was in the field. So even though it looked like a digital road

network it was in fact based on polygons (the basic units in GIS are points, lines and polygons)




18The Global Spatial Data Infrastructure Association provides a codebook of how an SDI can be built. See
www.gsdi.org [accessed February 21, 2007].


                                                  - 21 -

and a large effort was required to convert these into continuous lines and junctions so that a

usable road network was available for calculating travel distance and time.

       One way of reducing the effort required to obtain a usable roads network dataset is to

digitize only the main roads and then assume a network of feeder roads, which will automatically

have nicely aligned junctions. This approach is used by Staal et al. (2000, 2002) who study

market access and its effect on market participation and technology adoption for smallholder

Kenyan dairy farms. To build a road network linking their sample of farms to Nairobi and other

urban areas they use topographic maps to digitize three classes of roads: 1) all weather, bound

surface, 2) all weather, loose surface, and 3) dry weather only roads. Since this left many of the

surveyed farms still off the actual road network they add a 4-kilometer grid of assumed feeder

roads to fill in the areas between existing roads. It is not clear how much error is introduced by

using this combination of actual and assumed roads.


Confidentiality Issues

       The accuracy of GPS in measuring either household or community locations also poses a

challenge for maintaining the privacy of survey respondents. VanWey et al. (2005) discuss how

the ethical need to ensure confidentiality of information collected about human research subjects

may conflict with a desire to link the characteristics and actions of individuals or households to a

particular geographic location. Uncertainty about how best to proceed in these circumstances

may mean either that spatially explicit data are under-utilized, undermining the role of data

sharing and data preservation in advancing science, or that researchers inadvertently disclose

information that can identify survey respondents. These conflicts affect not only the original

producers of data but also any data archivists charged with maintaining the database and

providing it to other researchers while continuing to honor the commitments to confidentiality

made when the data were first collected.

       Moreover, this conflict between the confidentiality and usability of GPS data is not

limited just to the sharing of data. It also affects the reporting and display of results based on

geo-referenced data. For example, to show the confidentiality challenges posed by mapping point

data, Curtis, Mills and Leitner (2006) re-engineer (i.e., reverse address match back to an

individual residence) a newspaper map showing the locations in New Orleans where deaths

occurred during Hurricane Katrina. The location marks in the newspaper each covered



                                           - 22 -

approximately one and a half city blocks and there were no roads and few other reference points

illustrated. Nevertheless, over 30 percent of the re-engineered locations fell within 25 meters of
the actual residence where a death occurred.19 In several cases both the re-engineered location

and the field verified residence where death occurred were for the same house. The authors

scatter a series of random coordinates throughout the study area to show that chance alone would

not give the same level of discovery. They conclude that: "[t]he fact that many of the re-

engineered coordinates could be used to identify an actual address, or an address within the

immediate vicinity, should sound a note of caution for academics publishing maps displaying

human cases as points" (p.53).

        Typical approaches for maintaining confidentiality with GPS data are to limit access to

approved researchers who promise not to identify respondents, to convert point data to either

surfaces or distances so that individual locations are not revealed, to aggregate and report data

only for larger areas, and to use some geographical masking procedure. These masking

procedures add either stochastic or deterministic noise to the geographic coordinates for sampled

households and communities. Many surveys use a combination of these four approaches.

        Stricter approval procedures for obtaining geo-referenced data than for ordinary

household survey data are common. For example, researchers have to provide additional

justifications and commitments before they can obtain (masked) GPS data on community

locations in DHS surveys. An even more stringent approach is to use a data enclave, which is

typically located within a survey organization and accredited researchers come to the enclave to

run their analysis. All output is checked for disclosure risk before release and even in this

monitored environment there are typically restrictions on the linking of datasets and on the

identity or location of individual respondents. Researchers often have to pay entry fees to these

data enclaves and the limited number of enclave locations may act as a barrier to research.

Remote (or virtual) data enclaves are being explored in some countries to overcome these

problems. With a remote enclave a researcher can access the data server where the confidential

data are stored, carry out the desired analysis and obtain aggregated results (e.g. choropleth maps

or regression coefficients). Various rules on minimum cell sizes or the size of spatial units to be

mapped can be imposed so that no individual-identifiable details are provided to the researcher.



19The validation for the actual location where deaths occurred came from the search and rescue markings on
dwellings, which were recorded during a field survey.


                                                  - 23 -

An example is provided by Cromley, Cromley and Ye (2004) where user queries yield results

only if the cell contains at least six records (and the minimum population in the smallest mapping

unit is about 1,000).

        Point data on household locations can be converted to a continuous surface to represent

the spatial distribution of either characteristics or outcomes without identifying respondents. For

example, geographers have a variety of spatial interpolation methods, such as spatial variants of

the kernel density estimators increasingly used by economists (Bithell, 1990). Alternatively,

point location coordinates can be replaced with distances to various features of interest in any

public release dataset. However these methods are not very flexible and are likely to limit future

research use of the data. Surfaces do not provide the micro data needed for studies that seek to

measure causal impacts. The features of interest that distances are reported for are likely to vary

from one study to another and as distances to more features are included the possibility of using

triangulation to identify household locations increases. Moreover, distances are often needed for

more than just features of interest. For example, knowing the position of households relative to

other households helps study learning from neighbors (Conley and Udry, 2005) and helps

improve the modeling of spatial autocorrelation (Gibson and Olivia, 2007).

        Aggregating groups of observations into larger reporting units is a widely used method

for maintaining confidentiality of both survey and census records. With GPS coordinates, the

locations of individual households could be aggregated into larger areal units like Census Blocks

or Census Tracts so that all that is reported is either an administrative code for the larger area or

a polygon showing the boundaries of the area where the household is located. These techniques

can also be applied to the visual display of data by using dot points on a map that are sufficiently

large to prevent disclosure risk. For example, VanWey et al. (2005) show how the size of the

required buffer around the locations of sampled schools in a U.S. survey would need to vary

from only six kilometers in a city to over 50 kilometers in the countryside in order to hold

disclosure risk to only five percent when mapping the sample points. Although aggregation can

reduce disclosure risk it comes at the cost of seriously degrading the analyses that can be

conducted since so much detailed spatial information is lost. For example, Fefferman, O'Neil

and Naumova (2005) providing an example where areal aggregation yields little benefit of

additional privacy and large costs in terms of impaired ability of statistical tools to analyze

patterns of disease incidence.



                                             - 24 -

         Geographical masking methods work by modifying the geographic coordinates linked to

each household or community. Either random perturbations or affine transformations can be

used. If the relationship of sample points to one and other is important while the relationship to

another data layer (say, a base map or a road network) is not, then simply moving all points by

some given distance and direction or rotating them about a chosen point may preserve
confidentiality and not greatly degrade usability of the data.20 Normally however, point locations

obtained with a GPS are merged onto other data layers so these affine transformations will

introduce error that reduces the usability of the GPS data. Another option is to introduce random

perturbation around the original point with the radius of the perturbation circle chosen by the

data custodian, possibly weighting the size of the circle by population density at each point to

take into account the effect of population density on the risk of disclosure (Kwan, Casa and

Schmitz, 2004). For example, many recent DHS surveys include tests for HIV infection and

because of confidentiality issues related to HIV status, up to two kilometers of random error in

any direction is added to cluster locations in urban areas, and up to five kilometers of random

error is added to cluster locations in rural areas. Additionally, one point in each survey with HIV
testing is displaced up to 12 km in any direction.21 Only limited research has been conducted on

the effect of random perturbation on either disclosure risk or the accuracy of results. Kwan et al

(2004) show that as the size of the perturbation circle increases the accuracy of results

diminishes. Zimmerman and Pavlik (2006) show that releasing metadata about the perturbation

methods and having different masked versions of the same dataset (e.g., a spatially aggregated

dataset and a randomly perturbed dataset) can considerably increase disclosure risk.


CONCLUSIONS

         The removal of selective availability and falling costs of GPS receivers have made the

collection of GPS information increasingly feasible in household surveys ­ yet many household

surveys still do not use this technology. This paper argues that the collection of GPS coordinates

should become a routine part of household survey collection, since doing so can lead to better

economics and better policy advice. In particular, we have shown how GPS is being used to help




20This is the approach used by the Rural Investment Climate Survey in Indonesia.
21For more details see: http://www.measuredhs.com/topics/gis/methodology.cfm.


                                                 - 25 -

better measure and understand the causal impacts of policies, policy externalities, and access to

services. In addition, using GPS can improve the quality of the household survey data collected.

        Moreover, one of the greatest arguments for collecting GPS information now is the
option value it gives for unforeseen future applications.22 As the stock of geo-referenced data

increases within developing countries, we are likely to see a number of innovative applications

which combine household survey data with other geographical information. There are also a

number of interesting new research questions in econometrics and sampling methodology which

arise out of the use of GPS. For example, typical household surveys often involve population-

based clusters, which are not randomly spread across geographic areas (Montana and Spencer,

2004). As a consequence, more research is needed to determine how to best estimate spatial

models using a sample which is not spatially representative at the local level, and to determine

how to best sample within communities in order to best allow both spatial and non-spatial uses of

the data.


References
        Andrabi, Tahir, Jishnu Das, Asim Khwaja, Tara Vishwanath, and Tristan Zajonc (2007)
"The     Learning       and     Educational        Achievement          in     Punjab      Schools     (LEAPS)
Report" Mimeo The World Bank.
        Anselin, Luc (2002) "Under the hood: Issues in the specification and interpretation of
spatial regression models", Agricultural Economics 27: 247-67.
        Bateman, I. J., J. S. Brainard, G. D. Garrod and A. A. Lovett (1999a), "The impact of
journey origin specification and other measurement assumptions upon individual travel cost
estimates of consumer surplus: A Geographical Information Systems analysis", Regional
Environmental Change 1(1), 24­30.
        Bateman, I, A. Jones, A. Lovett, I. Lake and B. Day (2002) "Applying Geographical
Information Systems (GIS) to Environmental and Resource Economics", Environmental and
Resource Economics 22:219-69
        Beegle, Kathleen, Joachim De Weerdt and Stefan Dercon (2006) "Kagera Health and
Development       Survey       2004      Basic      Information         Document"         The     World    Bank
www.worldbank.com/lsms/country/kagera2/docs/KHDS2004%20BID%20feb06.pdf                                 [accessed
March 13, 2007].
        Bithell, John (1990) "An application of density estimation to geographical
epidemiology", Statistics in Medicine 9(5): 691-701.
        Borjas, George (2004) "Economics of migration" International Encyclopedia of the
Social and Behavioral Sciences pp. 9803-9809.
        Brady, Henry and Iris Hui (2006) "Is it worth going the extra mile to improve causal
inference? Understanding Voting in Los Angeles County", Mimeo. Department of Political
Science, UC Berkeley.


22See Turner (2006) for some intriguing ideas with regard to visual representation of geo-coordinates.


                                                 - 26 -

        Bruner, Jerome S. and Cecile C. Goodman (1947) "Value and Need as Organizing
Factors in Perception", Journal of Abnormal and Social Psychology, 42: 33-44
        Burchfield, Marcy, Henry Overman, Diego Puga and Matthew Turner (2006) "The
determinants of sprawl: A portrait from space", Quarterly Journal of Economics 121(2):
587-633.
        Case, Anne (1991) "Spatial patterns in household demand", Econometrica 59(4): 953-65
        Conley, Timothy (1999) "GMM estimation with cross-sectional dependence", Journal of
Econometrics 92(1): 1-45.
        Conley, Timothy and Christopher Udry (2005) "Learning about a new technology:
Pineapple in Ghana", Mimeo. Yale University.
        Cowen, David and John Jensen (1998) "Extraction and Modeling of Urban Attributes
using Remote Sensing Technology", pp. 164-88 in D. Liverman, E. Moran, R. Rindfuss and P.
Stern (eds.) People and Pixels: Linking Remote Sensing and Social Science, National Academy
Press, Washington D.C.
        Cromley, Ellen, Robert Cromley and Yanlin Ye (2004) "On-line reporting an mapping of
spatially aggregated individual records selected by user queries" Cartographica 39(2): 5-13.
        Curtis, Andrew, Jacqueline Mills and Michael Leitner (2006) "Spatial confidentiality and
GIS: re-engineering mortality locations from published maps about Hurricane Katrina"
International Journal of Health Geographics 5(1): 44-56.
        Deininger, Klaus and Bart Minten (2002) "Determinants of Deforestation and the
Economics of Protection: An Application to Mexico", American Journal of Agricultural
Economics 84(4): 943-960.
        Duflo, Esther and Rohini Pande (2007) "Dams", Quarterly Journal of Economics 122(2):
forthcoming.
        Duranton, Gilles and Henry Overman (2005) "Testing for Localisation Using Micro-
Geographic Data", Review of Economic Studies 72(4): 1077-1106.
        Dwolatzky, Barry, Estelle Trengove, Helen Struthers, James McIntyre and Neil
Martinson (2006) "Linking the global positioning system (GPS) to a personal digital assistant
(PDA) to support tuberculosis control in South Africa: a pilot study", International Journal of
Health Geographics, August 16, 5:34.
        El-Rabbany, Ahmed (2006) Introduction to GPS: The Global Positioning System Artech,
Boston, MA.
        Entwisle;   Barbara,    Albert   Hermalin,    Peerasit   Kamnuansilpa,   and     Apichat
Chamratrithirong (1984) "A Multilevel Model of Family Planning Availability and
Contraceptive Use in Rural Thailand" Demography, 21(4): 559-574.
        Entwisle, Barbara, Ronald R. Rindfuss, Stephen J. Walsh, Tom P. Evans, and Sara R.
Curran (1997) "Geographic Information Systems, Spatial Network Analysis, and Contraceptive
Choice" Demography, 34(2): 171-187.
        Escobal, Javier and Sonia Laszlo (2005) "Measurement Error in Access to Markets",
Mimeo. McGill University.
        Fafchamps, Marcel and Jackline Wahba (2006) "Child labor, urban proximity and
household composition" Journal of Development Economics 79(2): 374-397.
        Fefferman, Nina, Eileen O'Neil and Elena Naumova (2005) "Confidentiality and
confidence: Is data aggregation a means to achieve both? Journal of Public Health Policy 26(4):
430-449.




                                           - 27 -

       Freedman, David (2004) "Ecological inference and the ecological fallacy" International
Encyclopedia of the Social and Behavioral Sciences pp. 4027-4030.
       Gibson, John, Geua Boe-Gibson, Halahingano Rohorua and David McKenzie (2006)
"Efficient Financial Services for Development in the Pacific", Mimeo. University of Waikato
and World Bank.
       Gibson, John and Susan Olivia (2007) "Spatial autocorrelation and non-farm rural
enterprises in Indonesia", Paper presented at the 51st Conference of the Australian Agricultural
and Resource Economics Society, Queenstown, February, 2007.
       Goldstein, Markus and Christopher Udry (1999) "Agricultural Innovation and Resource
Management in Ghana", Final Report to IFPRI under MP17, Mimeo. Yale University.
       Heckman, James, Hidehiko Ichimura, and Petra Todd (1997), "Matching as an
Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme,"
Review of Economic Studies 64(4), 605-654.
       Goodchild, Michael (1992), "Geographical Information Science", International Journal
of Geographical Information Science 6(1): 31-45.
       Hong, Rathavuth, Livia Montana and Vinod Mishra (2006) "Family planning services
quality as a determinant of use of IUD in Egypt", BMC Health Services Research, June 22, 6:79.
       Hoxby, Caroline (2000), "Does Competition Among Public Schools Benefit Students and
Taxpayers?", American Economic Review, 90(5):1209-38.
       Kumar, Naresh (2007) "Spatial sampling for collecting demographic data" Paper
presented at the Annual Meeting of the Population Association of America, New York, March,
2007.
       Kwan, Mei-Po, Irene Casas and Ben Schmitz (2004) "Protection of geoprivacy and
accuracy of spatial information: How effective are geographical masks? Cartographica 39(2):
15-28.
       Landry, Pierre F. and Mingming Shen (2005) "Reaching Migrants in Survey Research:
The Use of Global Positioning System to Reduce Coverage Bias in China", Political Analysis
13:1-22
       Lalasz, Robert (2006) "In the news: The Nigerian Census", Population Reference Bureau,
http://www.prb.org/Template.cfm?Section=PRB&template=/ContentManagement/ContentDispla
y.cfm&ContentID=13767 [accessed December 28, 2006].
       McKenzie, David, John Gibson and Steven Stillman (2006) "How Important is
Selection? Experimental Vs Non-experimental Measures of the Income Gains from Migration",
World Bank Policy Research Working Paper No. 3906
       McKenzie, David, John Gibson and Steven Stillman (2007) "A land of milk and honey
with streets paved with gold: Do emigrants have over-optimistic expectations about incomes
abroad?" Mimeo The World Bank.
       Miguel, Edward and Michael Kremer (2004) "Worms: Identifying Effects on Education
and Health in the Presence of Treatment Externalities", Econometrica 72(1): 159-217.
       Montana, Livia and John Spencer (2004) "Incorporating Geographic Information into
MEASURE        surveys:   A    Field   Guide    to  GPS    Data   Collection",    MeasureDHS,
http://www.measuredhs.com/basicdoc/gps/DHS_GPS_Manual.pdf [accessed February 21, 2007].
       Noor, Abdisalan, Abdinasir Amin, Peter Gething, Peter Atkinson, Simon Hay and Robert
Snow (2006) "Modelling distances travelled to government health services in Kenya", Tropical
Medicine and International Health 11(2): 188-96.




                                          - 28 -

        Olken, Benjamin (2006) "Do Television and Radio Destroy Social Capital? Evidence
from Indonesian Villages", BREAD Working Paper No. 130
        Oster, Emily (2006) "HIV and Sexual Behavior Change: Why not Africa?", Mimeo.
University of Chicago.
        Overman, Henry G. (2006) "Geographical Information Systems (GIS) and Economics",
forthcoming in S. Durlauf and L. Blume (eds.) The New Palgrave Dictionary of Economics,
Palgrave Macmillan.
        Perry, Baker and Wil Gessler (2000) "Physical access to primary health care in Andean
Bolivia", Social Science and Medicine 50(9): 1177-88.
        Ramachandran, R (2000) "Public access to Indian geographical data", Current Science
79(4): 450-467.
        Ravallion, Martin (2006) "Evaluating Anti-Poverty Programs", forthcoming in R.E.
Evenson and T.P.Schultz (eds.) Handbook of Development Economics, Volume 4, Amsterdam,
North-Holland.
        Roberts, Peter, KC Shyam and Cordula Rastogi (2006) "Rural Access Index: A Key
Development Indicator" Transport Papers No. 10, Transport Sector Board, The World Bank.
        Rosenthal, Stuart and William Strange (2006) "The attenuation of human capital
spillovers: A Manhattan skyline approach", Paper presented at the AEA Annual Meeting,
January 6, 2006.
        Rosero-Bixby, Luis (2004) "Spatial Access to Health Care in Costa Rica and its Equity:
A GIS-Based Study", Social Science and Medicine 58(7): 1271-84.
        Spencer, John, Brian Frizzelle, Philip Page and John Vogler (2003) Global Positioning
System: A Field Guide for the Social Sciences, Blackwell Publishing: Oxford.
        Staal, S., Delgado, C., Baltenweck, I., and Kruska, R. (2003) "Spatial aspects of producer
milk price formation in Kenya: a joint household-GIS approach", Paper presented at the 24th
Conference of the International Association of Agricultural Economists, Berlin, August, 2000.
        Staal S., Baltenweck I., Waithaka M., de Wolff T. and Njoroge L. (2002)
"Location and uptake: Integrated household and GIS analysis of technology adoption and land
use, with application to smallholder dairy farms in Kenya", Agricultural Economics
27(2): 295-315.
        Thornton, Rebecca (2006) "The Demand for and Impact of Learning HIV Status:
Evidence from a Field Experiment", Mimeo. University of Michigan.
        Turner, Andrew (2006) "Introduction to Neogeography", O'Reilly Short Cuts, December.
        USAID Timor-Leste (2004) "East Timor Completes the World's First GPS-Based
Census",     USAID      Timor-Leste    Small     Grants    Program     Highlights,    http://timor-
leste.usaid.gov/PrintVersion/SGArchive51Print.htm [Accessed December 28, 2006]
        VanWey, Leah, Ronald Rindfuss, Myron Gutmann, Barbara Entwisle, and Deborah Balk
(2005) "Confidentiality and spatially explicit data: concerns and challenges", Proceedings of the
National Academy of Sciences 102, pp.15337-15342.
        Woodruff, Christopher, and Rene Zenteno (2007) "Migration networks and
microenterprises in Mexico", Journal of Development Economics, 82(2): 509-28.
        Zimmerman, Dale and Claire Pavlik (2006) "Quantifying the effects of mask metadata
disclosure and multiple releases on the confidentiality of geographically masked health data",
Mimeo Department of Biostatistics, University of Iowa.




                                          - 29 -