BIG DATA and Thriving CITIES Innovations in analytics to build sustainable, resilient, equitable and livable urban spaces C O N T EN T S INT R ODUCTION ............................................................... 1 What is Big Data in a Development Context? .......................... 2 Big Data in Urban Development................................................ 3 Profiling Big Data Approaches ................................................. 3 C A S E ST UDY : Using Geospatial Data to Track Changes in Urbanization.......... 5 C A S E ST UDY : Big Data to Beat Congestion........................................................ 15 C A S E ST UDY : People Power: Crowdsourcing to Track Urban Crime ................ 21 C O N CLUSION: Big Data for Optimal Cities of the Future.................................... 26 RE F ERENCES ..................................................................... 29 I NT R ODUCT ION Economic growth and urbanization are complementary and critical to poverty reduction in low- and middle-income countries. With 450 million new urban dwellers expected between 2010 and 2040, the typical African city will double its population and many new cities will be built1. For many developing countries this presents an opportunity to leapfrog development cycles, catch up and even surpass some of the 21st-Century hubs of high-income nations. Rapid urbanization is frequently unplanned, as decision making. The recent global diffusion cities struggle to keep up with the infrastructure of new technologies, combined with the use needs of fast-growing populations. Most of big data analytics, can help policymakers rapidly expanding cities also lack the data promote the effective development of needed even to monitor growth, let alone future cities that provide living and work to guide it. Newly available technologies, environments in which citizens can thrive2. techniques and intellectual capital provide a huge opportunity to help close this gap. In particular, innovative applications of geo- These new approaches are already being spatial and sensing technologies and the deployed worldwide in order to improve penetration of mobile phone technology are insights into urbanization and for data-driven providing unprecedented data collection 1. https://openknowledge.worldbank.org/bitstream/handle/10986/16657/815460WP0Afric00Box379851B00PUBLIC0.txt?sequence=2 2. The living environment determines the quality of housing, access to public goods, opportunities for recreation and socialization, and life prospects of children. The work environment determines the opportunities that exist for education, jobs and income generation. BIG DATA and Thriving CITIES 1 opportunities. This data can be analyzed for While most of the attention given to big data many purposes, including tracking population has focused on high-income countries, the and mobility, private sector investment, and rapid diffusion of technologies such as the transparency in federal and local government. internet, cellphones, ground sensors, drones It can be visualized accurately and in real- and satellites – to name a few – is also time, helping policymakers to make informed driving big data innovation in low- and middle- investments in infrastructure to meet both income countries. And while data flows in the short- and long-term needs, while minimizing developing world are typically smaller and opportunity costs. less diverse than in the developed world, they still present incredible opportunities for data To help development practitioners within scientists, economists and statisticians to use and beyond the World Bank take advantage big data to enhance or supplement traditional of these trends, this brief profiles a sample analytical approaches. of big data applications to support improved urban development in low- and middle-income Developing world big datasets still maintain countries. It also cites potential opportunities many of the unique characteristics that make for big data analytics to help developing nations big data different from traditional datasets. achieve sustainable urban growth, while They include the comparatively large volume reducing the economic differential with high- of data, its varied and unconventional income countries. sources, and the relative speed with which it accumulates. Such characteristics enable analysts to access radically new insights WH AT IS BI G DATA I N A and understand phenomena that traditional D EVELOPME NT CONTE XT? data collection systems cannot offer. Unlike previous sources of development data, such Big data is an umbrella term used to describe as household surveys, which address specific the constantly increasing flows of data emitted research questions, big data is usually produced from connected individuals and things, as in the course of some other activity (such as well as a new generation of approaches being making a cellphone call). This, along with the used to deliver insight and value from these size and complexity of some datasets, requires data flows. It is said that more data has been different research methods. Big data analytics generated in the past two years alone than in is the emerging set of tools and methods to all previous years combined. Big Data can be manage and analyze this explosive growth of defined as high-volume, high-velocity and high- digital information. It includes data science variety datasets that can be analyzed to identify methods such as machine learning, predictive and understand previously unknown patterns, analytics and visualization. These methods offer trends and associations. significant opportunities to draw on real-time information to address development challenges. 2 BIG DATA and Thriving CITIES The spread of analytics expertise, open-source end of income distribution can be missed from software and low-cost analytics packages data collected through cellphones. While not means that big data will become increasingly an important issue in higher-income countries, indispensable in helping low- and middle-income bias is likely to be a serious challenge in lower- countries analyze trends and develop policy. The income contexts. In addition, the extensive potential of big data in developing nations will informal exchange networks that are vital to only grow as they continue to digitize fast. urban economies in low- and middle-income countries are extremely difficult to track, even with the latest technology. This suggests that B IG DATA I N URBA N DEVELOPM ENT big data approaches in such contexts cannot rely on a hardwired urban data collection Big data and urban spaces are closely infrastructure, such as traffic monitors, or high- connected. The concentration of people and tech tools such as GPS-enabled smartphones. data-producing assets such as phones, traffic Rather, they may be best applied to phenomena and security cameras means cities naturally that can be tracked using cheaper passive generate massive amounts of data. Formal systems, such as low-cost cameras, satellite government data on factors such as health, the imaging, cellphones and crowdsourcing. economy or the environment are also prolific National restrictions on privacy or the use of in cities. In lower-income countries, capacities data may also affect the extent to which big for data collection and analysis are often most data can be used in specific countries. available in large cities, where technology and state services are most developed. This Despite these potential hurdles, numerous abundance of data means big data analytics innovative big data programs have already been has strong potential to help generate new created or implemented in low- and middle- approaches and solutions to challenges in income countries. Many are on a par with the urban development. developed world and may even present models for high-income cities to follow. There are, however, particular challenges to using big data in low- and middle-income cities. Analytics alone cannot create better PR O F ILIN G BIG DATA APPR OAC HE S cities. It can inform the development and implementation of solutions, but government The increased availability and reduced cost of policies, investment, financing, local support technology is allowing low- and middle-income and capable institutions are also essential. Less countries to leapfrog many of the steps taken developed urban environments may lack the in high-income nations to develop a robust technological resources to collect data, such technological infrastructure. As digital progress as closed-circuit TV, smartphones or traffic continues, many more big data approaches will sensors. Biases can be introduced when data be harnessed for urban development in lower- is not representative – for example, the lower income countries, at a faster rate than before. BIG DATA and Thriving CITIES 3 Numerous examples show that such The brief’s conclusion looks at big data’s role approaches offer valuable new ways of in shaping the cities of the future, highlighting understanding emergent events and trends that exciting technologies and techniques that could are difficult to measure. This brief offers four emerge in low- and middle-income countries key case studies, and many more examples, as over the coming decade. All these big data a taster of the potential of big data analytics in approaches are applicable in a wide variety of low- and middle-income urban contexts. These countries and contexts. They could achieve cover different implementation timeframes, from improvements as diverse as informing better already being deployed to expected readiness urban planning, helping beat congestion and within the next three years. They focus on how crime, or promoting financial services and the outputs of big data analytics can be used to efficient tax collection. Together they showcase improve policy in five focus areas of the World new possibilities for innovation in the quest for Bank’s urban work: Green Cities, Inclusive Cities, functioning, resilient and sustainable cities that Systems and Governance, Resilient Cities and enhance their residents’ lives. Competitive Cities. KEY TO C ASE- STUDY TIMEF RAM ES Big Data Near-Term Solutions Big Data Solutions in Development (3-5 years) Big Data Solutions of the Future (up to 10 years) KEY TO C ASE- STUDY POLI CY AR EAS Resilient Cities Inclusive Cities Green Cities Competitive Cities Systems & Governance World Bank staff can find tools, training and knowledge on big data on the intranet at //bigdata 4 BIG DATA and Thriving CITIES C ASE ST UDY Using Geospatial Data to Track Changes in Resilient Cities Inclusive Cities Urbanization NEAR-TERM HIGHLIGHTS PUBLIC POLICY USES FOR EO BIG DATA • Analysis of Earth observation (EO) big data • Track the growth of urban areas and from satellites and sensors can help understand economic drivers. stakeholders track and understand urban • Evaluate the current state of amenities development over time. and identify opportunities and priorities for • Working with providers of geospatial data, developing dynamic, equitable, sustainable the World Bank has carried out analytics and resilient cities. to measure the qualitative and quantitative • Underpin smart policymaking to promote aspects of urban transformation, such as optimal spatial and transportation links the distribution and density of urban sprawl, between jobs, affordable property, health and changes in land use and the growth rate of education services, and recreational areas. built-up areas. This allows analysts to begin to understand the drivers of land consumption. Practice Areas: • Combining EO data with other data on Resilient Cities, Inclusive Cities population and growth reveals dramatic insights on the overall economic viability, Countries Involved: inclusivity, resilience, sustainability and Various in South Asia, East Asia and Africa quality of life in urban areas. This enables Data Types: stakeholders and policymakers to develop Remote sensing, satellite imagery and implement informed policies in response, to create thriving cities of the future. BIG DATA and Thriving CITIES 5 The World Bank has been using big data to track and study changes in urbanization in low- and middle-income countries, to help ensure that it is sustainable, equitable and supports economic growth. As the world’s population becomes more urbanized, reducing unemployment and promoting sustainability and resilience in urban economies is vital. However, much urbanization is not well planned or managed in developing countries, resulting in cities which have grown rapidly but lack critical infrastructure and are unable to take advantage of economies of scale. For citizens, this means poor transport and services, weak links between job opportunities and the workforce, increased inequality, low resilience to shocks and a lack of sustainability. Successful urbanization depends on the or Earth observation (EO) data can deliver coordination of three distinct but interdependent quality results to measure urban growth over processes: public investment in infrastructure, a wide range of spatial and temporal scales, private investment in productive capital and particularly when combined with data from household investment in housing3. However, other sources. The resulting digital urban maps the speed of development and the lack of provide an up-to-date, accurate and cost- information available to each set of actors often effective resource to help national, regional prevents this coordination. Big data can play and city governments understand the nature an important role filling this gap and facilitating of urban development and make informed improved coordination. It is especially valuable decisions. EO datasets allow harmonized and where the speed and pace of urbanization standardized measurements, enabling spatially outstrip the authorities’ ability to understand and temporally consistent comparisons, as well how cities are growing and changing. Without as global assessment. Such data is particularly this baseline information, policymakers important for monitoring and understanding cannot meet the challenges and opportunities the evolution of cities – for example, allowing of urbanization through properly informed officials to see when built-up areas spill decisions. across formal administrative boundaries. This indicates the need to work with adjoining Visualizing urban landscapes administrative areas on issues such as In South and East Asia, the World Bank has connective infrastructure (roads, water mains) explored the patterns, consequences and policy or collecting garbage. implications of cities’ spatial development by drawing on the increasing availability of spatial Drawing on analysis of EO data, the World data and developments in analytics. Satellite Bank has built a database that describes 3. Source: Building African Cities that Work: A study on the Spatial Development of African Cities (P148736). World Bank concept note, November 13, 2013 6 BIG DATA and Thriving CITIES the speed, magnitude and spatial form of These initiatives have led to more than 30 World urbanization. Using these data, Bank teams Bank technical assistance projects delivered to have examined the drivers and impacts of national or city partners or due for completion the nature of urbanization and how the urban in the 2008-18 period. The results have been landscape has evolved to its current state. This used to maintain and strengthen policy provides a baseline from which to understand dialogues, as well as to guide the design of new the effects of policy change and identify projects. They have led to highly specialized priorities for new initiatives. In particular, the big data mapping products and monitoring Bank focused on exploring the institutional systems that leverage EO data for South Asian frameworks for urban management (such as cities – for example, mapping urban extent and mechanisms to coordinate service delivery analysis of the internal spatial structures of across administrative jurisdictions), investment different cities, or monitoring land subsidence (in transport, services and other network in a city with a high reliance on tube wells for infrastructure) and regulation (such as land use, water, lowering the water table. The World Bank- zoning and pricing of services). ESA partnership has been expanded until 2018, with Urban Development among its top three In 2008, in close collaboration with the priority themes. It aims to provide a systematic European Space Agency (ESA), the World source of development information, so that Bank launched the “Earth Observation for stakeholders can draw on state-of-the art EO Development” initiative. This provides data on capabilities to develop best practices and urban and other trends in areas where data sustainability plans. are traditionally scarce and often unreliable. Such information can be used to establish Mapping the megacities project baselines against which progress can Under the World Bank’s South Asia Megacities be gauged, mitigation measures determined Improvement Program, EO big data was used and high-priority issues identified. The initiative to analyze 20 years of urban expansion in focuses on a number of areas, including urban the metropolitan areas of Delhi, Mumbai and development and related fields such as disaster Dhaka. These data enabled measurement of risk management, the environment, water the qualitative and quantitative aspects of and energy. To facilitate greater collaboration urban transformation, such as the distribution towards these objectives between policymakers and density of urban sprawl, changes in urban and other development stakeholders, the land use and the growth rate of built-up areas. Bank also developed the Platform for Urban Using this information, analysts can see how Management and Analysis (PUMA)4. This tool informal settlements grow outside the cities’ allows users with no prior GIS experience to administrative boundaries, and can begin to access, analyze and share urban spatial data understand the drivers of land consumption. in an interactive and customizable way. 4. Available at http://puma.worldbank.org/ BIG DATA and Thriving CITIES 7 The analysis revealed important insights into city, district or sub-district, as well as other land cover and use in the three cities (see non-administrative units. Such datasets allow Figure 1), revealing the percentage of land flexible aggregation, for instance, showing the taken by residential build-up, industrial build-up, proportion of sprawl by district, its distribution agriculture, natural or semi-natural vegetation and density, class evolution within urban areas and forest. This helps city planners and and the drivers of urban change. (If housing development stakeholders understand existing comes before roads, for example, this indicates requirements and plan for future needs. informal and incremental city building. If roads In Delhi, for example, the maps show the urban come first, development is formal.) Combined sprawl is accelerated by industrial development. with environmental or socio-economic data, the This mainly took place between 2003-10, data can provide information concerning the although a significant increase in construction ratio of population growth to urban growth, and sites indicates that it will continue in the future can measure indicators such as compactness – and must therefore be planned for. (as a function of city density), the ratio of green space to citizens, and the proximity and Digitized spatial data allows analysis at accessibility of green areas. different administrative levels: metropolitan, Figure 1: Sample visualizations from the South Asia geospatial analysis 8 BIG DATA and Thriving CITIES In 2014, the World Bank used satellite imagery in urban areas. This suggests that the trend and demographic data to measure expansion in urban growth is likely to continue for many and population change between 2000-10 in East decades. Lower-middle-income countries such Asian urban areas of 100,000 people or more. as Indonesia, the Philippines and Vietnam had Analysts used change-detection methods that the fastest urban population growth, whereas draw on satellite data from the Landsat remote upper-middle-income countries such as China sensing project operated by the US Geological had the fastest spatial growth (hence most Survey and the National Aeronautics and Space urban areas outside China became denser, Administration (NASA). These maps rely on while the density of many Chinese urban areas a geophysical definition of built-up areas as declined). Most of this growth occurred in landscape units with more than 50 percent small and medium-sized cities rather than the coverage of non-vegetative, human-constructed megacities. Unsurprisingly, urban density was elements. These areas were combined with high, and increased through the period. the AsiaPop map, from the world mapping project5. The refined land cover datasets Findings from EO big data can help coordination were then combined with population density between public investment in infrastructure, information derived from census data, and used private investment in productive capital to disaggregate population counts to a grid of and household investment in housing. They 100 meters squared. This approach has allowed can allow policymakers to promote optimal the Bank to understand the entire region so as spatial and transportation links between jobs, to establish systematically where urbanization affordable housing and business units, health is occurring, to what degree and how quickly. and education services and recreational areas. This highlights the responses required from These insights can also be used to help support stakeholders, such as meeting needs for recent rural-to-urban migrants, bringing them services (water, sanitation or transport) to the attention of city authorities and ensuring and regulation. Analysts also quantified the that rapid urbanization is inclusive. As EO- relationship between urbanization, income based big data techniques spread across Africa growth and inequality. and other developing regions, and are refined and adapted, they will provide valuable tools The approach provided critical information on and insights to policymakers, and even greater the growth of urbanization in East Asia. For benefits for the citizens of the future. example, by 2010, the region had 869 urban areas with more than 100,000 people (600 of Creating livable cities these in China). Populations in these cities Further EO big data approaches are also had grown dramatically, lifting the region’s new helping drive sustainable urban development. urban population to approximately 200 million Research into using high-resolution satellite people. However, despite this growth, only 36 data for poverty mapping in Sri Lanka draws percent of the region’s total population lives on emerging techniques that can profile fast- 5. http://www.worldpop.org.uk BIG DATA and Thriving CITIES 9 changing urban areas in near-real-time. These a government program for unregistered land techniques can identify built-up area, building owners to legalize their property rights. The and car density, and types of roofing and data from the drone flights were processed road. Using open-source image-processing in 24 hours using two local high-end desktop algorithms, they can even calculate whether computers, resulting in orthophoto maps buildings are more rectangular or have more from which land owners could easily identify chaotic angles (indicating higher poverty) their property. The initiative can be scaled up and construct poverty indicators such as the globally, especially to secure land rights in low- percentage of paved roads in an area. This income countries. helps stakeholders target their interventions precisely where they are most needed. All these projects demonstrate that analysis of EO big data can be a significant tool for In Kosovo, after a period of chaotic urban managing urban development in low- and expansion, geo-spatial data collected by drones middle-income countries. It can measure, has been used to secure property rights quickly baseline and track the growth of urban areas, and cheaply – vital to household security and and highlight the drivers of economic growth. economic growth. The drones record imagery This allows policymakers and stakeholders which is processed into high-resolution to better understand the factors leading to orthophotographs (aerial photographs inefficiencies and inequality in urban areas, corrected to have the same lack of distortion as and to develop informed policies in response. a map) in a fraction of the time of conventional They can also build resilience into urban aerial surveys. The orthophotos are used to environments, so that residents, institutions, gather property boundary information from businesses and systems can adapt to chronic local residents, which can be used for formal stresses or acute shocks. And they can create property registration. In the fast-growing city livable cities that fulfill their residents’ needs. of Ferizaj, this approach was used to support An Earth observation project shows a successful approach to tracking changes in urbanization using geospatial data HIGHLIGHTS • New cloud-based computational platforms • Current and historic satellite data can be such as Google Earth Engine (GEE) enable ground-truthed by manually labeling areas as urbanization to be monitored using multi- “built-up” or “not built-up”. A machine-learning spectral imagery from multiple satellites over algorithm allows for the conversion of these extended time periods. data into highly-accurate classification of 10 BIG DATA and Thriving CITIES land for a particular area – in this case, cities which are costly to carry out, produced infre- in India. quently and subject to often-severe measure- ment problems. Reliable and up-to-date data • This allows urbanization to be measured with on urbanization – particularly from developing a high degree of geographic precision and countries – remain scarce. close to real time, which will transform public policy design to help tackle the economic and The coming revolution in geospatial data holds environmental challenges of urbanization. the potential to transform the way in which we study cities. As satellite imagery at ever- POTENTIAL USES FOR MONITORING improving spatial and temporal resolutions DATA FROM MULTIPLE SATELLITES becomes available, new approaches to machine OVER TIME learning are being developed that convert these • Accurate, near-real-time detection of urban images into meaningful information about the areas – even across large-scale regions with nature and pace of change in urban landscapes. diverse land cover – allowing urbanization to Research on land use is rapidly shifting be traced and understood towards remote-sensing methods designed to capture urban features as they are observed in • Urban planning based on unprecedented terrestrial Earth observation data. levels of information about past, current and predicted urban growth Despite the promise of satellite imagery for • Understanding of the ecological, urban analysis, current information on land use environmental, social and economic impacts is still subject to many drawbacks. Existing of urbanization, from land classification data satellite-based classifications of urban areas combined with other geographic data cover limited geographic extents and time periods, and frequently disagree in terms of the size and shape of particular cities. Progress is While urbanization in rapidly growing nations is further inhibited by lack of large datasets that helping lift hundreds of millions of people out give the “ground truth” regarding urbanization, of poverty, it is also creating immense societal which are essential for validating the micro- challenges. It is expanding greenhouse gas detailed maps of urban areas that remote- emissions, destabilizing fragile ecosystems, sensing methods produce. These deficiencies and creating new demands on education, mean that urbanization still cannot be tracked health, and transportation infrastructure. with a high degree of precision across space Despite the importance of understanding the and time. drivers of urban growth, it is still not possible to quantify the magnitude and pace of urban- Global-scale analysis of satellite imagery ization at a global scale. Standard empirical As powerful new cloud-based computational approaches use data from household surveys, platforms become available to the research BIG DATA and Thriving CITIES 11 community, it is becoming feasible to monitor represents an ideal context in which to illustrate urbanization using multi-spectral imagery from the applicability of new approaches for multiple satellites over an extended time-period. mapping urban expanse. The team leveraged One such platform is Google Earth Engine the computational power of GEE and its full (GEE). GEE leverages cloud-computational Landsat archive to introduce a practical and services for planetary-scale analysis and adaptable procedure for analysis of urban areas consists of petabytes of geospatial and tabular at a global scale. data. This includes a full archive of scenes from the US remote sensing project Landsat (as well The dataset consists of 21,030 polygons that as other satellites), together with a JavaScript, were manually labeled as “built-up” urban Python-based application programming areas or “not built-up” areas. To generate a interface (API), and algorithms for supervised machine-learning algorithm that allows for and unsupervised image classification. the conversion of these ground-truth data into a classification for the country as a whole, Recent World Bank research demonstrates the the team assessed alternative supervised applicability of GEE for studying urban areas classifiers (Random Forest, Classification at scale. It provides, for the first time, reliable and Regression Tree (CART) and Support and comprehensive open-source, ground- Vector Machine) and examined the effects of truth data for supervised image classification various inputs and class combinations on the that delineates urban areas – in this case, in performance of the classifiers. They proposed a India. As a large, geographically diverse nation methodology – “spatial k-fold cross-validation” undergoing a rapid urban transition, India – to evaluate the extent to which the classifiers Figure 2: Classification of built-up areas (visualized in red) compared to raw satellite images in three regions in India (Classifier: Random Forest with 10 trees; Input: Landsat 8). Satellite images from DigitalGlobe. Includes copyrighted material of DigitalGlobe, Inc. (Westminster, CO, Canada), All Rights Reserved. 12 BIG DATA and Thriving CITIES could be generalized in spatial terms, and their Landsat 7 as inputs to the classifiers – the performance in a large and geographically Normalized Difference Vegetation Index and heterogeneous context. the Normalized Difference Built-up Index – improves classifier performance so it Accurate mapping across space and time approaches that when Landsat 8 is used for The team found that their ground-truth dataset the input. This shows that the classifiers can can be used in GEE to produce high-quality be used to detect urban areas in historic data. maps of built-up areas in India, across space When the extent of urban areas in 2000 was and time. Their classification captures the mapped in this way, the team found a high fabric of built-up urban areas, as well as the fine overall classification accuracy of 86 percent. boundaries between cities and their peripheries. Towards precise, real-time urban In validating the classification results, the measurement team showed that when used with standard Urbanization is a fundamental driver of classifiers available in GEE, they achieve a high economic growth in the 21st century. Although overall accuracy rate of around 87 percent in it implies massive productivity gains for an identifying built-up areas for grid cells with a economy, it is also accompanied by congestion, dimension of 30 meters. Of the three types of pollution and heavy demands on public-sector classifiers examined, Random Forest achieved resources. Understanding the ecological, the best performance (a balanced accuracy environmental, social and economic impacts rate of 80 percent). However, the performance of these changes is essential for preserving a of the classifiers strongly improves as the size sustainable society. of the training dataset increases (especially with CART). With this methodology, which is As parallel computational platforms become designed to evaluate the spatial generalizability accessible to researchers, it is now possible of classifiers, the team showed that the to expand urban research across space and classifiers also perform well when the training time. In this study the team developed a large- examples are sampled from areas with scale geo-referenced dataset that was used heterogeneous land cover (such as a mix of to facilitate the detection of urban areas at dense vegetation and bare ground). a national level, and to provide a useful and reliable tool for temporal analysis of urban As inputs for the classifiers, the team used zones and their rural peripheries. The study Landsat 7 and Landsat 8, launched in 1999 highlights the potential of GEE for urban and 2013 respectively. Although Landsat 7 is research and illustrates the applicability of the of a lower spectral resolution than Landsat 8, it dataset for the detection of urban areas in a allows for a longer time horizon over which to country with a large population and a diverse study urbanization. The project demonstrated land cover. The methodology and the evaluation that the addition of two pixel-level indices to procedure are suitable for studies that analyze BIG DATA and Thriving CITIES 13 large-scale regions, and can easily be applied to in close to real time. This has the potential to other countries and contexts. transform how public policy is designed, helping planners to achieve successful urban growth However, the team noted several limitations for and address many of the most persistent their approach and made recommendations for challenges of economic development. future studies. First, the dataset was labeled according to 2014-15 imagery using a visual- Big Pixel, Case Study of India: University of California San Diego. Big Pixel Research Team: Ran Goldblatt with interpretation method, which, by its nature, Wei You, Gordon Hanson and Amit K. Khandelwal may be subject to idiosyncratic variation The study was published in Remote Sensing: across individuals performing the manual Goldblatt, R.; You, W.; Hanson, G.; Khandelwal, A.K. Detecting the Boundaries of Urban Areas in India: A classification. It is necessary to assure that Dataset for Pixel-Based Image Classification in Google across the dataset each example is labelled Earth Engine. Remote Sens. 2016, 8, 634. by multiple people and to account for the Google Earth Engine: https://earthengine.google.com/ agreement between them. Second, the analysis is limited to India. Creating manually labeled ground-truth data is expensive and time consuming. However, crowd-sourcing platforms may allow researchers to scale – at low cost – the labeling method and to construct larger and more comprehensive ground-truth datasets. Third, the sampling method used in this study was designed to detect the boundaries between built-up areas and their periphery. The majority of the labeled examples were sampled from highly populated areas and from their adjacent, low-population environs. This approach may create a risk of false-positive detections when classifying distant or remote areas. It is therefore suggested that future projects taking this approach include in the training set examples from remote areas that are less populated. New studies that take into account these limitations and successfully exploit emerging approaches such as crowd-sourced information will enable the measurement of urbanization with a high degree of geographic precision and 14 BIG DATA and Thriving CITIES C ASE ST UDY Big Data to Beat Congestion Systems & Governance Green Cities Competitive Cities NEAR-TERM HIGHLIGHTS BEYOND TRAFFIC: POTENTIAL USES FOR DATA COLLECTION FROM CAMERAS AND • Growing cities are being challenged by the SENSORS burden of large populations they were not built to accommodate. In response, they • Track and manage city service delivery, such need to innovate to address problems of as waiting times in government offices, water congestion and infrastructure improvement. and sewage systems or garbage collection • Big data analytics can help cities better • Maintenance of law and order: Tracking crowd understand and manage traffic and formation and criminal activity infrastructure needs, even in circumstances • Disaster management and response where traditional data-collection methods cannot be applied. • Successful approaches include congestion Practice Areas: reduction models based on low-resolution Systems and Governance, Green Cities, traffic cameras; using cellphone data to Competitive Cities understand people’s intra-city travel needs, Countries Involved: and analyzing GPS data from smartphones to Kenya, Vietnam, Tanzania, Haiti, the Philippines assess traffic flow. Data Types: Low Resolution Cameras, Traffic Sensors BIG DATA and Thriving CITIES 15 The United Nations estimates that the global urban population will grow by 2.5 billion by 2050, either through migration or the urbanization of rural areas. Ninety percent of this increase will occur in Asia and Africa, bringing the total global population living in urban areas to 66 percent6. Such population growth adds to the already significant challenges policymakers face to provide critical infrastructure and key services to urban populations. This is especially so in low- and middle-income countries, where urban development is frequently informal and incremental, and where cities struggle to keep up with infrastructure needs. High among the challenges city officials insights that pave the way for effective real-time face is congestion. Traffic congestion has mitigation measures. negative impacts on economic growth and can exacerbate urban air pollution and greenhouse High-quality insights from poor-quality gas emissions. It hampers cities’ resilience and data competitiveness. In Nairobi, Kenya, for example, Big data is already being used to tackle road congestion is estimated to cost around problems of urban mobility and transportation US$600,000 per day in lost productivity7, and in many lower-income countries and rapidly a considerable amount more if the impact of growing cities. In Kenya, Twende Twende wasted physical resources and emissions are (Swahili for “let’s go”) is a platform developed taken into account. Nairobi’s infrastructure by IBM’s research center in Nairobi which is decades old, planned for a city of around uses predictive big data analytics to address 350,000 inhabitants instead of the more than congestion. At its core, Twende Twende 3.4 million people who live there today. takes images captured by existing low-cost Improvements in public transport, roads and cameras and applies network-flow algorithms other critical infrastructure needed to support to estimate traffic flow. The model overcomes mobility in this expanded population require challenges often associated with big data in expensive investment. This needs to be low- and middle-income countries, such as a carefully planned by drawing on diverse data lack of data-collecting infrastructure, to deliver sources. New construction and infrastructure a cheap yet significant solution for congestion development is typically a long process management. spanning several years, often involving the shutdown of existing transport facilities The researchers parsed openly-available on which urban populations are critically camera feeds from low-resolution traffic dependent. But emerging techniques in big data cameras, put in place by Access Kenya (a local analytics can track urban congestion and offer internet provider), in collaboration with city 6. United Nations Report: World Urbanization Prospects (http://www.un.org/en/development/desa/news/population/world-urbanization-prospects-2014.html) 7. Source: http://asmarterplanet.com/blog/2013/10/27837.html 16 BIG DATA and Thriving CITIES authorities8. They then overcame two major Twende Twende is now being used to technical challenges that had prevented such a provide congestion warnings and route system from being developed previously: Firstly, recommendations that users can access via the camera images were not of the quality which SMS and mobile app-based map interfaces could be used by traditional computer vision (through two of the country’s largest mobile algorithms to discern and count the number telecom providers – Airtel and Safaricom). of individual cars. Secondly, the footage only This opens up a completely new avenue of covered about five percent of Nairobi’s roads crowdsourced data collection that would (as of 20139). augment IBM’s existing sources. IBM overcame these challenges by extracting The data from the camera streams and unique data features and observations that had inference of traffic patterns are also very not been used in this context before. Instead of useful for urban development and city officials, attempting to count the number of cars from and other stakeholders wanting to improve the camera feeds and images, the researchers transportation infrastructure. They can help flipped the problem and tried to see how much policymakers prioritize development projects of the road and background they could pick up by objectively identifying critical infrastructure with the presence of cars and other objects most in need of overhaul. They can also help in on it. This way, they were able to get a general the effective planning of long-term projects, so sense of the percentage to which a road was that burdens to the population are minimized. congested – a data feature good enough to build a real-time model of traffic flow throughout The power of open data the city. Central to Twende Twende’s success has been the open availability of data from To overcome the low percentage of roads Access Kenya’s traffic cameras, and a covered by camera, the team took data from willingness to change the way that issues the five percent that were covered and used it are typically addressed. Open data initiatives to predict conditions on the “blind” 95 percent. are promoting high-value data projects They integrated data from their algorithm with within and beyond government, providing empirical data from directly observed traffic businesses, academics, policy centers and in Nairobi’s business districts, and built a other interested stakeholders with access to predictive model of how traffic at key junctions data that goes beyond revealing insights, but would affect traffic on all the other connected can also be leveraged to change existing power roads. Estimates of traffic congestion from structures and decision-making processes. their predictive algorithm stood up well when compared to empirically collected traffic data. Of great significance to lower-income countries is that Twende Twende overcomes the poor 8. http://traffic.accesskenya.com/ 9. Source: http://www.forbes.com/sites/ehrlichfu/2015/03/03/fixing-traffic-congestion-in-kenya-twende-twende/ BIG DATA and Thriving CITIES 17 quality of data recorded by the low-resolution also the country’s fastest growing metropolitan cameras by using smart algorithms and sprawl. Since 2013, as part of IBM’s Smart analytics techniques. Beyond traffic and Cities Challenge, Da Nang’s traffic control congestion management, the platform’s center has had tools in place to predict and technology can be repurposed to create data prevent congestion on the city’s roads, and streams valuable for numerous aspects of urban to better coordinate responses to situations management – from monitoring, via closed- caused by adverse weather or accidents. Data circuit TV footage, waiting times at offices is aggregated from multiple streams, such that provide city services, to tracking garbage as sensors embedded in roads and public collection and sewage processing systems transportation units. City officials analyze this using sensors. Analytics tools are demonstrably to detect anomalies and control the flow of effective in ensuring that numerous city traffic. The system also provides the transport facilities run smoothly. They provide efficient, department with real-time information for low-cost solutions, boosting resilience and its fleet of buses, allowing it to view details competitiveness in urban centers to help them such as the location, speed and predicted keep up with growing populations. This potential journey times for each vehicle, and to respond means that development stakeholders must to changing demand by adjusting services. create the opportunities and infrastructure for Traffic data, along with sensors that monitor data collection (even if minimal) and incentivize water levels on the flood-prone Han River, the provision of open access to it. are facilitating early warning systems for disruptions to the city’s transportation network, Next-generation congestion management while also regulating activities at the port. Twende Twende demonstrates how Although Da Nang’s traffic control system is effectively big data can be gathered and in its nascent stages, the €37 million11 project used even within the constrained resources has already demonstrated the power of big of growing cities. IBM’s researchers are data as a cost-effective approach to managing currently working to extend its capabilities scarce infrastructure resources wisely. to include data on accidents, weather conditions and roadworks to create a Further innovative systems to manage comprehensive view of human mobility10. congestion are emerging, with cellphone data and open-source software becoming More traditional avenues of big data collection increasingly important. In Dar es Salaam, are also being piloted in many developing Tanzania, World Bank economists Nancy Lozano countries and are proving useful in improving Gracia and Talip Kilic are combining analysis of resilience and managing congestion. Da Nang big data from sensor-embedded smartphones is Vietnam’s biggest seaport and fourth-largest with face-to-face and phone interviews, to city, with close to one million inhabitants. It is capture accurately and affordably the route, 10. http://www.research.ibm.com/articles/africa.shtml 11.Source: World Bank Background Document: BIG DATA AND URBAN MOBILITY (June 2014) (http://www.worldbank.org/content/dam/Worldbank/ Feature%20Story/mena/Egypt/Egypt-Doc/Big-Data-and-Urban-Mobility-v2.pdf) 18 BIG DATA and Thriving CITIES purpose, travel mode and cost of every trip more The team aims to combine project data than 500 individuals make within the city over with cellphone call records to assess the a period of time. Reliable information about feasibility of using lower-cost phone data for how individuals move around cities, and the informing transport planning. Subsequent constraints they face, enables policymakers analysis will also examine how the travel to make informed, coordinated decisions on patterns recorded compare to those based on transport investments and land-use. Traditional traditional data sources, such as the Living methods for understanding individuals’ travel Standards survey. The project’s methodology patterns involve collecting travel diaries kept could be applied to any city. With accurate by respondents and supervised for completion understanding of people’s travel needs by field staff. These methods are resource- and the challenges they face when moving intensive, subject to recall error and demanding around, city officials can make well-informed, on respondents. In response, the team sought coordinated decisions to optimize ease of to create a dataset highly informative about movement and minimize congestion. individual’s travel patterns as a function of their socioeconomic background, the purpose of their Cellphones are also at the center of a big data travels and the associated costs. pilot in Haiti, which aims to carry out innovative analysis of call data records to provide valuable After developing sensors and software that inputs for strengthening urban transportation were installed in GPS-enabled smartphones, and land use planning. These inputs can help the team selected a random sub-sample of planners unlock the economic potential of respondents to the World Bank’s 2013-14 Haiti’s cities and expand job opportunities. From Measuring Living Standards Survey in Dar the call data records, the pilot will identify key es Salaam. Respondents had already taken intra-city connectivity challenges, producing part in a face-to-face interview covering their an employment accessibility analysis for the socioeconomic background and travel patterns. country’s two biggest conurbations, Port-au- Each was supplied with a smartphone able Prince and Cap-Haitien. Scenario analyses will to collect and transmit the time and GPS be conducted to simulate potential interruptions location of individual movements at one-minute in mobility between different parts of the cities, intervals for a one-month period. To encourage including from hazard and disaster events (such continued participation, respondents were told as flooding), and their impact on access to jobs. they could keep the phones after the study. This will help identify key corridors that require Journey records from their phones were then resilient strategies to ensure people can reach validated via follow-up phone interviews every employment in the event of a disaster. three days, covering the origin, destination, The project will also include analysis of route, purpose and cost of each trip. Initial migration flows to and from urban areas to data analysis is now underway, focusing gauge access to opportunities across the on understanding the key determinants for country. Based on this analysis, the team how people choose modes of transport. will identify major bottlenecks and possible BIG DATA and Thriving CITIES 19 interventions to improve city infrastructure, derived from taxi drivers’ smartphones. Using land use and coordination mechanisms, so GPS data from an on-demand taxi service, as to promote inclusive employment. Access Open Traffic successfully analyzed peak-hour to jobs can be enhanced either by increasing congestion, travel time reliability and corridor travel speeds (through more or better roads vulnerability across 10 South-east Asian cities, and public transport) or by reducing distance and has prepared travel time analyses for to jobs. Based on the accessibility analysis, select origin-destination pairs. This analysis scenarios can be produced to test the potential would not previously have been possible for improvements in transportation and land use without substantial time and resources. It to enhance job access and social inclusion. will enable traffic management agencies to make better, evidence-based decisions In the Philippines, the Open Traffic program about signal timings, public transport, road leverages open-source software and innovative infrastructure, emergency traffic management partnerships to substantially reduce the cost of and travel demand management. traditional traffic data collection and analysis, These approaches all show that the while simultaneously improving the quality. next generation of congestion and travel The project team worked with the Cebu City management solutions will leapfrog the capital- Government to develop the first scalable, intensive approaches of the past, enabling urban open-source platform of its kind for collecting, authorities and other stakeholders to make visualizing and analyzing traffic speed data affordable, evidence-based planning decisions. Figure 4: Low-resolution traffic cameras show images of cars as unclear (left) or overlapping (right). Twende Twende’s innovative approach avoids these problems by analyzing the images for roadspace, rather than for vehicles. Source: IBM Research/Forbes Magazine 20 BIG DATA and Thriving CITIES C ASE ST UDY People Power: Crowdsourcing Data Resilient Cities Inclusive Systems & Cities Governance to Track Crime DEVELOPMENT (3-5 years) HIGHLIGHTS CROWDSOURCED DATA: POLICY IMPLICATIONS • Reducing crime and violence is a priority for public policy stakeholders looking to support • Dynamic feedback to evaluate the effectiveness inclusive, resilient and competitive urban of measures and to allow programs to be refined environments. • Increased transparency and government • The proliferation of technology is opening accountability new avenues for data collection through • Efficient information sharing for targeted crowdsourcing, for tracking, analyzing and management in crisis situations responding effectively to criminal activity in • Predictive insights into future amenity and urban areas. Crowdsourced data can also be infrastructure needs of urban populations, leveraged to predict and proactively prevent through crowdsourcing combined with other crime. data streams • Open-source crowd-mapping platforms can be used to monitor a wide range of factors, Practice Areas: including election fairness, corruption, Resilient Cities, Inclusive Cities, Systems and natural disasters, and the supply of utilities Governance and urban services. Countries Involved: Colombia, Kenya, Syria Data Types: Crowdsourcing, digital listening, social media BIG DATA and Thriving CITIES 21 Crime and violence are now a key development issue in many lower-income countries worldwide. Beyond the trauma and suffering of individual victims, they carry staggering economic costs at both local and national levels. Accounting for expenditures on citizen security, law enforcement and health care, experts estimate that crime costs close to 8 percent of Gross Domestic Product (GDP) in some Central American countries. Crime and violence also undermine economic growth, not just through the loss of victims’ wages and labor, but by polluting the investment climate and diverting scarce government resources from supporting economic activity to strengthening law enforcement. Some estimates suggest that a 10 percent reduction in violence in Central American countries with the highest murder rates could boost annual economic growth per capita by as much as one percent.12 Reducing crime and violence is a priority for a completely new avenue of data collection for stakeholders looking to support inclusive, tracking criminal activity: Crowdsourcing. resilient and competitive urban environments. One of the largest problems with tackling Mapping crime through crowdsourcing violent criminal activity in low- and middle- In Colombia’s capital, Pilas Bogota (which income countries is the absence of conventional loosely translated means “get sharp, Bogota”) frameworks for the lateral reporting of crime. is the first crime map of its kind that draws This results in an environment in which on crowdsourced information from victims government authorities and policymakers and witnesses of crime. The platform was are unable to identify and focus targeted law developed in 2011 by a “Hacks Hacker” chapter enforcement efforts on problematic areas, while in Colombia (a collective of technologists crime victims and the general population lack and app developers), in collaboration with trust in the criminal justice system. the International Center for Journalists. It collects the time, date and location at which Traditionally, big data has not been applied to incidents occur from victims, witnesses the tracking and management of crime among and other citizens, and plots the collected lower-income cities owing to the lack of data information onto an interactive dynamic map. collection and analysis mechanisms. Poor This is actively administered and monitored law enforcement infrastructures reflect crime by journalists at El Tiempo, a local news statistics that are rarely collected and organized, media outlet. They trawl the information for or largely unreliable or incomplete. However, tips and trends to produce more thorough emerging technology has the potential to open and in-depth reporting about crime in the 12. Source: World Bank Report - Crime and Violence in Central America: A Development Challenge (2011) (http://siteresources.worldbank.org/INTLAC/Resources/FINAL_VOLUME_I_ENGLISH_CrimeAndViolence.pdf) 22 BIG DATA and Thriving CITIES city. Citizens can also report incidents by monitoring. Two additional crowdsourcing sharing posts and images or video using social platforms have been launched: Mi Bogota Verde, media, with geo-located tags and hashtags. to map Bogota’s ongoing problems with garbage These are picked up by Pilas Bogota’s digital accumulation, and Monitor de Corrupción, to listening components or by mobile SMS – monitor corruption. through which people can respond to short surveys about events close to them. Combining datasets for added strength Crowdsourcing gives city authorities the Pilas Bogota rests on the open-source Ushahidi advantage of being able to report and analyze Crowdmap platform, initially developed to map data in real time. This enables citizens, reports of post-election violence in Kenya in civil society, law enforcement agencies and 2008, using information submitted via the web humanitarian response agencies to respond and mobile phones. Since its initial deployment, quickly. During the Kenyan general election of the platform has been repurposed for data 2013, a Ushahidi-based tool named Uchaguzi collection and monitoring many crisis events, (Swahili for “election”) was deployed. The such as water shortages, earthquakes and process involved partnerships with civil society natural disasters13, as well as routine monitoring organizations and citizen groups, and used the of high-crime areas or during elections. same techniques as Pilas Bogota: Web and online reporting, coupled with social media and Pilas Bogota has largely been used so far for mobile SMS input. The platform crowdsourced collecting citizen reports of crime for journalistic data around key events and possible criminal purposes. However, the crowdsourcing activity during the election, followed by approach to visualizing incidents in real-time is verification and appropriate escalation. also useful for policymakers and city authorities, This facilitated real-time intervention by the who have in the past relied solely on lagging appropriate authorities. police statistics and official reporting for the tracking and management of criminal activity. Based on the success of this deployment, the Even in cases where this information is reliable Ushahidi platform launched “The Resilience and complete, it can only provide a partial Network Initiative”. This provides city picture of the ongoing crime situation on the governments with online tools and the capacity ground, as only a fraction of crime that occurs to connect more closely with their citizens, is reported or acted on by law enforcement collecting and sharing data and information, agencies. As a result, non-profit and citizen-run and allowing various stakeholders to input for organizations are embracing crowdsourced data improved municipal decision-making. not only for reporting crime, but also to augment the management of other city operations, As with other types of data, crowdsourcing including transport, services and corruption data becomes significantly more valuable from 13. http://www.ushahidi.com/mission/ BIG DATA and Thriving CITIES 23 a visualization and analysis perspective when Crowdsourcing for predictive analytics combined with other datasets. The same open- Crowdsourced data collected over longer source Ushahidi-based platform was deployed periods of time also contains significant in Syria in 2012 to compliment an open-source potential for the measurement and evaluation web and social media tracking platform that of steps that city authorities and policymakers mines thousands of online sources for evidence are taking towards the mitigation of crime of human rights violations, killings, torture and and violence. It is often difficult to ascertain detainment. The crowdsourcing tool, called which policies work best to ensure safe urban Syria Tracker, coupled the open-source data environments, especially if official reporting and with crowd-sourced human intelligence, such police statistics are lagged and are the only data as field-based eye-witness reports shared via sources available. Policymakers can benefit webform, email, Twitter, Facebook, YouTube from the dynamic feedback that crowdsourced and voicemail14. data streams offer in order to refine programs and enforcement efforts as they take place. This Using this approach, the Syria Tracker team kind of dynamic interaction provides predictive and its relatively small group of volunteers have insights, helping to shape operations and been able to verify almost 90 percent of the optimize use of government resources. documented killings mapped on the platform, thanks to video or photographic evidence. They As more and more people use smartphones and have also been able to name around 88 percent social media, urban law enforcement officials of those reported killed by Syrian forces since in high-income countries are increasingly using the uprising began. Depending on the levels of crowdsourced data for gathering information violence, the turnaround time for a report to be and evidence in crisis situations, such as mapped on Syria Tracker is 1-3 days. The team terrorist attacks, civil unrest or natural disasters. produces weekly situation reports based on the A wealth of geo-coded and time-stamped data data collected, along with detailed graphical is emerging for authorities to use in their efforts analysis. Files providing a more precisely geo- to piece together ongoing events and prepare tagged tally of deaths per location are made appropriate responses. Some law enforcement available regularly and can be uploaded and departments have started using crowdsourcing viewed using Google Earth. technology to harness the public in trying to find missing property and capture petty criminals in This approach could easily be applied to other their communities. contexts and issues, such as petty crime in a city, women commuting through unsafe urban When coupled with other disparate datasets, areas, environmental hazards, corruption and crowdsourced data can go beyond simple challenges to urban infrastructure facing quickly descriptive analysis, offering the predictive growing populations. and prescriptive analysis needed to proactively 14. http://irevolution.net/2012/03/25/crisis-mapping-syria/, Retrieved 6/29/2015 24 BIG DATA and Thriving CITIES deter urban crime. Numerous studies show All these approaches highlight the importance that if treated like a contagion, crime could be of publicly available, open-source mapping predicted from historical events15. In California, platforms. These can harness the knowledge the Santa Cruz Police Department is using and experiences of ordinary people in the quest advanced analytical techniques to predict to develop safe cities. property crimes such as home burglaries and car thefts, and deploying officers to suspected Current Projects on the Ushahidi Platform: locations in advance. Based on models for Pilas Bogota: Citizen and Journalist crime predicting aftershocks from earthquakes, their tracking utility approach analyzes and detects patterns in past Monitor de Corrupción: Crowdsourced corruption crime data, to generate projections about which monitoring areas and times are at highest risk of future Uchaguzi: Election monitoring mechanism crimes. The projections are recalibrated daily, as Syria Tracker: Combined crowdsourced new crimes occur and updated data is fed into information with digital listening for crisis the program16. monitoring Figure 5: An interactive page on Pilas Bogota, with news reports and collection prompt for crowdsourced information (bottom right) 15. http://www.nber.org/papers/w12409 16. http://www.nytimes.com/2011/08/16/us/16police.html?_r=0 BIG DATA and Thriving CITIES 25 C O N C LUSI ON: B IG DATA FOR OP T IMAL CI TIES OF THE FUTURE DEVELOPMENT (up to 10 years) Informed, considered and coordinated urban planning is needed to build sustainable, resilient, equitable and livable cities. To plan effectively, stakeholders need detailed descriptive and predictive information on both macro and micro levels – from large- scale trends such as the nature and pace of urbanization, to the needs of individual communities and citizens. The cities of the future must raise real incomes of the poorest people, be resilient to shocks and stresses, protect environmental resources and actively improve the lives of their residents. Big data can help bring about step changes in all help cities become better places. On the horizon these areas, from national to local government are many more big data approaches, which levels. Initiatives like those profiled in this brief look set to help streamline urban planning and are just the beginning – both in terms of their management in the coming decade. Some of the own lifecycle and impact, and in terms of the most exciting include: variety of ways in which big data analytics will 26 BIG DATA and Thriving CITIES Collecting taxes through geo-spatial data Mining mobile money transactions Stakeholders: Local authorities, policymakers for insight Data Types: Geo-spatial information (satellite), Stakeholders: Individuals, businesses, drones policymakers Data Types: Mobile The availability of Geo-spatial Information Systems (GIS) such as high-resolution Cellphones have become ubiquitous and are satellite imagery and drones has resulted in used increasingly by people in the world’s an increased use of these data sources in poorest countries. In 2012, there were 5.9 billion urban planning and development. So far, this active mobile connections globally, forecast has mainly been at national or macro policy to increase to 7.6 billion in 2017. Much of this level. However, integration of insights derived increase will come from cellphone penetration from GIS data into local urban planning and in Africa and Asia. In Africa, services like mPesa, management offers great potential. A recent offering mobile wallet and digital transactions, feasibility study into integrating GIS from the are increasingly important, with many sections bottom up into Ghana’s Land Administration of the urban population choosing them for Project evaluated local administration everyday financial transactions. Figures from technological and organizational preparedness Kenya’s Central Bank showed that for the year for GIS use. It also laid out a policy roadmap ending November 2014, the value of mobile for GIS adoption from national databases money transactions rose 26 percent, to 1.94 and systems to the local administrative trillion Kenyan shillings (KES), while card levels, for a pilot project to improve property transactions fell 18 percent to KES1.1 trillion18. tax collection16. On a wider scale, the World As mobile money transactions are possible Bank has funded a local government revenue between individuals, they are fast replacing cash collection system using a GIS platform in 16 as the preferred medium of exchange between municipalities in Tanzania, under its Urban Local people. Today, many development practitioners Government Strengthening Program17. This are either hesitant to use, or prohibited from management tool supports local government using, cellphone transaction data, owing to the tax reporting, revenue collection, operations lack of policy frameworks and privacy protection and maintenance, urban planning, licensing mechanisms. As the value of using this data and land management systems. It means becomes more apparent (revealing pockets of analysis of big data from GIS systems can poverty through low transaction amounts, for deliver valuable insights beyond the central example), such restrictions will be removed government level, direct to city authorities. and adequate data-sharing frameworks will be put in place. These will protect individuals’ privacy, while enabling the wealth of big data 16. Study carried out by researchers working for the annual paper 19. Source: Kenyan Central Bank payment systems data (http://www. competition for advanced graduate studies on issues relating to urban thepaypers.com/mobile-payments/kenya-mobile-money-hits-usd-21-bln- poverty, co-sponsored by USAID, the Wilson Center, the World Bank, as-transactions-surge/758093-16)25 the International Housing Coalition and the Cities Alliance. 17. Source: http://www.worldbank.org/projects/P118152/tanzania-sec- ond-local-government-support-project?lang=en BIG DATA and Thriving CITIES 27 generated by cellphones and mobile money to factors. The methods are widely applicable in be harnessed for development. other locations and for other crimes. Understanding how infrastructure A rich learning process affects crime The potential for big data is huge, but it is not a Stakeholders: Local authorities, individuals, magic bullet. Innovative paths inevitably involve businesses, policymakers hurdles, reveal useful lessons and require Data Types: Geo-spatial perseverance. Big data demands that users capture, prepare and store data meticulously – Latin America is highly urbanized, with above- and plan enough time to do so. For the Twende average crime rates. Its cities are typically Twende congestion management system in unplanned, with high socio-economic inequality, Kenya, a critical first step was the availability yet the association between crime and of data. To enable big data analytics, it is infrastructure has not been clearly defined essential that city authorities, policymakers, or quantified. Colombia’s capital, Bogota, non-governmental organizations and other collects considerable geo-coded data on urban stakeholders promote the public availability of infrastructure and has reliable geo-coded useful data from a wide range of sources (with information on population and crime. Drawing policies to support its use). on this rich data, a World Bank project quantified the occurrence of crimes in relation to specific Successful big data solutions often involve characteristics in the built environment, through approaching existing situations from new a technique called risk terrain modeling. This angles or combining previously unrelated uses an algorithm that identifies relationships data sources – such as using EO data, between different layers of data and correlates captured for other purposes, to assess urban them with crime, using models which are characteristics and growth. Despite the central then linked to places on a digitized map. The role of computational power, the human approach combines separate layers of map (one element also remains vital to the success per risk) to produce maps showing the intensity of these projects. Approaches such as the of all risk factors at every location throughout a Ushahidi crowdsourcing platform are reliant on landscape – the ‘risk terrain’. This allowed the human input. Big data analysis can often be team to identify locations near bus stations, enhanced by traditional research techniques, public hospitals, schools and drugstores as such as socio-economic surveys. There is also being associated with assault and homicide. need to invest in partnerships, or to combine The modeling also revealed peak times of day human and computational power for optimum for crime, and predicted areas of the city more results. Big data analytics is a team sport: likely to experience future crime. Combined with Effective collaboration between data experts, local stakeholder interpretations, risk terrain technologists and business sector specialists mapping can reliably suggest action to reduce is crucial. crime associated with particular environmental 28 BIG DATA and Thriving CITIES All these novel data approaches must be tested, Worldbank/Feature%20Story/mena/Egypt/Egypt-Doc/Big- validated and adapted for mainstream use – Data-and-Urban-Mobility-v2.pdf but the potential rewards of big data make People Power: Crowdsourcing Data to Track Crime such effort worthwhile. As the case studies in Kling, Jeffrey R. J. L. (2005). Is Crime Contagious? CEPS Working Paper No. 117. this brief show, big data analytics can improve Knight International Journalism Fellowships. (2012). development effectiveness in cities and help Colombia: Use Crowd Sourcing Technology to Track Crime initiatives within and beyond the World Bank and Corruption. Retrieved from International Center for Journalists (ICFJ): Our Work: http://www.icfj.org/our-work/ achieve results through improved evidence, colombia-use-crowd-sourcing-technology-track-crime- corruption efficiency, awareness, understanding and Lovler, R. (2012, Oct 17). Colombian Media Rely on Mapping forecasting. Ultimately, big data initiatives to Track Crime, Corruption, Environmental Issues. Retrieved from International Center for Journalists (ICFJ) - Blogs: can be a powerful accelerator for ending http://www.icfj.org/blogs/colombian-media-rely-mapping- poverty and boosting shared prosperity. track-crime-corruption-environmental-issues Lovler, R. (2012, March 26). Mapping Crime and Corruption in Colombia: Knowledge is Power, Thanks to New Digital Technology. Retrieved from International Center for Journalists: Blogs: http://www.icfj.org/blogs/mapping- crime-and-corruption-colombia-knowledge-power-thanks- REFEREN CE S new-digital-technology Meier, P. (2012, March 25). Crisis Mapping Syria: Automated Using Geospatial Data to Track Changes in Urbanization Data Mining and Crowdsourced Human Intelligence. World Bank group. (2013). Building African Cities that Retrieved from iRevolutions: From Innovations to Work: A study on the Spatial Development of African Cities Revolutions: http://irevolution.net/2012/03/25/crisis- (Concept Note). World Bank Group. mapping-syria/ World Bank Group. (2015). East Asia’s Changing Urban Meier, P. (2012). Crisis mapping in action: How open source Landscape: Measuring a Decade of Spatial Growth. World software and global volunteer networks are changing the Bank Group. world, one map at a time. Journal of Map & Geography Big Data to Beat Congestion Libraries, 8(2), 89-100. Bills, T., Bryant, R., & Bryant, A. W. (2014, October). Towards Okolloh, O. (2009). Ushahidi, or ‘testimony’: Web 2.0 tools for a frugal framework for monitoring road quality. In Intelligent crowdsourcing crisis information. Participatory learning and Transportation Systems (ITSC), 2014 IEEE 17th International action, 59(1), 65-70. Conference on (pp. 3022-3027). IEEE. VanCalcar, J. E. (2006). Collection and representation of GIS Ehrlich, T., & Fu, E. (2015, March 3). Fixing Traffic Congestion data to aid household water treatment and safe storage In Kenya: Twende Twende. Retrieved from Forbes: http:// technology implementation in the Northern Region of www.forbes.com/sites/ehrlichfu/2015/03/03/fixing-traffic- Ghana (Doctoral dissertation, Massachusetts Institute of congestion-in-kenya-twende-twende/ Technology). IBM Research. (n.d.). IBM Resarch - Africa: Developing ZULUAGA, Á. B. (2012, Oct 16). Sea un reportero de la solutions in Africa, for Africa and the world. Retrieved from seguridad. Retrieved from El Tiempo: Archivo: http://www. IBM Research Articles. eltiempo.com/archivo/documento/CMS-12306723 Kinai, A., Bryant, R. E., Walcott-Bryant, A., Mibuari, E., Conclusion: Big Data for Optimal Cities of the Future Weldemariam, K., & Stewart, O. (2014, June). Twende- Cartesian; Bill & Melinda Gates Foundation. (2014). Using twende: a mobile application for traffic congestion Mobile Data for Development. Cartesian. Retrieved from awareness and routing. In Proceedings of the 1st http://www.cartesian.com/wp_content/upload/Using-Mobile- International Conference on Mobile Software Engineering Data-for-Development.pdf and Systems (pp. 93-98). ACM. Diko, S. K., & Akrof, A. A. (2011). Optimizing Property Rate Larsen, L. (2013, Nov 5). IBM Nairobi Lab’s First Offering is a Returns for Urban Development in Ghana, Using Geographic Traffic-Dodging Mobile App. Retrieved from IEEE Spectrum: Information Systems. Innovation in Urban Development: http://spectrum.ieee.org/tech-talk/computing/software/ Incremental Housing, Big Data and Gender (Wilson Center ibm-nairobi-labs-first-offering-is-a-trafficdodging-mobile-app and USAID Report), 146-165. Witchalls, C. (2013). Text messages tell drivers when there’s World Bank Group. (2014, May 30). Projects & Operations. a jam ahead. New Scientist, 220(2944), 23. Retrieved July 28, 2015, from http://www.worldbank.org/ World Bank Group. (2014). Big Data and Urban Mobility. projects/P118152/tanzania-second-local-government- Retrieved from http://www.worldbank.org/content/dam/ support-project?lang=en BIG DATA and Thriving CITIES 29 AC KN OWLED GE MENTS This solutions brief is one of several sector focused big data knowledge products delivered through the “Innovations in Big Data Analytics” program which resides in the Global Operations and Knowledge Management Unit (GOKMU). It has been prepared jointly in cooperation with the Social, Urban, Rural and Resilience Global Practice and Booz Allen Hamilton. Trevor Monroe (GOKMO), Andrew Whitby (DECDG), and Luda Bujoreanu (GTIDR) coordinated the publication. A number of WBG staff provided valuable contributions to make this publication possible, including peer reviewers Ellen Hamilton, Nancy Lozano Gracia, Katherine Kelm, and Kai Kaiser. The Innovations in Big Data Analytics program works to accelerate the effective use of big data analytics across the World Bank, and to position the World Bank as a leader in the big data for development community. For additional information about this brief or to find out more about the program, please visit http://bigdata (WBG intranet) or contact Adarsh Desai (adesai@worldbank.org) or Trevor Monroe (tmonroe@worldbank.org). 30 BIG DATA and Thriving CITIES