GLOBAL PROGRAM RESILIENT HOUSING GUIDANCE NOTE Capturing Housing Data in Small Island Developing States JUNE 2022 1 PHILIPSBURG, ST. MAARTEN. Sean Pavone, iStock. ©2022 International Bank for Reconstruction and Development RIGHTS & PERMISSIONS / International Development Association or The World Bank. The World Bank supports the free online communication 1818 H Street NW, Washington DC, 20433 www.worldbank.org and exchange of knowledge as the most effective way of ensuring that the fruits of research, economic and sector This work is a product of the staff of The World Bank with work, and development practice are made widely available, external contributions from independent consultants. The read, and built upon. It is therefore committed to open access, funding that supported this work was provided by the Global which, for authors, enables the widest possible dissemination Facility for Disaster Reduction and Recovery (GFDRR) at the of their findings and, for researchers, readers, and users, World Bank. The findings, interpretations, and conclusions increases their ability to discover pertinent information. expressed in this work do not necessarily reflect the views The material in this work is made available under a Creative of The World Bank, its Board of Executive Directors, or the Commons BY 4.0 License. You are encouraged to share and governments they represent. The World Bank does not adapt this content for any purpose, including commercial guarantee the accuracy of the data included in this work. The use, as long as full attribution to this work is given. boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on Any queries on rights and licenses, including subsidiary the part of The World Bank concerning the legal status of any rights, should be addressed to the Office of the Publisher, territory or the endorsement or acceptance of such boundaries. The World Bank, 1818 H Street NW, Washington, DC 20433, USA; fax: 202522-2422; e-mail: pubrights@worldbank.org. ATTRIBUTION Please cite the work as follows: World Bank (2022). Capturing Housing Data in Small Island Developing States. Washington, DC. License: Creative Commons Attribution CC BY 4.0. 2 ACKNOWLEDGEMENTS ABOUT THE GLOBAL PROGRAM FOR RE- SILIENT HOUSING This guidance note was prepared by the Global Pro- gram for Resilient Housing (GPRH), a team within GPRH pro­ vides technical and financial support to the Global Facility for Disaster Reduction and Re- governments in­ terested in increasing safety and covery (GFDRR) at the World Bank. The team con- resilience in the hous­ing sector. GPRH has devel- sisted of Sarah Elizabeth Antos (Senior Land Ad- oped a methodology that combines: a) technology ministration Specialist), Luis Miguel Triveno Chan to identify which homes can be made safe before Jan (Senior Urban Development Specialist), Adam the next disaster; b) policy to connect families R. Benjamin (Drone Consultant), Jessica Gos- with government-sponsored housing programs; ling-Goldsmith (GIS Consultant), Jonathan Haso- and c) private sector participation to create jobs loan (GIS Consultant), Brian O’Hare (Web Developer and local economic development through private Consultant), Charles Wang (ML Consultant), Ale- investments in the construction and financial sec- jandro Cantera López (ML Consultant), and Nelson tors. While the program focuses on housing, it Hernandez (GIS Consultant). Managerial guidance developed a methodology to extract urban clues was provided by Francis Ghesquiere (Practice Man- from drone and street view imagery with multiple ager, Urban EAP), Ming Zhang (Practice Manager, applications. Urban EAP), David Sislen (Practice Manager, Urban LAC), and Niels Holm-Nielsen (Practice Manager, OBJECTIVE AND AUDIENCE GFDRR). Valuable country specific input was pro- vided by Gabriel Sergio Arrisueno Fajardo (Senior The purpose of this guidance note is to demon- Urban Specialist), Annie Gapihan (Senior Urban strate how to utilize geospatial technologies to Specialist), Giuliana De Mendiola Ramirez (Urban accurately and rapidly assess infrastructure within Specialist, Consultant), Cathy Lynch (Senior Urban the built environment of a small island developing Specialist), Tiguist Fisse­ha (Senior Disaster Risk state. The document presents a methodology on Management Specialist), and Keren Carla Charles how to collect drone (i.e., unmanned aerial sys- (Senior Disaster Risk Management Specialist). tem (UAS)) and street view imagery, process the The team would like to express gratitude for the imagery, and apply machine learning (ML) models valuable reviews and feedback received from: to generate a housing assessment database for Jian Vun (Senior Disaster Risk Management Spe- decision-makers. cialist), Edward Charles Anderson (Senior Disas- The audience is primarily World Bank task team ter Risk Management Specialist), Pierre Anselme leaders and task teams who need information Gilbert Chrzanowski (Disaster Risk Management about the housing stock in a small island devel- Specialist), Mira Lilian Gupta (Disaster Risk Man- oping state. It also could help disaster risk man- agement Specialist, Consultant), and Peter Rabley agement practitioners and governments. This (Founder, Place). In addition, the team is grateful document will interest people who want to gather for the technical support and equipment gener- imagery remotely for project preparation, supervi- ously provided by Trimble. sion, and impact evaluation. Those collecting, pro- Design was done by Xavier Conesa. cessing, and synthesizing mobile mapping data (e.g., drone pilots, street view camera operators, ML engineers, and other developers) will find the specific recommendations and requirements with- in the detailed workflow useful. The audience is not required to use the entire meth- odology. Teams can reference the document when collecting drone data for baseline assessments or when they want to use street view for post-dam- age assessment. The lessons learned and guid- ance provided in this document can be useful for a wide range of projects at various stages. 3 ABBREVIATIONS AND ACRONYMS 3D three-dimensional NCAT NGS Coordinate Conversion and AOI area of interest Transformation Tool AWS S3 Amazon Simple Storage Service NGS National Geodetic Survey cm centimeter NOAA National Oceanic and Atmospheric Administration CNN Convolutional Neural Network OPUS Online Positioning User Service CORS continuously-operating reference system PK primary key CVAT Computer Vision Annotation Tool POI point of interest DEM digital elevation model PPK post-processed kinematic DL deep learning ppm parts per million DOM digital object model PPP precise point positioning DRM disaster risk management PVF2 polyvinylidene fluoride DSM digital surface model ResNet residual network DTM digital terrain model RGB red-green-blue EXIF exchangeable image file RMSE root-mean-square error FK foreign key RTK real-time kinematic GA Geoscience Australia SfM structure-from-motion GAMS GNSS Azimuth Measurement Subsystem SIDS small island developing state GCP ground control point SQL Structured Query Language GFDRR Global Facility for Disaster Reduction and Recovery SRTM Shuttle Radar Topography Mission GNSS Global Navigation Satellite Systems SSD Single Shot MultiBox Detector GPRH Global Program for Resilient Housing TBC Trimble Business Center GPS Global Positioning System UAS unmanned aerial system Hz hertz UAV unmanned aerial vehicle IMU inertial measurement unit UN United Nations INS inertial navigation system UNDP United Nations Development Programme kg kilogram VGG-19 Visual Geometry Group CNN with 19 convolutional layers km kilometer VRS virtual reference station LCP LadybugCapPro VTOL vertical takeoff and landing m meter WASH water, sanitation, and hygiene ML machine learning WGS84 World Geodetic System 1984 MP megapixel XML Extensible Markup Language 4 TABLE OF CONTENTS Introduction������������������������������������������������������������������������������������������������������������������������������������������������������������� 8 Data Collection����������������������������������������������������������������������������������������������������������������������������������������������������� 14 Project Planning – Area of Interest������������������������������������������������������������������������������������������������������������ 14 Data Fusion – Georeferencing�������������������������������������������������������������������������������������������������������������������� 14 Georeferencing – Base Stations�������������������������������������������������������������������������������������������������������� 15 Street View Imagery Acquisition����������������������������������������������������������������������������������������������������������������� 19 Drone Imagery Acquisition�������������������������������������������������������������������������������������������������������������������������� 21 Initial Data Processing����������������������������������������������������������������������������������������������������������������������������������������� 26 Georeferencing – Base Station Data Processing��������������������������������������������������������������������������������������� 26 Street View Imagery Processing������������������������������������������������������������������������������������������������������������������ 26 Navigation Trajectory Correction������������������������������������������������������������������������������������������������������� 26 Image Production������������������������������������������������������������������������������������������������������������������������������� 27 Drone Imagery Processing��������������������������������������������������������������������������������������������������������������������������� 28 Georeferencing – Dataset Alignment���������������������������������������������������������������������������������������������������������� 30 Machine Learning������������������������������������������������������������������������������������������������������������������������������������������������� 34 Drone Imagery Analysis������������������������������������������������������������������������������������������������������������������������������� 34 Drone ML Accuracy Assessment������������������������������������������������������������������������������������������������������ 36 Street View Imagery Analysis���������������������������������������������������������������������������������������������������������������������� 39 Street View ML Accuracy Assessment��������������������������������������������������������������������������������������������� 42 Housing Portal������������������������������������������������������������������������������������������������������������������������������������������������������ 44 Project Implementation��������������������������������������������������������������������������������������������������������������������������������������� 48 Annex 1 — Inventory of GPRH Data��������������������������������������������������������������������������������������������������������������������� 50 Annex 2 — GPRH Housing Portal Metadata�������������������������������������������������������������������������������������������������������� 51 Annex 3 — Sample Timeline��������������������������������������������������������������������������������������������������������������������������������� 55 Annex 4 — Budget Template�������������������������������������������������������������������������������������������������������������������������������� 56 5 FIGURES Figure 1: GPRH Housing Portal��������������������������������������������������������������������������������������������������������������������������� 11 Figure 2: GPRH workflow������������������������������������������������������������������������������������������������������������������������������������� 12 Figure 3: Public CORS GNSS base station availability across the Caribbean��������������������������������������������������� 16 Figure 4: GNSS base station on a fixed-height tripod in a secure site with minimal overhead obstructions����� 17 Figure 5: Examples of high-contrast ground control points for indirect georeferencing of drone imagery��� 18 Figure 6: Mapping application on mobile device from a GPRH mission in St. Maarten���������������������������������� 19 Figure 7: Trimble MX7 mobile mapping system used for street view data collection in St. Lucia����������������� 20 Figure 8: Example of a 360° backpack camera system being deployed in Asuncion, Paraguay�������������������� 21 Figure 9: Community engagement in data collection with Sensefly eBee drones������������������������������������������� 24 Figure 10: Equirectangular panoramic image and corresponding cube images���������������������������������������������� 27 Figure 11: SfM-generated 3D point clouds��������������������������������������������������������������������������������������������������������� 29 Figure 12: Primary drone datasets���������������������������������������������������������������������������������������������������������������������� 30 Figure 13: Example of GCP identification within street view imagery in Trimble Business Center���������������� 31 Figure 14: Delineated rooftop polygons overlaid on a drone-derived orthomosaic in St. Lucia���������������������� 35 Figure 15. Labeling images using CVAT������������������������������������������������������������������������������������������������������������� 39 Figure 16: Depiction of façade material predictions (plaster)�����������������������������������������������������������������������������������41 Figure 17: Diagram of PostgreSQL relational database schema���������������������������������������������������������������������� 41 Figure 18: Custom area selected for building quality analysis in Padang, Indonesia�������������������������������������� 44 Figure 19: Home in custom area selected for review in Padang, Indonesia���������������������������������������������������� 45 Figure 20: GPRH Housing Portal application documentation��������������������������������������������������������������������������� 46 Figure PS-1: Housing data from a Dominica neighborhood overlaid on an satellite imagery base map.������ 15 Figure PS-2: Rapid building quality evaluation conducted in St. Maarten�������������������������������������������������������� 32 Figure PS-3: Rooftop shapes that were modeled for a housing vulnerability assessment project in St. Lucia���37 Figure PS-4: Examples of common rooftop construction materials found on Caribbean islands����������������� 37 Figure PS-5: Visualization of predicted housing vulnerability to hurricane damage in St. Lucia�������������������� 38 Figure PS-6: Integration of housing vulnerability dataset from St. Lucia into the GPRH Housing Portal������ 38 6 TABLES Table 1: Algorithms developed to derive building characteristics from drone images������������������������������������ 35 Table 2: Accuracy and F1 score (drone)������������������������������������������������������������������������������������������������������������� 36 Table 3: Algorithms developed to derive building characteristics from street view images��������������������������� 40 Table 4: Accuracy and F1 score (street view)����������������������������������������������������������������������������������������������������� 42 Table A-1: Inventory of GRPH data as of June 2021������������������������������������������������������������������������������������������ 50 Table A-2: Metadata descriptions for building data attributes in the GPRH Housing Portal�������������������������� 51 Table A-3: Metadata descriptions for sector and greenspace data attributes in the GPRH Housing Portal����� 54 Table A-4: Budget template for a project following the complete GPRH methodology���������������������������������� 56 7 Introduction The United Nations (UN) Human Settlements Programme estimates that 1.6 billion people (20 percent of the world population) live in inadequate housing.1 In addition, the UN recognizes 58 small island developing states (SIDS) for their social, economic, and environmental vulnerabilities.2,3 SIDS face challenges specific to their land area, relative coastline exposure, and low elevation coastal zones.4 While SIDS vary in population and land area size, small population sizes can result in higher costs per capita for physical infrastructure and services to plan for and recover from disasters.5 Furthermore, there are a multitude of stressors placed on physical infrastructure ranging from the impacts of cli- mate change (e.g., sea level rise, high-intensity tropical cyclones, extreme forest fires) to population growth outpacing housing supply and structural maintenance needs exceeding available resources. With more frequent, damaging, and deadly natural disasters, the housing sector continues to suffer significant damage, particularly for the poor. Before breaking ground, structural engineering studies are needed for resilient housing design and construction. Traditionally, engineers start their study by entering the field with rapid field assess- ment tools (e.g., the Rapid Visual Screening of Buildings for Potential Seismic Hazards) that can identify housing vulnerabilities.6 These assessments require engineers to evaluate homes from the street. Unfortunately, this is a time-consuming and expensive process, which makes it untenable to scale. However, advanced technology – drones, street view cameras, machine learning (ML) algo- rithms – can now be leveraged to inform resilient housing policy design and implementation. These geospatial technologies enable governments with forward-thinking housing policies, targeted hous- ing subsidies, and simplified risk management guidelines to invest in disaster-resilient retrofitting that serves the public good. The World Bank’s GPRH developed a methodology implementing these advanced geospatial technologies to support the strengthening of urban communities and substan- dard housing.7 This document provides technical guidance for those interested in adopting a similar process supporting SIDS. SIDS are ideal locations to apply the GPRH methodology. Governments and statistical offices of these states commonly face human and financial resource challenges that restrict the generation of detailed, updated datasets, resulting in reliance on customary datasets or limited use of incomplete or obsolete 1 UN-Habitat (United Nations Human Settlements Programme). The Value of Sustainable Urbanization. World Cities Report 2020. Nairobi, Kenya: UN-Habitat, 2020. https://unhabitat.org/World%20Cities%20Report%202020 2 UN-OHRLLS (United Nations Office of the High Representative for the Least Developed Countries, Landlocked Developing Countries and Small Island Developing States), https://www.un.org/ohrlls/content/list-sids 3 UN-OHRLLS, https://www.un.org/ohrlls/content/about-small-island-developing-states 4 Pelling, Mark and Juha Uitto. Small island developing states: natural disaster vulnerability and global change. Global Environmental Change Part B: Environmental Hazards 3, no. 2 (2001): 49–62. https://doi.org/10.1016/S1464-2867(01)00018-3 5 Pelling and Uitto. Small island developing states: natural disaster vulnerability and global change. 6 FEMA (Federal Emergency Management Agency). Rapid Visual Screening of Buildings for Potential Seismic Hazards: A Handbook. Washington, DC: FEMA, January 2015. https://www.fema.gov/sites/default/files/2020-07/fema_earthquakes_ rapid-visual-screening-of-buildings-for-potential-seismic-hazards-a-handbook-third-edition-fema-p-154.pdf 7 World Bank Group. “Global Program for Resilient Housing.” Last modified January 31, 2022. https://www.worldbank.org/en/topic/disasterriskmanagement/brief/global-program-for-resilient-housing 8 datasets.8,9 However, the land area of SIDS, whether a single island or a series of small islands, makes them a suitable size for collecting data using drones and street view cameras. In addition, their popu- lation density (22 SIDS are among the 50 most densely populated countries in the world), allows this technology to efficiently capture housing and infrastructure data.10 With limited land, compounded by natural disasters and climate change threats, a database generated using the GRPH methodology helps to effectively manage rapidly changing and competing priorities for housing, infrastructure, and land use on islands. This supports evidence-based decision-making within SIDS. While the primary application of the technical methodology shared within the guidance note is hous- ing assessment, this workflow can certainly be applied to additional infrastructure needs from public services to asset inventory. To gain confidence in these additional application areas, pilot testing on more complex public facilities including transportation and water, sanitation, hygience (WASH) infra- structure is encouraged. In particular, this approach is well-suited to document current infrastructure status for climate change mitigation and adaptation efforts similar to those undertaken by the World Bank Group in the Marshall Islands and Solomon Islands.11,12 Adopting the data collection and analysis workflows outlined herein would provide a more thorough assessment of buildings and associated infrastructure that will be impacted by rising sea levels and higher intensity storms in the coming decades. Furthermore, this methodology can be both scalable and less expensive than other potential data acquisition and analysis processes. Thus, the opportunities to utilize geospatial technologies and ML in infrastructure monitoring are numerous and available for governments looking to modernize how capital funding is allocated to address the challenges populations face in the near future. GPRH supports the strengthening of urban communities in SIDS by providing a) technical assistance for the identification of high-risk buildings; b) advice on program policy and design; and c) identifica- tion of technical solutions that could increase household resilience to hazards and pandemics. A first step in addressing these goals is to assess the condition of the current housing inventory for a given jurisdiction. Thus, high-resolution imagery of each home within the jurisdiction is systematically captured from two perspectives: from above via cameras on drones as well as from the ground via 360-degree, vehicle-mounted cameras. Once a neighborhood has been canvassed from the sky and from the street, ML algorithms are trained to extract specific characteristics of each house. GPRH then integrates these various housing attributes into a spatial database and visualizes them in an interactive web tool called the Housing Portal. The Housing Portal allows decision-makers to gain an overview of the housing stock at the city, neighborhood, and block levels as well as navigate along the street to examine the housing images and ML results. Extracting geospatial data from drone and street view images has several major advantages. First, this approach reduces field time and expenses relative to traditional methods. The information tech- 8 Montalvo, J. Daniel, Mitchell A. Seligson, and Elizabeth J. Zechmeister. “Data Collection in Cross-National and International Surveys.” In Advances in Comparative Survey Methods, 569–82. Hoboken: John Wiley & Sons, Ltd, 2018. https://doi.org/10.1002/9781118884997.ch27 9 Paris21. National Strategy for the Development of Statistics Guidelines for SIDS, 2018. https://paris21.org/sites/default/files/2018-02/SIDS-NSDS-Guidelines_final_web.pdf 10 Adeoti T, Fantini C, Morgan G, Thacker S, Ceppi P, Bhikhoo N, Kumar S, Crosskey S & O’Regan N. Infrastructure for Small Island Developing States. Copenhagen: UNOPS, 2020. https://content.unops.org/publications/Infrastructure_SIDS_EN.pdf 11 World Bank Group. “Marshall Islands: New Climate Study Visualizes Confronting Risk of Projected Sea Level Rise.” Last modified October 29, 2021. https://www.worldbank.org/en/news/press-release/2021/10/29/marshall-islands-new-climate- study-visualizes-confronting-risk-of-projected-sea-level-rise 12 Government of Solomon Islands and Global Facility for Disaster Reduction and Recovery. “Solomon Islands: Rapid Assessment of the Macro and Sectoral Impacts of Flash Floods in the Solomon Islands, April 2014.” Washington, DC: World Bank, 2014. https://openknowledge.worldbank.org/handle/10986/21818 9 nology capacity of SIDS governments are broad ranging from islands that are largely paper-based to islands that own and operate drones regularly. Regardless of the differences, they all seem to share an eagerness to adopt new technology that would improve operational efficiency. Second, the tech- nology approach minimizes direct personal contact during data acquisition, which is beneficial in areas with security concerns. Furthermore, minimization of personal contact can also reduce some potential biases arising from human-derived assessments collected by multiple people in the field across a longer time frame. Third, this workflow deftly integrates the benefits of covering broad areas at scale with time-tested methods that utilize detailed, individual, on-the-ground assessments. Thus, this methodology avoids the pitfall that can accompany projects reliant solely on surveying large areas via satellite imagery analysis. Often, those projects provide assessments with less actionable information because a project’s objectives may only be addressed suitably through the integration of additional ground-based data collection. Lastly, by emphasizing accuracy during geospatial data collection, this GPRH methodology avoids issues that arise from street-level data collected with am- biguous geographic locations due to positioning inaccuracies prevalent on mobile platforms. The subsection “Data Fusion – Georeferencing” will provide further details on this issue. End users derive value from this methodology by extracting important information from collected data into a well-structured, easily-accessible database. The Housing Portal database can be tailored to integrate other datasets to best serve the needs of individual projects and support decision-making re- lated to housing, disaster risk management (DRM), urban planning, fragility, and conflict mitigation, among other concerns. Given the flexibility of ML models to accurately capture infrastructure char- acteristics when provided with sufficient training data, there are numerous potential Housing Portal applications from workflow-derived data including: cadaster updating, regularization, and mass val- uation. Specifically, planning departments can extract information related to a structure’s access to public infrastructure and networks (e.g., utilities, roads). Likewise, taxation departments can access information related to the characteristics of a structure (e.g., use, condition, materials, features) to define an objective and transparent valuation of the subject structure. Furthermore, if governments of developing countries are interested in adopting a Fit-For-Purpose Land Administration approach to supporting land tenure security, the GPRH methodology is flexible enough to deliver datasets with spatial accuracies that accommodate the specific registry needs of citizens within that jurisdiction.13 This data can be integrated with other spatial or socio-economic data to infer various insights about a neighborhood or a city. Preventative and post-damage analysis can be developed for several types of risk and can allow intervention in areas that may be too fragile for on-the-ground analysis. The result is a housing database that can identify homes that a) are likely to be newly constructed; b) can be made safer at reasonable costs; c) need to be relocated; and d) need further technical assessments. 13 Enemark, Stig, Keith Clifford Bell, Christian Lemmen, and Robin McLaren. Fit-For-Purpose Land Administration: Joint FIG / World Bank Publication. 2nd ed. Copenhagen, Denmark: FIG Publications, 2014. https://fig.net/resources/publications/figpub/pub60/figpub60.asp 10 The housing data capture and assessment methodology has been applied to countries in Africa, Southeast Asia, Latin America, and the Caribbean: Colombia (Cartagena, Neiva, Bogota, Armenia, Montenegro, Filandia), Ghana (Accra), Guatemala (Guatemala City), Indonesia (Padang), Mexico (Sa- lina Cruz, Juchitan), Paraguay (Asuncion), Peru (Lima), and the islands of St. Maarten and St. Lucia (Figure 1). For a full list of data generated as of June 2021, refer to Annex 1. FIGURE 1. GPRH Housing Portal 6. Locating homes that need roof retrofitting to withstand future hurricanes in Castries, St. Lucia Using damage assessment data from a Category 5 hurricane in Dominica, a model based on housing characteristics was 1. The COVID-19 Vulnerability 2. Locating overcrowding 3. Locating slums in 4. Locating homes that could 5. Locating homes built with created to predict future damage Index: Locating the 40% more homes at the block level in Philipsburg, Sint Maarten. be most affected by floods in unreinforced masonry in in St. Lucia. vulnerable in Bogota, Lima, Peru Asuncion, Paraguay Salina Cruz, Mexico Colombia 6. Locating homes that need 7. Locating backyard homes 8. Locating homes with dirt 9. Classifying structures 10. Locating homes built roof retrofitting to withstand in Philipsburg, Sint Maarten floors in Guatemala City, by age of construction in with reinforced masonry in future hurricanes in Castries, Guatemala Lima, Peru Padang, Indonesia St. Lucia 11. Locating the 40% most 12. Locating units that have 13. Conducting mass 14. Estimated potential 15. Locating Venezuelan vulnerable in Cartagena, not paid property taxes in property valuation in Bogota, for increasing property tax migrants in Cartagena, Colombia Lima, Peru Colombia collection in Lima, Peru Colombia SOURCE: Original figure for this publication. The GPRH methodology relies on a variety of experts to collect and process drone and street view images and fuse the results into a dynamic Housing Portal. Data collection, initial image processing, and ML are the main components of the workflow found in Figure 2. In general, these workflow steps are applied separately to drone and street view imagery due to the distinct steps and techniques required to prepare and subsequently synthesize results from both sources into the Housing Portal. Since these are distinct data processing pipelines, knowledge is distributed across multiple people to create the Housing Portal. 11 FIGURE 2: GPRH workflow Determine Area of Interest (AOI) DRONE STREET VIEW Collect drone data COLLECT DATA Collect street view data Generate orthomosaic INITIAL IMAGE PROCESSING Trajectory correction and and three DEMs image production Delineate rooftops to MACHINE LEARNING Delineate buildings to generate GIS vector data (image segmentation) generate GIS vector data Local rooftop labeling LABELING Local building labeling MACHINE LEARNING Classify rooftops Classify buildings (image classification) Join predictions and GIS vector data Add related housing attributes and import to Housing Portal SOURCE: Original figure for this publication. The objective of this guidance note is to provide an overview of best practices and considerations for SIDS to reference when capturing housing data using the GPRH methodology. The intent of this document is to ensure that others undertaking data acquisition and data processing of drone and street view imagery avoid common pitfalls by collecting data efficiently for subsequent integration into a housing database such as a Housing Portal. Given the focus of this guidance note is on a specific workflow and application, interested readers, and especially those who are new to drone mapping, are encouraged to read additional guidance notes from the World Bank Group on drone mapping technical guidelines and cost/benefit analyses of drone technology deployment. 14, 15 14 World Bank, and Humanitarian OpenStreetMap Team. Technical Guidelines for Small Island Mapping with UAVs. Washington, DC: World Bank, 2020. http://hdl.handle.net/10986/33455 15 Stokenberga, Aiga, and Maria Catalina Ochoa. Unlocking the Lower Skies: The Costs and Benefits of Deploying Drones across Use Cases in East Africa. Washington, DC: World Bank, 2021. http://hdl.handle.net/10986/35593 12 Street view imagery captured 168.7K panoramic images and more than 1 million cube images using a Trimble MX7 system (6-camera Ladybug imaging sensor with a Trimble mobile mapping system) in St. Maarten. 13 PHILIPSBURG, SINT MAARTEN. GPRH. Data Collection Project Planning – Area of Interest A clearly defined area of interest (AOI) is a necessary first step prior to data acquisition on the ground and in the air. AOI selection is a collaborative process with government officials to ensure data collec- tion efforts align with local population interests and needs (e.g., disaster-prone areas, informal housing areas, etc.). When considering the extents of an AOI based on available field time, projects areas with flatter topography will take less time to capture via drone than areas with steep and highly variable terrain. Part of the AOI selection process also requires consultation with local airspace regulations to ensure survey areas do not encroach on no-fly zones (e.g., close proximity to airports, over military and national security interests). Prior to performing drone data collection, all necessary permits need to be secured from local and national aviation authorities. It should be noted that street view imagery collection does not typically require a permit. However, alerting the community and police is highly rec- ommended, especially before entering areas where the camera would draw attention. After securing the permits and delineating the AOI, data collection from the sky and street can be planned accordingly. Project planning also needs to incorporate a strategy for managing data in the newly surveyed areas. While local site conditions will vary and impact resultant datasets, the data inventory of previous projects found in Annex 1 is provided to help end users approximate a relationship between the area of the AOI and the volume of data collected. This relationship can be used to help estimate future data storage and processing needs depending upon the desired area of the AOI to be surveyed. Additional details are discussed in the “Street View Imagery Acquisition” section. Data Fusion – Georeferencing To create a reliable housing condition assessment which brings together data from multiple sources, best practices for geospatial data fusion need to be followed. Typically, this entails accurate georefer- encing using a common geodetic framework to tie these datasets together. For housing inventory as- sessments, this means that drone and street view imagery data need to share the same geodetic basis. While much of this GPRH workflow is built on the premise of best practices for new data acquisition, existing geospatial imagery datasets may also be integrated in lieu of new collections. Many agencies and governments may be interested in adopting this data reuse approach to save on time and expense. To leverage these previous collection efforts, it is critical that reused data a) have a known geodetic basis that newly collected data can tie into via high accuracy georeferencing and b) meet the data spec- ifications laid out in subsequent sections on drone and street view imagery acquisition. For imagery data acquisition, on-board camera sensors have accompanying positioning and naviga- tion sensors that receive uncorrected, autonomous positions from Global Navigation Satellite Sys- tems (GNSS) with a horizontal positional accuracy of 2-10 m.16 For some applications, misalignment arising between two datasets from 10 m horizontal positional errors is acceptable. In these cases, additional georeferencing project planning is unnecessary. Frequently, this level of horizontal posi- tional error is unacceptable for housing-related projects, especially when working in areas with high housing density. Thus, additional georeferencing steps are required to ensure accurate data fusion. 16 Van Sickle, Jan. GPS for Land Surveyors. 4th ed. Boca Raton (Fla.): CRC Press Taylor & Francis Group, 2015. 14 There are two methods for establishing a higher accuracy, common geodetic basis. The first, and most com- mon, is to use a GNSS base station or GNSS Virtual Reference Station (VRS) to provide corrections for both data collection pipelines. The second method is georeferencing one dataset to a higher accuracy dataset. Georeferencing – Base Stations The optimal method for ensuring a high accuracy geodetic basis is to use the same GNSS base station data for post-processing the carrier-phase GNSS data collected on-board the drone (if the system has capable sensors) and on-board the street view camera. If the drone does not have on-board GNSS receivers capable of collecting data for post-processed kinematic (PPK) processing, then see the data processing section “Georeferencing – Dataset Alignment.” It is imperative that base station GNSS data is collected once per second (1 Hz) in order for complete processing of the GNSS data collect- ed on-board the street view camera and the drone. The horizontal positional error of the resulting datasets is directly correlated to the baseline distance between the base station and the platform. A common rule of thumb is that the error increases by 1 part per million (ppm) as the baseline distance increases. Thus, a base station that is 10 km from the survey area can be expected to contribute about 1 cm (10 mm) of additional error on top of the 2-5 cm of horizontal error that is normally attributed to post-processed kinematic GNSS data. Base stations that are far from a project site (over 150 km) have fewer satellites in common with data collected on the drone and street view platforms. This leads to increased errors and at times an inability to post-process data, which means that data collected on the platform would then revert to the accuracy levels of autonomous positions (2-10 m). PROJECT SPOTLIGHT: Dominica: importance of accurate geographic coordinates Data gathered using the GPRH methodology can be integrated with other field survey data. Both street view and drone datasets require accurate geographic coordinates. During reconstruction efforts following Hurricane Maria in 2017, detailed survey data was collected house by house in Dominica as shown in Figure PS-1.* After this resource-intensive effort, the descriptive housing dataset was unfortunately limited by its coarsely collected geographic coordinates. With coordinates lacking sufficient accuracy, project teams were unable to link the field-collected housing data to each unique home seen in the satellite imagery with a high degree of confidence. FIGURE PS-1: Housing data from a Dominica neighborhood overlaid on satellite imagery base map N Roseau North Roseau Valley Roseau Central Roseau South Minimal damage Minor damage Major damage Destroyed *UNDP - Dominica Post-Hurricane Maria Recovery Project, https://www.bb.undp.org SOURCE: United Nations Development Programme (UNDP), ESRI basemap, and Global Program for Resilient Housing. 15 To reduce the burden on field teams, it is ideal if the GNSS base station is local to the project site (i.e., within 25 km) and already a part of a continuously-operating reference system (CORS) network. When using data from a CORS network, known absolute coordinates for the base station are already available. In the case of publicly-funded CORS networks, data is typically provided free of charge to end users. The National Oceanic and Atmospheric Administration’s National Geodetic Survey (NGS) operates a dense CORS network across the United States with a sparser network of CORS stations found worldwide in- cluding the Caribbean.17 NGS has a web interface that provides GNSS data free of charge to users soon after the data is collected at the CORS base station.18 Likewise, a university consortium called UNAVCO runs a smaller CORS network throughout the Americas. When using GNSS data from NGS or UNAVCO, only base stations which provide 1 Hz GNSS data should be used. Figure 3 shows the availability of 1 Hz GNSS CORS stations from the two above-mentioned public CORS networks. For operations in the Eastern Hemisphere, interested readers are referred to the Australian Government’s Geoscience Aus- tralia (GA). GA aggregates data from multiple CORS networks throughout the Asia-Pacific region.19,20 Alternatively, there are additional country-specific CORS networks such as the one operated by Indone- sia’s Geospatial Information Agency.21 During project planning, it is important to not only consider the presence of a CORS GNSS base station near a project site but also its availability (is the receiver collect- ing raw data and sharing this data over the web?), its recording interval (is the raw data collected at 1 Hz?), and reliability (how frequent has the base station data been available over the preceding months?). These attributes can be checked via web mapping interfaces and subsequent hyperlinks to the desired CORS base stations.22,23 In addition to public CORS networks, private CORS networks do exist and should also be considered depending again upon location, availability, and fees. FIGURE 3: Public CORS GNSS base station availability across the Caribbean April 25, 2022 SOURCE: Original figure for this publication with data from April 25, 2022. 17 National Geodetic Survey – The NOAA CORS Network (NCN), https://www.ngs.noaa.gov/CORS/ 18 National Geodetic Survey - User Friendly CORS, https://www.ngs.noaa.gov/UFCORS/ 19 Geoscience Australia - GNSS Network Map, https://gnss.ga.gov.au/network 20 Geoscience Australia - GNSS Data Repository, https://data.gnss.ga.gov.au/docs/home/index.html 21 Geospatial Information Agency of Indonesia - Geospatial Reference System Webmap, https://srgi.big.go.id/jkg-active 22 National Geodetic Survey – CORS Map, https://www.ngs.noaa.gov/CORS_Map/ 23 UNAVCO – Real-time Networks & Stations Monitoring, https://www.unavco.org/instrumentation/networks/status/all/realtime 16 When a data acquisition project requires high accuracy georeferencing and a CORS base station is not available, it is recommended to bring a GNSS base station and place it within the project AOI. GNSS base stations from prominent positioning and navigation system manufacturers (e.g., Trimble, Topcon, Leica) can be rented for a few thousand dollars (US) for a month’s operation with all neces- sary supplementary equipment. The purchase of similar equipment can cost 5 to 6 times as much. Meanwhile, there are alternative low-cost and ‘Do-It-Yourself’ solutions available for those on tighter budgets and with more time to research and test options. Whether renting or purchasing, equipment specifications dictate that GNSS base station receivers are capable of logging static, multi-frequency raw GNSS data at 1 Hz for the duration of all active surveys. This means that coordination between data collection teams is necessary to ensure that the base station is always logging raw data when either the drone is aloft capturing data or when the street view camera is collecting 360° imagery. Site selection for the GNSS base station is based on a) clear view of the sky (i.e., minimal satellite obstructions from 10° above the horizon to directly overhead in all directions); b) lack of electrical in- terference from nearby high power infrastructure such as high tension power lines and electrical sub stations; c) stability and permanence of the point over which the GNSS receiver is placed; d) security of the site to avoid issues with unattended equipment; and e) ease of access to exchange batteries and check on data logging status as needed during each day for the duration of a project mission. Figure 4 depicts an example of a GNSS receiver set on a fixed-height tripod and used as a base station in Asuncion, Paraguay. Lastly, it is best practice to communicate with local government officials about the need to establish a GNSS base station meeting the above criteria so potential sites can be scouted and recommended prior to a team arriving onsite. FIGURE 4: GNSS base station on a fixed-height tripod in a secure site with minimal overhead obstructions Facing North Facing West SOURCE: Global Program for Resilient Housing. Many lower-cost, entry-level drones do not have high accuracy on-board positioning sensors (e.g., PPK-capable GNSS receivers). To overcome this limitation and provide highly accurate georeferenc- ing of drone imagery-derived datasets, the primary method entails the placement of high-contrast aerial targets throughout the survey area to serve as ground control points (GCPs). These GCPs can be installed or painted by field crews or can exist already in the urban environment as shown in Fig- ure 5. Regardless of when they were established, the GCPs need to be surveyed with high positional accuracy using real-time kinematic (RTK) GNSS surveying or similar methods to obtain final coordi- nates for use within the drone image processing software. Interested readers are again referred to the previously cited guidance notes from the World Bank Group on indirect georeferencing for drone 17 mapping operations as well as articles from the scientific literature on best practices using GCPs.24 The installation and surveying process of GCPs can be quite time-consuming, which can limit the ef- fective area covered during a time-limited project mission. Thus, the section “Georeferencing – Data- set Alignment” outlines an alternative approach leveraging street view imagery to provide an aligned dataset that is superior in horizontal positional accuracy to the autonomous positioning without the need to laboriously survey GCPs in the field. FIGURE 5: Examples of high-contrast ground control points for indirect georeferencing of drone imagery Aerial targets installed for use as GCPs Existing features that can be used as GCPs SOURCE: Original figure for this publication. 24 Sanz-Ablanedo, Enoc, Jim Chandler, José Rodríguez-Pérez, and Celestino Ordóñez. “Accuracy of Unmanned Aerial Vehicle (UAV) and SfM Photogrammetry Survey as a Function of the Number and Location of Ground Control Points Used.” Remote Sensing 10, no. 10 (October 9, 2018): 1-19. https://doi.org/10.3390/rs10101606 18 Street View Imagery Acquisition Street view image data acquisition requires driving a vehicle up and down neighborhood streets with a high-resolution, car-mounted 360° street view camera. The aim is to capture the street-facing fa- cades of all buildings within the AOI. Prior to data acquisition, the road network for a given munic- ipality or jurisdiction along with the AOI extents are downloaded to a mapping application on an Android or iOS device with location service capability (i.e., GPS/GNSS capable devices). Note that the mobile device does not require a 3G or 4G data plan; however, wifi access is needed before the trip to download and cache the necessary road and satellite imagery basemaps. While the vehicle is navi- gating the road network during data acquisition, the portions of the road network being captured are highlighted in the mapping application as shown in Figure 6. This record keeping helps the camera operator with efficiently mapping neighborhoods by reducing data gaps and avoiding redundant map- ping of the same roadways. While each neighborhood and municipality is unique, common issues encountered during navigation of the road network for data capture include a) blocked roadways, b) gated communities that are inaccessible, c) one-way streets, and d) narrow passageways that the ve- hicle cannot safely navigate through. These points of interest (POIs) are also marked in the mapping application to serve as a reminder when post-processing data upon returning from the data acquisi- tion mission. FIGURE 6: Mapping application on mobile device from a GPRH mission in St. Maarten AOI extent Roadway where data was successfully captured Roadway where data was not yet captured or inaccessible Point of interest denoting features that blocked further mapping of a given roadway SOURCE: Original figure for this publication. To ensure that there is enough data coverage of each façade for the ML algorithms, street view images are collected approximately every two meters along the route using a high-resolution, car-mounted, 360-degree street view camera. Specifically, the GPRH team uses a Trimble MX7 system as shown in Figure 7, which combines a 6-camera Ladybug imaging sensor with a Trimble mobile mapping system. At each 2 m interval, the portable system takes six pictures (top, bottom, front, back, left, and right of the MX7) for subsequent stitching into a 30 megapixel (MP) panoramic photo. Each JPG image file stores EXIF metadata that includes the timestamp of data capture, the WGS84 latitude and longitude coordinates from a GNSS-derived autonomous position, and the camera facing direction. 19 FIGURE 7: Trimble MX7 mobile mapping system used for street view data collection in St. Lucia SOURCE: Global Program for Resilient Housing. Due to the Trimble mobile mapping system components, the MX7 system is capable of precise posi- tioning during post-processing using raw data collected on-board by a tightly-coupled GNSS receiver and inertial navigation system (INS) with the previously described GNSS base station data. The MX7 is also equipped with a GNSS Azimuth Measurement Subsystem (GAMS) to continuously calibrate the inertial measurement unit (IMU) by maintaining heading accuracy and ensuring that the azimuth does not drift. When using this MX7 system, it is imperative that the vehicle has a) an appropriate roof rack as shown in Figure 7; and b) a new battery to avoid a weak power supply disrupting data acquisition. For project planning puposes, previous projects with this system yielded approximately 8 hours of continuous data capture prior to internal storage reaching maximum capacity. When streets are too narrow for cars to pass through, alternative, lighter weight and highly mobile spherical cameras can be used to achieve the building façade coverage goals obtained by the MX7. A GoPro Fusion or an Insta360 Pro are examples of commercially-available, 360° cameras that can be attached to a backpack or motorcycle for navigating between tightly packed houses. In addition to proprietary systems, there are open source, Do-It-Yourself 3D street view systems that can be de- veloped with patience and ingenuity for those with technical savvy.25 Figure 8 shows a GPRH team member using a commercial backpack system due to narrow roadways in the Los Banados neighbor- hood in Asuncion, Paraguay. 25 Rainbow Sensing. https://rainbowsensing.com/ 20 FIGURE 8: Example of 360° backpack camera system being deployed in Asuncion, Paraguay SOURCE: Global Program for Resilient Housing. Data collection from previous projects in Annex 1 reveals that the average ratio between car distance (i.e., kilometers driven by a street view vehicle) and the number of panoramic images in the resultant Housing Portal is close (582 img/km ± 165 img/km) to the expected 500 images per kilometer (i.e., 500 img/km equals 1000 m divided by 2 m/img). The ratio ranges from 345 img/km to 875 img/km, which is reflective of both areas where data acquisition rates were coarser (e.g., a set of images every 3 m) on the low end and mapping roadways with high redundancy on the high end. Additionally, the expected driving distance based on the area of the AOI from previous projects is 17 km driven per km2 of area flown with a standard deviation of ± 8 km. These ratios can be used as approximations to help gauge street view data collection needs for future projects. Drone Imagery Acquisition While airborne imagery could be collected by a multitude of airborne platforms (e.g., helicopters, manned airplanes, small ultralights, surveillance drones, etc.), the GPRH methodology typically cap- tures data via small UAS (sUAS) or those drones weighing less than 25 kg. These drones are relatively portable for travel purposes, capable of capturing high quality imagery over substantial areas (e.g., 10s to 100s of km2), and are moderately priced which eases accessibility for the governments of SIDS. Drone data is collected by flying over an entire AOI in weather that is conducive to capturing clear photos of the ground (i.e., minimal clouds, wind, haze). Before and during sUAS flight, it is incum- bent on drone operators to follow drone operations risk management best practices as detailed in the World Bank Group’s comprehensive guidance note.26 In particular, operators should be aware 26 World Bank Group. 2017. “Guidance Note: Managing the Risks of Unmanned Aircraft Operations in Development Projects.” Washington, DC, USA: World Bank Group. https://documents1.worldbank.org/curated/en/895861507912703096/pdf/ Guidance-note-managing-the-risks-of-unmanned-aircraft-operations-in-development-projects.pdf 21 that SIDS often lack robust regulatory frameworks for determining ground risk assessment, specific operational risk assessment, and airport deconfliction. Thus, adherence to the afore-mentioned best practices will help minimize risk for all involved in drone data acquisition. The objective of drone data collection is acquiring imagery with a spatial resolution of approximately 4 cm that is consistent in orthometric and colormetric values across the geographic area of interest. This offers a level of detail that is approximately eight times the resolution of high-end commercial satellite imagery (e.g., WorldView-327). The spatial resolution of the drone imagery is completely dependent on the physical size of the sensor in the camera, the focal length of the camera lens, and the flight altitude of the drone. When using typical flight planning software, the specifications of the camera and the lens are already preconfigured in the software and loaded based on a dropdown menu of imaging sensors that can be selected. Thus, beyond selecting the sensor/lens combination that will be flown on-board the drone, the only parameter left to modify the spatial resolution is the flight altitude. Popular drones such as the Sensefly eBee X with a S.O.D.A. camera and the DJI Phantom 4 Pro both have global-shutter cameras to reduce potential image blur and a large physical sensor in the cam- era to reduce noise in the captured images. These and similar sensors can provide input imagery to generate high quality orthomosaics that are suitable for subsequent processing in ML models. In addition, higher-end drones often have superior, larger format sensors (e.g., PhaseOne iXM cameras that can provide 50 MP or 100 MP images captured on sensors larger than a full frame camera) that would provide excellent input imagery to produce geometrically-correct, color-balanced orthomosa- ics.28 In general, common fixed-wing drones (e.g., Sensefly eBee series, Wingtra) are more efficient fliers over longer distances than the common multicopter drones (e.g., DJI Phantom or DJI Mavic series). Consequently, the fixed-wing drones can stay aloft longer (i.e., up to an hour of flight time depending on wind conditions) and cover larger areas than the multicopters can within similar time frames. Multicopter platforms are logistically easier to operate since they are capable of taking off and landing within a very small area; meanwhile, many fixed wing aircraft require a larger takeoff and landing zone area (e.g., the size of a recreational soccer field). For this reason, some manufac- turers are introducing hybrid drone platforms that incorporate vertical takeoff and landing (VTOL) capability on a fixed wing airframe to provide users with operational site selection flexibility and longer flight duration performance. Longer duration UAS are becoming more common in the mar- ketplace. These drones feature high efficiency platforms powered by lithium batteries and gasoline. When traveling to project locations, drones that use gasoline or large lithium batteries (i.e., batteries over 100 watt-hours) require additional travel logistical planning to ensure safe shipment of the drone and its components via means other than commercial aircraft. The previously mentioned technical note from the World Bank Group on small island mapping using drones is an excellent resource for choosing an appropriate platform for a given project.29 The GPRH methodology can be undertaken with any drone platform type; however, it has most fre- quently been applied using a fixed-wing, Sensefly eBee X drone equipped with a S.O.D.A 3D camera. A benefit of this camera is the ability to take alternating oblique and nadir images during flight. The oblique images provide better coverage of building facades and the ground underneath canopies and roof overhangs than the downward-facing nadir imagery. At 1.3 kg (including the battery and the cam- 27 European Space Agency, https://earth.esa.int/eogateway/missions/worldview-3 28 PhaseOne iXM, https://geospatial.phaseone.com/cameras/ixm-100/ World Bank, and Humanitarian OpenStreetMap Team. Technical Guidelines for Small Island Mapping with UAVs. 29 Washington, DC: World Bank, 2020. http://hdl.handle.net/10986/33455 22 era), this aircraft is extremely portable and unlikely to cause significant injury if it were to have an issue during flight. Furthermore, the maximum battery capacity for the eBee X is less than 100 watt- hours, which means transportation to project sites via commercial aircraft both domestically and in- ternationally is possible. In addition, the drone utilizes a precise, on-board PPK positioning system so each camera exposure location and the corresponding georeferenced, RGB photo can have positional accuracies approaching cm-level magnitude when post-processed with local GNSS base station data. For entities requiring higher-end drone platforms, some manufacturers include two on-board GNSS receivers capable of PPK positioning to increase confidence in derived positional accuracies by utiliz- ing heading orientation between the two receivers as an additional navigational data input. During drone mission planning, an image overlap of 70 to 80 percent between adjacent photos en- sures all captured imagery can be aligned and accurate elevation datasets are derived. When possible, it is advised to use a digital elevation model (e.g., SRTM global DEM with 30m spatial resolution) for flight planning, especially in areas with mountainous or steep terrain as encountered in many SIDS.30 Mission parameters such as maximum altitude and maximum flight distance from the landing zone are dependent upon airspace rules governing a municipality. Maximum flight distance is highly de- pendent upon local communication interference (e.g., first responder communication towers) and local topography (e.g., extreme elevation gradients lose communication over shorter distances than flat terrain). When preparing for a project, it is important to communicate with local government officials regard- ing the need to access specific, suitable takeoff and landing zone areas (e.g., sports fields, parks with few trees). Securing permission to fly from these areas before the data acquisition team is onsite con- tributes significantly to flight operation efficiency and maximizing available flight windows. Regard- less of the site location, involving the local community before and after flight operations as shown in Figure 9 is an enjoyable and educational experience for all. 30 UN Department of Economic and Social Affairs - Small Island Developing States, https://www.un.org/esa/sustdev/sids/sidslist.htm 23 FIGURE 9: Community engagement in data collection with Sensefly eBee drones SOURCE: Global Program for Resilient Housing. LESSONS LEARNED: Flight planning for survey areas with variable terrain elevations • For operators with lower-cost, entry-level drones, most basic flight planning software (e.g., Pix4Dcapture, DroneDeploy) do not account for the changing terrain of the project area when calculating the ground sample distance (GSD) (i.e., spatial resolution) of the resultant imagery.* Instead, the GSD in the mission planner is usually based on the altitude above ground level (AGL) of the takeoff area. • For areas with highly variable terrain, a flight that is flown at a consistent AGL will result in drastically different GSDs within the project area. Also, front and side overlap percentages will be quite different depending on the difference between the terrain elevation and the elevation of the aircraft. • When possible, it is best to use a flight planning software that has terrain awareness (e.g., UgCS) and/or conduct flights with similar terrain elevations together in one mission.** • When this is not possible, GSD flight planning based on the midpoint in the terrain elevation between the highest point and the lowest point may be the best option to minimize discrepancies across the entire area. This will require pre-planning to determine the difference in elevation between the terrain midpoint elevation and the takeoff elevation. • It is best to be a bit more conservative in front and side overlap percentages when terrain changes significantly within a flight AOI (80 to 85 percent front overlap and 70 to 75 percent side overlap). This is especially true when mission planning software does not have terrain awareness. * Pix4Dcapture, https://www.pix4d.com/product/pix4dcapture. DroneDeploy, https://www.dronedeploy.com/ ** UgCS, https://www.ugcs.com/ 24 A Sensefly eBee drone collected imagery in St. Lucia. 25 CASTRIES, ST. LUCIA. GPRH. Initial Data Processing Georeferencing – Base Station Data Processing Before processing the drone or street view imagery, the geodetic basis for the entire project must be determined to ensure spatially accurate data fusion. When using either a CORS GNSS base station or an independent base station to provide accurate georeferencing for the resultant datasets, the first step is to determine the absolute final coordinates that will be used for control in the drone and street view navigation trajectory post-processing. If using a CORS GNSS base station, these coordinates are provided on the accompanying data sheet with a specific coordinate system and datum. Many software packages and web tools (e.g., NGS Coordinate Conversion and Transformation Tool [NCAT]) exist to convert coordinates from one coordinate system and datum to another.31 When using data from an independently set up base station, additional steps are required to determine the absolute final coordinates. The most straightforward option is to process all of the base station data using on- line GNSS processing tools such as the NGS Online Positioning User Service (OPUS) to compute one set of final base station coordinates. Depending upon the project location (e.g., remote project sites), some tools that utilize Precise Point Positioning (PPP) will provide better solutions than network pro- cessed solutions such as OPUS. Once the final coordinates for the base station are determined, the processing of the navigation trajectories can commence. Street View Imagery Processing The initial processing of data from the MX7 sensor suite consists of two components: a) navigation trajectory correction and b) image production. The general workflow is similar for other street view sensors; however, the software packages and raw data collected by each sensor is slightly different. One should make particular note of the GNSS data that is being captured by different street view sen- sors. Especially for lower-cost sensors, image metadata should store geographic coordinates for each image; however, there may not be a method to post-process and refine those locations due to the lack of a raw GNSS data receiver on-board. As always, it is highly recommended to field test and develop equipment workflows prior to project deployment. Navigation Trajectory Correction The Trimble MX7 utilizes a tightly-coupled GNSS/INS sensor suite to produce a precise navigation trajectory. Since the MX7 navigates on roadways in urban settings for housing-related projects, data gaps in GNSS raw data may exist due to GNSS satellite outages when traveling in close proximity to tall buildings and underneath tree canopy or overpasses. Therefore, post-processing is necessary to improve the accuracy of the produced trajectory particularly during the signal outage periods. Ap- planix IN-fusion technology applies a centralized Kalman Filter to utilize inertial data to resolve am- biguities during GNSS signal disruptions. Applanix POSPac is the software used for post-processing the GNSS and INS data from Trimble MX7 operations. During post-processing, it is important to a) monitor root-mean-square (RMS) errors for the horizontal and vertical components of the navigation trajectory to troubleshoot large inaccuracies should they arise and b) overlay processed navigation 31 National Geodetic Survey – Coordinate Conversion and Transformation Tool (NCAT), https://www.ngs.noaa.gov/NCAT/ 26 trajectories on satellite imagery in GIS software (e.g., QGIS, ArcGIS) to ensure that there are not sys- tematic biases (e.g., incorrect control coordinates) causing large horizontal positional errors. Upon completing the quality assurance and quality control, the navigation trajectory is output as a SBET file, which is then used as an input in the image production process. Image Production Data from the Trimble MX7 is processed in both a) LadybugCapPro (LCP), an open-source software, and b) Trimble Business Center (TBC), a proprietary software which complements the MX7 workflow. Each software package is used to export a specific set of street view imagery as shown in The work of GPRH supported resilient housing for neighborhoods in. FIGURE 10: Equirectangular panoramic image and corresponding cube images Comparison of two street view image datasets from a project in St. Lucia with the equirectangular panoramic image on the top and the set of six square cube images on the bottom. SOURCE: Global Program for Resilient Housing. 27 The LCP software is used to output cube images, which are a set of six square-size images capturing the six sides (top, bottom, front, back, left, and right) for the MX7 camera. The cube images serve as the input images used in the ML processing pipeline. To generate the cube images, PGR files provided by the Trimble MX7 at the end of the data capturing process are input into LCP. The TBC software is used to output a) panoramic images, which are the six-sides of cube images stitched into an equirectangular image; and b) geolocation data, which includes the location informa- tion for each panoramic image stored in a CSV file format. The CSV file is the navigation trajectory dataset that includes the timestamp of data capture, heading, geographic coordinates (i.e., latitude, longitude), and elevation. To import and export panoramic images, TBC requires two inputs: a) the SBET navigation trajectory file from POSPac and b) the TRIDB file, which is the database indicating the relationships between the image directories and the navigation data. The resultant panoramic images are then uploaded to Mapillary, an online mapping platform that gives users in the Open- StreetMap community and other mapping platforms free access to the street view images.32 Finally, the CSV trajectory serves as the foundation for linking attributes derived from the cube images and the panoramic images to the data and attributes derived from the drone imagery. Drone Imagery Processing Following a PPK-enabled drone flight, the navigation trajectory of the drone is post-processed us- ing the GNSS base station absolute final coordinates and raw GNSS data to refine the image expo- sure location coordinates. By using the GNSS base station to post-process both the drone and street view navigation trajectories, the resultant photo locations for both imaging systems will align with high spatial accuracy. During the drone imagery processing pipeline, structure-from-motion (SfM) software generates a dense point cloud from the overlapping imagery. To obtain accurate elevation models, the SfM-generated 3D point cloud is classified, filtered, and cleaned so that all ground and non-ground points are appropriately classified as shown in Figure 11. While many software programs can perform automated ground classification of 3D point clouds, it is inevitable that the process will entail some manual editing and cleaning of the resultant classification. Manual refinement is espe- cially necessary in areas with steep slopes and for buildings that are built into the sides of hills with walkouts on the downhill side. 32 Mapillary, https://www.mapillary.com/ 28 FIGURE 11: SfM-generated 3D point clouds Point clouds from a project in St. Maarten shown in true color on the left and classified ground (brown) and non-ground (white) on the right. SOURCE: Original figure for this publication. The primary drone data deliverables for the subsequent steps in the GPRH workflow are uncom- pressed raster GeoTIFFs of the full radiometrically, color-balanced orthophoto mosaic and three dig- ital elevation models (DEMs) as shown in Figure 12. The orthophoto mosaic is provided in natural color with a non-resampled, spatial resolution of 4 cm or less, spectral resolution of at least 3 bands (RGB), and 16-bit radiometric resolution. The three raster DEM layers are provided with a non-res- ampled, spatial resolution of 8 cm or less. The first DEM, the digital terrain model (DTM), is a bare- Earth representation of the topography. The second DEM, the digital surface model (DSM), is an ele- vation model that includes all ground and non-ground features (e.g., buildings, trees, etc.). The third DEM, the digital object model (DOM), is the difference between the DSM and the DTM and provides the modelled height of all non-ground objects. The accuracy of the DOM for determining structural heights is adversely impacted by steeply sloped terrain and dense tree canopy abutting the structure. Depending on project objectives, additional drone data (e.g., raw flight data such as JPG, TXT, and/ or other raw output files), flight logs, classified 3D point cloud(s) in LAS format) may need to be de- livered as well. LESSONS LEARNED: Drone image processing of large, contiguous survey areas • When a project area is mapped with multiple individual flights, avoid processing each unique flight separately and subsequently delivering individual orthomosaics for each flight. • Instead, it is critical to align images from all flights together in the SfM software (e.g., Pix4Dmapper, Agisoft Metashape, OpenDroneMap) before proceeding to create the dense point cloud, digital elevation models, and orthomosaic.* • When delivering large orthomosaics covering 10 to 30 km2 at 4-5 cm spatial resolution, image tiles are the most effective input for the ML analysis pipeline. * Pix4Dmapper, https://www.pix4d.com/product/pix4dmapper-photogrammetry-software Agisoft Metashape, https://www.agisoft.com/ OpenDroneMap, https://www.opendronemap.org/ 29 FIGURE 12: Primary drone datasets Drone datasets for GPRH workflows encompass an orthomosaic (a) and three digital elevation models (b-d) as shown for this example project site in St. Maarten. Orthomosaic (a) DOM (b) DSM (c) DTM (d) SOURCE: Original figure for this publication. Georeferencing – Dataset Alignment Spatially aligning the drone and street view imagery datasets is important for most applications. Thus, when high positional accuracy is required and data acquisition teams are not using PPK-en- abled drones, GPRH teams have leveraged the positional accuracy of post-processed street view data to improve the positional accuracy of drone-derived datasets. Leveraging this existing street view dataset is important because the time involved to field survey an appropriate number of GCPs over a 20-25 km2 area would be prohibitive to conducting drone data acquisition in a timely manner. Typ- ically, GPRH data processing is done separately for drone and street view data. For this alignment workflow, the drone processing team communicates with the street view processing team regarding potential GCPs within the AOI. Common GCP recommendations include the previously described 30 painted lines and symbols shown in Figure 5 as well as the center of manhole covers or well-defined intersections of pavement and sidewalk patches. It is best practice to use GCPs on the ground to avoid issues with relief displacement in the imagery from elevated aerial targets. Again, the goal is to recommend multiple potential GCPs that are well-distributed around the AOI and may be observ- able in both the street view and drone imagery. The street view processing team can then identify these recommended points in at least two street view images within TBC as shown in Figure 13. The resultant GCP coordinates are then output in the same coordinate system and datum as the street view imagery. Lastly, these street view GCPs are integrated into the drone data processing pipeline to ensure that the datasets derived from both the street view imagery and drone imagery align with high spatial accuracy. FIGURE 13: Example of GCP identification within street view imagery in Trimble Business Center SOURCE: Original figure for this publication. 31 PROJECT SPOTLIGHT: St. Maarten: post-hurricane building stock evaluation After Hurricane Irma, St. Maarten confronted reconstruction challenges compounded by limited housing data. Government officials collaborated with GPRH to efficiently gather information to evaluate the housing status of its residents. Using the GPRH methodology, drones captured natural color imagery (4-6 cm spatial resolution) spanning 28 km2. High-risk neighborhoods could be located from the sky (e.g., informal settlements established close to a new trash site). Advanced ML algorithms applied to drone and street view imagery assessed the condition of 13,000 buildings as shown in Figure PS-2. Building attributes, such as size, use (e.g., critical infrastructure, residential, commercial, mixed), and quality (e.g., good, fair, poor, damaged), were made accessible through a geospatial database. This meant that buildings and temporary structures (e.g., backyard homes) were counted and accurately located; meanwhile, their relevant characteristics could be assessed at a neighborhood level. As a result, government officials and other stakeholders possessed actionable information as they evaluated reconstruction efforts and planned investments. FIGURE PS-2: Rapid building quality evaluation conducted in St. Maarten SOURCE: Global Program for Resilient Housing. 32 Two years after Hurricane Irma, GPRH collected drone imagery covering 28 km2 to assess housing in St. Maarten. 33 PHILIPSBURG, ST. MAARTEN. GPRH. Machine Learning Once drone and street-level data are fully processed and aligned, ML algorithms are used to identify building characteristics. Traditionally, many of the algorithms developed for street view imagery fo- cus on transportation (e.g., Convolutional Neural Networks (CNNs)-based detection or segmentation of objects such as road signs, crosswalks, and car information).33 However, a great opportunity exists to use street view imagery in building characteristic detection; whereas, these characteristics have shown to be an accurate indicator of socioeconomic trends.34 In addition, ML segmentation of school signs, crosswalks, hospital entrances, and petrol logos allows for the automatic detection of critical infrastructure such as schools, health facilities, and gas stations. Similarly, information regarding the home’s façade (e.g., windows, garages, doors) can also be extracted. Before proceeding to the analysis of drone and street view imagery with ML, it is important to high- light that AI and ML need to be used responsibly to avoid data misuse, bias, and ethics concerns. The World Bank Group published a working group summary that addresses best practices for ethical use of AI.35 This document is an excellent resource for those new to the field of AI and ML who want to educate themselves on how to benefit from using this powerful technology while minimizing unin- tended negative impacts. Drone Imagery Analysis ML is a powerful analytical tool for detecting patterns in imagery datasets using computers that require minimal human intervention. When using drone imagery in the ML GPRH workflow, there are four main components: a) image segmentation; b) model training; c) image classification; and d) model accuracy assessment. The orthomosaics and DEMs derived from drone imagery permit the a) delineation of rooftops; b) extraction of building height to calculate the number of floors per home; and c) estimation of ground slope under each building. To delineate roofs using image segmentation, a U-Net CNN is applied. Depending on the compactness of the structures (i.e., overlapping or adjacent rooflines of abutting structures), manual delineation may be required to supplement the CNN-based delineation. Essentially, this manual delineation is quality assurance and quality control of the image segmentation portion of the ML process. Figure 14 shows an example of rooftop delineation with black polygons approximating the extents of a building’s structure. 33 Neuhold, Gerhard, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder. “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 5000-5009. doi: 10.1109/ICCV.2017.534. 34 Galobardes, Bruna , Mary Shaw, Debbie A. Lawlor, John W. Lynch, and George Davey Smith. “Indicators of socioeconomic position (part 1).” Journal of epidemiology and community health vol. 60,1 (2006): 7-12. doi:10.1136/jech.2004.023531. World Bank Group. 2021. “Responsible AI for Disaster Risk Management.” Working Group Summary. Washington, DC, 35 USA: World Bank Group. https://opendri.org/wp-content/uploads/2021/06/ResponsibleAI4DRM.pdf 34 FIGURE 14: Delineated rooftop polygons overlaid on a drone-derived orthomosaic in St. Lucia SOURCE: Global Program for Resilient Housing. The next steps in the ML processing pipeline are model training and image classification. Once the rooftops are delineated, the roof material and condition are predicted via image classification for each rooftop polygon using a Visual Geometry Group CNN with 19 convolutional layers (VGG-19) where classes are shown in Table 1. To make correct predictions, the network has been previously trained with labeled images from other cities, as well as a small sample of local labeling to identify endoge- nous characteristics of the location. As projects expand to new geographic areas around the world, the diversity of building characteristics in the training dataset continues to broaden. Inevitably, there will still be local construction practices that are not completely captured by the training dataset that need to be supplemented through additional labeling. TABLE 1. Algorithms developed to derive building characteristics from drone images BUILDING PROPERTIES Roof condition good, fair, poor, under construction, or vacant Roof material concrete, metal, mixed, tile, or other Further details are provided in Table A-2 found in Annex 2. 35 Drone ML Accuracy Assessment The final step in the ML processing of drone imagery is determining the reliability of the ML-derived classifications. The identification of a rooftop characteristic in Table 1 depends on how clear the feature is in the image, how many training samples of the characteristic were provided in the ML model, how many classes are used for a specific building characteristic, and how well the classes capture the variability of all evaluated features for a given characteristic. Multiple metrics exist for evaluating classification quality. The simplest metric is accuracy, which measures the number of correct predictions as a percentage of the total number of predictions. Accuracy can be a good metric if there is a balanced number of samples in each class. Unfortunately, this is rarely the case so an- other metric that handles imbalanced classes is needed, namely the F1 score. The F1 score is actually a combination of two prediction evaluation metrics: precision and recall. Precision is a percentage that reflects how much of all positive classification predictions are correctly positive for a given class (e.g., true_positives / [true_positives + false_positives]). Meanwhile, recall measures the success that the model correctly predicted all of the positive cases for a given classification (e.g., true_positives / [true_positives + false_negatives]). The F1 score combines the recall and precision metrics as follows: (precision * recall) F1 score = 2 * (precision + recall) Thus, a model that has both good precision and recall will have a high F1 score (i.e., close to 100 per- cent). The ML model for evaluating drone rooftop characteristics is shown in Table 2. TABLE 2. Accuracy and F1 score (drone) CHARACTERISTIC ACCURACY (%) F1 SCORE (%) Material 83.64 81.20 Condition 77.14 74.67 SOURCE: Original table for this publication. To provide end users with a categorical attribute for the reliability of the ML model, the drone confi- dence level defines how certain the model identifies the roof material and condition. For each rooftop polygon, the ML model a) calculates an array of values equal to the number of potential classifica- tions and b) assigns the probability of each potential classification with a value between 0-1, where 1 equals 100 percent probability. The confidence level measures the difference between the material (or condition) with the highest probability in the array minus the sum of the rest of the probabilities in the array. Confidence levels are critical to understanding the reliability of the ML-based classifica- tion. This method offers a specific confidence level for each prediction: • High: if the value is >0.75, the difference among the material (or condition) with the highest prob- ability and the rest is large, the confidence is high. • Medium: if the value is >0.25 and <0.75, there is a material (or condition) that has a higher proba- bility than the rest, but the difference is not clear, so the confidence is medium. • Low: if the difference is <0.25, the model cannot clearly differentiate between at least two materi- als (or conditions), so the confidence is low. 36 PROJECT SPOTLIGHT: St. Lucia: building vulnerability assessment ML models can be built and trained for specific applications. In St. Lucia, GPRH successfully undertook a study assessing the vulnerability of housing structures to impacts from hurricanes. Based on housing vulnerability literature, it was shown that buildings were less affected by hurricanes when they had a square floor plan, a roof with 4 panels, a roof slope of greater than 30 degrees, and short roof overhangs.* Further, by specifically analyzing data in Dominica after Hurricane Maria, the least vulnerable structures had a) a hip, pyramid, or hip and valley roof shape, b) a larger building footprint, and c) roof material made of concrete or PVF2 sheeting. To investigate the applicability of these findings to a specific project site in St. Lucia, drone-derived datasets as described in the drone data processing section were generated for the entire AOI. Similar to other projects using the GPRH workflow, image segmentation was used to generate roof outlines for further analysis. A model was built using attributes related to rooftop structure and rooftop material as shown in Figure PS-3 and Figure PS-4. FIGURE PS-3: Rooftop shapes that were modeled for a housing vulnerability assessment project in St. Lucia Open gable Hip Hip and valley Flat Combination Pyramid hip SOURCE: Modified from https://www.roofingcalc.com/top-15-roof-types-and-their-pros-cons/ * New Jersey Institute of Technology. “Home Shapes And Roofs That Hold Up Best In Hurricanes.” ScienceDaily. www.sciencedaily.com/releases/2007/06/070619155735.htm (accessed May 16, 2022). FIGURE PS-4: Examples of common rooftop construction materials found on Caribbean islands PVF2 Sheeting Galvanized metal Shingle Cement SOURCE: Origilnal figure for this publication. (Continued on following page) 37 The result was a rich dataset on housing vulnerability for Dennery, St. Lucia, shown in Figure PS-5. The local government could then use this up-to-date database within the Housing Portal in Figure PS-6 to identify houses that would benefit most from leveraging geospatial technology and ML to improve the lives of local residents. It should be emphasized that implementation of this methodology provides information for high-level housing assessments and subsequent identification of residents’ housing in St. Lucia most at risk. Meanwhile, those St. Lucian residents seeking detailed, engineering-grade vulnerability analyses of specific buildings would require additional point-by-point structural assessments than cannot be accomplished via exterior observations alone. FIGURE PS-5: Visualization of predicted housing vulnerability to hurricane damage in St. Lucia very low low medium high very high SOURCE: Origilnal figure for this publication. FIGURE PS-6: Integration of housing vulnerability dataset from St. Lucia into the GPRH Housing Portal SOURCE: Origilnal figure for this publication. 38 Street View Imagery Analysis As with any ML model, the model and algorithms are only as good as the training data upon which it is built. Thus, high-quality training datasets are a necessity to develop high-performing ML mod- els. For street view information extraction, a free, open-source, web-based annotation tool called Computer Vision Annotation Tool (CVAT), shown in Figure 15 is used to label thousands of randomly selected street level images captured in the AOI. To assemble the ground truth labels for the training dataset, the CVAT labeling results are exported to an XML document before conversion to the COCO format, which is imported into the ML network architecture. FIGURE 15. Labeling images using CVAT SOURCE: Global Program for Resilient Housing. LESSONS LEARNED: Developing (or supplementing) a training dataset • For a simple classification, at least 2,000 labels per class is recommended. • Provide visual examples for each label and clear instructions on handling edge cases to support labeling consistency. Note edge cases are when a feature does not decisively fit within one class for a given attribute. • Crowdsourcing labeling through services like Mechanical Turk may negatively impact the quality of the training dataset. This crowdsourced approach was tested and the label quality from external sources was poor enough that the ML results were substantially worse. Thus, a dedicated labeling team is preferred. To derive building characteristics visible from the street, a deep learning (DL)-based image segmenta- tion technique, Mask R-CNN, is used to detect instances of building characteristics from images and classify the detected instances. Examples of detected building characteristics include the predicted building use, wall material, and wall condition among others shown in Table 3. Detectron2 is the pri- mary tool used for implementing the Mask R-CNN technique. Detectron2 is built upon and powered by the PyTorch DL framework. 39 TABLE 3. Algorithms developed to derive building characteristics from street view images BUILDING PROPERTIES Complete construction status of the building: complete or incomplete Condition wall condition: good, fair, or poor Construction type predominant construction: unreinforced masonry, reinforced masonry, or unknown Designed designed: designed at one time or undesigned: designed incrementally Material wall material: brick or concrete block; plaster; wood - polished; wood - crude/plank; adobe; corrugated metal; stone with mud/ashlar with lime or cement; container/trailer; plant material; mix/unclear/other Security secured or unsecured Use residential, commercial, critical infrastructure, or mixed Vintage estimated era of building construction: 1) pre-1940, 2) 1941–1974, 3) 1975–1999, 4) 2000–present BUILDING PARTS Door number of doors Garage number of garages Window number of windows Further details are provided in Table A-2 found in Annex 2. SOURCE: Original table for this publication. To identify building parts (e.g., windows, doors, garages) in projects using the GPRH workflow from street view imagery, the ML model is developed with the TF Object Detection API using a Residu- al Network (ResNet) model component for feature extraction and a Single Shot MultiBox Detector (SSD) model component for object proposals. This multi-tooled approach enabled creation of a model that performed well across multiple geographic regions for identifying numerous building properties (e.g., building construction type, exterior wall material, use, condition). Additionally, the parts of the building (e.g., windows, doors, garages) are not only identified but also counted. The predicted build- ing properties and parts are linked to each street view image using a unique image frame ID. A de- tection ray connects each geolocated image taken from the street to the building footprint. Figure 16 illustrates an example of the vehicle’s GPS trajectory joined by the detection ray to the photographed building and related prediction. 40 FIGURE 16: Depiction of façade material predictions (plaster) A detection ray connects each geolocated view of the building with the building footprint. SOURCE: Global Program for Resilient Housing. To make this information accessible to end users, the street view images, street view detections, and building footprints are stored in separate tables in a PostgreSQL database as shown in Figure 17. The street view images (sv_images) table contains image information from the trajectory files. The street view detections (sv_detections) table contains the ML detections from the building parts and properties files. Lastly, the building footprints (buildings) table contains the locations of the buildings and related metadata. The SQL database structure was selected to manage the housing data since it permits fast, detailed, and dynamic queries for the web-based interface of the Housing Portal. FIGURE 17: Diagram of PostgreSQL relational database schema The PK is the primary key field in a given table and FK stands for the foreign key field linking the database tables. SOURCE: Global Program for Resilient Housing. 41 Street View ML Accuracy Assessment Similar to the drone imagery ML process, data users need to quantitatively understand the reliability of the ML-derived classifications for the building characteristics in Table 3. Thus, a model that has both good precision and recall will have a high F1 score (i.e., close to 100 percent) as shown for the four building characteristics listed in Table 4. Table 4. Accuracy and F1 score (street view) CHARACTERISTIC ACCURACY (%) F1 SCORE (%) Construction type 98.97 98.97 Material 97.08 93.30 Use 94.62 88.31 Condition 78.85 69.68 SOURCE: Original table for this publication. Each individual classification receives a prediction score between 0-1 based on the probability that the building characteristic identified in the image is assigned to the most probable class among all available classes for an attribute, with 1 indicating the highest confidence level (100 percent). With multiple street view images of the same building, the final building characteristic classification is determined by the highest prediction score among all images for that specific class. Having plentiful, high-quality training data is paramount to the accuracy of a ML model’s results. For example, the three main material classes: plaster, brick or concrete block, and mix/other/unclear dis- play high accuracy (>90 percent) given the high quantity of associated labels for these features. Those three classes have approximately 76,000, 17,000, and 19,000 labels, respectively, within the street view ML model; meanwhile, the remaining material classes found in Table 3 have at most 2,000 labels and often only hundreds of labels. This relationship between label quantity and classification accuracy suggests that the model could also achieve high performance for the relatively more minor material classes (e.g., adobe, wood – crude plank, wood – polished, corrugated metal) once more training data is collected and provided. Similarly, the training data for the wall condition class is rath- er imbalanced. With more training data for buildings in ‘good’ condition (i.e., currently 12,000 and 55,000 labels fewer than ‘poor’ and ‘fair’ labels, respectively), the overall class accuracy and F1 score should subsequently improve given the model already shows high accuracy (<90 percent accuracy) for the ‘fair’ and ‘poor’ wall condition predictions, which have a substantial amount of training data. After predictions scores are generated for every building in the dataset, all final prediction scores are grouped from highest to lowest in three equally-sized categories (i.e., tertiles) to determine high, medium, and low confidence levels for each building characteristic. The characteristic for a given building is then given a categorical confidence level based on its prediction score relative to the pre- diction scores of other buildings in the dataset. If prediction scores for a building characteristic are all greater than 95 percent for a given dataset, then all predictions for that building characteristic are assigned high confidence. By ranking confidence levels in this manner, end users can determine the reliability of a specific building characteristic classification when comparing the selected classifica- tion to classifications of the same characteristic across all buildings within a dataset. 42 GPRH worked in St. Maarten in 2019, two years after Hurricane Irma devastated the area. Capturing images offered an inventory of homes and public spaces. Now, after the next storm, it will be possible to quickly measure any damage to these structures, allowing reconstruction to begin. 43 PHILIPSBURG, ST. MAARTEN. Gerben van Es. Housing Portal The overarching goal for projects using the GPRH methodology is to provide end clients with an at- tribute-rich, comprehensive dataset on housing conditions that can be accessed and used to inform housing interventions and investments. In order to bring this data into the hands of a decision mak- ers, a demo Housing Portal was developed. This Housing Portal is open source and designed to be tailored to the local context. The Housing Portal can show more than 25 characteristics for complete cities, neighborhoods, or housing units as shown in Table A-2 and Table A-3 in Annex 2. Furthermore, the Housing Portal is interactive by allowing users to define their own areas of analysis (e.g., examining a particular neigh- borhood or block) and by providing filters to examine and visualize specific housing characteristics as shown in Figure 18. It is even possible to navigate along the street from street view image to street view image and gain further insight from the complete imagery for each home as illustrated in Figure 19. Another benefit of the Housing Portal is that housing data can be exported for further analysis in a variety of formats (e.g., GeoJSON, GeoPackage, ESRI shapefile, CSV). FIGURE 18: Custom area selected for building quality analysis in Padang, Indonesia SOURCE: Global Program for Resilient Housing. 44 FIGURE 19: Home in custom area selected for review in Padang, Indonesia SOURCE: Global Program for Resilient Housing. Given the richness of the data displayed in the Housing Portal, privacy and security are paramount. Images that contain a) faces or b) any personal information about the people living in the homes are blurred. To ensure security, each country has a dedicated Housing Portal with login permissions ex- clusive to each user. An example of the Housing Portal landing page can be viewed at: https://gprh.geoweb.io/. Public access is not provided by default. However, once a government is host- ing their own instance of the Housing Portal, they may choose to provide open access. The Housing Portal application architecture consists of two parts: a backend and frontend. Docu- mentation for both components are provided on the platform as shown in Figure 20. On the backend, spatial and user data is kept in a PostGIS database. The spatial data for each AOI is stored in a PostGIS instance with individual tables for buildings, images, and detections as shown in Figure 17. The appli- cation components are run within docker containers, except for Nginx in production. The frontend is a JavaScript application built using Vue.js. All data is kept behind the World Bank firewall on Amazon Simple Storage Service (AWS S3). The code to develop the Housing Portal and its database are open source and available on GitHub: https://github.com/GPRH/housing_portal. By providing the code and detailed instructions on GitHub, governments can create their own instance of the system to be run locally or on AWS. While the portal has been designed to be transfered to governments, continued IT support and knowledge sharing should also be included in the transfer. Support for the installation and deployment will help ensure government IT experts can administer and maintain the Housing Portal well into the future. 45 FIGURE 20: GPRH Housing Portal application documentation SOURCE: Global Program for Resilient Housing. The capabilities of the Housing Portal can be extended to tailor the understanding of cities by govern- ments and companies. For example, by integrating official geospatial information into the portal (e.g., cadastral, census, or hazard layers), governments can start understanding the land tenure security or socio-demographic profiles of homes. By pulling in supplementary data, the Housing Portal can be queried to identify homes and neighborhoods to support the implementation of formal, affordable, and resilient housing to improve lives and safeguard economies. For a government client interested in inte- grating this database into its own national data repository, the PostGIS database has a flexible design to allow for modifications (e.g., reprojection, etc.) to fit the spatial data infrastructure needs of the national government. Furthermore, the government client could permit the release of the housing dataset in the World Bank Data Catalog to improve discoverability and data reuse.36 For sharing of non-sensitive data with the general public, uploading housing data to OpenStreetMap’s free, editable geographic database could provide public benefit to those reliant on non-proprietary map data.37 Simple rooftop polygon datasets with fewer attributes (e.g, rooftop material, number of stories) derived from GPRH workflows could update or fill-in data gaps that exist across the maps of many SIDS in OpenStreetMap. As the number of projects and geographic areas expands, it is anticipated that this unique blend of geospatial technologies and ML will continue to impactfully serve the World Bank’s twin goals of ending extreme poverty and increasing the prosperity of vulnerable populations throughout SIDS and beyond. 36 World Bank Data Catalog, https://datacatalog.worldbank.org/ 37 OpenStreetMap, https://www.openstreetmap.org/ 46 GPRH helped the government find homes that need roof retrofitting to withstand future hurricanes in St. Lucia. There was a high concentration of small units (average area: 33 m2) with a high to very high likelihood of complete damage located near the port. 47 CASTRIES, ST. LUCIA. ATGImages, iStock. Project Implementation Understanding project implementation best practices is a prerequisite for the successful execution of a project using the GPRH methodology. Readers are encouraged to reference these recommenda- tions in conjunction with relevant sections and annexes of the guidance note to apply this approach and technology to support a wide variety of World Bank projects. Four annexes are available to guide project implementation. Annex 1 helps to estimate data collection time based on prior projects and Annex 2 shares the standardized metadata for the resulting housing database. Annex 3 reveals a timeline for a sample project with an associated budget template found in Table A-4 of Annex 4. Given the unique scope and location of each project, providing a generic budget estimate for a sample project would provide minimal long-term relevance. Thus, a budget template is shared so project teams can ensure that no major components of the workflow are overlooked during the project planning process. Lastly, if teams would like additional support making their geospatial data more sustainable, they can contact GFDRR’s Digital Earth Platform for assistance.38 LESSONS LEARNED: Project Implementation • Start field data collection with equipment calibration and testing prior to shipment via diplomatic pouch to the project site. • Ensure data collection teams meet with local government officials and World Bank Group staff from the country office to confirm all parties are coordinating activities with the same goals in mind. • The most successful project missions are those where knowledge transfer is consistent throughout the duration of the data acquisition. Thus, encourage local officials’ involvement so they can: – learn mapping techniques through hands-on experience, and – be proactively involved in streamlining access for data acquisition areas. • Data collection via drone and street view vehicles requires flexibility from the field teams due to the previously mentioned issues related to takeoff and landing zone access for drones, inaccessible roadways for street view vehicles, and unpredictable weather events for both teams. • Delegate data processing tasks to processing teams to create the deliverables outlined in the data processing sections. – While a rigid timeline related to data processing delivery can be difficult to pinpoint due to topography and other geographic features, data processing and ML components typically take two to four times the data collection duration. • When communicating ML results, it is imperative to emphasize model performance for each prediction (e.g., level of confidence for a given attribute). • Upon conclusion of the ML, integrate the datasets using standardized data formats (consistent naming conventions, attribute definitions, and data types) into a housing database (e.g., Housing Portal) for visualization and further analysis. • Based on previous and pending projects in SIDS, the applications for the data derived from GPRH workflows across these island nations address similar issues: – locating and characterizing informal settlements, – detecting backyard homes, – retrofitting roofs, – assessing speed of reconstruction efforts after major natural disasters, and – identifying vacant and suitable lands for construction and densification. 38 World Bank GFDRR Digital Earth Platform. https://www.gfdrr.org/en/digitalearthpartnership 48 The work of GPRH supported resilient housing for neighborhoods in St. Lucia. 49 ANSE-LA-RAYE, ST. LUCIA. Benjamin Howell, iStock. Annex 1 – Inventory of GPRH Data TABLE A-1: Inventory of GRPH data as of June 2021 Note that data continues to be collected within Colombia, among other countries. COLOMBIA GHANA GUATEMALA INDONESIA MEXICO PARAGUAY PERU ST. LUCIA ST. MAARTEN Drone coverage 40 11 4 80 31 17 20 8 28 (km2) Car distance 470 – 120 1,700 580 230 140 210 320 (km, inc. overlap) Panoramic images 411,910 – 62,036 587,880 337,601 121,368 107,337 109,200 168,691 Cube images 2,792,710  – 372,216 6,465,264 2,025,606 728,208 644,022 655,200 1,012,146 Street view labels 26,925 – – 25,590 8,240 23,417 20,061 – 12,865 Rooftops  109,300 25,400 4,900 146,000 67,900 35,100 61,600 9,100 13,000 50 Annex 2 – GPRH Housing Portal Metadata TABLE A-2: Metadata descriptions for building data attributes in the GPRH Housing Portal BUILDINGS TITLE ATTRIBUTE DESCRIPTION GENERAL OR GOVERNMENT INFORMATION address street address Street address. aoi area of interest Area of interest. block block Block. geohash geohash Alphanumeric string to encode geographic coordinates. count number of tax records Number of tax records available per building. id building ID Building ID in database. pt_avg average property tax Average property tax per building. pt_avg_owed average property tax owed Average property tax owed (cumulative balance including prior years). pt_sum sum of property tax Sum of property tax per building. pt_sum_owed sum of property tax owed Sum of property tax owed (cumulative balance including prior years). DRONE d_area roof area (m2) Estimated roof area in square meters. d_avg_height height (m) Average (mean) height of the building in meters derived from the rooftop polygon and the digital height information derived from the drone. Calculated using zonal statistics. d_condition roof condition Roof condition is based on construction appearance, such as patching and coloring (rust). Good: new, well-constructed (no holes) and very minimal patching or discoloration. Fair: roof is patched or discolored, but still sturdy; may look old or drab but seems livable and not precarious. Poor: lots of patching, holes, items to hold it down, or bags to stop leaks. Otherwise, the building may be under construction or vacant. d_material roof material Concrete: more than 50% of the rooftop is visibly concrete. Metal: vast majority of the rooftop is covered in metal (70-90%). Mixed: multiple materials used to cover the roof and keep the inhabitants dry. Typically, this is less than 50% concrete. Tile: more than 50% of the rooftop is clay tile or metal tile. Other: tent or tarp material. d_slope ground slope (degree) Average (mean) slope of the ground in degrees underneath the roof. This tends to be overestimated. d_volume volume (m3) Estimated volume of building in cubic meters. STREET VIEW sv_complet complete Construction status of the building: complete or incomplete. sv_condit wall condition Good: new construction and sturdy. Fair: sturdy but shows signs of aging. Poor: dilapidated, temporary, self-built, or not well-maintained. 51 TITLE ATTRIBUTE DESCRIPTION sv_constru construction The predominant construction type is organized into the following categories: Unreinforced masonry: refers to buildings made from brick, stone or concrete blocks that appear from the outside to be missing concrete columns or beams (or both). Other examples include buildings made from adobe or constructed using timber or wooden frames. Reinforced masonry: refers to buildings with confined masonry or concrete frames, which may be called reinforced masonry in some countries. Reinforcement components such as rebar inside of blocks are not always possible to determine from street view analysis but at times, particularly when buildings are ‘incomplete’ rebar is visible. Unknown: unknown. sv_design designed Designed: building designed at one time. Undesigned: building designed incrementally. sv_door doors Number of doors. sv_garage garages Number of garages. sv_materia wall material The predominant classes are plaster, brick or concrete block, and mix/other/unclear. brick or concrete block: plaster; wood – polished; wood – crude/plank; adobe; corrugated metal; stone with mud/ashlar with lime or cement; container/trailer; plant material; mix/unclear/other. sv_securit extra security Secured or unsecured. sv_use building use Residential: used solely for residential purposes. Commercial: used for commercial purposes (e.g., store). Critical infrastructure: used for public purposes, such as education, government, public services, health care, religion, banks or other public infrastructures. This class was used sparingly. Mixed: used for residential and non-residential purposes. A common case is a mini-market on the first floor and residential housing above. sv_vintage vintage Estimated era of building construction (for neighborhoods with distinct architectural periods): 1) pre-1940, 2) 1941–1974, 3) 1975–1999, 4) 2000–present. sv_window windows number of windows HAZARDS hz_earthqu earthquake risk 1=very low, 2=low, 3=medium, 4=high, 5=very high. hz_flood flood risk 1=very low, 2=low, 3=medium, 4=high, 5=very high. hz_landslide landslide risk 1=very low, 2=low, 3=medium, 4=high, 5=very high. hz_tsunami tsunami risk 1=very low, 2=low, 3=medium, 4=high, 5=very high. hz_wind wind risk 1=very low, 2=low, 3=medium, 4=high, 5=very high. ANALYSIS AND FIELD WORK cap_payment payment capacity Capacity of payment from households: maximum household annual income=(estimated value/5). Estimated value can be modeled from field surveys. This is for illustrative purposes. 52 TITLE ATTRIBUTE DESCRIPTION dem_insur home insurance Demand for home insurance premiums: yes, if total quality=good or very good; all other demands=no AND general value=medium or high. Otherwise, no. General value can be calculated from the total quality and building volume, determined per AOI. This is for illustrative purposes. dem_micro home microfinance Demand for home improvement microloans: yes, if ONLY demand for quality improvement=yes and structural improvements=yes; and capacity of payment > USD 10,000. This is for illustrative purposes. dem_reset resettlement Demand for resettlement: yes, if any hazard=5; total quality=poor or very poor. Otherwise, no. dem_struct structural improvement Demand for structural improvement: yes, if earthquake hazard is 3 or lower AND flood, landslide or wind hazard is between 0 and 4 AND construction type=unreinforced masonry or reinforced masonry AND soft story=yes AND total quality=good or fair. Otherwise, no. dem_qualit quality improvement Demand for quality improvement: yes, if hazards are below 5 AND construction type=reinforced masonry AND possible soft story=yes AND total quality=fair, poor or very poor. Otherwise, no. extra_attrs extra attributes General purpose field where extra data may be added such as from a survey. Recommended to export as a geopackage to view. infrastruc access to paved roads Is the building within 10 m of a paved road? 1=yes, 0=no. K3 COVID-19 index The COVID-19 index locates the bottom-40 and bottom-10 vulnerable households at the block level. Variables correspond to overcrowding, age, illness, disability, and access to water, sewerage, electricity, and internet. Values range from 1-3 with 1 being most vulnerable. If all K3 values=3 in export, the index is not available for AOI. land_publi public land Is the building on public land? 1=yes, 0=no. land_servi access to services Is it possible to bring public services? For example, is there a bus stop within 400 m (five-minute walk)? 1=yes, 0=no. opp_expansion expansion Opportunity for expansion: yes, if building height is less than 3 m AND within 200 m of greenspace AND within 10 m of paved road. Otherwise, no. park greenspace Does this building have good access to greenspace? (i.e., is it walking distance or less than 200 meters from a park?) 1=yes, 0=no. soft_story soft story Is this a potential soft story building? For example, the building height is at least 7.5 meters AND has at least one garage AND at least two windows. Other calculations are possible. 1=yes, 0=no. tot_qualit total quality Determined by comparing the roof and wall condition derived from drone and street view imagery. If roof and wall condition are not the same designation (good, fair or poor), the lower value of the two is taken: Very good: if both roof and wall condition are good. Good: if roof or wall condition is good and the other is under construction, vacant or otherwise unknown. Fair: if both roof and wall condition are fair; or if one is fair and the other is good, under construction, vacant or otherwise unknown. Poor: if roof or wall condition is poor and the other is good, fair, under construction, vacant or otherwise unknown. Very poor: if both roof and wall condition are poor. value estimated value Estimated value modelled from field survey. 53 TABLE A-3: Metadata descriptions for sector and greenspace data attributes in the GPRH Housing Portal SECTORS TITLE ATTRIBUTE DESCRIPTION GENERAL OR GOVERNMENT INFORMATION aoi area of interest Area of interest. area_km area (km ) 2 Area of sector in square kilometers. avg_tax average tax Average tax per sector. avg_tax_owed average tax owed Average tax owed per sector. id id Sector ID in database. name sector Name of sector. sector_id sector id Sector ID in database (string). ANALYSIS AND FIELD WORK building_count building count Total number of buildings in sector. commercial commercial buildings Number of commercial buildings. critical_infrastructure critical infrastructure Number of critical infrastructure buildings. fair_quality fair quality Number of buildings in fair condition (total quality). good_quality good quality Number of buildings in good condition (total quality). mixed mixed buildings Number of mixed buildings. poor_quality poor quality Number of buildings in poor condition (total quality). resettlement resettlement Number of buildings in demand for resettlement. residential residential buildings Number of residential buildings. softstory soft story Number of potential soft story buildings. very_good_quality very good quality Number of buildings in very good condition (total quality). very_poor_quality very poor quality Number of buildings in very poor condition (total quality). GREENSPACE TITLE ATTRIBUTE DESCRIPTION area_m area (m )2 Area in square meters. id id Greenspace ID in database. type type of feature Designation of cemetery, forest, grass, meadow, park, playing field, recreation ground, or scrub derived from OpenStreetMap.39 39 OpenStreetMap, https://www.openstreetmap.org/ 54 Annex 3 – Sample Timeline This schedule outlines the workflow and estimated time to complete a project, once preparatory steps have been completed (i.e., determined AOI, produced detailed flying and driving maps, secured flight permits for drones, rented GNSS base station). This estimate is based on a project with four distinct AOIs covering a total area of 100 km2. ACTIVITY ESTIMATED TIME Shipment of Equipment 1 week via diplomatic pouch Data Collection 3 weeks40 – Installation of base station – Drone: take-off and landing from accessible fields – Street view: driving all navigable roads Knowledge Transfer Throughout data collection – On-the-job training for one or two drone pilots (option to use their drones) and several virtual meetings – 1-3 government staff members to shadow or assist with street view capture following the mission – Training on methodology (description of each step) and how to best process drone imagery – Provide training materials and implementation training reports Data Processing 2 months – Drone: 5-8 weeks – Street view: 6-8 weeks Machine Learning 4 months – Drone: 2-3 months – Street view: 2 months (using existing models) – Followed by synthesis of building characteristics into GIS layers Continued Capacity Building 2-3 weeks – Discuss the sustainability and next steps for the application of this data and how it can be updated Follow up – Virtual meeting to coincide with data delivery TOTAL 8 months 40 In general, one can estimate covering 5 km2 per day; however, additional time is required to travel between AOIs and for data collection in areas with steep terrain, high road network density, narrow roadways, or low road surface quality. During the mission, other conditions such as weather or availability of secure, open areas will affect time and coverage. 55 Annex 4 – Budget Template TABLE A-4. Budget template for a project following the complete GPRH methodology PURPOSE ESTIMATE Drone collection – Street view collection – Base station rental – Shipment of equipment – Collection Subtotal – Drone and street view imagery - processing – Drone imagery - machine learning (and manual inspection of rooftop delineation) – Street view imagery - machine learning (classification) – GIS database – Processing and Digitization Subtotal – Continued capacity building – Staff – Knowledge Transfer and Project Management Subtotal – TOTAL – 56 GLOBAL PROGRAM RESILIENT HOUSING CANARIES, SAINT LUCIA. iStock. Sarah Antos Luis Triveño Adam Benjamin Jessica Gosling Goldsmith santos1@worldbankgroup.org ltriveno@worldbank.org abenjamin1@worldbank.org 57 jgoslinggoldsmit@worldbank.org