Results and Performance of the World Bank Group AN INDEPENDENT EVALUATION 64 54 53 51 43 40 43 2020 © 2020 International Bank for Reconstruction and Development / The World Bank 1818 H Street NW Washington, DC 20433 Telephone: 202-473-1000 Internet: www.worldbank.org ATTRIBUTION Please cite the report as: World Bank. 2020. Results and Performance of the World Bank Group 2020. Independent Evaluation Group. Washington, DC: World Bank. COVER PHOTO shutterstock EDITING AND PRODUCTION Amanda O’Brien This work is a product of the staff of The World Bank with external contributions. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The bound- aries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. RIGHTS AND PERMISSIONS The material in this work is subject to copyright. Because The World Bank encourages dissem- ination of its knowledge, this work may be reproduced, in whole or in part, for noncommercial purposes as long as full attribution to this work is given. Any queries on rights and licenses, including subsidiary rights, should be addressed to World Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; fax: 202-522-2625; e-mail: pubrights@worldbank.org. Results and Performance of the World Bank Group AN INDEPENDENT EVALUATION 2020 NOVEMBER 30, 2020 iv v vi XI 1. Introduction 2 2. Part I: Assessing Performance through Ratings 6 World Bank Projects 7 Country Programs 12 Explaining the Trends 13 Responding to COVID-19 and Other Shocks 21 IFC Projects 24 Explaining the IFC Trends 27 MIGA Projects 33 3. Part II: Assessing Outcome Levels 36 Introduction 36 Outcome Classification Framework 37 Project Outcomes 42 Project Outcome Levels and Ratings 46 Thematic Area Outcomes 53 4. Conclusions: Getting to Outcomes 56 Findings and Conclusions 56 Implications 58 Looking Ahead 61 Bibliography 62 Photo Credits 64 II Results and Performance of the World Bank Group 2020 Contents Boxes Box 1.1. Key Terms in This Report 3 Box 2.1. Aspects of Bank Performance 8 Box 2.2. Elements of Monitoring and Evaluation Quality 11 Box 2.3. Smoothing Country Program Ratings 13 Box 2.4. IFC’s Reforms to Strengthen Upstream Engagement 31 Box 3.1. The Outcome Level Framework 40 Box 3.2. IFC’s AIMM System for Setting Project Objectives 45 Box 4.1. Setting Project Objectives 57 Box 4.2. The Coronavirus Pandemic and IFC Project Ratings 59 Box 4.3. A Fresh Approach to Understanding Country Outcomes 61 Figures Figure 1.1. Outcome Levels Classification 4 Figure 2.1. World Bank Project Outcome Ratings, Annual 7 Figure 2.2. Project Outcome Ratings, FY12–14 and FY17–19 8 Figure 2.3. Country Program Outcome Ratings, FCV and Non-FCV Countries 12 Figure 2.4. Decomposing the World Bank Project Rating Increase over FY12–14 and FY16–18 14 Figure 2.5. Outcome Rating Plotted against M&E Quality Rating 16 Figure 2.6. Ratings for Country Program Objectives by Type of Objective 18 Figure 2.7. Country Client Perceptions in FCV and Non-FCV Countries 20 Figure 2.8. Relationship between Quality at Entry and Project Preparation Time 22 Figure 2.9. IFC Investment Project Development Outcome Rating (annual data) 24 Figure 2.10. IFC Advisory Project Development Effectiveness Rating, Three-Year Moving Averages 26 Figure 2.11. IFC Investment Project Development Outcome Ratings by Industry Group 27 Figure 2.12. Factors Affecting IFC Investment Performance 30 Figure 2.13. MIGA Project Development Outcome Rating, Six-Year Rolling Basis 33 Figure 3.1. Steps in the Outcome Levels 37 Figure 3.2. Representative World Bank Project Objectives 39 Figure 3.3. Outcome Levels in IPF and DPF Projects 42 Figure 3.4. IFC Project and Market Claims’ Outcome Levels 44 Figure 3.5. Representative Examples of IFC Claims 45 Figure 3.6. Ratings and Outcome Levels, by Instrument 49 Tables Table 3.1. Ratings and Outcome Levels, by Instrument 48 Table 3.2. Ratings and Outcome Levels for Select Global Practices and Project Types 51 Table 3.3. Project M&E Rating and Evaluated Projects with Lack of Evidence, by Outcome Level 52 Contents Independent Evaluation Group III Abbreviations COVID-19 coronavirus CY calendar year DPF development policy financing FCV fragility, conflict, and violence FY fiscal year GP Global Practice IBRD International Bank for Reconstruction and Development ICRR Implementation Completion and Results Report Review IDA International Development Association IEG Independent Evaluation Group IFC International Finance Corporation IPF investment project financing M&E monitoring and evaluation MIGA Multilateral Investment Guarantee Agency MS+ moderately satisfactory or above MTI Macroeconomics, Trade, and Investment RAP Results and Performance of the World Bank Group S+ satisfactory or better All dollar amounts are US dollars unless otherwise indicated. IV Results and Performance of the World Bank Group 2020 Abbreviations Acknowledgments Rasmus Heltberg, task manager, led the work for this report under the supervision of Alison Evans, Director-General, Evaluation. The core team included Mariana Branco, Claudia Figueroa Huidobro, Gaby Loibl, Xiaoxiao Peng, Stephen Porter, Melvin P. Vaz, Alena Lappo Voronetskaya, and Yi Yao. Other Independent Evaluation Group colleagues also made valuable contributions, including Harsh Anuj, Ana Belen Barbeito, Leonardo Alfonso Bravo, Eric Cruikshank, Unurjargal Demberel, Hiroyuki Hatashima, Estelle Raimondo, Santiago Ramirez Rodriguez, Luis Alvaro Sanchez, Shiva Sharma, and Ichiro Toda. Maximillian Ashwill was the lead editor, and JESS3 created the graphics. The Independent Evaluation Group’s quality enhancement panel included Alison Evans, Oscar Calvo-Gonzalez, and Andrew Stone. The external advisory panel included Dr. Jörg Faust, director of the Deval German Institute for Development Evaluation; Tamar Manuelyan Atinc, retired World Bank staff; and Hans M. Boehmer, retired World Bank staff and adjunct faculty, Columbia University. Acknowledgments Independent Evaluation Group V Overview This Results and Performance of the World Bank Group (RAP) assesses the World Bank Group’s performance by analyzing the achievement of projects and program objectives through ratings and by classifying project objectives according to their outcome levels. This report examines performance and outcomes from different perspectives using evidence from the Bank Group’s results measurement systems. Previous RAPs have relied on the project and country program ratings that these systems collect. However, this report breaks with tradition and analyzes the results measurement systems’ larger evidence base beyond ratings to classify outcome levels for World Bank and International Finance Corporation (IFC) projects. It also reviews how results measurement systems for select corporate priorities add up results and derives implications for the Bank Group’s outcome orientation. Shifting the focus beyond ratings was partially done in response to the Board of Executive Directors’ request for more evidence on the Bank Group’s development outcomes and outcome orientation. The data in this report cover a period ending in 2019 and do not show the coronavirus (COVID-19) pandemic’s consequences for outcomes and performance, though the report identifies some implications for the Bank Group’s COVID-19 response. Part I: Assessing Performance through Ratings World Bank Projects and Country Programs Independent Evaluation Group project data for fiscal year (FY)19 show that 79 percent of World Bank lending operations were rated moderately satisfactory or above (MS+) at completion. This compares with 81 percent in FY18. Looking back over a longer period, the share of closed projects rated MS+ was 71 percent in FY09, declining to 63 percent in FY13 and rising since then. Measured by volume, 82 percent of lending operations were rated MS+ in FY19, staying relatively constant since FY13. When results for FY12–14 and FY17–19 are compared, outcome ratings for investment project financing (IPF) show improved performance, from 68 percent MS+ to 81 percent MS+, and the share of development policy financing (DPF) operations rated MS+ decreased modestly from 72 to 69 percent. Ratings increased in nearly all Regions and Global Practices. The Middle East and North Africa Region had the largest outcome ratings increases and now has the highest rating at 93 percent MS+ in FY17–19. Among the two Africa Regions, Western and Central Africa increased from 52 percent MS+ in FY12–14 to 71 percent MS+ in FY17–19. Eastern and VI Results and Performance of the World Bank Group 2020 Overview Southern Africa was also at 71 percent MS+ in FY17–19 and had remained stable at that level over the period. Project outcome ratings in countries affected by fragility, conflict, and violence (FCV) show improvement but continue to lag those in non-FCV countries. Between FY12–14 and FY17–19, the share of MS+ projects in FCV-affected countries increased from 69 to 77 percent compared with an increase from 69 to 81 percent in non-FCV-affected countries. Other types of project ratings also increased over the past decade. Bank performance improved from 69 percent rated MS+ in FY13 to 84 percent in FY18 and 82 percent in FY19. Quality at entry ratings increased from 58 percent MS+ for projects that closed in FY14 to 75 percent for projects that closed in FY18 and FY19. Monitoring and evaluation quality ratings increased from 31 percent of projects rated substantial or above in FY09 to 51 percent rated the same in FY19. The improvement across these aspects of World Bank project performance, together with broadly conducive economic and institutional conditions in many larger countries during project implementation, helps explain the overall positive outcome ratings trends. Beyond the project level, ratings for country program outcomes reached 72 percent MS+ in FY19, up from 51 percent MS+ in FY09. This increase occurred in International Bank for Reconstruction and Development countries, while country program outcome ratings stayed flat in International Development Association (IDA) and FCV-affected countries. Country program performance was particularly low in FCV-affected countries because of external challenges, including large shocks for which country programs are often not sufficiently prepared. Weak political and technical capacity of governments in FCV-affected countries also explains the lower performance rating for projects focused on institutional and governance reform compared with those focused on service delivery. Responding to COVID-19 Bank Group teams are preparing COVID-19 response projects under tight deadlines amid complex economic and public health contexts. Projects’ quality at entry could suffer because the teams have less time and opportunity to conduct foundational work, client dialogues, and relationship building. Consequently, more frequent project and country program course corrections might be needed during implementation to respond to shocks and unforeseen circumstances and to mitigate issues associated with shorter project preparation time. Simpler procedures for restructuring and canceling projects could enable course corrections. Additionally, in low-capacity settings, teams could consider reducing country program scope when adding COVID-19 response components to avoid overtaxing countries’ low implementation capacity. IFC Investment and Advisory Projects IFC investment project ratings for calendar year (CY)18 are the first to show a slight improvement after a 10-year decline. The CY18 data show that 43 percent of IFC investment Overview Independent Evaluation Group VII projects were rated mostly successful or better on development outcome, down from a peak of 75 percent in CY08 but slightly up from 40 percent in CY17. Measured by net commitment volumes and three-year moving averages, IFC’s development outcome ratings declined from 83 percent rated mostly successful or better in CY07–09 to 43 percent in CY16–18 and 48 percent in CY17–19. Over this longer period, performance declined for all Regions, industry groups, country categories, and equity and loan instruments. A combination of internal work quality issues, external risk factors, and broader market trends help explain IFC’s performance trends. Issues with IFC staffing, incentives, accountability, and focus on volume targets over development results affected work quality. Market, country, and sponsor risks often distinguished higher-rated projects from lower-rated projects. Those with strong sponsors and business fundamentals coped better with market risks than projects without those characteristics. Additionally, projects that were better prepared to cope with currency devaluations and political and regulatory risks improved the likelihood of higher ratings. Broader market trends may have made IFC’s business model more exposed to risk and weakened the pool of available projects with attractive risk-reward profiles. IFC has taken steps to improve work quality, focus on development results, grow the pool of bankable investment projects, and better identify risks and market opportunities. Development effectiveness ratings began to improve for IFC advisory services projects evaluated in FY17–19, when 50 percent of them were mostly successful or better. The share rated mostly successful or better declined from 65 percent in FY12–14 to 38 percent in FY15–17. Measured by funding amounts, development effectiveness ratings declined from 70 percent mostly successful or better in FY12–14 to 33 percent in FY15–17 but then increased to 49 percent in FY17–19. Successful advisory projects often had strong client commitment, flexible and proactive supervision, and robust project monitoring and evaluation. Multilateral Investment Guarantee Agency Projects Multilateral Investment Guarantee Agency (MIGA) projects’ development outcome ratings have continued on an increasing trend. These ratings increased from 64 percent satisfactory or better (S+) in FY07–12 to 69 percent S+ in FY13–18 when calculated by number of projects, and from 61 percent to 75 percent S+ when calculated by gross issuance amounts. MIGA projects in IDA and FCV-affected countries achieved high ratings—for example, 77 percent of MIGA projects in IDA countries were S+ compared with 63 percent in non-IDA countries in FY13–18. An analysis of MIGA projects in IDA countries found that MIGA promoted private sector investment by deterring political risks and resolving issues such as arrears payments by governments, for example. VIII Results and Performance of the World Bank Group 2020 Overview Part II: Assessing Outcome Levels Classification Framework This RAP uses a theory of change framework to classify outcome levels, thus providing new information on the most common types of Bank Group project outcomes. The framework captures the intended and achieved outcomes of World Bank projects and the intended outcomes of IFC projects. The four outcome levels are the following: 1 · Outputs from Bank Group projects and activities 2 · Early outcomes such as a new capacity or better access to public services 3 · Intermediate outcomes such as a meaningful change in policy outcomes or beneficiaries’ lives 4 · Long-term outcomes with systemic effects nationally or across sectors that contribute to general well-being Outcome Levels Project objectives cluster in clear outcome patterns depending on the sector and lending instrument. The patterns show that most IPF objectives focus on quality and access to services and cluster at level 2. However, IPF objectives in a few sectors (most notably agriculture and environment) have a clearer focus on end beneficiaries and cluster at level 3. Most DPFs, which focus on policy reform objectives outcomes, cluster at level 3, and recently approved IFC projects, which often focus on market creation objectives, cluster at level 3. The relationship between projects’ outcome levels and their performance rating is only modest and becomes insignificant when controlling for other factors. Ratings for projects with level 3 and 4 outcomes are modestly lower than for projects with level 2 outcomes, but the difference in ratings is insignificant when controlling for instrument and monitoring and evaluation quality. Many projects with higher-level objectives manage to achieve good Independent Evaluation Group ratings, in part by having strong results frameworks to measure outcome achievement. This finding suggests that there is no systematic trade-off between projects’ outcome level and ratings, though it would not be realistic or desirable to expect all World Bank projects to have objectives at outcome level 3 or 4. Differences in rating performance between IPFs and DPFs and between the lowest-rated Global Practice and other Global Practices appear more closely associated with levels of risk and the inherent difficulty in achieving policy and institutional reforms compared with service delivery improvements. Evaluation methods differ in reality between IPFs and DPFs, which may also play a role. Overview Independent Evaluation Group IX Thematic Area Outcomes This RAP finds that the Bank Group clearly articulates higher-level outcomes for its global and thematic work in key thematic areas such as FCV, gender, and climate change. Results measurement systems in these thematic areas serve an essential accountability function by assuring that business units meet output and process targets, which are under the Bank Group’s direct control. Yet a strong focus on monitoring targets can cause a risk-averse corporate culture and lead to box-checking behavior, meaning perfunctory rather than substantive compliance. Overall, systems that measure thematic area results do little to orient the Bank Group toward achieving higher-level outcomes. Conclusions: Getting to Outcomes This RAP concludes that the Bank Group often has limited evidence of its higher-level outcomes and can improve how its incentives and results measurement systems support outcome orientation. Projects’ objectives need to balance realism and ambition, and therefore, one should not expect all projects to have higher outcome levels. There are more opportunities to gather evidence on broader outcomes at country program level. Confronting trade-offs related to the purposes of the Bank Group’s results measurement systems is necessary for improving outcome orientation. The Bank Group’s results measurement systems collect evidence needed for ratings and for process and compliance monitoring. Systems collect little evidence on the Bank Group’s contributions to higher-level outcomes, partly because such outcomes are hard to monitor and combine. At the project level, setting objectives and assessing achievements that can be attributed to Bank Group support continue to be important for the institution’s accountability. Beyond the project level, there is a need to rethink the approach to collecting outcome evidence. A suitable approach would downplay ratings-based accountability, focus on contribution rather than attribution, and help stakeholders understand how different projects and types of Bank Group engagements collectively contribute to country-level outcomes over a longer period. X Results and Performance of the World Bank Group 2020 Overview Management Comments Management of the World Bank Group institutions welcomes the Independent Evaluation Group (IEG) report, Results and Performance of the World Bank Group 2020 (RAP 2020). Management welcomes the positive overall findings by IEG on performance at both the project and country program levels. The report’s findings provide useful inputs to both learning and strategic decision-making. World Bank Management Comments Management welcomes IEG’s positive overall findings regarding performance at the project and country program levels. The report notes that 79 percent of World Bank projects that closed in fiscal year (FY)19 and were evaluated by IEG were rated moderately satisfactory or better (MS+) at completion, surpassing corporate targets (75 percent), and also that project ratings by volume have remained above corporate targets since FY13. Bank Group country program outcome ratings increased from 51 percent MS+ in FY09 to 74 percent MS+ in FY17 across all reviewed country program cycles. Management believes that these positive trends are partly the result of proactive management of quality at entry and enhanced focus on supervision. Management notes that although project outcome ratings for International Development Association (IDA) countries and countries affected by fragility, conflict, and violence (FCV) have improved, country program outcome ratings stayed flat in these countries. Ratings for projects in IDA countries increased from 68 percent to 78 percent MS+ in FY12–14. Projects in FCV-affected countries rose from 69 percent to 77 percent MS+ over the same period. Management notes, however, that these improvements were not concomitantly reflected in country outcome ratings. Management believes that in addition to the more difficult contexts, this trend is partly explained from a somewhat rigid use of results frameworks, which penalizes course correction in countries that necessitate more flexibility. Management is reassured that this challenge has also been pointed out by IEG in the report Outcome Orientation at the Country Level and that pilot solutions are being explored. Anticipating this, the February 2020 FCV strategy articulated, as one of its operationalizing measures, that the World Bank would enhance its evaluation framework for country programs in FCV settings by encouraging more realism—both in objective setting and in project design and implementation—and would also make the evaluative framework more adaptable to dynamic circumstances and to situations of low institutional capacity and high levels of risk and uncertainty. Management is pleased to note the positive trends for Bank performance (82 percent in FY19), quality of supervision (86 percent in FY19), quality at entry (75 percent in FY18–19), Comments Independent Evaluation Group XI and monitoring and evaluation (M&E; 51 percent in FY19) and acknowledges room for continuous improvement. Management is particularly reassured to note that sustained efforts to improve M&E are starting to yield results. Given that IEG ratings are based on closed operations, which were designed 5–8 years earlier in most cases, this improvement in M&E quality ratings demonstrates that recurrent management efforts, such as enhanced tools, guidance, training, and resources for staff to strengthen project quality and promote more robust M&E practices (including a stronger focus on intervention logic, results frameworks, results indicators, and M&E) are proving effective. Just recently, management launched an M&E Gateway to serve as one stop clearinghouse for M&E resources across the World Bank. These efforts are expected to improve M&E quality further, particularly as management rolls out its pathways to strengthen outcome orientation in operations and country partnership frameworks. Management recognizes the need to ensure that development policy financing (DPF) projects perform as strongly as investment project financing (IPF) projects but does not see sufficient evidence of a deteriorating trend. The report notes that the share of DPF operations rated MS+ decreased modestly to an average of 69 percent between FY17 and FY19, from 72 percent between FY12 and FY14. The report rightly identifies that “some risks relate to the nature of the DPF instrument itself” and that “evaluation methods also play a role, as, de facto, they differ between IPFs and DPFs.” What the report does not make explicit is that, in FY19, only three DPFs were added to the analysis of DPF outcome ratings, with the result that performance changes period-over-period were driven in part by a sample too small to be representative. Management notes the new outcome classification that IEG has explored in the evaluation and is reassured by IEG’s finding that operations that specify higher-level outcomes can perform well—when the operational context is right. At the same time, management urges caution regarding the inference that operations with higher-level development objectives are more ambitious. Although all operations use a theory of change approach, not all project development objectives are explicitly set at the same level for multiple reasons. For example, identifying objectives that can be attributed to World Bank interventions continues to be important for accountability and transparency. It is also part of applying the theory of change rigorously and consistently and requires elaborating objectives at lower levels, where attribution is typically stronger. This underpins the report’s finding that approximately 72 percent of IPFs state their outcome objectives at level 2, and another 26 percent at level 3. These level 2 outcomes (for example, improved quality or access to social- or infrastructure- related public services) are relatively easier to attribute to World Bank support by the time the project closes and clients often favor that. Experience has also shown that, for IPFs, level 4 outcome objectives are hard to reach. Human Development projects, for example, rarely go to level 4 outcomes because these outcomes take time to be realized and are often affected by factors outside the control of a single project or program. Having said that, management recognizes that more needs to be done to ensure that operations establish more explicit lines of sight toward higher-level outcomes and international commitments. This is an intended Management XII Results and Performance of the World Bank Group 2020 Comments effect of the World Bank’s renewed efforts to strengthen outcome orientation and to connect the risk and results thinking. Management is of the view that corporate results measurement is instrumental to advance outcome orientation. The report argues that there is a “trade-off” between the World Bank’s outcome orientation and its existing Results Measurement Systems (RMS) for tracking commitments. Management views these as complementary. The IDA RMS and the Bank Group Corporate Scorecard, along with other corporate reporting systems, are designed to monitor short- to medium-term results, which can be attributed to individual projects and aggregated for portfoliowide reporting. It is possible and often desirable to indicate how these same short- to medium-term results contribute to longer-term results that are higher up the same results chain. Both the RMS and the Corporate Scorecard incorporate long-term development outcome indicators (tier I) to contextualize the global development environment in which we are operating, in addition to reporting aggregate World Bank–supported results. Inclusion of selected indicators, accompanied by targeted actions, in these instruments has proved effective over time to advance new priorities and commitments, such as gender, climate change, and citizen engagement. International Finance Corporation Management Comments Management of the International Finance Corporation (IFC) welcomes IEG’s Results and Performance of the World Bank Group 2020 report. The report provides both an assessment of project performance and a review of whether the set project objectives are sufficiently focused on development outcomes, that is, whether development impact is accurately and adequately captured by aggregated project performance metrics. IFC management appreciates the increasingly collaborative approach by IEG in support of improvement in the methodologies underpinning the assessment of IFC’s development performance. In addition to IFC’s initiative to engage IEG on the impact of the coronavirus (COVID-19; described in box 4.2 of the RAP), IFC looks forward to sustaining this engagement on methodology. IFC values both IEG’s and the Board of Executive Director’s willingness to consider this question because it is important to assess whether the metrics we have are appropriate for what we want to measure and ultimately know about our impact. IFC management regrets that the treatment and presentation of the performance data in chapter 2 makes it impossible to discuss IFC’s current direction with respect to addressing a historic trend in performance. New readers of the report may fail to appreciate that what is being reported here is not IFC’s current performance but rather how IEG has evaluated projects for which objectives were set approximately seven years ago. The most recent investment projects evaluated in the report were approved by the Board in calendar year Comments Independent Evaluation Group XIII (CY)13, and were part of the CY18 Expanded Project Supervision Report (XPSR) cohort, which IEG evaluated during CY19. Similarly, for advisory projects, the cohort includes only a preliminary sample of projects closing in FY19, with most conclusions being drawn from projects designed an average of seven years ago. IFC’s ongoing efforts in the past two years to improve performance have started showing noticeable results in the CY19 evaluations and ratings, which may not be apparent to shareholders and stakeholders until the 2021 RAP. For improvements related to quality at entry, an even longer period will be required for these to be reflected in the RAP results. It is worth further noting that given the evaluation and reporting time lag, projects assessed ex ante under the Anticipated Impact Measurement and Monitoring (AIMM) framework at the time of Board approval will start being evaluated as part of CY22 XPSR program with results being validated and reported by IEG in subsequent years. IFC is nevertheless pleased to see that the work mentioned to address declining development effectiveness ratings for advisory services projects in management responses to previous RAPs is reflected in recognizable year-on-year improvements in performance. IFC believes that to ensure that this is clearly understood by all readers, including new readers, IEG should provide clearer metadata and signposting with respect to exactly what is being described, including in the headings of graphics. In this context, IFC management believes that it is worth restating observations made previously with respect to the sustained effort to turn around IFC’s performance, as this provides context and brings the report up to date. IFC has previously highlighted that the effort to deliver greater development effectiveness has included the creation of the Economics and Private Sector Development Vice Presidential Unit to strengthen project and macroeconomic analyses, the launch of the AIMM framework, and the Accountability Initiative. The latter initiative informed subsequent decisions in the operational realignment, including very significant changes to the Accountability and Decision-Making framework. The strengthening of IFC’s operational practices and processes is ongoing. IFC management has initiated multiple efforts to improve the quality of self-evaluations (Expanded Project Supervision Reports [XPSRs] / Project Completion Reports) and proactively engage on other associated activities, including the review of validation notes (EvNotes) and IEG independent evaluations undertaken on closed projects (Project Evaluation Summaries). This effort comprises targeted, expert advice to strengthen the analysis and articulation of a project’s overall outcome, including development impact, along with increased support to facilitate the effective management and processing of XPSRs and Project Completion Reports or Project Evaluation Summaries. IFC management wishes to complement the report’s consideration of an outcome-based approach to evaluation by noting that IFC has already made a conscious decision to go beyond the direct impacts captured by the Development Outcome Tracking System and include indirect effects of IFC investments (on market creation and development impact). In addition, through AIMM, IFC introduced an ex ante component to complement the existing Management XIV Results and Performance of the World Bank Group 2020 Comments ex post M&E approach. Furthermore, IFC strengthened the broader corporate incentives to put development impact at the heart of IFC’s decision-making with respect to where and how to deploy IFC’s scarce resources by ensuring that AIMM scores inform the project assessment and approval process. IFC management welcomes the restatement in the report of earlier work to consider risk factors that influence project performance. However, these findings suggest the need for an evaluation approach to account for external shocks (generally unexpected and beyond the control of either IFC or our clients) and allowing for a more systematic treatment of risk, not fully considered in the report. IEG’s machine learning exercise discussed in the report confirms the results of an earlier IFC-IEG joint study, which found that sponsor- selection risks, market risks, country risks, and transaction structuring are factors that most frequently distinguished investment projects with good ratings from less successful ones. IFC appreciates the observation made in the report that, over time, IFC has taken on (and will continue to take on as part of the IFC 3.0 strategy) greater risks over which it can only exercise limited control. In this context, which is further exacerbated by the forces unleashed as a result of the COVID-19 crisis on these extant risks, there is a clear need to pursue a more systematic approach to performance measurement and evaluation that factors in the changing nature of IFC’s business model to one with an inherently higher risk tolerance, in a dynamic private sector environment. Encouraging private sector investment as part of recovery from COVID-19 will demand that IFC encourage sponsors of projects to consider new business lines within broader reallocation of resources by the private sector across sectors within the economy, and that IFC’s clients and portfolio absorb the exogenous shock of weak consumer demand. This is in addition to the policy decision in the Forward Look to take on more country risk, implicit in the pivot to IDA and fragile and conflict-affected situations (FCS). In this context, there is a distinct possibility that the recently observed improvement in development outcome ratings as a result of IFC’s turnaround efforts, may not be sustained due to the disruption experienced by projects designed in a pre–COVID-19 environment. Even in instances in which IFC’s investments deliver a positive development outcome in the context of the current crisis, for some projects the precrisis designed objectives may not be achievable in a radically changed environment. IFC management welcomes the inclusion of a summary outline of IFC’s reforms to strengthen upstream engagement. As the report highlights, as with previous IFC initiatives in recent years, the goal is improved coordination toward greater development outcomes. Upstream is a more proactive way of doing business by getting involved earlier in the sector and project development process, including conceiving opportunities for unlocking sectors of the economy and conducting feasibility studies to generate investment-ready opportunities. It is IFC’s most recent initiative, perhaps the most critical building block of the internal reforms IFC has implemented over the past four years. IFC believes that upstream will be an important component of the institution’s response to the restructuring and recovery phase of the pandemic and key to an effective crisis response. It highlights the potential value of a Comments Independent Evaluation Group XV shift away from a project-by-project approach to evaluation of IFC’s performance toward a model more anchored in outcomes, as failures are implicit in the design of a more adaptive project discovery approach within IFC’s broader business model. Evaluating success or failure with respect to the impact of these actions will demand a “bigger picture” perspective of IFC’s overall performance than can currently be captured in the RAP. IFC management notes that the report sets out that there is a need for instruments suited for collecting higher-level outcome evidence. However, it is not clear what form a new system might take or what the effect could be on resources. IFC believes that, should an outcomes- based approach be pursued, it will be necessary to explore the detailed implications, including the implications for costs, staff, and existing systems. The creation and implementation of the AIMM approach and training of staff has required a very considerable effort over the past four years, which suggests that we should approach this question as one of continuous and steady evolution of our understanding of the contribution we make to effecting change. Multilateral Investment Guarantee Agency Management Comments The Multilateral Investment Guarantee Agency (MIGA) welcomes RAP 2020. MIGA welcomes IEG’s Results and Performance of the World Bank Group 2020 report and finds it useful and important. MIGA commends IEG for streamlining and sharpening the focus of the RAP 2020 report and exploring new themes. MIGA thanks IEG for the productive engagement during the drafting of the report. Historically high MIGA development results. The report presents many useful findings, and MIGA appreciates IEG’s observations. In particular, the report notes the steady increase in in the development outcome success rates of MIGA guarantee projects over the past 10 years. The development outcome success rate for the period under review, FY13–18, reached the MIGA- historic high of 69 percent by number of projects (n = 71) and 75 percent by gross issuance amount ($7,725 million). The increase in the development outcome success rate has been driven by strong performance in IDA (74 percent by number, 82 percent by amount), FCV (78 percent by number, 84 percent by amount), the Energy and Extractive industries sector (79 percent by number, 87 percent by amount), and the Eastern Europe and Central Asia Region (73 percent by number, 78 percent by amount). MIGA notes that the sustained increase in development outcome success rates to historic high levels validates the Agency’s increased emphasis on underwriting impactful projects in difficult settings and increased attention to monitoring, evaluation, and learning. In addition, MIGA’s efforts to diversify the Europe and Central Asia portfolio away from financial markets—which was adversely impacted by the 2008 global financial crisis—to other sectors has been instrumental in improving overall performance, as noted in the report. Management XVI Results and Performance of the World Bank Group 2020 Comments Good performance of IDA and FCS projects. The report finds that MIGA played an active and important role in promoting private sector investment through projects in IDA and FCS countries. MIGA notes that the good IDA and FCS performance to be an important foundation for the Agency’s FY21–23 strategy, which emphasizes continued support for IDA and FCS as strategic priorities. MIGA notes that the strong IDA and FCS results bode well for the Agency’s ambition for further deepening the development impact of MIGA guarantee projects. Remarkable progress in environmental and social (E&S) performance. MIGA welcomes the report’s recognition of the remarkable progress made regarding the E&S results of MIGA guarantee projects. During FY13–18, E&S effects was the highest-rated development outcome indicator, with a success rate of 84 percent by number and 88 percent by amount, compared with 50 percent by number and 46 percent by amount during FY07–12. MIGA notes the rapid strides made in E&S monitoring and supervision after the adoption of Performance Standards on Social and Environmental Sustainability in 2007 and the launching of E&S policy implementation monitoring in MIGA guarantee projects in 2011. MIGA notes that the strong E&S results highlighted in the report have been on account of the Agency’s enhanced E&S monitoring and supervision efforts of its guarantee projects. MIGA notes the good example cited in the RAP 2016 report of an oil and gas sector project in Uzbekistan, where the MIGA team helped solve critical E&S issues by convening external industry experts. COVID-19 and MIGA projects. The report states that IEG and IFC are discussing potential adjustments to project ratings to account for shocks like COVID-19, including making project objectives more realistic by rating projects based on projects’ midcourse correction targets rather than those set at approval before the shock occurred and giving IFC more flexibility to choose the evaluation timing, which may help projects recover and meet targets at a later time. MIGA notes that, given the broad similarities between the ex post project evaluation frameworks for IFC investment projects and MIGA guarantee projects, COVID-19–type shocks impact MIGA projects as well. MIGA looks forward to working with IEG and exploring similar rating and evaluation timing adjustments to MIGA guarantee projects, which are facing broadly similar challenges to IFC investment projects from the COVID-19 pandemic. Characteristics of MIGA guarantee projects. In its discussion on the historically high development outcome success performance, the report delineates some key characteristics of MIGA guarantee projects, which are reflective of the Agency’s mandate and business model: (i) MIGA’s clients are larger multinational investors; (ii) MIGA political risk insurance guarantees against political risks; (iii) the relatively large size of MIGA-supported projects makes them visible in host countries and motivates governments to help them succeed; and (iv) MIGA originates the majority of its projects from part 1 countries. In addition to these project characteristics, MIGA notes the significant initiatives that the Agency has undertaken to (i) enhance project selection; (ii) strengthen assessment, underwriting, and monitoring; (iii) bolster results measurement systems; (iv) implement an ex ante development impact assessment system; and (v) promote learning from evaluation. These measures have played a critical role in the steady improvement in the development outcome success rates of MIGA guarantee projects to the current historically Management Comments Independent Evaluation Group XVII high levels of 69 percent by number and 74 percent by amount. MIGA also notes an important caveat to the report’s reference to the relatively larger size of MIGA guarantee projects, due to the fact MIGA support for smaller guarantee projects—including small and medium enterprises—through the Small Investment Program (https://www.miga.org/small-investment- program) are evaluated on a programmatic basis rather than at the project level. In other words, MIGA support for small guarantee projects is not reflected in IEG’s RAP 2020 project evaluations database, and therefore the report’s reference to the “relatively large” size of MIGA guarantee projects is not fully accurate. Management XVIII Results and Performance of the World Bank Group 2020 Comments 1 . Introduction This is the 10th Results and Performance of the World Bank Group (RAP) by the Independent Evaluation Group (IEG). The RAP assesses the World Bank Group’s performance by analyzing the achievement of project and program objectives through validated ratings and by classifying these objectives according to their outcome levels. It also explains key results and performance trends and discusses ways in which the Bank Group can continue to enhance its results measurement systems and outcome orientation. Shifting the focus beyond ratings was partially in response to the Board of Executive Directors’ request for more evidence on development outcomes and outcome orientation. It was also prompted by the recent capital increases to the International Bank for Reconstruction and Development (IBRD) and the International Finance Corporation (IFC), the International Development Association (IDA) Replenishment, and the need to report on a wider range of project and country outcomes from that expanded resource base. 2 Results and Performance of the World Bank Group 2020 Chapter 1 Box 1.1. Key Terms in This Report Project development objectives: World Bank projects’ stated objectives framed as a positive outcome. In the International Finance Corporation’s new Anticipated Impact Monitoring and Measurement system, project claims and market claims are similar statements of objectives or intended outcomes. Outcome orientation: a term used when the World Bank Group generates credible evidence on the outcomes from its development interventions and uses this evidence to engage clients and adapt interventions and portfolios to bolster performance Outcomes: changes in behaviors, conditions, or situations resulting from Bank Group activities. Outcomes include intended, unintended, positive, and negative changes. Ratings: a measure of projects’ and programs’ success relative to objectives stated at approval or revised subsequently. Different aspects of projects and programs have separate ratings. For World Bank projects, the outcome rating measures how effectively and efficiently the project achieved its relevant objective. Results: an all-encompassing term that refers to the outputs and outcomes from a development intervention Results measurement systems: measurement systems that add up ratings and indicators from multiple projects and programs. The Bank Group has different primary results measurement systems for its main business lines, including World Bank, International Finance Corporation, and Multilateral Investment Guarantee Agency projects and country programs. The Bank Group also has different aggregated results measurement systems, such as the Corporate Scorecards and the results measurement systems for International Development Association, gender, and climate change. Self-evaluation: the formal, empirical assessment of a project, program, or policy written by or for those in charge of the activity Validation: the Independent Evaluation Group’s independent, critical review of the evidence, results, and assessments from self-evaluations Source: Independent Evaluation Group. Chapter 1 Independent Evaluation Group 3 This report examines ratings and outcomes from different perspectives using evidence from the Bank Group’s results measurement systems (see box 1.1 for key terms). Previous RAPs have relied on the project and country program ratings that these systems collect to understand the Bank Group’s results and performance. However, these results measurement systems contain much more evidence and many more indicators beyond project and program ratings. This report breaks with tradition and analyzes this larger evidence base to also describe outcomes and classify outcome levels, particularly for closed and rated World Bank projects and for recently approved World Bank and IFC projects. It also reviews how results measurement systems for select corporate priorities add up results. To do so, the report synthesized sectoral theories of change derived from World Bank and IFC projects, among other sources, to build an outcome classification framework that could classify interventions’ stated objectives along a change pathway. Figure 1.1 defines this framework. In doing so, the report could examine different types and levels of outcomes and how these relate to performance, and assess the line of sight—or connection—between the Bank Group’s results measurement systems and higher-level outcomes. Figure Figure 11Outcome Figure 1.1.Outcome Classifications Classifications Classification of Outcome Levels LEVEL11 Outputs LEVEL Outputs Activities Activities and and delivered delivered outputs, outputs, such such as as knowledge knowledge products, products, goods, goods, equipment, equipment, and and services services LEVEL22Early LEVEL orImmediate Earlyor Outcomes ImmediateOutcomes New New capacities capacities and and better better access access to to public, public, private, private, or or environmental environmental services services LEVEL33Intermediate LEVEL Intermediate Outcomes Outcomes Meaningful Meaningful change change inin policy policy outcomes outcomes oror beneficiaries’ beneficiaries’ lives lives LEVEL44Long-Term LEVEL Outcomes Long-TermOutcomes Sustained Sustained long-term long-term outcomes outcomes that that eventually eventually arise arise with with sustained sustained changes changes inin delivery, delivery, governance, governance, oror citizens’ citizens’ well-being well-being Source: Independent Evaluation Group. Refer to the full methodology in part II. 4 Results and Performance of the World Bank Group 2020 Chapter 1 This report is in two parts. Part I is on performance as assessed through ratings and reports ratings trends for projects and country programs and identifies explanatory factors behind portfolio performance. Part II is on assessing outcome levels and classifies objectives according to their outcome levels, examines links between performance and outcome levels, and discusses results measurement systems’ outcome orientation. The RAP concludes with some key findings and implications for the Bank Group’s coronavirus (COVID-19) pandemic response and its outcome orientation. Chapter 1 Independent Evaluation Group 5 2. Part I Assessing Performance through Ratings This chapter reports Bank Group ratings trends for World Bank projects, the Bank Group’s country programs, IFC investment projects and advisory services, and Multilateral Investment Guarantee Agency (MIGA) projects. It also explains some major trends and patterns in ratings, focusing on World Bank projects and IBRD country programs’ positive performance trends; programs in countries affected by fragility, conflict, and violence (FCV); and IFC’s less positive performance. In line with common practice, the chapter treats ratings as a success metric. Ratings measure projects’ achievement relative to objectives and targets stated at approval or revised subsequently. Ratings are not comparable across the three Bank Group institutions because of differences in mandates and business models. 6 Results and Performance of the World Bank Group 2020 Chapter 2 World Bank Projects Overall outcome ratings for World Bank lending are high. Of the 167 Project Completion Reports for projects that closed in fiscal year (FY)19 and were validated by IEG, 79 percent were rated moderately satisfactory or above (MS+) on achieving their stated outcomes. This is a slight decrease from 81 percent in FY18. Looking back over a 10-year period, outcome ratings declined from 71 percent MS+ for project closures in FY09 to 68 percent MS+ in FY13, and they increased again to 81 percent MS+ in FY18 and 79 percent in FY19. A numerical conversion of the ratings scale, done to test the trend’s robustness, shows the same pattern of ratings declines until FY13 and increases afterward (figure 2.1). Because of the improved project performance, outcome ratings increasingly cluster in the moderately satisfactory or satisfactory points of the scale. Figure 2.1. World Bank Project Outcome Ratings, Annual Average rating Outcome rating 6 100 90 Outcome rated MS+ (percent) 5 81 81 79 80 Rating 74 71 72 70 68 69 68 69 70 4 4.10 4.18 4.10 3.91 3.94 3.83 3.84 3.84 3.80 3.80 3.85 60 3 50 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Year Source: Independent Evaluation Group. Note The dark blue line shows the numerical value of the six-point rating scale, which assigns 1 for highly unsatisfactory, 2 for unsatisfactory, and so on, with 6 being highly satisfactory. The light blue line represents the conventional percentage of projects rated moderately satisfactory or above. MS+ = moderately satisfactory or above. Chapter 2 Independent Evaluation Group 7 Outcome ratings have been more stable over time when measured by project volume, increasing from 78 percent MS+ in FY08 to 82 percent in FY13, 84 percent in FY18, and 82 percent in FY19. IEG ratings for Bank performance improved from 69 percent of projects rated MS+ in FY13 to 84 percent in FY18 and 82 percent in FY19, reflecting better ratings for quality of supervision and quality at entry, the two components of the Bank performance rating (box 2.1). Box 2.1. Aspects of Bank Performance Quality at entry refers to the extent to which the World Bank identified, prepared, and appraised the operation so that it was most likely to achieve planned development outcomes. Quality of supervision refers to the extent to which the World Bank identified and resolved threats to the achievement of development outcomes and to fiduciary aspects. The rating for quality at entry combined with the rating for quality of supervision determines the Bank performance rating. Monitoring and evaluation (M&E) quality refers to the design and implementation of the project’s M&E arrangements and the extent to which the data are used to improve performance. M&E quality is not a formal dimension of the Bank performance rating, though aspects of M&E overlap with quality at entry and quality of supervision. Source: Independent Evaluation Group. Outcome ratings increased in most parts of the portfolio. To have robust sample sizes, IEG compared the project closings of three-year cohorts. It compared FY12–14, when outcome ratings were at their lowest point (69 percent MS+), with FY17–19, when ratings were 80 percent MS+ (figure 2.2). Ratings for investment project financing (IPF) operations rose from 68 percent MS+ to 81 percent, and ratings for development policy financing (DPF) operations declined from 72 percent MS+ to 69 percent between FY12–14 and FY17–19, based on preliminary FY19 data. Outcome ratings moved upward in nearly all Global Practices (GPs). Outcome ratings decreased in the Macroeconomics, Trade, and Investment (MTI) GP, which has the lowest ratings among all GPs, at 55 percent MS+ in FY17–19.1 The MTI GP leads on many DPFs. Among the GPs with sizeable portfolios, the Education and Environment GPs’ project ratings increased the most. Currently, the Education GP has the highest ratings, at 92 percent MS+. Part II examines the reasons for the ratings differential between IPFs and DPFs and between the highest- and lowest-rated GPs. Ratings for projects in IBRD countries increased from 71 percent MS+ in FY12–14 to 82 percent in FY17–19. Ratings for projects in IDA countries increased from 68 percent MS+ to 78 percent over the same period. Ratings increased in nearly all Regions. The Middle East and North Africa Region saw the largest outcome ratings increases and now has the highest rating, at 93 percent MS+ in FY17–19. The Africa Region was split into two vice presidential units, 1 Again, the fiscal year (FY)19 data is preliminary, so these numbers will change as more projects complete their evaluations. 8 Results and Performance of the World Bank Group 2020 Chapter 2 effective July 1, 2020. Although the two Africa Regions were both at 71 percent MS+ in FY17–19, their trends differ. Western and Central Africa increased from 52 percent MS+ in FY12–14, but Eastern and Southern Africa remained stable from 72 percent MS+ in FY12–14. Outcome ratings in FCV-affected countries increased modestly but remained below those in non-FCV-affected countries. Projects in FCV- and non-FCV-affected countries were both at 69 percent MS+ in FY12–14. Outcome ratings rose to 77 percent MS+ in FCV-affected countries in FY17–19 compared with 81 percent in non-FCV-affected countries (figure 2.2). Figure 2.2. Project Outcome Ratings, FY12–14 and FY17–19 (percent rated MS+) FY 12–14 Global Practice FY 17–19 FY 12–14 Region FY 17–19 50% Poverty and Equity 60% 72% Eastern and Southern Africa 71% 66% Education 92% 52% Western and Central Africa 71% 76% Urban, Resilience, and Land 86% 76% Latin America and the Caribbean 81% 89% Social Protection and Jobs 87% 77% South Asia 82% 71% Transport 83% 75% Europe and Central Asia 83% Environment, Natural Resources, 61% and the Blue Economy 90% 68% East Asia and Pacific 88% 70% Agriculture and Food 80% 63% Middle East and North Africa 93% 66% Energy and Extractives 81% 78% Health, Nutrition, and Population 82% 72% Finance, Competitiveness, and Innovation 78% FY 12–14 Instrument FY 17–19 61% Water 72% 72% DPF 69% 54% Governance 64% 68% IPF 81% 67% Macroeconomics, Trade, and Investment 55% FY 12–14 Agreement type FY 17–19 FY 12–14 FCV status FY 17–19 67% Recipient-executed trust fund 79% 69% FCV 77% 68% IDA 78% 69% Non-FCV 81% 71% IBRD 82% Source: Independent Evaluation Group. Note DPF = development policy financing FY = fiscal year MS+ = moderately satisfactory FCV = fragility, conflict, and violence IPF = investment policy financing or above Chapter 2 Independent Evaluation Group 9 The underlying improved performance trends are also seen in higher ratings for projects’ quality at entry (see definition in box 2.1). The share of MS+ quality at entry ratings increased from 58 percent MS+ for projects that closed in FY12–14 to 75 percent for projects that closed in both FY18 and FY19. Quality at entry ratings increased in all Regions and all Practice Groups except for Equitable Growth, Finance, and Institutions. Projects in FCV-affected countries have similar quality at entry ratings: 73 percent MS+ in FY18, a pattern also seen in previous years. The World Bank maintained strong quality at entry even as it responded to the global financial crisis and increased its annual commitments to client countries by 130 percent. In fact, quality at entry improved for projects that had been approved since FY09. In part, this was possible because the World Bank increased the size of projects under preparation during the global financial crisis more than it increased the number of new projects.2 Monitoring and evaluation (M&E) quality ratings increased over the past 10 years for projects in all Practice Groups and Regions. M&E ratings rose from 31 percent of projects rated substantial or above in FY09 to 51 percent rated the same in FY19. All Regions increased M&E ratings over this period, and the Middle East and North Africa Region’s ratings increased the most. The overall increase in M&E quality ratings masks variation among GPs. Only four GPs achieved good M&E ratings substantial or above on at least half of their projects in FY16–18: Social Protection and Jobs (72 percent); Education (64 percent); Health, Nutrition, and Population (55 percent); and Urban, Resilience, and Land (51 percent). There have been many efforts to enhance tools, guidance, and training for staff to strengthen project M&E quality. Some examples include focusing attention on theories of change in project documents, restructuring projects to improve results frameworks, and building staff capacity by training existing staff and recruiting dedicated M&E specialists. Even so, interviews and desk reviews suggest that project M&E struggles for attention amid competing operational agendas. About 60 percent of the projects that closed between FY07 and FY18 have a mismatch, or disconnect, between the rating given to M&E quality in the last supervision report (the Implementation Status and Results Report) and IEG’s validation of M&E quality based on the Implementation Completion and Results Report. The size of the mismatch varies quite widely by GP. It is possible that optimism bias is affecting assessments of M&E quality during implementation. The elements of what drives M&E quality are rather intuitive, as described in box 2.2. 2 The World Bank nearly doubled the average size of new projects, from $87 million in FY05–07 to $157 million in FY09–10 (see also World Bank 2012). 10 Results and Performance of the World Bank Group 2020 Chapter 2 Box 2.2. Elements of Monitoring and Evaluation Quality Good project monitoring involves collecting the right data and using it in the right way. Projects with successful monitoring and evaluation (M&E) have outcome indicators that reflect project objectives without being too complicated. These projects plan and execute data collection that is computerized, quality controlled, aligned with client systems, and integrated into the operation rather than an ad hoc process. Teams use the data to track progress and identify implementation challenges. For example, an irrigation project in Mozambique had a specific project objective with clear, measurable, and directly linked indicators. The theory of change was sound, data collection was planned and executed regularly, weaknesses were corrected, and the team used the data to track progress, adjust the results framework during restructuring, and document project outcomes. Even better M&E also ensures country ownership over M&E arrangements, seeks to embed project M&E into client monitoring systems, and focuses on collecting useful data that can inform project implementation (versus more compliance-focused data). By contrast, projects with unsuccessful M&E had overambitious or complicated data collection plans and unclear results frameworks, resulting in delayed baseline data, irregular reporting, and information that lacked credibility. Sources: Independent Evaluation Group; World Bank 2016a. Chapter 2 Independent Evaluation Group 11 Country Programs Bank Group country program outcome ratings have improved over the past 10 years in IBRD countries but not in IDA and FCV-affected countries. Bank Group country program outcome ratings increased from 51 percent MS+ in FY09 to 74 percent in FY17 across all reviewed country program cycles. However, country program outcome ratings stayed flat in IDA and FCV-affected countries. These data are after smoothing, as explained in box 2.3. Among the six Regions, Europe and Central Asia and South Asia had the highest country program ratings in FY08–19, both at 79 percent MS+, and Africa, and East Asia and Pacific had the lowest, at 44 and 57 percent MS+, respectively. FCV-affected countries had lower country program outcome ratings over the period, at 50 percent MS+, compared with 66 percent MS+ for non-FCV (figure 2.3). Figure 2.3. Country Program Outcome Ratings, FCV and Non-FCV Countries FCV Non-FCV 100 90 79 80 75 74 71 72 71 66 Outcome rated MS+ (percent) 70 65 60 51 50 55 57 54 50 50 50 50 50 50 40 30 20 10 2009 2010 2011 2012 2013 2014 2015 2016 2017 Year Source: Independent Evaluation Group. Note The dark blue line shows the numerical value of the six-point rating scale, which assigns 1 for highly unsatisfactory, 2 for unsatisfactory, and so on, with 6 being highly satisfactory. The light blue line represents the conventional percentage of projects rated moderately satisfactory or above. MS+ = moderately satisfactory or above. 12 Results and Performance of the World Bank Group 2020 Chapter 2 Box 2.3. Smoothing Country Program Ratings This Results and Performance of the World Bank Group uses a new data smoothing method to compare project ratings across country programs. The Independent Evaluation Group conducts reviews of Completion and Learning Reviews (CLRs) for country programs at the end of every country program cycle, usually every four to five years. With only about 20 CLR reviews per year, the sample size is too small to allow many comparisons and identify meaningful trends. To overcome this data challenge, this report smooths annual data fluctuations by averaging country program outcome ratings over the four-to-five-year CLR period versus just the CLR’s exit year. This method increases the number of data points per year and smooths country program outcome ratings over time. Source: Independent Evaluation Group. Explaining the Trends The World Bank operates within country programs. Transforming its technical and financial support into results depends on both the country’s capacity and economic environment and the quality of the World Bank’s support. This RAP explores some of these external and internal factors further for World Bank projects and Bank Group country programs. It finds that improvements in project design, M&E, and supervision, combined with broadly conducive economic and institutional conditions during project implementation (that is, before the pandemic) in many of the larger countries, help explain the overall positive ratings trends. The worse performance in FCV-affected countries can partly be explained by difficult context and large shocks for which country programs in those countries were not sufficiently prepared. IEG used decomposition analysis to account for the factors behind the increase in project outcome ratings between FY12–14 and FY16–18. The analysis decomposed the overall increase in World Bank project ratings over the period into changes in the size of different portfolio elements (such as Region, country, GP, lending instrument, and so on) and changes in the ratings for the portfolio elements. Figure 2.4 shows how much each portfolio element contributed to the total ratings increase. Decomposed this way, the increased portfolio share of projects in the South Asia Region (from 10 to 15 percent of the total portfolio size), together with modest improvements in project ratings, was an important contributor to improved performance ratings overall. Bangladesh, China, and Pakistan all had growing ratings and portfolios, thus increasing the total. IPF projects were the biggest contributor to improved average project outcome ratings. Chapter 2 Independent Evaluation Group 13 Figure 2.4. Decomposing the World Bank Project Rating Increase over FY12–14 and FY16–18 PG GP FCV Countries Project size (in $, millions) Energy and Extractives Environment, Natural Resources, and the Blue Economy Health, Nutrition, and Population $10-30 M Education Less than 10 M Agriculture and Food Urban, Resilience, and Land $30-100 M Poverty and Equity Social Protection and Jobs Larger than $100M FCV Non-FCV Sustainable Development Human Development Infrastructure Equitable Growth, Nepal Finance, and Bangladesh Institutions Madagascar Kenya Pakistan Philippines Peru Source: Independent Evaluation Group. Note The circle sizes represent how much each portfolio element contributed to the total ratings increase. FCV = fragility, conflict, and violence GP = Global Practice PG = Practice Group FY = fiscal year M = millions 14 Results and Performance of the World Bank Group 2020 Chapter 2 A conducive institutional and economic environment and good performance in many of the World Bank’s larger client countries contributed to improved project ratings. Many of the larger client countries saw good rates of economic growth and an uptick in their Country Policy and Institutional Assessment scores over this period.3 Studies have found a positive and statistically significant influence of economic growth and Country Policy and Institutional Assessment score on World Bank projects’ performance (Geli, Kraay, and Nobakht 2014; World Bank 2018b). IEG’s qualitative analysis of 14 projects rated highly satisfactory and 14 rated highly unsatisfactory found that the successful projects often benefited from a conducive context with strong political support and an enabling policy and regulatory framework. The opposite was true for the unsuccessful projects, which also suffered from political instability and clients’ weak implementation and coordination capacity. 3 The Country Policy and Institutional Assessment score is an indicator of countries’ policy framework and institutional capacity. Chapter 2 Independent Evaluation Group 15 There is evidence of improvement across several aspects of the World Bank’s work quality. Most projects are designed well, as judged from the positive quality-at-entry ratings. The increase in projects’ M&E quality helps explain the increasing outcome ratings. Ratings methodology plays a role because IEG gives poor ratings to projects with insufficient evidence of their achievement. Regression analysis that attempts to control for the role of ratings methodology has shown that World Bank projects with good-quality M&E tend to have substantially—and statistically significant—higher ratings on outcomes than similar projects do (Raimondo 2016). The correlation between M&E quality and outcome ratings has held up over time and, in fact, has increased somewhat. So when outcome ratings are plotted against M&E quality, the slope has become modestly steeper (figure 2.5). The analysis of 14 projects rated highly satisfactory and 14 rated highly unsatisfactory found that M&E data collection and use of data for decision-making was one of the most frequent distinguishing factors. IEG ratings for supervision quality are also high, at 86 percent MS+ in FY19. This matters because studies have found that the task team’s ability to identify and mitigate potential risks to the project during supervision improves project outcome ratings. Figure 2.5. Outcome Rating Plotted against M&E Quality Rating a. FY09–11 b. FY16–18 Both high High M&E vs. low outcome Both high High M&E vs. low outcome Both moderate Both low Both moderate Both low HS HS S S Outcome rating Outcome rating MS MS MU MU U U HU HU Negligible Moderate Substantial High Negligible Moderate Substantial High M&E quality rating M&E quality rating Source: Independent Evaluation Group. Note Circle sizes indicate how many projects fall in each category. HS = highly satisfactory MU = moderately unsatisfactory FY = fiscal year S = satisfactory U = unsatisfactory M&E = monitoring and evaluation MS = moderately satisfactory HU = highly unsatisfactory 16 Results and Performance of the World Bank Group 2020 Chapter 2 Project outcomes can be achieved despite serious challenges if the task team can identify risks early, elicit support from managers, and act quickly to mitigate these risks, for example, by restructuring the project.4 The analysis of projects rated highly satisfactory found that these projects often benefited from collaborative supervision (active engagement of clients and partners, local presence, and a good mix of skills in the World Bank team) and timely reactions to challenges. Some non-IEG data also point to World Bank performance often being strong in the field. Country Opinion Surveys since 2012 indicate that country clients generally perceive the Bank Group positively as a long-term partner that collaborates well with government and contributes quality knowledge work, especially on good development and M&E practices. Survey respondents in a different survey conducted by AidData perceived the World Bank to be among the most influential donors, with particularly high influence of its knowledge products (Custer and others 2015).5 The worse performance in FCV-affected countries can be explained by a vicious cycle that these countries face in which large shocks prevent them from building capacity and improving governance. This has to be understood in a context of a somewhat rigid results framework architecture that requires forecasting results and is not sufficiently adaptable to dynamic circumstances, shocks, and high levels of uncertainty. These factors affect country program outcomes in various ways. In a sample of 15 FCV-affected countries, all experienced large shocks, such as Ebola outbreaks, disasters, oil price shocks, and political crises. These shocks altered national priorities, prevented countries from building stable and credible institutions, and compelled the country team to reallocate resources and adjust country programs’ implementation. Political shocks and armed conflict (for example, in Madagascar and the Republic of Yemen) are especially challenging. The reduced staff presence during a political crisis naturally made it hard to reengage and achieve program objectives after the crisis subsided. 4 Two World Bank reports (2016a, 2018b) summarize the evidence, including an internal audit study. 5 Unlike the International Finance Corporation (IFC), the World Bank does not have a rating system for its knowledge products. Perception surveys suggest they can be influential. AidData’s survey in Custer and others (2015) was updated in AidData’s 2014 Reform Efforts Survey Aggregate Data Set (2017). Chapter 2 Independent Evaluation Group 17 Institutional and governance reforms in FCV-affected countries are often unsuccessful. About half of country program objectives in FCV-affected countries focused on institutional and governance reforms, and the other half focused on infrastructure development and public service delivery. Only 22 percent of objectives that focused on institutional and governance reform were achieved or mostly achieved, compared with 66 percent of objectives focused on service delivery (figure 2.6).6 Institutional and governance reforms are harder to insulate from FCV-affected governments’ Figure 2.6. Ratings for Country Program Objectives by Type of Objective 100 Objectives achieved or mostly achieved (percent) 90 80 78 70 60 66 50 51 40 30 30 20 22 23 10 13 0 Service Social Infrastructure Institutions and Business Governance Macrofiscal delivery focused sectors and real sectors governance environment focused focused Service delivery focus Institutions and governance focus Source: Independent Evaluation Group. Note MS+ = moderately satisfactory or above weak political and technical capacity and need more time to achieve objectives than public service delivery projects do, which could explain the lower achievement rates of these reforms. The longer timeline exposes these reforms to more shocks and more government and World Bank staff turnover. 6 This is according to Completion and Learning Report Reviews, the Independent Evaluation Group’s (IEG) reviews of closed country programs. 18 Results and Performance of the World Bank Group 2020 Chapter 2 Overburdened country programs performed worse in response to large shocks and crises. In this context, overburdened refers to FCV-affected country programs with weak relevance and selectivity. Bank Group country programs performed better during shocks when they limited and consolidated interventions, such as in Haiti, Kosovo, Lebanon, Nepal, and Timor-Leste. Some other shock-affected countries saw an influx of new projects or made existing projects more complex, overstretching the World Bank’s and clients’ capacity. Most FCV-affected country programs were already stretched to capacity before the shocks occurred. In a sample of 15 recently evaluated FCV-affected country programs, 11 had low or weak relevance, defined as the likelihood a program will achieve its intended objectives given the program’s resources and instruments. Nine programs had low or weak selectivity, which is defined as concentrating resources on priority objectives in a way that maximizes development impact and does not overburden the client’s or the World Bank’s implementation capacity. Eight programs were neither relevant nor selective. Four FCV- affected country programs were both relevant and selective, and three of these—Liberia, the Solomon Islands, and The Gambia—performed well despite major shocks. Mirroring these findings, IEG’s project validations often find that project designs that are too complex relative to clients’ capacity lead to weak ratings in FCV-affected countries. Some econometric studies have associated active conflict, inflation, natural resource dependence, and distortionary trade, fiscal, and monetary policies with lower project performance.7 Staff quality and presence also matters for performance. Studies have linked the quality and stability of the project’s task team leader to project performance—see, for example, Denizer, Kaufmann, and Kraay 2011; Geli, Kraay, and Nobakht 2014; Ralston 2014; Moll, Geli, and Saavedra 2015; and World Bank 2016b. Yet client perceptions of the Bank Group staff’s availability and the quality of its work are often less favorable in FCV-affected countries. On most Country Opinion Survey questions, perceptions of the Bank Group are worse in FCV-affected countries than in non-FCV-affected countries. This is especially true of the Bank Group’s respectful treatment of clients and stakeholders, the technical quality of its knowledge work, its value as an information source on global development practices, its project M&E, and its staff accessibility (figure 2.7). Similarly, responses in AidData’s survey on the World Bank’s perceived influence were markedly lower in FCV-affected countries than in others.8 It is not clear why FCV clients respond less favorably to perception surveys. The Bank Group has increased its budget and staff resources for FCV-affected countries over time, 7 World Bank 2018b summarizes the studies. 8 According to a custom calculation that AidData provided to the Results and Performance of the World Bank Group, the average score for World Bank influence was 3.76 among non–fragility, conflict, and violence–affected respondents, compared with 3.28 for respondents in countries classified as fragility, conflict, and violence–affected in FY14, the year before the survey. This is based on data documented in Custer and others (2015) and updated in AidData’s 2014 Reform Efforts Survey Aggregate Data Set (2017). Chapter 2 Independent Evaluation Group 19 though recruiting qualified staff to work in FCV-affected countries has often been difficult, something that the Bank Group’s strategy for FCV (2020–25) is aiming to address through enhanced support, training, and incentives for staff working in fragile settings. Figure 2.7. Country Client Perceptions in FCV and Non-FCV Countries FCV Non-FCV 7.73 N = 49 Being a long-term partner N = 183 7.79 7.41 N = 62 Collaboration with government N = 236 7.52 6.91 N = 62 Treating clients and stakeholders with respect*** N = 233 7.36 7.00 N = 62 Technical quality of knowledge work*** N = 233 7.34 6.99 N = 61 Source of information on global good practices*** N = 200 7.21 6.85 N = 60 Effectiveness of M&E*** N = 236 7.14 5.72 N = 62 Staff accessibility*** N = 237 6.42 5.93 N = 50 Speed of things achieved in the field N = 191 6.09 5.91 N = 44 Collaboration with private sector N = 168 6.08 5.58 N = 53 Adequately staffed*** N=3 6.64 Source: Independent Evaluation Group, based on World Bank Group Country Opinion Survey data collected annually from 2012 to 2019. Note All scores except for one are measured with the following Likert scale: 1 = to no degree at all; 10 = to a very significant degree. Technical quality of knowledge work is measured with the following Likert scale: 1 = very low technical quality; 10 = very high technical quality. Averages (“N”) are based on number of country-years. Statistical significance is for difference of means tests between question responses in FCV and non-FCV countries. Two-sample mean tests are used, assuming equal variances. FCV = fragility, conflict, and violence M&E = monitoring and evaluation ***p <.01. 20 Results and Performance of the World Bank Group 2020 Chapter 2 Responding to COVID-19 and Other Shocks The study of shocks and their impact on project and program results can also contribute some insights for the World Bank’s ongoing pandemic response. Teams are preparing pandemic response projects (many of which are new rather than additional financing) under tight time pressures and amid complex political, economic, and public health contexts and logistical challenges, such as the inability to travel or conduct meetings in person. According to IEG evaluations, there is sometimes less time for data collection, technical studies, learning from past lessons, and designing strong results frameworks when the World Bank rushes to prepare crisis responses—see, for example, World Bank 2010a, 2010b, and 2017. World Bank (2019) analyzed comprehensively the factors that influence quality at entry and found that foundational work matters. Less foundational work limits the World Bank’s understanding of local policy, capacity, and institutions and its ability to fine-tune procurement arrangements and other elements of project design. The logistical challenges could adversely affect the World Bank’s local staff presence and ability to build trusting relationships and partnerships—factors that World Bank (2019) also found critical for quality at entry.9 9 Mirroring this, econometric research has linked project outcomes to the project team’s access to time, budget, and knowledge (see Ika 2015; and World Bank 2016b, 2017). Chapter 2 Independent Evaluation Group 21 There is a statistical association between time pressures during preparation and projects’ quality at entry. IEG calculated a variable for project preparation time in a sample of more than 3,000 evaluated projects.10 Projects in the first three deciles of this variable, meaning projects with low preparation time relative to duration, were rated significantly lower on quality at entry than projects at or above the median of project preparation time (figure 2.8).11 Figure 2.8. Relationship between Quality at Entry and Project Preparation Time HU -3 U -2 MU -1 MS 1 S 2 HS 3 1.2 1.1 1.0 0.9 Average quality at entry 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.3 0.3 0.2 0.2 0 1 2 3 4 5 6 7 8 9 10 Deciles of projects sorted by the difference between approval and execution time Source: Independent Evaluation Group. Note HS = highly satisfactory MU = moderately unsatisfactory HU = highly unsatisfactory S = satisfactory MS = moderately satisfactory U = unsatisfactory 10 This index is the absolute difference in months between projects’ approval time (time from inception to approval) and projects’ duration (time from effectiveness to project close). 11 The score for quality at entry was calculated by converting the 6-point scale into numerical values: highly unsatisfactory = −3 moderately satisfactory = 1 unsatisfactory = −2 satisfactory = 2 moderately unsatisfactory = −1 highly satisfactory = 3 22 Results and Performance of the World Bank Group 2020 Chapter 2 Robust implementation support can counter shocks, problems, and quality at entry weaknesses. The difference between projects rated highly satisfactory and highly unsatisfactory was less about the presence of shocks or the number of supervision missions but instead about the World Bank teams’ timeliness in flagging concerns, taking corrective measures, complying with mandated safeguards, undertaking Mid- Term Reviews, revising objectives, and collecting data. These often distinguished successful and unsuccessful projects. For example, Somalia’s Emergency Drought Response and Recovery Project, which IEG rated highly satisfactory on outcomes, was prepared in five weeks and required complex support to implement. It involved intense collaboration and overcoming institutional differences between the World Bank and the International Committee for the Red Cross on rules and procedures for M&E, procurement, financial management, and even protocols for communicating with government officials. Chapter 2 Independent Evaluation Group 23 IFC Projects Investments IFC investment projects’ development outcome ratings have declined over the past 10 years, but there are early signs that this decline has stopped or may be starting to reverse. IFC development outcome ratings declined from a peak of 75 percent of projects rated mostly successful or better by IEG in calendar year (CY)08 to 40 percent in CY17 and 43 percent in CY18 (figure 2.9). These ratings are based on a stratified random representative sample, which in CY18 covered 99 projects, or 39 percent of all projects approved in CY13 and eligible for evaluation. Average ratings can also be measured by net commitment volumes rather than the number of projects and by using three-year instead of annual averages. Calculated this way, IFC’s development outcome ratings declined from 83 percent rated mostly successful or better in CY07–09 to 43 percent in CY16–18 and 48 percent in CY17–19. As these numbers suggest, the ratings decline may have stopped or reversed since CY17.12 Figure 2.9. IFC Investment Project Development Outcome Rating (annual data) 100 Confidence intervals for the inferred success rates for population 90 69 Outcome rated MS+ (percent) 80 64 70 58 54 53 51 60 43 43 40 50 40 30 20 10 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 Evaluation year Source: Independent Evaluation Group. Note IFC = International Finance Corporation MS+ = mostly successful or better. 12 The tentative reversal in IFC’s ratings trend is, however, within the margin of error, given that only a sample of IFC projects undergo ex post evaluation and that not all of the projects sampled for evaluation in the calendar year 2019 cohort have finished their evaluations. 24 Results and Performance of the World Bank Group 2020 Chapter 2 IFC infrastructure projects’ development outcome ratings fell from 63 percent mostly successful or better in CY13–15 to 40 percent in CY16–18. Development outcome ratings for projects involving oil and gas exploration and junior mining companies declined sharply (from 73 percent mostly successful or better in CY13–15 to 13 percent in CY16–18); IFC has halted or reoriented most of its oil, gas, and mining investing. Core infrastructure projects (that is, excluding oil, gas, and mining) were 46 percent mostly successful or better, which is similar to other Industry Groups. IEG’s review of infrastructure projects contains another lesson with wider applicability for IFC project success. These projects have shown that essential client actions, such as obtaining operating permits or licenses or reporting monitoring data, must be completed before disbursing the equity investment to the client. This is because IFC is a minority shareholder with limited recourse or influence after investments are disbursed. These projects also show that IFC’s early and continuous project engagement contributed to successful social and environmental ratings, particularly when companies expanded into different sectors and countries and thus benefited more from IFC’s advice. In recent years, IFC has expanded its advice to existing and prospective clients on social, gender, environmental, and community engagement issues. Chapter 2 Independent Evaluation Group 25 Advisory Services Development effectiveness ratings for IFC advisory services projects show signs of improvement. Development effectiveness ratings peaked in FY12–14, when 65 percent of advisory projects were rated mostly successful or better (figure 2.10). This declined to 38 percent in FY15–17 before increasing to 41 percent for projects evaluated in FY16–18 and 50 percent for FY17–19 (based on very preliminary FY19 data and therefore subject to change).13 When calculated by the advisory project’s funding amount rather than the number of projects, development effectiveness ratings declined from 70 percent mostly successful or better in FY12–14 to 33 percent in FY15–17, before increasing to 49 percent in FY17–19. Figure 2.10. IFC Advisory Project Development Effectiveness Rating 100 Three-Year Moving Averages 90 Development effectiveness rated MS+ (percent) 80 70 65 63 63 65 61 58 58 57 60 47 50 50 38 41 40 30 20 10 0 2006–08 2007–09 2008–0 2009–11 2010–12 2011–13 2012–14 2013–15 2014–16 2015–17 2016–18 2017–19 Years Source: Independent Evaluation Group. Note IFC = International Finance Corporation MS+ = mostly successful or better. 13 Although the FY17–19 estimate is based on 171 evaluated projects, the FY19 data are based on only 36 evaluated projects out of 54 projects sampled for evaluation. Estimates will therefore change as more projects finish their evaluations. 26 Results and Performance of the World Bank Group 2020 Chapter 2 Explaining the IFC Trends IEG researched many possible explanations for the long period of decline in IFC investments’ development outcome ratings. The joint IFC-IEG underlying evaluation and ratings methodologies did not change during this period, so methods changes cannot explain IFC’s ratings decline. The ratings trends for IFC investments differ substantially from those of the Asian Development Bank and the European Bank for Reconstruction and Development, which are the only other multilateral development banks with published ratings for private sector operations. Both institutions’ development outcome ratings for private sector investment projects increased over the same 10-year period in which IFC’s ratings dropped, so global economic conditions alone cannot explain the ratings decline. Additionally, IFC’s development outcome ratings declined in all Regions; for all four industry groups (figure 2.11); in IDA-eligible, FCV-affected, and IBRD countries; for both equity and loan instruments; and in both greenfield and expansion projects. Therefore, major declines in specific project categories cannot explain IFC’s ratings decline because declines were across the board. Furthermore, IFC’s business volume stayed approximately the same over the past 10 years with no major investment increases in low-capacity countries, so rapid business growth cannot explain IFC’s ratings decline. In fact, IFC ratings in IDA-eligible countries are slightly higher than in IBRD countries. Figure 2.11. IFC Investment Project Development Outcome Ratings by Industry Group 100 CDF industry group FM industry group Infra industry group MAS industry group 90 80 76 69 Outcome rated MS+ (percent) 70 66 60 50 50 45 45 40 40 30 20 14 10 0 2009–11 2010–12 2011–13 2012–14 2013–15 2014–16 2015–17 2016–18 Years Source: Independent Evaluation Group. Note CDF = Disruptive Technology and Funds FM = Financial Markets MAS = Manufacturing, IFC = International Finance Corporation Infra = Infrastructure Agribusiness, and Services. MS+ = mostly successful or better Chapter 2 Independent Evaluation Group 27 A combination of internal work quality issues, external risk factors, and broader market trends help explain IFC’s investment performance trends. A joint IFC-IEG study from 2017 identified work quality and credit and country risks as significant drivers of investment projects’ development outcome ratings. Staffing, incentives, organizational culture, focus on volume targets over development results, and diffused accountability were the main factors affecting IFC’s work quality. IFC endorsed those findings and has since taken many steps to implement the joint study’s recommendations, including setting up a vice presidential unit to focus on development results, seeking stronger country engagement with improved analytics, and screening projects ex ante for anticipated outcomes.14 External risk factors also influence projects’ performance. IFC invests in many domestic, medium-size firms affected by a variety of risks. IEG’s review of project validations found that market, country, and sponsor risks and transaction structuring were the factors most clearly associated with IFC investment projects’ performance. IEG reviewed nearly two-thirds of the projects that it had evaluated from 2016 to 2018 and used a machine learning framework to analyze all 720 IFC investment projects that IEG evaluated between 2010 and 2018. Both reviews sought to identify factors associated with projects’ success and underperformance, and both identified sponsor selection risks, market risks, country risks, and transaction structuring as the factors that most frequently distinguished projects with good ratings from less successful projects (figure 2.12). The machine learning algorithm clustered projects as high development outcome and high work quality (high work quality, 318 projects), and low development outcome and low work quality (low work quality, 213 projects). • Sponsor risks (risks linked to the client company in which IFC invests) were 1.8 times more frequent in projects with low work quality than in those with high work quality. IFC knew the sponsor well in the positively rated projects (high work quality). Either the sponsor was a repeat client in good standing and had strong business fundamentals or IFC’s due diligence had concluded that the client had the necessary knowledge and experience. By contrast, sponsors of projects with low work quality had started new business lines in which they lacked relevant experience or were highly leveraged. 14 IFC’s managerial actions translate into ratings for the mature portfolio with a long delay. That is because the projects rated this year were approved years before the mentioned actions. 28 Results and Performance of the World Bank Group 2020 Chapter 2 • Projects with high work quality coped better with market risks, which affected all types of projects. For example, slowdowns in visiting tourists and an oversupply of hotel rooms affected some tourism projects. Weak consumer demand caused by slowing economic growth and currency devaluations affected some agribusiness and forestry projects. Demand that was weaker than expected or competitors adding infrastructure capacity affected some infrastructure projects. Such market risks had only temporary effects on projects with sound underlying business fundamentals, strong sponsors, and enough liquidity. Market risks had the most lasting impacts on the success of projects without these strong fundamentals. • Country risks increased in relative influence on investment projects. The most common country risks were currency devaluations and political and regulatory risks. These risks increased for projects with both low work quality and high work quality that IEG evaluated in 2013–15 and 2016–18. However, the machine learning algorithm found that in projects with high work quality, IFC and its clients adapted well to country risks. Sometimes, IFC successfully mitigated the impact of currency devaluations on projects through local currency loans. However, there are examples where IFC lent to clients in foreign currency even though local currency IFC loans were also available. Other projects with low work quality relied on anticipated regulatory changes for project viability. However, these changes often took much longer than anticipated or did not happen at all, adversely affecting project results. Chapter 2 Independent Evaluation Group 29 • The quality of IFC’s transaction structuring, additionality, and sensitivity analyses varied between projects with high work quality and low work quality. Details vary across industry groups. Examples of strong IFC transaction structuring included good selection of IFC investment products, careful scrutiny of intragroup risks when investing in holding companies, rigorous analysis of market and exchange rate risks, and realistic consideration of a bank’s condition and priorities before investing in those banks. Findings Figure 2.12. Factors IFC Machine from Affecting Learning, Investment 2016-18 Performance High DO ratings and Projects affected Low DO ratings and high work quality ratings by each factor (percent) low work quality ratings 14 Sponsor risk 27 26 Country risk 24 12 Market risk 12 30 IFC-specific factors, including structuring 27 18 Other factors 10 Source: Independent Evaluation Group. Note DO = development outcome IFC = International Finance Corporation 30 Results and Performance of the World Bank Group 2020 Chapter 2 Broader market trends may have made IFC’s business model more exposed to certain risks. IFC screens for risks when selecting projects, but there is a finite pool of repeat clients and bankable or viable investment projects, so IFC needs to accept certain risks when it invests. Moreover, the pool of viable projects available to IFC may have shrunk because rival financiers, both private and multilateral, have expanded into emerging and developing markets over the last few decades. A weaker pool of viable investment projects can translate into less attractive risk-reward profiles, thus contributing to the ratings decline. This means that better internal identification of risks during project preparation may not suffice. In recent years, IFC has taken many steps to grow the pool of bankable investment projects and to better identify market opportunities and constraints, as described in box 2.4. It has also taken other steps to increase its focus on outcomes, including providing specialist resources to advise teams, encouraging midcourse corrections, and introducing a new tool (Anticipated Impact Monitoring and Measurement) to assess and screen projects for expected outcomes prior to approval.15 Hopefully, these steps will help align IFC’s business model to market, country, and sponsor risks, though it is too early to tell if they will improve ratings. Additional steps IFC could consider include enhanced tools and processes to identify and mitigate risks during supervision. Box 2.4. IFC’s Reforms to Strengthen Upstream Engagement The International Finance Corporation (IFC) has prioritized upstream engagement in its strategy IFC 3.0. Upstream engagement can increase the number of bankable investment opportunities through regulatory reforms to unlock private investment and development of viable investment projects. To do so, IFC has updated its funding and operating model to encourage upstream engagement and invested significant resources in developing its project pipeline. IFC has strengthened its focus on country outcomes through new IFC Country Strategies and analytical tools such as Country Private Sector Diagnostics and IFC Sector Deep Dives. These tools aim to provide a deeper understanding of market constraints and opportunities and help develop better coordinated upstream engagements with hopefully greater development outcomes. IFC has also integrated advisory teams into industry groups and introduced a new additionality framework. Source: Independent Evaluation Group, based on documents from IFC. 15 See World Bank (2019, 19) for a fuller description of IFC’s efforts to improve work quality. Chapter 2 Independent Evaluation Group 31 For IFC advisory projects, project size and duration and a change of team leader had a statistically significant negative association with project success.16 Some of the larger and longer-lasting projects were riskier, for example, if they involved public sector clients and complex regulatory reforms such as those in business climate and public-private partnerships. Some of these larger advisory projects were more likely to encounter difficulties with political economy and counterparts’ capacity compared with simpler projects with private sector clients. Such difficulties could increase in importance as IFC expands its upstream engagements and its program in challenging and fragile markets. Other factors that mattered for IFC advisory projects’ success included the client’s commitment, IFC’s flexible and proactive supervision, and robust project M&E.17 Client commitment was a major driver of advisory projects’ success.Indications of client commitment include alignment with the client’s established business plan or ongoing activities, client contributions to project costs, and level of seniority of interlocutor staff. Commitment can be fostered by aligning client and project objectives, involving clients closely in project design, and establishing a variety of client interlocutors beyond the project’s day-to-day individual counterparts. The staff’s patience and flexibility to respond to changing circumstances (such as government personnel changes) also contributed to success, and detecting signs of waning client commitment and restructuring projects accordingly proved important. However, such project restructurings were helpful only when clients showed continued commitment, for example, through responsiveness and engagement; otherwise, canceling the projects was preferable. IFC staff and managers’ proactive involvement in decisions to restructure, cancel, or reduce the duration of projects was important because IFC consultants contracted to the project may lack the incentive to recommend such actions. Robust project M&E provides IFC teams with a more detailed understanding of projects’ achievements and challenges so that they can adjust implementation as needed and achieve results. As reported in RAP 2018, IFC has worked to strengthen its work quality for some years, including through greater attention to projects’ scope and results frameworks, self-evaluations, and staff training. 16 This is based on all 169 advisory projects evaluated between FY16 and FY18. 17 This is based on IEG’s review of 42 advisory projects evaluated in FY18. 32 Results and Performance of the World Bank Group 2020 Chapter 2 MIGA Projects Ratings for MIGA projects’ development outcomes increased over the past 10 years. Specifically, the ratings increased from 64 percent satisfactory or better (S+) in FY07–12 to 69 percent in FY13–18 (figure 2.13). When calculating ratings by gross issuance amounts, MIGA development outcome ratings increased from 61 percent S+ to 75 percent over the same time frame. These higher ratings have continued into the most recent ratings period. The increases are driven by higher ratings for MIGA projects in IDA countries (from 59 percent S+ in FY07–12 to 77 percent in FY13–18), in Europe and Central Asia (from 56 percent S+ to 73 percent), and in the Energy and Extractive Industries sector (from 67 percent S+ to 79 percent). MIGA projects are rated at 77 percent S+ in IDA countries, which are a strategic focus for MIGA, compared with 63 percent S+ in non-IDA countries. Projects in FCV-affected countries also had high ratings at 88 percent S+ in FY13–18 Figure 2.13. MIGA Project Development Outcome Rating 80 Six-Year Rolling Basis 69 70 66 64 64 63 MIGA projects rated S+ (percent) 62 62 62 60 50 40 30 20 FY06—11 FY07—12 FY08—13 FY09—14 FY10—15 FY11—16 FY12—17 FY13—18 Fiscal years Source: Independent Evaluation Group. Note FY = fiscal year MIGA = Multilateral Investment Guarantee Agency S+ = satisfactory or better Chapter 2 Independent Evaluation Group 33 The financial markets sector had the lowest ratings at 58 percent S+. These low ratings for financial markets projects were caused by adverse impacts that the global financial crisis had on financial markets in Eastern Europe and Central Asia and by issues with MIGA’s assessments, underwritings, and monitoring. MIGA has diversified its portfolio away from financial markets to other sectors in Eastern Europe and Central Asia, which has helped improve MIGA’s performance trend in that region. MIGA’s work quality has improved. Ratings for MIGA’s Assessment, Underwriting and Monitoring increased from 54 percent S+ in FY07–12 to 59 percent in FY13–18. Ratings for the environmental and social effects of MIGA guarantee projects increased from 50 percent S+ in 2007–12 to 83 percent in FY13–18 on the heels of MIGA adopting its Performance Standards on Social and Environmental Sustainability in 2007. MIGA’s clients are larger multilateral investors, reflecting its mandate to promote cross-border investment in developing countries by providing guarantees to investors and lenders. MIGA guarantees against political risks. Other investors carry the credit risk. The relatively large size of MIGA- supported projects—on average $109 million in gross issuance—makes these projects visible to host countries and motivates governments to help these projects succeed, for example, by undertaking planned regulatory reforms. MIGA originates about 62 percent of its projects from Part 1 countries. 34 Results and Performance of the World Bank Group 2020 Chapter 2 MIGA played an active and important role in promoting private sector investment through projects in IDA and FCV-affected countries. This is based on IEG’s review of 13 MIGA projects in IDA and FCV-affected countries evaluated in FY17 and FY18. The reviewed projects were all relevant because they fit with MIGA’s and host countries’ strategic priorities. Capable international investors who introduced competitive power generation or other technologies sponsored successful infrastructure projects. Some large- scale power projects were the first of their kind in the country. MIGA helped deter political risks and resolve emerging issues, for example, on arrears payments by governments. In successful agribusiness projects, MIGA provided reinsurance for foreign direct investment in IDA countries, created new supply chains, provided trademark license agreement guarantees, and integrated farmers and others into new processing facilities, irrigation networks, or distribution networks. Generally, the agribusiness projects were socially, economically, and environmentally sustainable, and the demonstration effect encouraged future private sector participation in the sector. This highlights a main difference between successful and less successful MIGA projects: the project’s market and business sustainability.18 For example, unsuccessful power sector projects had low market and business sustainability because of lower consumer demand for power and intense competition from rival sources of power generation. In the telecom sector, some projects were unsustainable because episodes of violence or increased competition led to fewer subscribers than expected. 18 Of the 13 projects in International Development Association and fragility, conflict, and violence–affected countries evaluated in FY17 and FY18, 10 projects were rated satisfactory or better and 3 projects were rated less than satisfactory. All of the projects fit with the Multilateral Investment Guarantee Agency’s and host countries’ strategic priorities. Chapter 2 Independent Evaluation Group 35 3. Part II Assessing Outcome Levels Introduction Project and program ratings give a helpful picture of Bank Group achievement against stated objectives, but objectives set outcomes at different levels, and the line of sight to higher-order development goals varies This chapter presents a theory of change considerably. Beneath framework to classify outcome levels. Because every rating is a wealth of of a lack of data, the framework is by no means information about where the exhaustive, but it still offers a common lens Bank Group is focusing its to understand outcomes and outcome levels efforts and what these mean across sectors and Bank Group institutions, thus in relation to its outcome providing essential new information about the orientation. This part of the most typical types of outcomes that make up the RAP classifies objectives project portfolio. The next section looks at the according to their outcome distribution of project outcome levels in samples levels and examines links of World Bank and IFC projects. This is followed between performance and by an assessment of the relationship between outcome levels. projects’ outcome levels and ratings, which may help shed light on some of the risk-return trade- offs when project teams are formulating project objectives. This part concludes by reviewing the outcome orientation of key thematic areas. 36 Results and Performance of the World Bank Group 2020 Chapter 3 Outcome Classification Framework The novel outcome classification framework uses a theory of change logic to define comparable and complementary outcome levels. IEG synthesized sectoral theories of change derived from World Bank and IFC projects, among other sources, to build the outcome classification framework and validated the classifications on World Bank and IFC projects. Box 3.1 describes the framework and the samples that IEG applied it to. The framework defines four outcome levels. Each level corresponds to a step in a theory of change for how the Bank Group’s work influences clients’ development outcomes, ranging from outputs at level 1 to early, intermediate, and long-term outcomes at levels 2 to 4. IEG defined shifters to distinguish one outcome level clearly from another (figure 3.1).19 Figure 3.1. Steps in the Outcome Levels • Shift from outputs to changes in the status quo or in behavior that happens as a consequence of the outputs. Government, private sector, and nonstate actors can gain new skills or capabilities; 1 2 3 4 citizens can have enhanced access to better-quality services or environmental benefits and see early changes. • Shift from a change in the status quo or behavior to meaningful changes in the lives of ultimate beneficiaries. Beneficiaries and other actors apply 1 2 3 4 new capabilities to solve problems. Service access or improved service quality improves well-being. • Shift to more sustained changes in delivery, governance, or citizens’ well-being. Changes are 1 2 3 4 often at national or sectoral scale. 19 These terms come from standard evaluation and results-based management literature. However, although these terms call attention to outcomes’ time dimension, the coding framework emphasized the sequential steps in the logic of how interventions lead to outcomes. Chapter 3 Independent Evaluation Group 37 For example, level 1 outcomes include project deliverables, but level 2 outcomes include changes in the development status quo that resulted from the level 1 deliverables. Hence, level 2 outcomes follow quite directly from project outputs and often focus on improved access, capacity, regulation, planning, provision, and quality of public services—all of which represent relatively immediate benefits to beneficiaries. Level 3 outcomes follow indirectly from project interventions and are beyond the direct control of the World Bank and its clients. At level 3, the level 2 outcomes have led to material improvements that solved development problems, causing sectorwide ripple effects that benefit end beneficiaries. The ripple effects of level 4 outcomes are even deeper and wider. These are outcomes with systemic effects nationally or across sectors that contribute to general well-being. Level 4 outcomes correspond to the Sustainable Development Goals, the twin goals, and other higher-level outcomes to which the Bank Group aspires. Figure 3.2 shows representative examples taken from World Bank project objectives.20 The framework captures World Bank projects’ intended and achieved outcomes and IFC projects’ intended outcomes. IEG designed the framework to compare outcomes in a comparable 20 This is a departure from the IEG’s traditional project ratings. These are determined by IEG after projects close, based on achieved outcomes relative to intended objectives. IEG project development objective ratings consider only the declared project development objective to a limited extent, namely, in the assessment of the project’s relevance. 38 Results and Performance of the World Bank Group 2020 Chapter 3 manner across sectors (box 3.1). At project design, all project documents state a clear objective, called a project development objective at the World Bank and called claims at IFC under its new Anticipated Impact Monitoring and Measurement framework. During project implementation, teams and clients manage projects to achieve these objectives. When projects close, self-evaluations review whether they achieved their stated objectives, with validation by IEG. Figure 3.2 Examples of Outcome Levels Transport • Prepare a transport • Provide access to • Improve mobility, • Improve household plan transport. Improve reduce travel time, well-being transport efficiency, and improve reliability, and connectivity to quality of services economic activity Public • Develop an • Develop capacities • Improve • Improve Finance operational tax for better tax transparency and tax-to-national management IT revenue accountability of tax income ratio system administration system. Improve tax management compliance Agriculture • Provide appropriate • Develop farmers’ • Increase agricultural • Improve seeds and capacities. productivity and performance of technology. Implement yields, farmers' irrigated agriculture Develop improved extension income, and participatory plans outreach profitability Nutrition • Develop training • Improve nutritional • Change in weight • Reduce stunting programs and behavior. Increase and vitality among children awareness-raising use and quality of under five activities nutrition services. Source: Independent Evaluation Group Note GDP = gross domestic product IT = information technology Chapter 3 Independent Evaluation Group 39 Box 3.1. The Outcome Level - Framework This framework and its application have strengths and weaknesses. Comparability across sectors and countries is a key strength, which the Independent Evaluation Group (IEG) ensured by defining comparable yardsticks and applying internal and external quality assurance. For example, IEG validated the framework through pilots and expert consultations, defined key terms in project development objectives (PDOs) that are indicative of different outcome levels, developed detailed coding guidance, and tested for interrater reliability (the reliability of multiple coders to code the same outcomes) by having multiple team members independently code the same PDO to standardize coding scores. However, the framework is a blunt tool. It focuses on stated and measured objectives, which may not be the same as the actual outcomes. It simplifies outcomes’ complex social realities into four categories that do not factor in context, so one country’s simple achievement could be another’s ambitious outcome. Coding focused on PDOs, which are summaries that approximate projects’ intended outcomes, but sometimes they are vague or may not comprehensively reflect all of the project’s objectives. To overcome this challenge, IEG consulted the project’s indicators when in doubt but did this less often for investment project financing than for development policy financing, which was harder to assess because of long PDOs with multiple parts. For composite PDOs with more than one subobjective, IEG chose the highest. For two samples, new World Bank projects and new International Finance Corporation (IFC) projects, IEG followed a different approach that reviewed PDOs and indicators’ outcome levels separately to compare them. Another challenge of the framework was differentiating between level 3 and level 4 outcomes. The difference between these outcomes is conceptually clear, but in practice, it can require the coder to subjectively judge the projects’ real objectives and then approximate how deep and how systemic are the outcomes to which these objectives aspire. 40 Results and Performance of the World Bank Group 2020 Chapter 3 IEG applied the framework to four project samples: • Recently closed projects: all 989 Implementation Completion and Results Report Reviews completed from fiscal year (FY)17 to FY20 (April). This sample is large enough to allow Global Practice comparisons. • Older projects with IEG field evaluations: all 42 Project Performance Assessment Reports from FY19 and FY20 available in March 2020. Analysis focused on achieved outcomes in the sample’s 114 component objectives. This sample shows actual project outcomes that IEG verified in the field. • New World Bank projects: a statistically representative sample of 161 projects approved in FY19, indicative of recent approvals. • New IFC projects: a random sample of 29 recently approved IFC investment projects. Analysis covered the 100 project and market claims in this sample.a Source: Independent Evaluation Group. Note a. The Independent Evaluation Group assessed project objective statements and related indicators in International Finance Corporation projects approved in FY20 to understand the types and levels of outcomes in projects processed under its new Anticipated Impact Monitoring and Measurement (AIMM) system. IEG identified all AIMM claims in the project summaries for 29 randomly selected investment projects and indicators in 21 of these projects. This was not an evaluation of AIMM as a tool. IEG did not review AIMM scores or the underlying methodologies for calculating AIMM scores or review projects’ actual outcomes. The 29 sampled projects made 100 claims, of which 61 were project claims and 39 were market claims. All project and market claims were clearly formulated objective statements. Sampled projects contained 142 indicators, of which 58 percent were project indicators, 18 percent were market indicators, and the remaining 24 percent required corporate and other indicators unrelated to AIMM claims. Chapter 3 Independent Evaluation Group 41 Project Outcomes This section analyzes the distribution of projects’ objectives to understand what types of outcomes most projects intend to achieve and measure. It does so partly in response to Board members’ demands for more evidence on outcomes. Until now, the understanding has been that projects pursue diverse objectives across diverse sectors, contexts, and instruments, with limited room for generalization. For the first time, this research shows that project objectives cluster in clear patterns depending on sector and lending instrument. Most IPF objectives cluster at level 2 around quality and access to services. A few sectors, most notably agriculture and environment, state IPF objectives at level 3 with clearer focus on end beneficiaries, and most DPFs state their objectives at level 3 with a focus on policy reform outcomes. Recently approved IFC projects often state their objectives at level 3, particularly in relation to market creation objectives. Most IPFs have project development objectives that aim for level 2 outcomes. IEG classified 72 percent of IPF projects’ objectives in the recent Implementation Completion and Results Report Review (ICRR) sample at level 2 (figure 3.3).21 By far, the most common level 2 IPF objectives improve quality or access to social- or infrastructure-related public services. Most IPFs—and by extension most of the World Bank’s work—intend to strengthen public sector capacity. This reflects the strong emphasis in World Bank operations on improving public sector capacity and performance as an enabler of higher-level change. The prevalence of service access objectives also reflects the relatively easier measurement and attribution to World Bank support of such objectives. IPF and DPF outcome levels Figure 3.3. Outcome Levels in IPF and DPF Projects IPF Outcome level DPF 1% 0% 72% 26% 26% 54% 1% 19% Source: Independent Evaluation Group. Note DPF = development project financing IPF = investment project financing 21 The share was similar in the recently approved sample, at 68 percent of investment project financing objectives at level 2. 42 Results and Performance of the World Bank Group 2020 Chapter 3 IPFs in a few sectors pursue level 3 outcomes more often. Level 3 outcomes were found in 26 percent of all IPF project objectives and level 4 in 1 percent (figure 3.3). However, there is clustering in some sectors: half of Agriculture and Environment GP projects, 35 percent of Transport GP projects, 31 percent in the Water GP, and 27 percent in Energy and Extractives. The share of level 3 and 4 objectives is far lower in other GPs, ranging between 10 and 14 percent. Common examples of IPF level 3 outcomes include improved agricultural productivity, yields, and incomes; improved management of protected areas; climate resilience; and transport connectivity. A focus on sectorwide change and end beneficiaries characterizes these types of level 3 outcomes. Such outcomes are different from most IPFs’ focus on level 2 service access and capacity. The reason for the variation across GPs in objectives’ outcome levels is not entirely clear, though the ability to define suitable indicators plays a role in how teams set objectives. DPFs’ objectives cluster around yet other types of outcomes. Objective statements at outcome levels 3 and 4 were found in 54 and 19 percent, respectively, of World Bank DPFs (figure 3.3). DPFs seek to induce change through policy, institutional, and governance reforms. DPFs achieve their objectives less often, resulting in lower ratings compared with IPFs, as seen in part I. Representative examples of level 3 DPF objectives include macrofiscal stability, improved transparency and accountability, and increased domestic tax revenue. However, 26 percent of DPF outcomes in the ICRR sample were at level 2. Examples include technical support to policy and regulatory reforms and the first of a series of planned DPFs, with higher intended outcomes for subsequent DPFs. Level 1 outputs are rare in project objective statements. Only 1 percent of objectives in recent ICRRs and 9 percent of objectives in recent approvals had level 1 outputs in the objective statement. No DPFs or IFC projects in the samples had output objective statements. It is established good practice to focus on outcomes, so it is positive that so few projects have level 1 objective statements. Project objectives in FCV-affected countries are not distributed differently. In FCV projects, 71 percent of objectives are at level 2, 25 percent are at level 3, and 4 percent are at level 4. This compares with 64 percent at level 2, 31 percent at level 3, and 4 percent at level 4 in non-FCV-affected countries. The similarity of outcome levels in FCV and non-FCV countries is surprising because of the higher contextual risks in FCV-affected countries and the need for quick and simple attainable goals, as discussed in part I. IEG also observed that results Chapter 3 Independent Evaluation Group 43 frameworks in FCV projects did not commonly capture conflict drivers or outcomes on fostering the country’s resilience to conflict and violence. FCV countries need agile responses to their unique challenges, an aspect that falls outside the outcome level classification framework. IEG also compared projects’ indicators to their objective statements to assess whether the indicators’ levels matched the objectives’ outcome level. It found that 14 percent of recently approved projects have no indicator of the same level as the objective, which could suggest that there are no indicators able to measure the objective’s achievement. Recently approved IFC projects under the Anticipated Impact Monitoring and Measurement system often state their objectives in relation to higher-level outcomes because they are aligned with IFC’s goals of creating markets and fostering private sector development (box 3.2). Outcome level 3 and 4 objective statements were found in 67 and 15 percent, respectively, of recent IFC market claims, and in 39 and 13 percent, respectively, of project claims (figure 3.4). Level 2 objective statements were found in 18 percent of market claims and 48 percent of project claims. Figure 3.5 shows representative examples of IFC claims. Twelve percent of IFC project and market claims did not have indicators that matched the claim’s outcome level, which could suggest that none of the selected proxy indicators are able to measure the objective’s achievement. Figure 3.4. IFC Project and Market Claims’ Outcome Levels Project claims Outcome level Market claims 48% 18% 39% 67% 13% 15% Source: Independent Evaluation Group. Note IFC = International Finance Corporation. 44 Results and Performance of the World Bank Group 2020 Chapter 3 Box 3.2. IFC’s AIMM System for Setting Project Objectives The Independent Evaluation Group (IEG) analyzed objectives in recent International Finance Corporation (IFC) projects to understand how IFC’s new Anticipated Impact Monitoring and Measurement (AIMM) system articulates intended outcomes.a Under AIMM, IFC projects include multiple project claims and market claims, but the World Bank sets only one objective per project. Project claims are defined as a project’s direct and indirect effects on stakeholders, the economy, and the environment and are comparable to World Bank projects’ project development objectives. Market claims are derived effects, defined as a project’s ability to catalyze systemic changes beyond those effects brought about by the project itself. IEG did not review AIMM scores, an index number for a combination of the depth and likelihood of project outcomes and contribution to market creation. Overall, IEG found that the system of project and market claims contained clear objective statements that aligned well with IFC’s higher-level goals of creating markets and fostering private sector development. AIMM ensures that project objectives align with IFC’s goals. Although IFC can ensure such alignment because of its focused business model and goals, the World Bank operates with objectives that are more diverse because of its diverse sector and country contexts. It is too early to tell what impact AIMM will have on outcome achievement, ratings, evidence, and incentives because no project under AIMM has been evaluated yet. Source: Independent Evaluation Group. Note a This sample is different from the rated sample analyzed in part I, which did not include projects with AIMM claims. Figure 3.5. Representative Examples of IFC Claims Project Claim Markett Claim • Design investment products or • Design products, services, services for small, medium, or or catalyzation activities large enterprises • Provide access to capital to small, • Invest into novel types of markets medium, or large enterprises • Transfer technical skills, • Grow trade finance offerings and expertise, and knowledge improve efficiency • Promote client firms’ expansion • Show successful market and revenue growth demonstration and replicability • Increase national employment • Create new dynamic and inclusive markets in multiple • Promote large-scale emerging markets economic growth Source: Independent Evaluation Group Note IFC = International Finance Corporation. Chapter 3 Independent Evaluation Group 45 Project Outcome Levels and Ratings This section combines the outcome level classification and ratings to examine the relationship between projects’ outcome levels and projects’ performance. This analysis was motivated by World Bank management’s efforts to better identify the risk-return trade-offs when formulating project objectives and related questions about whether the rating system influences project teams’ incentives when setting objectives. The analysis also aimed to explore potential explanations for key performance patterns identified in part I. Efficacy ratings (which assess to what extent projects achieve their stated objectives) and outcome ratings (which consider the project’s relevance and efficiency) were used.22 The relationship between objectives’ outcome levels and projects’ performance is only modest and becomes insignificant when controlling for other factors. Specifically, ratings for projects with level 3 and 4 outcomes are modestly lower than for projects with level 2 outcomes, and the difference in ratings is insignificant when controlling for instrument and other factors such as M&E quality (box 3.3). This finding runs counter to a key assumption prior to doing the analysis that one of the reasons for not setting higher level objectives is the risk of a lower rating. Instead, the finding shows no systematic trade-off between projects’ outcome level and ratings. This implies that many projects with higher-level objectives manage to achieve good outcome ratings, in part by having strong results frameworks to measure outcome achievement. Although the model does not provide any more detail on the causal relationship between objectives set at design and projects’ eventual performance—both depend on specific country and sector contexts—it does point to larger questions about when it makes sense for projects to set higher-level objectives and what it takes for such projects to be successful in reaching their intended outcomes. 22 IEG uses a numerical conversion of the four-point efficacy rating. Efficacy ratings are sometimes given for subobjectives. In that case, the average of the subobjective ratings was calculated. 46 Results and Performance of the World Bank Group 2020 Chapter 3 Box 3.3. Regression of Projects’ Performance on Outcome Levels and Other Factors A regression analysis on the Implementation Completion and Results Report Review sample shows that projects’ outcome levels do not play a statistically significant role for these projects’ efficacy rating when controlling for lending instrument. Regressing efficacy ratings on outcome levels and a dummy for lending instrument shows that investment project financing projects have markedly higher efficacy ratings than development policy financing projects do, in line with the findings of part I (model 2 in table B3.3.1). The difference in efficacy rating between lending instruments is statistically significant at the 0.001 percent level, whereas the outcome level is not statistically significant in this model. The negative relationship between efficacy and outcome level in model 1 is driven by the fact that investment project financing, which have higher efficacy ratings than development policy financing, also have lower outcome levels. The results are also robust to including projects’ monitoring and evaluation quality rating (model 3). In this model, the lending instrument and monitoring and evaluation quality affect efficacy rating at the 0.001 significance level, and outcome level remains statistically insignificant. The results are the same when controlling for Global Practice as random effect (model 4). Table B3.3.1. Regression Results Variable 1 2 3 4 Outcome level -0.0970** -0.0500 -0.0305 -0.0305 (0.0323) (0.0356) (0.0298) (0.0328) IPF (vs. DPF) -0.1982*** -0.2151*** -0.2151*** (0.0538) (0.0427) (0.0304) M&E rating -0.4793*** -0.4793*** (0.0243) (0.0328) GP as random effect Yes Number of observations 949 946 944 944 R2 0.0102 0.0225 0.3201 0.3201 Source: Independent Source: Independent Evaluation Evaluation Group. Group. Note DPF Note development DPF = development policy policy financing IPF = financing IPF = investment investment project project financing **p › 0.01**p › .01 financing GP = Global Practice M&E = monitoring and evaluation ***p › 0.001 GP = Global Practice M&E = monitoring and evaluation ***p › .001 Chapter 3 Independent Evaluation Group 47 Pairwise comparisons of efficacy and outcome ratings illustrate the same tendency of modestly lower ratings as outcome levels increase (figure 3.6 and table 3.1). DPFs at level 3 are rated modestly lower on outcomes than DPFs at level 2—true in both the ICRR and the Project Performance Assessment Report sample—and DPFs at level 4 are rated lower. Efficacy ratings are marginally lower for DPFs at higher outcome levels. IPFs tell a similar story. IPFs with level 2 outcomes have marginally higher efficacy ratings and somewhat higher outcome ratings than IPFs with level 3 outcomes—77 percent MS+ compared with 72 percent MS+. (The result for level 4 is not robust because of the small sample size.) Similar patterns are seen in many GPs and in a large sample of older projects (box 3.4).23 IEG next examines whether outcome levels help explain ratings differences reported in part I between GPs and project types. Table 3.1. Ratings and Outcome Levels, by Instrument IPF DPF Projects MS+ Average Projects MS+ Average Outcome Level (no.) (percent) efficacy ratinga (percent) (percent) efficacy ratinga Level 1 9 89 2.8 0 n.a. n.a. Level 2 580 77 2.7 45 71 2.5 Level 3 211 73 2.7 93 68 2.5 Level 4 7 57 2.6 33 61 2.3 Total 807 76 2.7 171 67 2.5 Source: Independent Evaluation Group. Source: Independent Evaluation Group. Note DPF = development policy financing MS+ = moderately satisfactory or above IPF == Note DPF development investment policy project financing n.a. MS+ financing = moderately satisfactory or above = not applicable a IEG uses a numerical conversion IPF = investment project financing of the four-point n.a. = efficacy rating. not applicable a IEG uses a numerical conversion of the four-point efficacy rating. 23 For example, Transport projects with level 3 outcomes have somewhat lower outcome ratings (74 percent moderately satisfactory or above) than projects with level 2 outcomes (83 percent moderately satisfactory or above), though efficacy ratings are identical for both outcome levels: 2.7 out of a maximum of 4. 48 Results and Performance of the World Bank Group 2020 Chapter 3 Figure 3.6. Ratings and Outcome Levels, by Instrument Average efficacy Outcome rated MS+ IPF DPF 2.69 2.65 2.53 2.45 77% 71% 73% 67% Outcome Outcome Outcome Outcome level 2 level 3 level 2 level 3 Source: Independent Evaluation Group. Note Circles show efficacy ratings, and lines show percentage of projects rated MS+. DPF = development project financing IPF = investment project financing MS+ = moderately satisfactory or above. Box 3.4. Outcome Levels and Ratings over a Longer Period The Independent Evaluation Group used machine learning to extend the outcome classification to older projects. It used the population of all 3,119 projects that were rated since 2009 for which the relevant information was readily available. The machine learning algorithm classified projects based on their objectives at either level 2 or level 3, with 92 percent accuracy on a test data set. Precision was lower for levels 1 and 4 because of small sample sizes, and therefore these results are not used here. Looking at only investment project financing (IPF), the outcome levels were broadly constant across the years. The algorithm coded 75 percent of projects at level 2 and 25 percent at level 3. Development policy financing projects had higher outcome levels than IPF projects in the machine-coded data. Furthermore, consistent with the other findings of this section, the outcome ratings for level 3 IPF projects were only marginally lower than level 2 IPF projects—71 percent moderately satisfactory or above compared with 73 percent. Source: Independent Evaluation Group. Chapter 3 Independent Evaluation Group 49 When revisiting the key performance patterns identified in part I, IEG finds that projects’ outcome levels do not explain the low ratings for projects in the MTI and Governance GPs. IEG combined the Education; Urban, Resilience, and Land; and Transport GPs (which tend to deliver basic services and are among the highest-rated GP portfolios) and combined the MTI and Governance GPs (both of which focus on policy and institutional reforms, often using DPFs, and are the lowest-rated GP portfolios). Table 3.2 shows that MTI and Governance projects have lower ratings compared with Urban, Education, and Transport projects, regardless of their outcomes level. The table also shows that MTI and Governance projects with level 3 outcomes achieved the same ratings as projects with level 2 outcomes (61 percent MS+ compared with 60 percent), and there was only a limited ratings decline for projects with level 4 outcomes (55 percent MS+). Looking only in FCV-affected countries, MTI and Governance projects with level 2 and 3 outcomes are again rated equally. Instead, the explanation for these key performance trends is related to the DPF instrument, which MTI and Governance GPs use much more often than other GPs. Multiple factors help explain lower ratings for DPFs (and thus for MTI and Governance GP projects). Policy and institutional reform objectives are more prone to risk and uncertainty than service delivery objectives. Some of those risks relate to the longer time frame needed for DPFs’ policy reforms to lead to outcomes. Such reforms must successfully proceed through a long change pathway to arrive at desired outcomes. For example, a policy reform supported by a DPF must build from a prior action (for example, a parliamentary proposal for a legislative change) to approving and enacting the change and waiting for that change to achieve intended higher-level outcomes, such as people or firms behaving differently and spurring economic growth. Each of the links in this chain depends on actions by governments, parliaments, and economic actors outside of the project’s control. Some risks relate to the nature of the DPF instrument itself. For example, the World Bank has less room to make course corrections to achieve results in DPFs than it does in IPFs, especially in stand-alone DPFs. Evaluation methods also play a role because, in reality, they differ between IPFs and DPFs. For example, DPFs’ outcome ratings are based only on assessment of relevance and efficacy, with no assessment of efficiency as done for IPF, and furthermore with challenges in assessing DPFs’ relevance and efficacy.24 24 It is hard to assess how and how much development policy financing contributed to overall reform outcomes, given that development policy financing’s prior actions are part of broader reform plans. Instead, evaluators can focus on the relevance of the prior actions and the results indicators. Planned reforms of the evaluation methodology for development policy financing aim to strengthen these dimensions. 50 Results and Performance of the World Bank Group 2020 Chapter 3 Table 3.2. Ratings and Outcome Levels for Select Global Practices and Project Types Education, URL, and Transport MTI and Governance Projects MS+ Average Projects MS+ Average Outcome Level (no.) (percent) efficacy ratinga (percent) (percent) efficacy ratinga Level 1 6 100 3.1 0 n.a. n.a. Level 2 237 84 2.8 53 60 2.5 Level 3 39 77 2.7 100 61 2.4 Level 4 3 100 2.8 28 55 2.2 Total 285 84 2.8 181 60 2.4 Source: Independent Evaluation Group; recent Implementation Completion and Results Report Review sample. Source: Independent Evaluation Group; recent Implementation Completion and Results Report Review sample. Note MS+ = moderately satisfactory or above URL = Urban, Resilience, and Land MS+ Note MTI Trade, and or = moderately satisfactory = Macroeconomics, above Investment n.a. = not URL = Urban, Resilience, and Land applicable a IEG uses MTI a numerical conversion = Macroeconomics, Trade,of the four-point and Investment efficacy rating. n.a. = not applicable a IEG uses a numerical conversion of the four-point efficacy rating. Projects with level 3 and 4 objectives appear to have adequate result frameworks and M&E systems as often as other projects do. It seems intuitive that it would be harder to design adequate result frameworks for projects with higher-level outcomes, yet the evidence suggests otherwise.25 Project M&E ratings decline little as outcome levels increase. Projects with objectives at level 2 were rated 45 percent high or substantial on M&E quality compared with 43 percent for level 3 and 44 percent for level 4 projects (table 3.3). A similar pattern emerges when looking at IPFs only. Similarly, IEG rates IPF projects low when there is insufficient evidence to confirm the projects’ achievement of objectives. This happened to at least 6 percent of all IPF projects with level 2 outcomes and 9 percent with level 3 outcomes (table 3.3).26 25 Recall that the quality of projects’ monitoring and evaluation is important for ratings, according to the regression analysis and the analysis presented in part I. 26 These figures are a lower bound estimate based on Implementation Completion and Results Report Reviews, in which the IEG reviewer explicitly noted weak evidence as a reason for the rating decision. Chapter 3 Independent Evaluation Group 51 Table 3.3. Ratings and Outcome Levels for Select Global Practices and Project Types Projects Rated High and IPF Projects Rated High IPF Projects with Outcome Level Substantial on M&E and Substantial on M&E Lack of Evidence Level 1 n.a. n.a. n.a. Level 2 45 45 6 Level 3 43 41 9 Level 4 44 n.a. n.a. Source: Independent Evaluation Group. Source: Independent Evaluation Group. Note Values for sample sizes of 10 IPF = investment project financing n.a. = not applicable Note Values for or fewer sample projects notsizes of 10 shown. IPF = investment M&E = monitoring project financing and evaluation n.a. = not applicable or fewer projects not shown. M&E = monitoring and evaluation The risk-return trade-off does not appear to be very pronounced in these data. Outcome levels vary across GPs and instruments, but this is not the reason for performance differences because IPF projects with level 3 objectives and DPF projects with level 3 and 4 objectives do not appear to have markedly higher risk of weak performance compared with projects with lower-level objectives. Half of agricultural and environmental IPF projects set their objectives at level 3 and still register mostly strong achievements. Differences in performance appear to be more closely associated with levels of risk and uncertainty and the time and complexity involved in pursuing policy and institutional reforms. Questions remain about what is required for projects to set and achieve ambitious objectives. 52 Results and Performance of the World Bank Group 2020 Chapter 3 Thematic Area Outcomes This section considers how the Bank Group aggregates project and program results in key thematic areas and the implications for its outcome orientation. The IEG team reviewed corporate strategies, Corporate Scorecards, and results measurement systems for three Global Themes: Gender; Climate Change; and Fragility, Conflict, and Violence (looking particularly closely at Climate Change). The Bank Group has clearly articulated higher-level outcomes for its thematic work. Bank Group corporate strategy documents set out clear high-level outcome goals, most famously the twin goals on poverty and shared prosperity and the commitment to the Sustainable Development Goals. Furthermore, there are many other goals, targets, and policy commitments set in different sectoral and thematic areas through the 2018 Bank Group capital package; the IDA Replenishments; World Bank Group Climate Change Action Plan 2016–2020; World Bank Group Gender Strategy (2016–2023): Gender Equality, Poverty Reduction, and Inclusive Growth; and World Bank Group Strategy for Fragility, Conflict, and Violence 2020–2025, among others. The Bank Group has extensive systems to track and aggregate its results, but these systems often operate at some distance from higher-level outcomes. All projects and country programs have results frameworks with objectives, indicators, and M&E systems to capture those indicators. These projects and programs undergo self-evaluations that IEG validates and rates, and these form the backbone of the Bank Group’s results measurement system. Aggregated data from projects and country programs appear in the Bank Group’s Corporate Scorecards, IDA’s results measurement system, and thematic results measurement systems, such as those for gender and climate change. Yet these data focus on internal processes and the number of people reached by health, water, financial, education, sanitation, electricity, and agricultural services. Such reach indicators correspond to level 2 outcomes, but they convey little about the service’s quality and impact on human well-being and, therefore, do not help staff manage to those outcomes. Only a few of the indicators in the Corporate Scorecards, IDA’s results measurement system, and thematic results measurement systems track higher-level outcomes. The results measurement systems for thematic areas do little to support the Bank Group’s outcome orientation. The RAP defines outcome orientation as gathering credible evidence on outcome achievement; using this evidence to adapt interventions and portfolios, engage clients, and learn; and thus becoming more effective at achieving positive social change. This definition is not about encouraging staff to aim for any particular level of outcomes. Rather, strong outcome orientation requires collecting credible evidence on progress and achievements and ensuring that staff have the right incentives to use the evidence to pursue positive social change relevant to the context of countries and sectors. Outcome orientation is different from achieving targets and monitoring processes. Chapter 3 Independent Evaluation Group 53 Instead, corporate results measurement systems help senior management track and incentivize operational fulfillment of corporate policy commitments. The Bank Group’s ability to track and report on its policy commitments confers legitimacy and credibility on the organization and has undoubtedly helped it secure strong IDA replenishments and IBRD and IFC capital increases. Corporate indicators incentivize operations to integrate these themes into their work streams and meet targets. For example, when the World Bank committed to engage citizens in all applicable projects and started tracking this, the share of projects with citizen engagement indicators in their results frameworks increased quickly, but there was limited evidence on the quality, influence, or outcomes of citizen engagement (World Bank 2018a). Box 3.5 examines how the climate change results measurement system has helped the Bank Group meet or exceed its climate action targets. Box 3.5. The Climate Change Results Measurement System The World Bank Group Climate Change Action Plan (CCAP) was adopted in April 2016 and lays out ambitious climate-related targets for 2016–20. The Bank Group has reported annually on progress for over 30 climate change–related actions and targets and is preparing a retrospective summary report. Through these targets, the Bank Group monitors how well it integrates climate change into operations and strategies. The vast majority of indicators, 90 percent, relate to actions under the World Bank’s direct control, including inputs, such as financing for climate action; internal processes, such as greenhouse gas accounting and risk screening; and outputs, such as the number of products that support countries and cities with climate-related policies, strategies, and capacity building. The results measurement system used for tracking CCAP targets has a limited focus on projects’ and programs’ quality and higher-level outcomes. The system incentivizes operations to adhere to process requirements, for example, to adjust cost-benefit analysis for the shadow price of carbon and screen or assess projects’ potential climate risks. However, it is unclear if requiring risk screening influences projects’ and country programs’ design, quality, and outcomes. Furthermore, there is no evidence on how well projects address climate risks. 54 Results and Performance of the World Bank Group 2020 Chapter 3 In the CCAP itself, the main commitment to increase the share of climate change–related commitments to 28 percent has driven all subsequent outputs and outcomes. At the level of the many institutional CCAP targets, only approximately 10 percent relate to outcomes, including level 2 outcome indicators, such as the amount of commercial funds mobilized for clean energy or the number of people covered by climate-adaptive social protection and early-warning services. The CCAP reporting does not, however, assess the Bank Group’s contributions to greener or more resilient national development trajectories. On the whole, the CCAP results measurement system has driven accountability and internal incentives to mainstream climate action across the Bank Group and has tracked progress in meeting targets, but it does not guide operations toward key outcomes or assess the quality of those outcomes. Source: Independent Evaluation Group. Corporate mandates and indicators cascade down to operational departments and can potentially drive box-checking behaviors. If operations sought to maximize the reach indicators for service access in the Bank Group Corporate Scorecard, they could increase the number of people covered by water, health, electricity, and other services at the cost of service quality. However, if operations instead had evidence of the quality of services, the capacity of institutions, and beneficiaries’ productivity and well-being, they might be better able to manage for those outcomes. In another example, an emergency health project in an Ebola-affected country was held back at one point because it did not meet a minimum threshold for climate cobenefits. Overall, the challenge is to ensure that targets create incentives that are compatible with outcome orientation, as discussed in the next, concluding, chapter. Chapter 3 Independent Evaluation Group 55 4. Conclusions Getting to Outcomes This report expanded on past RAPs by focusing not only on core performance as assessed through ratings but also on outcomes and relationships between ratings and outcome levels. This chapter draws out some key findings, conclusions, and implications for the Bank Group’s COVID-19 response and then for its outcome orientation. It finds that there are trade-offs between using results measurement systems for tracking commitment targets and outcome orientation. Confronting these trade-offs is necessary if the Bank Group wants to better support outcome orientation. Findings and Conclusions The analysis of performance showed positive ratings trends for World Bank and MIGA projects and Bank Group country programs in IBRD countries. The analysis linked the positive outcome ratings trends to strong work quality on project design, implementation support, and M&E, and broadly conducive economic and institutional conditions in many larger countries before the pandemic. Performance trends for IFC projects and in FCV-affected countries are less positive, albeit with signs of recent slight improvements for IFC. Less successful results were often linked to large shocks and issues with projects’ and programs’ preparation for risks and their response when shocks occurred. 56 Results and Performance of the World Bank Group 2020 Chapter 4 The analysis of outcomes showed that most IPF objectives cluster at level 2 around quality and access to services, though a few sectors state IPF objectives at level 3 with a clearer focus on end beneficiaries. Most DPFs state their objectives at level 3 with a focus on policy reform outcomes, and recently approved IFC projects often state their objectives at level 3, particularly in relation to market creation objectives. Projects’ outcome levels have only a modest relationship with their ratings, and the relationship becomes insignificant when controlling for other factors. Looking beyond projects, the analysis showed limited higher- level outcome data. The existing results measurement systems collect evidence needed for ratings and for process and compliance monitoring, which is different from evidence on outcome achievement. Based on the evidence and findings, this RAP concludes that the Bank Group can improve how its incentives and results measurement systems support outcome orientation. At the project level, many projects with higher-level objectives manage to achieve good IEG ratings, in part by having strong results frameworks to measure outcome achievement. Even so, it would not be realistic or desirable to expect all World Bank projects to have objectives at outcome level 3 or 4, as discussed in box 4.1. At the country program level, the Bank Group has opportunities to take a broader and more strategic view beyond individual projects, yet there is often little evidence on higher-level country outcomes, as discussed in IEG’s forthcoming evaluation of country programs’ outcome orientation. At the corporate level, the Bank Group’s extensive systems cover different thematic work areas and collect process and output indicators to help senior management incentivize and report on operations’ fulfillment of corporate policy commitments, but they do not help staff to manage for higher-level outcomes. Box 4.1. Setting Project Objectives Objective setting needs to balance the opposing demands of realism and ambition. Realism demands that objectives be achievable, given the projects’ resources, timeline, and context. Objectives that are far removed from project interventions jeopardize the ability to show contribution. Outcome achievement should be measurable. Country and sector context, geographic scope, and beneficiaries are some of the context factors that also matter. Ambition, however, demands objectives with a line of sight to systemwide, transformative, or other important higher-level outcomes. Ambition also demands result frameworks that measure how the project changes beneficiaries’ conditions, with attention to gender and distributional aspects. Balancing realism and ambition requires judgment and dialogue between client counterparts and the World Bank. Therefore, universal rules are unlikely to be helpful. In practice, it is plausible that some projects aspire to and achieve outcomes at a higher level than those captured in their objectives. Source: Independent Evaluation Group. Chapter 4 Independent Evaluation Group 57 Excessive focus on monitoring targets can cause a risk-averse corporate culture and stifle staff’s intrinsic motivation to pursue positive social change. IEG’s evaluation of the Bank Group’s self-evaluation systems found that staff often have little use for the collected data and find little value in it. Instead, incentives are to focus on checking the box, meaning meeting targets and feeding the demands for corporate monitoring data (World Bank 2016a). At the same time, a corporate culture focused on compliance, disbursements, and meeting targets can induce risk aversion, reduce openness about problems, interfere with staff learning and experimentation, and stifle how staff use evidence to pursue outcomes.27 For these and other reasons, there are trade-offs between outcome orientation and using results measurement systems for reporting and incentivizing fulfillment of policy commitments. Confronting trade-offs related to the purposes of the Bank Group’s results measurement systems is necessary for improving outcome orientation. The corporate results measurement systems for projects, programs, and thematic areas were purposefully designed to meet the Bank Group’s need to collect data it can report to shareholders to show attributable results and that allow shareholders’ representatives to hold it accountable. The purpose of tracking and reporting results data dictates what systems collect, leading to a focus on tracking commitments using quantifiable indicators of activities and lower-level results that can be attributed to Bank Group interventions, added up across portfolios, and used to verify whether targets have been met. Systems designed for such purposes are not geared to ward understanding and managing higher-level outcomes. Implications This RAP’s findings and conclusions have implications for the Bank Group’s ongoing response to the pandemic and other shocks. Some projects may need more robust implementation support and more frequent course correction during implementation, both to respond to shocks and unforeseen circumstances and to counter the potential effects of short preparation times on quality at entry. M&E systems across the Bank Group need to enable such responses and can do so by maintaining sight of project objectives and enabling teams and clients’ identification of issues and nimbler course corrections. The World Bank’s and clients’ administrative procedures for restructuring and canceling projects could be streamlined. Rating systems should not unduly penalize necessary changes to targets and objectives set at approval that may need adjustment in light of COVID-19 and other shocks. Box 4.2 explains what that might involve for IFC. Furthermore, the analysis of IFC’s performance highlighted the need for enhanced tools and processes to identify and mitigate market, country, and sponsor risks. 27 Based on research covering many development agencies (including the World Bank), academic Dan Honig discusses how to promote staff’s intrinsic motivation to achieve outcomes. Arguing for more “navigation by judgment,” Honig suggests promoting a less risk-averse corporate culture that embraces bold ambitions and gathers, uses, and learns from outcome evidence (Honig 2018, 2020). 58 Results and Performance of the World Bank Group 2020 Chapter 4 The analysis of country programs in low- capacity countries suggests that when adding new elements in response to large shocks, there is a need to simplify other program elements to avoid overtaxing country capacity. More broadly, such programs need to be designed from a premise of high risk. This includes aiming for short-term gains, sequencing longer- term reform agendas into discrete items with shorter time frames, and avoiding overburdened programs. Box 4.2. The Coronavirus Pandemic and IFC Project Ratings The coronavirus pandemic represents a shock to International Finance Corporation projects, putting the economic and financial sustainability of those projects at risk, at least temporarily. As one response, the International Finance Corporation and the Independent Evaluation Group are discussing how to adjust ratings processes and methodologies to account for shocks like the pandemic, which make the projects’ implementation environment and country context more challenging. Proposals include making project objectives more realistic by rating projects based on their midcourse correction targets rather than those set at approval (before the shock occurred), and giving the International Finance Corporation more flexibility to choose the evaluation timing, which may help projects recover and meet targets later. Source: Independent Evaluation Group. Chapter 4 Independent Evaluation Group 59 The outcome orientation findings suggest a need to rethink the approach to collecting outcome evidence beyond the project level. The existing self-evaluation instruments and results measurement systems aggregate data from individual projects, but higher-level outcomes result from the interplay of different projects over time, something that none of the Bank Group’s existing self-evaluation instruments capture. In line with past practice, this RAP does not make formal recommendations. However, IEG’s forthcoming evaluation of country programs’ outcome orientation discusses how using a wider set of methods and a focus on contribution rather than attribution could help support longer-term thinking and engagement on how the Bank Group contributes to important country-level outcomes. This includes more flexibly accounting for shocks and necessary program adjustments and better capturing the Bank Group’s contribution to institutional change in countries (box 4.3). It would be helpful to differentiate the purpose of collecting outcome evidence. At project level, setting objectives and assessing achievements that can be attributed to Bank Group support continues to be important for the institution’s accountability and credibility. This requires realism when setting projects’ development objectives as discussed in box 4.1. But beyond the project level, for results in country programs and at thematic levels, the purpose of collecting outcome evidence should not be to track and report and hold the institutions accountable for attributable results. Assessing outcomes often requires dedicated, context- specific evidence, which does not always lend itself easily to portfoliowide aggregation. Outcome evidence can be robust when based on sound evaluation methods that use plausible theories of change and credible data to relate Bank Group activities to observed outcomes in sectors and countries. 60 Results and Performance of the World Bank Group 2020 Chapter 4 Box 4.3. A Fresh Approach to Understanding Country Outcomes The Independent Evaluation Group’sevaluation of country programs’ outcome orientation finds that a satisfactory self-evaluation instrument would need to go beyond the present approach, which is centered on “results frameworks premised on metrics, attribution, and time-boundedness” (World Bank 2020, x). A self-evaluation instrument suited for collecting higher-level outcome evidence would have to cover a longer period and focus on a sector or country to capture contributions to outcomes and assess the cumulative effects from multiple World Bank, International Finance Corporation, and Multilateral Investment Guarantee Agency lending, knowledge, and convening interventions. “A renewed country- level results system could conceive accountability differently, based on evidence of achievement and failures and description of learning and adaptation. It could acknowledge that the Bank Group can influence but not control country outcomes. It could recognize that country teams cannot decide all targets and objectives at design but must adapt during implementation” for reasons relating to shocks, uncertainty, changing circumstances, and, especially for the International Finance Corporation and the Multilateral Investment Guarantee Agency, unpredictable client demand. And it could realize that capturing contributions to country outcomes and assessing cumulative effects from multiple interventions requires dedicated evaluation inquiries, not just measurement of indicators. Data for such a renewed system could come from existing project evaluations, impact evaluations, ratings, stakeholder surveys, and other sources. Source: World Bank 2020. Looking Ahead IEG plans to continue producing annual RAPs that aim to provide a broad perspective on the Bank Group’s performance. Though the exact shape of future RAPs is still undecided, IEG will continue its efforts to offer a lens through which to understand outcomes and outcome levels across sectors and Bank Group institutions. The Bank Group exists to work with its client countries on improving human conditions. A clear focus on outcomes helps it stay on course. Chapter 4 Independent Evaluation Group 61 Bibliography Custer, Samantha, Zachary Rice, Takaaki Masaki, Rebecca Latourell, and Bradley Parks. 2015. Listening to Leaders: Which Development Partners Do They Prefer and Why? Williamsburg, VA: AidData. Denizer, Cevdet, Daniel Kaufmann, and Aart Kraay. 2011. “Good Countries or Good Projects? Macro and Micro Correlates of World Bank Project Performance.” Journal of Development Economics 105 (November): 288–302. Geli, Patricia, Aart Kraay, and Hoveida Nobakht. 2014. “Predicting World Bank Project Outcome Ratings.” Policy Research Working Paper 7001, World Bank, Washington, DC. Honig, Dan. 2018. Navigation by Judgment: Why and When Top-Down Management of Foreign Aid Doesn’t Work. Oxford: Oxford University Press. Honig, Dan. 2020. “Actually Navigating by Judgment: Towards a New Paradigm of Donor Accountability Where the Current System Doesn’t Work.” Policy Paper 169, Center for Global Development, Washington, DC. Ika, Lavagnon A. 2015. “Opening the Black Box of Project Management: Does World Bank Project Supervision Influence Project Impact?” International Journal of Project Management 33 (5): 1111–23. Moll, Peter G., Patricia Geli, and Pablo Saavedra. 2015. “Correlates of Success in World Bank Development Policy Lending.” Policy Research Working Paper 7181, World Bank, Washington, DC. Raimondo, Estelle. 2016. “What Difference Does Good Monitoring and Evaluation Make to World Bank Project Performance?” Policy Research Working Paper 7726, World Bank, Washington, DC. Ralston, Laura. 2014. “Success in Difficult Environments: A Portfolio Analysis of Fragile and Conflict-Affected States.” Policy Research Working Paper 7098, World Bank, Washington, DC. World Bank. 2010a. “Responding to Floods in West Africa: Lessons from Evaluation.” Independent Evaluation Group Note, World Bank, Washington, DC. World Bank. 2010b. “Response to Pakistan’s Floods: Evaluative Lessons and Opportunity.” Independent Evaluation Group Note, World Bank, Washington, DC. 62 Results and Performance of the World Bank Group 2020 Bibliography World Bank. 2012. The World Bank Group’s Response to the Global Economic Crisis, Phase II. Independent Evaluation Group. Washington, DC: World Bank. http://ieg. worldbank.org/sites/default/files/Data/Evaluation/files/crisis2_full_report.pdf. World Bank. 2013. World Bank Group Assistance to Low-Income Fragile and Conflict-Affected States: Independent Evaluation Group. Washington, DC: World Bank. http://ieg. worldbank.org/sites/default/files/Data/reports/fcs_eval_0.pdf. World Bank. 2016a. Behind the Mirror: A Report on the Self-Evaluation Systems of the World Bank Group. Independent Evaluation Group. Washington, DC: World Bank. http://ieg. worldbank.org/sites/default/files/Data/Evaluation/files/behindthemirror_0716.pdf. World Bank. 2016b. Results and Performance of the World Bank Group 2015. Independent Evaluation Group. Washington, DC: World Bank. http://ieg.worldbank.org/sites/ default/files/Data/Evaluation/files/rap15_fullreport.pdf. World Bank 2017. Crisis Response and Resilience to Systemic Shocks: Lessons from IEG Evaluations. Independent Evaluation Group. Washington, DC: World Bank. http:// ieg.worldbank.org/sites/default/files/Data/reports/building-resilience.pdf. World Bank. 2018a. Engaging Citizens for Better Development Results: Independent Evaluation Group. Washington, DC: World Bank. http://ieg.worldbank.org/sites/default/ files/Data/Evaluation/files/Engaging_Citizens_for_Better_Development_Results_ FullReport.pdf. World Bank. 2018b. Results and Performance of the World Bank Group 2017. Independent Evaluation Group. Washington, DC: World Bank. http://ieg.worldbankgroup.org/ sites/default/files/Data/Evaluation/files/rap2017.pdf. World Bank. 2019. Results and Performance of the World Bank Group 2018. Independent Evaluation Group. Washington, DC: World Bank. https://ieg.worldbankgroup.org/ sites/default/files/Data/Evaluation/files/rap2018.pdf. World Bank. 2020. The World Bank Group Outcome Orientation at the Country Level. Independent Evaluation Group. Washington, DC: World Bank. Bibliography Independent Evaluation Group 63 Photo Credits Page 2 Sifting seeds in a field along Red River in Northern Vietnam QD-VN001 World Bank | Quy-Toan Do / World Bank | Vietnam Page 4 Happy students at a school in Uganda, Africa. Students raising their hands. 1784638013 | Boxed Lunch Productions | Uganda Page 5 Kuala Lumpur is the capital city of Malaysia, landscape view over rice field plantation farming in morning sunrise 131496056 | Szefei, from Switzerland | Malasya Page 6 ANSA-AW (Affiliated Network for Social Accountability- Arab World) was officially launched on March 14-15, 2012, in Rabat, Morocco. The event was organized by the World Bank in collaboration with CARE International (Egypt). ANSA-AW is a multi-stakeholder regional network comprised of CSOs, media, private sector and government representatives. Hoel_120313_DSC_7346 | Arne Hoel / World Bank Page 11 Irrigation system watering a crop of soy beans at field. 663246409 | Fotokostic, from Serbia Page 13 Rossing Uranium Mine lies about 70km inland from Swakopmund close to the small town of Arandis. It was founded by amateur geologist Capt. Peter Louw in 1928 largley owned by the Rio Tinto Group (69% shares). In 2006 the mine produced about 7% of the world production of primary produced uranium. The mine is a long term supplier of uranium to the world nuclear power industry. Haul trucks being repaired and serviced at the workshop on the mine. JH-NA070906_0049 | John Hogg / World Bank Page 15 Thermo-solar power plant. Ain Beni Mathar Integrated Combined Cycle Thermo-Solar Power Plant. DS-MA111 | Dana Smillie / World Bank Page 17 Andean family taking their live stock to grazing pastures in the Andes, Peru, South America 298577849 | Duncan Andison, from U.K. | Peru Page 18 Winding desert road in Wadi Rum, Jordan 115982191 | Boris Stroujko, from Switzerland | Jordan 64 Results and Performance of the World Bank Group 2020 Photo Credits Page 21 Soap and Water for Clean Hands for African Children 1007424487 | Riccardo Mayer, from Germany Page 22 Munnar, Kerala, India - October 12, 2007 : Beautiful landscape of a road passing a village and tree plantations near Munnar 784009363 | ImagesofIndia | India page 23 Baidoa / Somalia - March 2017 - People who carry water rest under a tree in the refugee camp. 1100529911 | Mustafa Olgun, from Turkey | Somalia Page 24 Low angle shot of modern glass buildings and green with clear sky background. 613341923 | James Teoh Art Page 25 The oil pump, industrial equipment 1664994739 | Pan Demin, from China Page 26 Kolony, Uganda – October 02, 2016: Many pregnant woman waiting for an ultrasound scan at the Kolonyi hospital in Uganda. A German doctor is there to educate the local doctors. 766560016 | Dennis Wegewijs, from Germany | Uganda Page 29 Metro Manila / Philippines - April 2019: Bonifacio Global city skyline at Magic hour. Bonifacio Global City or BGC, is a financial and lifestyle district in Metro Manila, Philippines. 1367780063 | Hit Uno, from Japan | Philippines Page 30 Daily life in Monrovia, Liberia on December 2, 1014. Liberia_Scene_Setters_0004 | Dominic Chavez/World Bank Page 33 Daily life in Monrovia, Liberia on December 2, 1014. Liberia_Scene_Setters_0002 | Dominic Chavez/World Bank Page 34 Indigenous peruvian Quechua woman with traditional hat and textile along a Andes road, Sacred Valley of the Inca, Cusco, Peru 1802841382 | Sebastien Lecocq, from Belgium | Peru Page 35 Workers building a new road. Albes Fusha / World Bank | Albania Page 36 Good quality cocoa beans that are carefully selected in the hands of the owners of agricultural workers 1389682223 | Attasit saentep, from Thailand Photo Credits Independent Evaluation Group 65 Page 38 According to the World Bank’s Malaysia Economic Monitor, June 2013, the country’s recent economic performance and near term outlook owes much to the commodities sector which includes palm oil. Palm oil is used for products such as animal feed. Nafise Motlaq / World Bank | Kuala Lumpur, Malaysia. Page 40 Cattle and donkeys near a water point in Kenya’s Eastern Province. FP-KE-0639 | Flore de Preneuf / World Bank Page 41 Young boys on fishing boat. AH-GH061111_5002 | Arne Hoel / The World Bank | Ghana Page 42 Cleaning solar panels. Ain Beni Mathar Integrated Combined Cycle Thermo-Solar Power Plant DS-MA117 | Dana Smillie / World Bank Page 43 Local Intha woman weaving blue lotus fabric on a loom at the local lotus cloth weaving workshop at Inle lake, Shan State, Myanmar (Burma). January 2019. Selective focus 1359398909 | Anya Newrcha, from Russia | Burma Page 44 Abdul Satar, 30, says, before the cementing of the floor of the canal and the establishment of the sluices they had a lot of problems with the irrigation of their farms. 10/26/2014. Deh Surkh Village,Zenda jan district, Herat, Afghanistan. Ghulam Abbas Farzami / World Bank Page 46 Zaheda feeding her chicken in the farm. Livestock Extension, FFS methodology training, National Horticulture and Livestock Project. 27 Jan 2015, Itifaq Mena, Surkhrud district, Jalalabad, Afghanistan. ABBAS Farzami / Rumi Consultancy / WorldBank Page 49 Solar panels in desert under colorful sunset sky clouds, sun energy and electricity generation in Africa. Investment project to reduce greenhouse gas emissions. 1384724600 | Yasmin Meraki, from Netherlands | Africa Page 52 Water Projects, Lesotho. Advance Infrastructure of the Metolong Dam and Water Supply Programme included bridges (two) and a tarred access road of 32km road to the site from Maseru. Also power supply, water and sanitation, telecommunications, construction camp and permanent operational facilities. Bridge 1 over the Phuthiatsana River at Ha-Makhoathi. There is also small scale agriculture next to the river some of which is irrigated, Lesotho farmers however rely more on rainfall than irrigation. JH-LS-090625-2 | John Hogg / World Bank 66 Results and Performance of the World Bank Group 2020 Photo Credits Page 54 Parched soil by the White Nile. AH-SD2161869 | Arne Hoel / World Bank | Khartoum, Sudan. Page 55 A Metrobus system bus, part of the new mass transportation system in Panama City, Panama. Gerardo Pesantez / World Bank Page 56 African health professional or physician wearing face mask for protection and scrubs,child wearing wearing homemade mask sitting on her,looking at camera in covid-19 pandemic 1798128712 | Yaw Niel, from Ghana Page 59 Baobab trees along the rural road at sunny day 187374941 | Dudarev Mikhail, from Russia | Senegal Page 61 New rural roads have provided access to markets for the local communities. Ana Gjokutaj / World Bank | Albania Photo Credits Independent Evaluation Group 67 Chapter 4 (H1) World Bank Follow-Up on Major Evaluations by  Group Management This chapter summarizes progress made in implementing action plans created in response to recommendations from IEG’s major evaluations. It finds that progress can be slow, that the current system for tracking and reporting on action plans does not work well, and that delays in formulati IFC Advisory Projects Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, cons ectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis 68 Results and Performance of the World Bank Group 2020 Chapter 4