Policy Research Working Paper 9884 A Puzzle with Missing Pieces Explaining the Effectiveness of World Bank Development Projects Louise Ashton Jed Friedman Diana Goldemberg Mustafa Zakir Hussain Thomas Kenyon Akib Khan Mo Zhou Operations Policy and Country Services Vice-Presidency & Development Research Group December 2021 Policy Research Working Paper 9884 Abstract The identification of key determinants of aid effectiveness collected measures, is a significant predictor of ultimate is a long-standing question in the development commu- project success. These factors generally grow in predictive nity. This paper reviews the literature on aid effectiveness importance as the income level of the country rises. The at the project level and then extends the inquiry in a variety results also indicate that a key determinant of the staff’s of dimensions with new data on World Bank investment contribution is their experience with previous World Bank project financing. It confirms that the country institutional projects, but not other characteristics such as age, education, setting and quality of project supervision are associated with or country location. Key inputs to the project production project success, as identified previously. However, many process associated with subsequent performance are not aspects of the development project cycle, especially proj- captured in routine data systems, although it is feasible ect design, have been difficult to measure and therefore to do so. Further, the conceptualization and measurement under-investigated. The paper finds that project design, of the success of project-based aid should be revisited by as proxied by the estimated value added of design staff, evaluative bodies to reflect a project’s theorized contribution the presence of prior analytic work, and other specially to development outcomes. This paper is a product of the Investment and Program Lending Unit, Operations Policy and Country Services Vice- Presidency and the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at jfriedman@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team A Puzzle with Missing Pieces: Explaining the Effectiveness of World Bank Development Projects Louise Ashton, Jed Friedman, Diana Goldemberg, Mustafa Zakir Hussain, Thomas Kenyon, Akib Khan, Mo Zhou JEL codes: H43, O22 Key words: Development project success, project effectiveness The authors would like to acknowledge the valuable contributions of Lily Chu, Enrique Pantoja, Eun Jung Park, Jaime Sarmiento, John Underwood, Nastassha Arreza, Bastien Koch, Sangill Lee, and Olatunde Olatunji. All authors are with the World Bank, except for Akib Khan at Uppsala University. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Corresponding author: Jed Friedman (jfriedman@worldbank.org). 1. Conceptual frameworks and knowledge gaps in development project effectiveness, an introduction Broad questions of aid effectiveness surround international efforts to achieve the Sustainable Development Goals. These questions include: Is development aid worth the devoted resources? What aspects can improve the efficiency of aid? What do we consider to be a development success? This paper reviews and then extends the inquiry into determinants of development aid success in so far as aid takes the form of the development project. We confirm that project success is associated with the quality of both a country’s institutional setting and the quality of individual project supervision, as found previously. However, other key inputs to the development project cycle, in particular the quality and suitability of project design, have rarely been investigated due to the paucity of standardized quantifiable information at the project level. We develop proxies for the quality of project design such as the estimated value-added of design staff, the presence of prior analytic work, and specially collected measures from original project documents. By and large project success is positively associated with these measures and in fact these measures grow in predictive importance as the income level of the country rises. Yet despite the significant associations with the newly developed measures, the overall goodness-of-fit of statistical models that predict project success does not appreciably change, with the clear majority of variation in project success left unexplained. This leads then to broader questions around how we conceptualize and measure the success of project-based aid. Economic researchers have adopted various approaches to investigate aid effectiveness and the characteristics of aid delivery that are associated with success. An aggregate country-level approach generally focuses on economic growth as the outcome and macro-level aid measures as explanatory variables. A second strand of the literature adopts a micro-econometric perspective to relate project-level outcomes, typically the ratings assigned ex-post by independent evaluators, with some combination of country-level and project-level characteristics. Other outcome levels, notably above the project at the level of the sector, have been relatively under-exploited by researchers To better situate the question of development project effectiveness, Table 1 summarizes alternative analytic frameworks investigating different levels of aid and corresponding outcomes of interest. The lowest practical unit of analysis is that of the intervention or project component, such as a vaccination campaign or investment in a wastewater treatment plant. Where possible, impact evaluation methods have led to a burgeoning literature on effectiveness for a wide range of interventions. 1 The bulk of micro-econometric work on development effectiveness has focused on the project as the unit of analysis and explored various correlates or predictors of project success. This is the literature we review and then extend in this paper. We can envision development projects as comprising a bundle of interventions, sometimes aggregating them across sectors or along a service delivery chain – for example, combining assistance to strengthen public financial management with financing for investments in schools. Projects also vary in the extent to which they combine financial with other forms of support and in the type of disbursement conditionality they impose on the borrower. The scope of previous studies has been driven largely by data availability, yet the production function for an investment project is complex and few inputs into the process are routinely collected by administrative data. 1 See Cameron et al. (2016) for an account of the growth of impact evaluations in developing contexts, and Banerjee et al. (2017) and Deaton and Cartwright (2018) for discussions on its potentials and limitations. 2 Project production comprises a cycle of strategy development, identification, preparation, appraisal and executive board approval, implementation, supervision and completion, followed by evaluation. 2 There are numerous contributors at each stage. On the World Bank side, strategy development is the responsibility of the country management unit, in particular the country director who determines the size of the financing envelope; project preparation, appraisal, supervision and completion are largely the responsibility of the task team, comprising technical and fiduciary specialists, but with guidance from sector managers and country office representatives. To the extent that previous studies have estimated the impact of individual professionals, this has largely been limited to the project manager (commonly referred to as Task Team Leaders or TTLs in the World Bank) during project implementation. Almost none have identified or attempted to measure the contribution of other participants in the process. Along these lines, one contribution of this study is to more comprehensively assess the contribution of project design and supervision to development effectiveness. We consider the usual explanatory factors of project level success in the literature, clustered around (i) country characteristics, such as institutional quality, political and economic stability, and economic growth trajectories; and (ii) project characteristics, such as duration, size, sector and lending instrument. Additionally, we include (iii) aspects of project design, such as the clarity of results frameworks, the number of components, and the characteristics of staff involved in design and (iv) aspects of project supervision, such as the intensity, quality, and continuity of oversight. For this, we combine data on World Bank projects’ outcomes with a novel data set derived from project preparation documents and internal staff records. Specifically, we examine the contributions of the TTL who prepared the project, the Practice Manager who oversaw them and the Country Director responsible for the country program within which the project was set, as well as the chain of TTLs who supervised the project until close. A second contribution of this study is to examine measures of project quality as captured in original project documents. For a stratified random sample of 120 projects, we manually extract detailed information from primary project documents, including an expert panel assessment of the design quality and a thorough review of evaluation reports from outlier projects – defined either as those with highly satisfactory outcome ratings in low capacity environments (‘positive outliers’) or highly unsatisfactory ratings in high capacity environments (‘negative outliers’). These harder to measure aspects of project design and supervision comprise support for center-of-government functions such as planning and budgeting, human resource management, financial management and procurement; and support for strengthening the implementation capacity of sector agencies, such as through results-based management. In theory, their effect may be felt not just on the management of interventions financed by the project but through institutional spillovers to other areas of government activity. The project cycle is similarly complex on the government side. Responsibility for advising on the content of the World Bank’s strategy and identifying projects lies with key economic ministries, typically finance, planning and investment, and sometimes international cooperative bodies. Investment projects are generally implemented by line ministries or, in some cases, specialized government agencies. There are also operations implemented by sub-national governments, either states, provinces or municipalities, whose capacity can substantially vary. In each case, direct responsibility typically lies with a project implementation unit, staffed with civil servants who may or may not have previous experience of World Bank projects. All investment projects also depend to a greater or lesser degree on the effectiveness of 2 See Figure A3.1 in Annex 3 for a schematic representation of this process, as detailed in World Bank (2011). 3 government procurement and financial management systems. But to the extent that previous studies have estimated the impact of the government’s contribution, this has generally been through the inclusion of country rather than agency or team level measures of administrative capacity. A fuller approach to evaluating the determinants of project success would be more sensitive to the role of all participants on both sides, and at all stages of the project cycle, particularly identification and evaluation. Following the literature on donor effectiveness, the ratings assigned by the Independent Evaluation Group (IEG) constitute our primary measure of individual project success. 3 These ratings are the culmination of a two-stage process: first, the World Bank’s self-evaluation end-of-project report – the Implementation Completion and Results Report (ICR) 4 – and subsequently the desk-based critical review by an external evaluator (ICR Review, or ICRR). In about 20 percent of cases the ICRR is followed by a detailed field-based evaluation (Project Performance Assessment Report, PPAR). The ICRR and the PPAR are both conducted by IEG. Together these lead to a six-point outcome rating that captures the extent to which projects achieved their intended development outcomes, ranging from highly unsatisfactory to highly satisfactory. We take as our measure the ICRR or, where available, the PPAR. While certainly noisy to some degree, these project success measures are widely seen as valid proxies for true project performance due to their institutional independence and transparent criteria. The combination of measurement error in the outcomes and key inputs into the project production process, as well as the omission of other inputs, is likely why the explanatory power of previous studies has been relatively limited. No reviewed study explains more than 25-30 percent of the variance in project outcomes and most far less than that. The rest of this paper, after systematically reviewing the extant literature, provides a more complete analysis of the associations between inputs to project production and ultimate project success. We distinguish between project design and implementation characteristics as well as the characteristics of those responsible for designing and implementing projects – and look at both. Despite the addition of these new measures, significantly associated with project success, up to 70% of variation in project outcome scores remains unexplained. Therefore in the concluding section we broaden the discussion to include conceptual and interpretive issues pertaining to the scope of project evaluations and ratings. We note that, since they are limited to individual projects, effectiveness ratings often fail to capture how development projects are incorporated into policy making or the cumulative impacts that projects may have. We close by sketching out a research agenda that relates more closely to the theorized contribution of projects to development outcomes and discuss its implications for the analysis conducted in this paper and the literature we survey. 3 The IEG faces different incentives than the rest of the World Bank, reporting directly to the institution’s Board rather than its President – this independence is at the core of its credibility. IEG considers three criteria in its evaluations of projects: relevance, efficacy, and efficiency. Relevance is assessed as the degree to which project objectives were aligned with the country strategy, country capacity and previous World Bank experience in the sector. Efficacy refers to the success of the project’s objectives being achieved, or expected to be achieved, considering their relative importance and the extent to which they are attributable to the activities or actions supported by the project. Efficiency is a measure of how economically project resources were converted into results and whether the project was designed and implemented at lowest cost. 4 Typically the managing World Bank unit compels the self-evaluation end of project report, the ICR, to be started by the borrower and drafted independently of the staff tasked with the project design and supervision. Taking a random sample of 100 projects in our data set, we confirm that is it uncommon for the supervision TTL to be part of the team drafting the ICR (25% of cases, with only 19% as the main author). Further, draft ICRs are subject to a peer-review process within the Bank before finalization. 4 2. What do we know about project effectiveness? A review of the literature A growing body of literature examines the factors associated with aid effectiveness at the project level and tries to understand both the characteristics of successful projects and the contexts in which those projects are more likely to succeed. We conducted a systematic review of the published and grey literature on the topic, using a two-step approach. First, we searched four meta-databases – JSTOR, SSRN, RePEc, and Google Scholar – for articles posted between 2000 and 2021 containing the keywords ("project quality" AND "development") OR ("project success" AND "development"). Results were screened according to the following inclusion criteria: (a) a considered paper analyzes projects funded by multilateral financial institutions or bilateral development cooperation agencies, and (b) a considered paper employs quantitative empirical analysis with projects as the unit of analysis. Additional relevant articles were identified through the reference lists of the included articles. The full list of reviewed studies is available in Annex 1. The outcome variable in most of the studies reviewed is the ex-post success rating produced by donor staff and evaluation experts. This rating intends to capture a project's relevance, effectiveness, efficiency, sustainability, and impact, and map onto a broader OECD Development Assistance Committee standard (OECD, 2019). The literature on donor effectiveness uses these ratings extensively, considering them a noisy but valid measure of project performance (Briggs, 2020; Denizer et al., 2013; Dreher et al., 2013; Geli et al., 2014). Other outcome variables documented at the project level are the economic rate of return (Isham et al., 1997; Mubila et al., 2000) and the disbursement profile (Álvarez et al., 2012; Kilby, 2013; Kersting and Kilby, 2016). We discuss the identified papers’ findings by the categories of explanatory factors investigated, which fall under three clusters: (i) project characteristics, (ii) staff/management characteristics, and (iii) country characteristics. 2.1. Project characteristics To the extent that previous contributions have examined the importance of project design, they have been concerned mainly with easily observable characteristics like size, duration and sector. Project size is typically proxied by the total financial amount pledged or spent. Evidence on whether project size is indicative of project performance is mixed, ranging from positive (Shin et al., 2017; Winters, 2019; Wood et al., 2020) to irrelevant (Geli et al., 2014; Caselli and Presbitero, 2021) and negative (Denizer et al., 2013). However, having disbursements fall short of commitment amounts is associated with worse outcomes, suggesting that poor-performing projects may be cut short (Feeny and Vuong, 2017). Regarding project length, projects that take longer to implement are less likely to be successful, regardless of whether the completion time meets a planned deadline (Geli et al., 2014) or exceeds the deadline (Bulman et al., 2017). This finding generalizes across most donors (Briggs, 2020).When dividing project length into the duration of various stages of the project cycle, these durations are differentially associated with project success. Longer preparation time is positively correlated with success (Chauvet et al., 2013; Kilby, 2015), while delays between approval and the start of implementation are negatively correlated with success (Denizer et al., 2013). Other dimensions of project design examined in the literature include sectors, funding sources, and implementation counterparts. Winters (2019) finds that World Bank projects with co-financing from foreign and/or domestic sources receive somewhat worse ratings, though this effect explains only a small part of 5 within-country project effectiveness variation. Shin et al. (2017) find that World Bank projects implemented by non-governmental actors have better evaluation scores, albeit acknowledging their reduced number of only 29 of the 647 projects they study. While the quality of project design may be difficult to define and observe, there have been a few attempts to investigate the role of design quality – the extent to which project objectives are relevant and achievable – in project success. Higher scores on the quality of the economic analysis in project appraisal documents are associated with better ratings during project implementation (Vawda et al., 2003; Corral and McCarthy, 2020), while projects with weak or unclear results frameworks are more likely to be downgraded (Blanc et al., 2016). Studies using quality-at-entry of World Bank projects measured by the IEG produce similar findings (Chauvet et al., 2010; Smets et al., 2013; Wane, 2004). Perceptions of World Bank managers confirm project design as a critical success factor (Ika et al., 2012). 5 Moreover, early studies indicate that a strong analytical underpinning – proxied by labor inputs from designated analytic staff – is positively associated with the quality of World Bank projects (Deiniger et al., 1998; Wane, 2004). 2.2. Staff and management characteristics Several studies have investigated the role of project supervision in subsequent project performance. In one example, Kilby (2000) finds the number of weeks dedicated to project supervision has a positive and significant association with project performance. Relatedly, Kilby (2001) concludes that the benefit of project supervision is expressed strongly in the early stages of implementation and diminishes as the project progresses. Nevertheless, there is some evidence that poorly designed projects are difficult to turn around and that supervision cannot wholly compensate for poor quality at entry (Corral and McCarthy, 2020; Limodio, 2011). Others have investigated the association of supervisory staff characteristics with project success, including attempts to proxy for supervision inputs on the quality margin as well as the quantity margin. Denizer et al. (2013) proxy for good supervisory leadership with a measure of supervisory quality defined as the weighted average success rate of all projects that the TTL has ever worked on (with weights that are proportional to how much time the TTL spent on each project). They estimate that a one standard deviation increase in the TTL track record variable is linked with an increase in the chance of project success by 6%. Moll et al. (2015) extend the work on supervisor quality to include a quantity measure of experience, as well as the average success of past projects. They find a positive and significant relation of project success with the TTL quality variable, as found before, but they also find that the number of projects taken to the Board by the TTL is not a reliable indicator of the likelihood of project success. The role of staff in other key phases of the project cycle, such as project design, or staff who serve as overall management at the country or sector level, have been little studied. Two notable exceptions, using World Bank project and personnel data, are Honig (2020) and Limodio (2021). Since these studies rely on publicly available and web-scrapped data, there are differences in data set size and quality: while Limodio (2021) analyzed a selected sample of 715 TTLs who post their CVs on public websites, we rely on Human 5 IEG project evaluations include a ‘quality at entry’ rating, which captures the ‘extent to which the World Bank identified, facilitated preparation of, and appraised the project so that it was most likely to achieve planned development outcomes and was consistent with the World Bank’s fiduciary role.’ The interpretive challenge with this ex-post measure is that it is simultaneously generated with the overall project rating and therefore highly correlated with it. 6 Resources records of the over 2,700 distinct TTLs in our sample. Another difference is that Limodio (2021) considers the preparation TTLs exclusively, while we broaden the staff considered to include supervision TTLs, Practice Managers, and Country Directors. Limodio (2021) finds that good preparation TTLs change fewer countries, are in charge of more and larger projects. He does not find correlations between TTL performance and their gender nor years of experience, but the number of internal promotions positively correlates with performance. Moreover, Limodio (2021) documents a negative assortative matching between high-performing TTLs and low-performing countries. Honig (2020) takes a different approach, focusing on the relationship between staff presence in recipient countries and project performance, rather than individual characteristics. He finds that the TTL presence in country-offices during preparation and implementation has no significant effect on project success. However, Honig (2020) finds that the local presence of a Country Director is associated with greater project success, especially in fragile states and since the "Strategic Compact" initiative increased the power of Country Directors. 2.3. Country-level characteristics Alongside micro-correlates of project success, various studies have looked at macro-correlates of project success such as financial and growth indicators and, especially, institutional quality. Isham et al. (1997), for instance, report that recipient country civil liberties indicators positively relate with projects' economic rate of return and likelihood of a successful rating. The view that strong public sector institutions matter for project success is supported by documenting the effects of the government effectiveness index (Buntaine and Parks, 2013) and state capacity measures (Hanson and Sigman, 2019) on project outcomes. In a related vein, Denizer et al. (2013) and Geli et al. (2014) find that institutional quality, measured by the Country Policy and Institutional Assessment (CPIA) score, has a significant and positive correlation with project success. 6 There are contradictory findings elsewhere, however. Using the Freedom House scores for civil liberties and political rights, Feeny and Vuong (2017) find that the probability of Asian Development Bank projects' success appears lower in more democratic countries. And Blum (2014) finds no clear evidence that projects, at least in public sector management, are more likely to succeed in countries with higher administrative capacity, as measured by CPIA scores and International Country Risk Guides (ICRG) Bureaucratic Quality Ratings. Blum (2014) offers an explanation for this seemingly counterintuitive result, arguing that TTLs may adjust the ambitiousness of their project objectives to the level of administrative capacity in the respective country. Therefore, he argues, the likelihood of reaching these objectives across countries and institutional environments is relatively constant. The idea that economic growth and stability are conducive to project success is intuitively appealing and supported by evidence. Higher GDP growth rates are associated with more successful projects (Denizer et al., 2013; Briggs, 2020; Bulman et al., 2017; Feeny and Vuong, 2017; Presbitero, 2016), while economically unstable environments – such as high export volatility (Guillaumont and Laajaj, 2006), inflation (Deininger et al., 1998) and GDP volatility (Caselli and Presbitero, 2021) – are associated with worse project outcomes. On the other hand, Presbitero (2016) finds that World Bank projects implemented during periods of public investment booms are less likely to succeed, comprising suggestive evidence of absorptive capacity constraints in recipient countries. 6 These scores comprise a set of criteria grouped into four clusters: economic management, structural policies, policies for social inclusion and equity, and public sector management and institutions. 7 Adding to the findings of the effects of economic instability, Chauvet et al. (2010) explore the extent to which post-civil war situations shape the outcomes of World Bank projects. Unsurprisingly, they find that projects that start in a post-war period have lower chances of success than projects implemented in countries at peace. Caselli and Presbitero (2021) expand on these results using a larger data set of aid projects funded by seven donors over 1980-2012, documenting that a project implemented in a fragile state is eight percentage points less likely to succeed than a similar project in a non-fragile developing country. There is some suggestive evidence that geopolitical context influences disbursement patterns, project design, and evaluation at the World Bank. Projects in geopolitically important countries have less stringent conditions (Clark and Dolan, 2021), shorter preparation periods (Kilby, 2013), and are disbursed faster (Kilby, 2013; Kersting and Kilby, 2016). Dreher et al. (2013) find no significant average effect of political influence on the likelihood of a successful project evaluation, but some evidence that politics affect project performance in exceptional cases of high political stakes and challenging economic circumstances. Regarding the overall degree that country factors explain variation in project success, several papers have addressed the relative importance of project and country-level factors in explaining outcomes. Denizer et al. (2013) report that 80% of the variation in World Bank project outcomes occurs within countries, similar to the 75%-90% obtained by Bulman et al. (2017) for both the World Bank and the Asian Development Bank. Using the same method to examine projects by the Australian aid agency (Wood et al., 2020) and seven diverse donors (Briggs, 2020) produced consistent findings that development project success varies more within countries than across countries. 3. Data and methods: What is our empirical approach? We combine data on World Bank project outcomes from the Bank’s Independent Evaluation Group (IEG) with a new data set derived from project preparation documents and internal staff records to assess the contribution of project design and management, as well as other relevant factors, to development effectiveness. Specifically, we ask to what extent observed variations in project quality are associated with: (i) factors related to the selection and retention of key staff and managers of staff; and (ii) other project design and supervision factors that are within the World Bank’s control. We draw on hitherto unexploited administrative data to assess the relation of other previously unobserved aspects of the project production function on outcome ratings. Specifically, we examine the contributions of the TTLs who either prepared or supervised the project, the Practice Manager 7 (PM) who oversaw project development, and the Country Director (CD) responsible for the country program within which the project was set. TTLs design and supervise Bank projects in collaboration with the client government; PMs make decisions on staffing and clear all aspects of project preparation and subsequent implementation; CDs shape the overall dialogue with the country and chair the meetings that finalize project design. For each individual staff in our data, we then use internal staff records to gather key staff characteristics. We also assess the influence of aspects of project design that the literature has identified as relevant, such as complexity and the clarity of project results frameworks. 7 This role has been titled differently over time at the World Bank, for this study we use the current title of Practice Manager and draw equivalency with previous titles Sector Manager and Division Chief. 8 We use institutional records to identify the preparation TTLs for 3,355 projects, PMs for 3,154 projects and CDs at time of approval for 4,342 projects. The identity of supervision TTLs is already captured in routine reporting systems. The identity of preparation TTLs, PMs, and CDs are scraped from internal records – this process is described in more detail in Annex 2. From HR records we also observe the TTL’s age and education, previous Bank experience, and location (headquarters or a field office). Our measure of staff contribution is a ‘value-added’ measure defined as the difference between the predicted performance of the individual’s previous projects, based on national and fixed project factors, and the realized performance. 8 While we have collected data on projects evaluated by the World Bank’s IEG since the late 1990s up through the end of Fiscal Year 2017 (i.e. June 2017), we focus the analysis on two sub-samples. The main data set concentrates on 2,845 projects approved between Fiscal Years 1995 and 2009. 9 This date range is strategically chosen for two reasons: (a) by starting with FY1995 projects, the missingness of novel project input measures is minimized as missing values are more prevalent in projects that were approved before 1995, and (b) by limiting the start date to before 2009 minimizes data censoring, as many projects approved after 2009 have yet to close and be rated. A second contribution of this paper is to test the influence of various aspects of project design and implementation on project outcomes. We do so in part with a stratified random sample of 120 projects from the full data set. This smaller data set, which we term the qualitative data set for convenience, supplements the existing measures with qualitative assessments of project characteristics, extracted from primary documents such as Project Appraisal Documents (PADs), Implementation Completion and Results Reports (ICRs) and IEG ICR Reviews (ICRRs). The qualitative data variables include an expert panel assessment of the quality of Project Development Objectives (PDOs) and results frameworks. They also include data on restructurings and a set of contingent indicators such as the number of PDO changes and closing date extensions for those projects that underwent restructuring – i.e. factors that are countable but not counted in existing administrative data. We complement this with a review of the ICRRs from outlier projects – defined either as those with highly satisfactory outcome ratings in low capacity environments (‘positive outliers’) or highly unsatisfactory ratings in high-capacity environments (‘negative outliers’). The project success ratings themselves fall on a six-point scale ranging from 1 (highly unsatisfactory) to 6 (highly satisfactory). For a binary designation of satisfactory project, attaining a cut-off of 4 or more is required (72% of projects in our data were rated as ‘satisfactory’). There is a tendency for IEG to downgrade ratings assigned by project management and, in the case of PPARs, its own reviewers. Of the 1,968 projects for which we have final ICR and ICRR ratings, 628 were downgraded from the ICR and 54 upgraded. Of the 563 projects for which we have PPARs, 159 were downgraded and 54 upgraded. As most downgrades are from satisfactory to moderately satisfactory and from moderately satisfactory to 8 This measure is distinct from the staff quality measure in Denizer et al. (2013), who adopt the average outcome of all other observed projects of the TTL as their quality measure. Our quality measure is somewhat more closely related to the one in Limodio (2021), although our measure utilizes additional project-level information in the value-added calculation, and is discussed further in the next section. 9 This set of projects excludes 29 regional (multi-country) projects due to the inferential challenges of work with operations in multiple countries. Annex 2 Table A2.1 lists the total number of projects by fiscal year along with the number of projects included in the main analytic data set. 9 moderately unsatisfactory, this leads to further bunching in the middle of the distribution than would otherwise be the case. Table 2 reports summary measures of project core characteristics in our two analytic samples. Two estimates of each measure are reported: the project-weighted and the dollar-weighted (by pledged loan volume) averages. The first information in the table depicts the distribution of the IEG project outcome scores. In the analytic data set there is a clear bunching of outcomes at the ‘satisfactory’ and ‘moderately satisfactory’ ratings while only 3% of projects are rated as highly satisfactory. Similarly, only 1% of projects were highly unsatisfactory and of the remaining 26% percent of projects deemed unsuccessful, 15% are ‘moderately unsatisfactory’. The fact that larger projects are slightly more likely to be rated successful is reflected in the fact that 76% of dollars was pledged to ultimately successful projects as opposed to 72% of projects. We see that while World Bank investment projects are more numerous in Sub-Saharan Africa, when weighting by dollars they are greatest in the East Asia and Pacific region, followed by Latin America and the Caribbean. The Sustainable Development (SD) sector dominates by project count and by dollar volume but still almost half of projects are in the Equitable Growth, Finance, and Institutions (EFI) and Human Development (HD) sectors. 10 Slightly more than half of investment projects occur through the IDA instrument but 44% of projects fall within the less-concessional IBRD vehicle. In contrast, by dollar volume 64% of lending over the period studied was offered through IBRD. 11 The average real project size at time of approval stands at 89.6 million USD (measured in 2015 USD), while the slightly lower total amount committed, at 87.3 million USD, reflects the actual amount transferred including both partial cancellations and any increases in loans size that took place during project restructuring. The intended length at approval stands at five years, however project duration extension over the life of the project is quite common as the actual mean project length of the completed projects is 6.8 years. We include two country-level measures, reported both at the beginning of projects and as the change over the project lifetime. The World Bank’s Country Policy and Institutional Assessment (CPIA) data provides a rating of countries’ policy and institutional frameworks and their conduciveness to poverty reduction, sustainable growth, and the effective use of development assistance. This rating is against a set of 16 criteria grouped in four clusters: economic management, structural policies, policies for social inclusion and equity, and public sector management and institutions. In terms of the country characteristics linked to these projects, the average CPIA stands at 3.5 and ranges from 1 (indicating weak country performance) to 6 (strong performance). The average change in score over the life of the projects is actually positive, indicating improvement in institutional environment over the implementation period. Average log 10 The sectors considered in this paper correspond to the World Bank’s three practice groups as of FY2018, in which the global practices (GPs) are nested. EFI includes the following: Finance and Markets; Governance; Macroeconomics and Fiscal Management; Poverty & Equity. HD includes Education; Health, Nutrition & Population; and Social Protection and Labor. Lastly, SD is comprised of: Agriculture and Rural Development; Energy and Extractives; Environment and Natural Resources; Social, Urban, Rural and Resilience; Transport and ICT; Water. 11 International Development Association (IDA) and International Bank for Reconstruction and Development (IBRD) are the two lending arms of the World Bank responsible for public sector projects. IDA and IBRD share the same staff and headquarters, and country eligibility to both funding sources is determined by their income level. National eligibility for the concessional IDA support is updated annually and for 2022 stands at a GNI per capita below US$1,205. Countries above this eligibility cut-off are generally eligible for IBRD financing. 10 GDP per capita at approval stood at 5,625 USD in PPP terms, and the average annual economic growth rate over the life of the project was 3.8%. In the qualitative data set, selected as a stratified random sample from the larger groups of projects, the mean project characteristics are virtually the same as the full set even though the rate of project success is only 50%. This is because projects were selected to ensure relatively equal numbers of low- and high- rated projects. The analytic approach used to identify significant predictors of project performance is a straightforward regression framework given by the following, = 0 + + + + Where Y is the rating of project i approved in year t and country c, X is a vector of considered predictors which will vary with the particular analysis, Z is a vector of standard project covariates included in all specifications, and fE is a vector of evaluator fixed effects to control for possible rater bias. The standard set of project covariates (Z) includes world region, sector of project, whether the loan is a standard (IBRD) or a concessionary (IDA) loan extended to low-income countries, the original intended size of the project (in 2015 constant dollars), the original intended project length, the country’s CPIA score in the year of approval, and the change in the CPIA score over the project’s lifetime. The main outcome, the IEG rating, is an ordinal six-point scale and the main analytic results (Tables 4 to 9) are simply estimated from OLS with standard errors clustered at the country level. The results are robust to alternative specification choices such as an ordered probit, a probit with a dichotomized success/failure outcome, whether the observations are weighted by the constant dollar value of the intended loan volume, as well as the missingness of key explanatory variables (most notably the ‘value added’ of the Practice Manager which has a high rate of missingness). Annex 4 presents all alternative specifications for each of the subsequent analytic tables discussed in the next section. The annex also presents all main results stratified by IDA or IBRD project status. 4. What do we find? We begin with an investigation of the relative importance of project or country characteristics in explaining the observed variation in project ratings. Several measures related to the decomposition of variance in the IEG ratings score are presented in Table 3, namely the mean sum of squares, the F-statistic, and the adjusted R-square. The project characteristics investigated include country and fiscal year indicators, project sector, and various accountings of key project staff identity including the TTL at preparation responsible for project design, the supervisory TTL at the mid-point of the project, the supervisory TTL at the close of the project, both the practice manager and the country director at the start of the project, and the staff responsible for project evaluation after close. All characteristics investigated have some predictive power for ultimate project success, with the exception of project sector, as determined by the F-statistic. The country of operation itself is relatively influential, with an adjusted R-square of 0.07. Interacting country with fiscal year pushes the R-square to .09. In contrast, when controlling for the identity of key project staff such as the TTL at preparation or the supervisory TTLs at midpoint or endpoint, the adjusted R-squares rise to 0.13-0.14. The most variation is 11 explained if we combine measures for TTL identity at the start and close of the project, with an adjusted R- square of 0.29. The mean sum of squares statistic also suggest that the greatest degree of variation is absorbed by preparatory or supervisory staff measures. More senior management cadre such as the country director or practice manager also absorb a degree of project quality variation but to a substantially less extent – with adjusted R-square of .04-.06. This is approximately the same predictive influence as the project evaluator. While only a starting point for our analysis, these results reflect some of the themes found in the earlier literature. As in Denizer et al. (2013), country setting is an important predictor for project success, but supervisory staff are even more consequential. For the first time, we are able to decompose variation in project scores with the TTL at the design stage and find that this role has only slightly less explanatory power than supervision TTLs. Senior management positions also contribute some explanatory power but substantially less than project-level staff. We also note that much of the observed variation in project outcomes remains unexplained. A fuller specification that includes all the covariates featured in Table 3 can explain almost 30% of the total variation in project outcomes, roughly equal to the variation explained by the Bank’s internal problem project monitoring system when the final rating is regressed on problem project indicators accrued over the life of the project. Country-level characteristics In terms of predictors of project success at the country-level, we investigate the influence of numerous measures. Consistently we find the most influential predictor is a country’s institutional quality as proxied by the CPIA rating. While in a univariate framework, various measures such as the GDP growth rate predict project ratings, most of the national policy or macroeconomic environment measures considered have no association with project quality when included in a multivariate framework with the CPIA score. Other national measures investigated include log GDP per capita, GDP growth over the project life, log population, Freedom House scores of political rights and civil liberties, assessed political risks such as the stability of the government or the impartiality of the legal system, and assessed financial risks such as foreign debt service as percentage of exports. Table 4 summarizes the national level analysis. A one-point increase in the CPIA score in the year of project approval is associated with 0.16 increase in the ultimate IEG rating. Perhaps of even more importance for project success is the change in the CPIA score over the life of the project – a project is significantly more likely to be rated a success if the institutional environment in the country is improving over time. A one-point increase in the CPIA score implies a higher IEG rating by 0.32. Also of interest is the apparent negative covariation in CPIA levels and CPIA changes as, when the project success rating is regressed on both terms (column 3 of Table 4), both coefficients increase in magnitude by a factor of two. To underscore the predictive importance of the CPIA score, the final three columns of Table 4 present an example of an influential factor in a univariate setting, GDP growth over the project lifetime, reduces in importance and loses precision once the CPIA score is also included in the analysis. One application of the predictive importance of the CPIA arises in the next sub-section, as the value-added estimates of individual staff control for the CPIA score of the country of operation. We have no means of controlling for within country variance in institutional capacity either across sector or region, however it would likely be significant. Of the 21 negative outliers in our sample (those in a country with a CPIA rating above the mean whose outcomes were rated highly unsatisfactory and for 12 which ICRRs are available), nine were to sub-national entities, either regions or municipalities, that had little or no prior experience of working with World Bank lending projects. Some of these were also multi- sector projects, with more than one implementing agency. Staffing inputs to the project production process We turn now to the role of key staff in the production of project quality. Prior work has consistently found the supervisory TTL to be important in determining outcomes. Both Denizer et al. (2013) and Geli et al. (2014) find that the record of supervisory TTLs is, along with the CPIA, one of the two best predictors of quality. Similar results were obtained by Moll et al. (2015) for World Bank development policy financing, 12 and Álvarez et al. (2012) for Inter-American Development Bank projects. Earlier work by Kilby (2000) finds that the number of weeks dedicated to project supervision has a positive and significant impact on project performance. We build on these earlier studies by extending our consideration of staff inputs to include preparation TTLs, sector managers, and country directors. The key metric we construct for staff inputs is a `value added’ measure that attempts to predict the contribution of the TTL to project quality by investigating how all other projects developed or supervised by the TTL performed relative to expectations. Expected project performance is estimated by a regression model that predicts project performance on the basis of various standard characteristics such as country CPIA level and change, national population, per capita GDP, sector of project, and pledged project size, as well as country fixed effects. The staff `value added’ measure, which we term staff predicted performance, is then calculated as the average deviation of actual project rating from the predicted rating for all other projects the TTL has either designed or supervised. 13 Once normalized, this approach generates a distribution of predicted performance across staff with a mean of zero and a fair degree of dispersion. This approach is in line with the teacher value-added framework (Hanushek, 1971; Chetty et al., 2014), and has also been used to study the effects of managers in firms (Bertrand and Schoar, 2003). Annex 3 discusses this ‘value added’ framework in more detail, presents the ‘value added’ regressions, and the distributions of predicted performance for preparation TTLs, supervisory TTLs, practice managers, and country directors. Table 5 explores the relationship between the predicted performance of staff and project success. Our first finding is that TTL ‘quality’, as proxied by the predicted performance, is indeed associated with project success. A one standard deviation change in predicted performance for preparation TTL is associated with a 0.09 increase in the project rating, controlling for the standard project and country characteristics. By comparison, a one standard deviation change in the CPIA score at year of project approval is associated with a 0.16 change in outcomes, therefore the preparation TTL effect is about 60% as strong. However, the effect is substantially greater among IBRD borrowers (Annex 4: Table A4.6e). Indeed, for projects in IBRD countries, the preparation TTL coefficient is 0.15, on the same order of magnitude as the CPIA score at approval. We will see that differential influence of staff predicted performance by country income status is a common thread in the findings on the importance of staffing, 12 The World Bank offers different financing instruments to client countries. The focus of Moll et al. (2015) is development policy financing, which provides budget support to governments, while this paper investigates investment project financing, which provides financing to governments for activities that create the physical/social infrastructure necessary to reduce poverty and create sustainable development. 13 Where applicable, the same individual staff would have a separate predicted performance for project design and project supervision activities. 13 and suggestive that the influence of ‘soft’ inputs may be greater in the higher capacity environments of higher income countries. Concerning supervisory activities, there is a strong association between the predicted performance of the supervision TTL and final quality ratings and that supervision quality is more strongly associated with project success than preparation quality when we include all our measures in the same model (column 4 of Table 5). 14 An increase of 1 SD in the supervision TTL value added measure is associated with an increase of 0.13 points in the final project rating, roughly the same order of magnitude as a one point increase in the CPIA score at year of approval. However, this supervisory quality effect is stronger in the second half of project implementation, perhaps suggesting that implementation problems may take time to emerge and that the scope for correcting them is greater after mid-term review. Indeed, when conditioning on supervision in both halves of the project life, it is only the predicted performance of staff in the second- half of project life that predicts ultimate success. Similar to the preparation TTL results, the importance of supervisory staff quality is greater in IBRD than in IDA countries – while positive and fairly precisely estimated in IDA countries, the association is roughly twice as large for projects located in IBRD countries (Annex Table A4.5e). With 72% of successful projects, these TTL estimates suggest a substantial association – raising the quality of both the preparation and supervision TTL by 1 SD is linked to an increase in the likelihood of project success of 11 percentage points. 15 Regarding the ‘value added’ of higher levels of management, we find no significant association between the predicted performance of either the CD or the PM and project quality in the full data set. Given the relatively few numbers of CDs and PMs in the data, in comparison with TTLs – there are 1,466 distinct preparation TTLs and 2,780 distinct supervisory TTLs while only 273 unique PMs and 170 unique CDs – the lack of precision in the correlational estimates of CD or PM with project success is perhaps expected. Indeed, the point estimate for PM suggests an association of .075 rating points for a 1 SD increase in PM value-added, while the regression is only powered to detect an association of 0.11 or greater at standard levels of significance. In fact, when projects are weighted by the dollar volume of commitments, the association of project success with PM quality is substantially greater (a coefficient of 0.15) and precisely estimated, indicating that management influence is perhaps more salient with larger projects. Interestingly, in the relatively small set of projects in upper middle-income countries, CD association emerges as positive and significant. Similar to the IBRD findings for preparation and supervision TTL, the importance of staff predicted performance emerges in higher income contexts where project complexity and the policy environment may render staff `quality’ measures more meaningful. Another aspect of staff management that has been found to predict project performance is staff turnover. The higher the rate of turnover, the lower the subsequent rating. We extend this investigation by looking at three different aspects of staff turnover – a change in TTL between project approval and the first supervisory period, the rate of staff turnover in the first half of the project (before the mid-term review), and the turnover rate in the second half of the project. Contrary to expressed beliefs within the institution, where project handover at this sensitive moment is a concern, we find little relation between TTL turnover 14 While the preparation TTL quality coefficient loses significance when including other staff measures in the OLS specification, the same coefficient remains significant in the ordered probit, probit, and OLS weighted by loan volume. This suggests (a) a relatively robust relation between project outcomes and preparation staff ‘value-added’, and (b) increased predictive power of preparation TTL quality in larger projects. 15 This estimate is based on a probit with a dichotomized success/failure outcome, where moderately satisfactory, satisfactory and highly satisfactory ratings are considered a success (Annex Table A4.5b, column 8). 14 at the start of the project and subsequent project outcomes. However, we do find that the rate of turnover during project implementation is associated with poorer performance, especially turnover in the second half of project duration. After the project mid-term review, an increase in the ratio of TTLs to bi-annual progress reports from 0.25 to 0.50 is associated with a reduction in the IEG rating of 0.15 points, roughly the same magnitude as an increase of one standard deviation in the predicted performance of supervisory staff. Turnover in the first half of the project is also detrimental, but less so in magnitude, and indeed the size of the coefficient declines once second half turnover is included in the specification. This again indicates the importance of supervision in the second half of project life for ultimate success. Similar to the findings above, the detrimental impacts of high turnover are only identified in IBRD countries. Of course, the identified association between staff turnover and poor project quality is only correlational. It is not clear if the direction of causality runs from lack of management continuity to performance or, perhaps, already poorly performing projects deter TTLs from remaining with them. Nevertheless, excessive staff turnover can be defined and monitored as a potential warning indicator justifying further management review of project challenges. Turning to specific staff characteristics, earlier work has found that a TTL’s previous experience does not affect project outcomes (Denizer et al. 2013). In contrast we find in Table 7 that previous experience does have some predictive power for project success in at least two senses. Previous experience either as a preparation TTL (measured as the number of previous projects prepared) or supervisory TTL (measured as the log of the number of bi-annual project reports submitted) is significantly related to the TTL predicted performance measure. In turn, a direct relationship between previous experience and project ratings (the second panel of Table 7) is far weaker, with only the previous supervisory experience of the TTL managing the second half of a project related to project outcomes. Consistent with the earlier pattern, the relationship between previous supervisory experience and project outcomes is only identified in IBRD projects. This again highlights that a relation between project success and staff quality measures emerges more strongly in the more complex settings of middle-income countries. Regarding other TTL characteristics, we are able to investigate possible associations between TTL age, education, World Bank tenure, posting location, and project success. There is no identifiable relationship between a TTL’s predicted effectiveness and their age or level of education (in effect whether they have a PhD, since all TTLs have at least a master’s degree). 16 Additionally neither the length of their tenure at the World Bank nor whether they are based at the World Bank’s Washington headquarters or in country offices appear to matter, though we do find a significant negative effect for TTLs based in the project country for projects prepared before 2003. 17 Relatively few TTLs were based in country offices before 2003. After 2003, the World Bank decentralized further resulting in more field-based staff and consequently the country office penalty for TTL location disappeared. The only measurable staff 16 The exception is for TTLs supervising projects in the World Bank’s human development practice groups (education, health and social protection), where the presence of a PhD is negatively associated with project outcomes. 17 Limodio (2021), with a somewhat selected sample of TTLs who post CVs to the website LinkedIn, also finds an absence of association between the contribution of TTLs and their tenure in the institution. He does however find a positive correlation between TTL performance and the possession of an MBA, having completed a degree in their own country and the number of their publications. He also finds a negative correlation between their performance and having worked at the International Monetary Fund. 15 characteristic that has some possible influence with project quality is the TTL’s World-Bank-specific project experience. Project design in the qualitative data and the link between World Bank analytic work and project quality While there is some indication that the quality of project design plays a role in ultimate project success – specifically the ‘value-added’ of the preparation TTL covaries with the IEG rating, at least in IBRD settings, and the previous design experience of World Bank projects helps predict the TTL ‘value added’ – the evidence for the importance of design is relatively scant in relation to the evidence of the importance of supervisory quality. For this reason, we turn next to the qualitative data set and explore measures of project design not routinely captured in administrative data. Table 8 presents some simple bivariate associations (estimated with OLS) in the qualitative data between the IEG rating and various generalized design features. These features include design complexity measures such as the number of project components and sub-components, the number of primary development objectives, and the number of intermediate indicators as described in core project documents. Additional measures include expert panel ratings (based on a 4-point Likert scale) of the quality of the stated project objectives and the project results framework, as well as whether and to what extent project components were modified as part of a restructuring (while project restructuring may occur for various reasons, changes to project components suggest possible design challenges). Many of these indicators are significantly associated with the project’s success rating. Design complexity measures such as the number of project components or the number of intermediate indicators are significantly and negatively associated with subsequent IEG rating. Project designs that fail to define the basic aspects of a results framework (as evidenced by a low score from an expert review panel) also receive lower outcome ratings. The most common results framework shortcomings identified in our sample of 120 operations were inadequate baseline information or indicator targets and an absence of indicators relating to the project’s expected outcome. Further evidence on the stickiness of design flaws comes from our data on project restructuring. Almost two-thirds of the 120 projects in our sample data set were restructured, some more than once. There is no penalty attached to restructuring per se. Indeed, failure to restructure promptly or even cancel is often mentioned as a contributor to poor performance in ICRRs. But restructuring to compensate for poor design, as measured by changes to project components, is also associated with lower outcome ratings. To contextualize the qualitative data set findings, we construct an index of design features where the weight for each individual element (listed in the top panel of Table 8) is determined through principal components analysis. This index is significantly related to the project rating – a one standard deviation increase in this index is associated with a 0.22 point increase in the rating. By contrast, an index derived from standard project characteristics related to project success, such as the duration of the project development period, is also significantly related to the IEG rating but at only half the magnitude as the design index. Another benchmark is the CPIA at year of project approval. This measure is not significantly related to the project rating in the qualitative data (likely due to the small sample size), although the magnitude of the point estimate, 0.30, is only slightly larger than the coefficient on the design index. An additional measure linked to project design quality is whether the project has been informed by prior analytic work tailored to the sector and country of operation. The World Bank devotes significant resources on the production of analytic work and therefore we would hope to see the impact of such 16 spending on subsequent operation quality. In Table 9 we find that prior analytical work is indeed linked to improved project outcomes, especially in lower-income IDA settings and possibly even fragile and conflict- affected settings (FCVs). Analytical work completed up to three years before project approval is associated with an increase of 0.12 in IEG outcome ratings (0.15 points for IDA countries); half the impact associated with a one-point improvement in the CPIA. This suggests an important role for World Bank initiated analytic work in country environments that may be relatively “knowledge scarce”, where the information generated by these analytic efforts informs subsequent project design. 5. Summary and discussion of unsettled questions This paper reviews the literature on the correlates of World Bank project outcome ratings and then extends previous analytic work by identifying further staffing inputs into the production process, linking previous sector analytic work to individual projects, and extracting a richer set of design quality explanators from a subset of projects. Results from these analytic extensions all point towards the importance of project design in ultimate project success, especially in the more sectoral complex settings of middle- and upper-middle- income countries (IBRD). Previous quantitative work has found fairly weak evidence for the importance of project design; however, it appears that this is in part due to the paucity of relevant indicators captured in routine World Bank administrative data. It would be an error to conclude from these earlier studies that project design is not an especially influential determinant of project success. Like much of the project effectiveness literature, the analysis discussed here is descriptive in nature. Hence, any observed association between project inputs and outcomes may either represent an underlying causal relationship or may be confounded by further unobserved factors. While our understanding of the project cycle leads us to consider these correlations as plausibly capturing the role certain inputs play in project success, there are alternative explanations that render the correlations spurious. For example, project staff are not randomly assigned to develop or supervise projects but are chosen by management. Other work suggests negative assortative matching between high performing TTLs and low performing environments (Limodio, 2021). If indeed management places some of the most effective staff in the most challenging environments, this would serve to downward bias the estimated relation between TTL quality and project success. While we still largely find a positive relation between TTL ‘value added’ and project ratings, the true relation may actually be of greater magnitude. Short of instituting experimental staff assignments, there are few analytic possibilities that will cleanly identify the causal relationship between inputs and project success. One direction of inquiry involves the mandatory staff rotation rules where, after a fixed period (of either four or five years in the recent past) staff are forced to rotate to a different global region. Arguably, staff turnover due to rotation rules are exogenous to other project characteristics. In this example, information on staff tenure in a region can be then leveraged to more carefully estimate the relation between staff turnover and project success. Regarding the overall variation in the IEG rating, we explain slightly more than a quarter of observed variance in project outcomes, most of it accounted for by staffing and project characteristics, with approximately five percent of total variance attributable to country-level factors. Some of the most significant contributions to project success appear to be made by the `value-added’ of supervision TTLs, and the most important observed determinant of their contribution is previous World Bank project experience, not their age, education, tenure, or country location. This argues for deploying scarce expertise 17 more effectively, including through mentorship and shadowing programs for junior staff. Some of our other findings also have implications for World Bank policy, particularly with respect to human resources management. The negative association between TTL turnover and project quality sits uneasily with the institution’s practice of rotating staff every four years. So does the apparent absence of penalty for headquarters-based staff with its periodic efforts at decentralization, though field-based staff may admittedly contribute in other ways than through improving project quality. Finally, the positive association between prior analytical work and operational outcomes argues for a more systematic integration of analytic work into project preparation, especially in ‘knowledge scarce’ countries. There are several possible reasons for the degree of unexplained variance in the analysis of World Bank project success. One is that we fail to account for the participation of other technical specialists in project teams beyond TTLs and Practice Managers whose contributions may be significant in determining project quality. Project design and supervision is a joint production of numerous specialized staff, managed by the TTL. Focus on the TTL is empirically convenient (as reliable data exists or can be constructed) but ignores potentially critical inputs from other staff. Another possible reason is that the capacity and willingness of government counterparts to implement World Bank projects varies within countries, perhaps reflecting their prior experience of working with the institution, and that this variation is not captured in national level data such as CPIA scores. Third, we have no systematic measure for the process through which projects are selected by professional staff and their government counterparts. For example, we do not observe projects that were considered and discussed with government counterparts but abandoned before World Bank Board approval. We would expect this process to be a critical determinant of the relevance and attainability of the objectives of existing projects, two criteria on which the success of projects is rated. Beyond shortcomings in explanatory power, there is another reason for revisiting how we evaluate project effectiveness. It is that evaluation ratings themselves likely should not be the sole investigated measure of development impact. By assessing primarily whether a project has achieved its stated objectives, IEG outcome ratings miss the contribution that project advisors make in defining an intervention’s objectives and theory of change. The ratings may over- or under-state the contribution of the Bank by failing to establish a counterfactual or rule out alternative explanations for observed outcomes. One possible complementary approach to the type of analysis reviewed and discussed here would focus on the role of projects as intermediaries between interventions and sector outcomes, taking advantage of the more rigorous body of evidence on the effectiveness of the former and the possibly more reliable measurement of the latter. Related to this view, we may conceive of a project as an intervening structure between individual interventions and sectoral outcomes. 18 Project level features such as the quality of implementation often help or hinder the effectiveness of single interventions. Variations in implementation capacity across interventions can also undermine the validity of inferences from randomized controlled trials and limit the extent to which they may be extrapolated across sites. 19 Empirically, such an approach might relate donors’ contribution to strengthening these government functions with variation in sectoral outcomes where there is solid evidence of the effectiveness of individual interventions. Some studies have 18 Williams (2020) posits that contextual differences affect intervention impacts when they interact with the policy’s theory of change, and that mechanism mapping can prospectively predict the impact of transporting or scaling up an intervention that has succeeded elsewhere. Development projects can fulfill this role of mechanism mapping. 19 Allcott (2015) explores this question for energy conservation programs in the US. See Das, Friedman and Kandpal (2018) for a developing country health sector example. 18 analyzed how health and education outcomes respond to the amount of development assistance received (Marty et al., 2017; Odokonyero, 2018; Dreher et al., 2008; Gehring et al., 2017; Riddell and Niño-Zarazúa, 2016). However, they do not observe characteristics of the aid received, which may help bridge the gap between the micro-level literature on interventions and the macro-level literature on aid flows. The results of such a research effort would have implications for the broad literature on the determinants of development effectiveness. Were we, for example, to find that project ratings have little explanatory power for development outcomes, we might conclude that alternative measures of project performance need be developed. Nonetheless, it would still be worthwhile, as we have done in this paper, to collect further data on other aspects of project design and implementation that we should expect to correlate with improvements in government capacity to manage interventions of the sort supported through World Bank projects, and to relate them to long term development outcomes. Such an approach might focus more closely on the role of the World Bank in supporting the adaptation of these interventions to local context, and on their evaluation and replication. The World Bank has evolved from providing stand-alone lending operations to guidelines for engagement under which all operations must be conceived as building blocks of a country assistance strategy. Through time, the focus of these country strategies has shifted from inputs – a series of analytical and lending products – to development results, as they became more participatory, involving consultation with country governments and other stakeholders. 20 Initial evaluations of country programs by IEG highlight the importance of careful consideration of recipient country capacities and World Bank’s comparative advantages when selecting priorities, grounding all operations in solid analytical work and adequate sequencing of interventions (Independent Evaluation Group, 2016). The evaluation framework that envisions projects as part of a coherent country program, rather than as individual units, strikes us to be a promising future endeavor. 20 Currently, this is operationalized through the preparation of Country Partnership Frameworks, which rely on the analytical underpinnings of a Systematic Country Diagnostic. For more details, see World Bank (2014). 19 References Allcott, H. (2015). Site Selection Bias in Program Evaluation. The Quarterly Journal of Economics, 130 (3), 1117–1165. https://doi.org/10.1093/qje/qjv015 Álvarez, C., Bueso-Merriam, J., & Stucchi, R. (2012). So you think you know what drives disbursements at the IDB? Think, think again... (Technical Note No. 479). Inter-American Development Bank, Washington, DC. https://publications.iadb.org/en/so-you-think-you-know-what-drives-disbursements-idb-think-think- again Banerjee, A., Banerji, R., Berry, J., Duflo, E., Kannan, H., Mukerji, S., Shotland, M., & Walton, M. (2017). From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application. Journal of Economic Perspectives, 31(4), 73–102. https://doi.org/10.1257/jep.31.4.73 Bertrand, M. and Schoar, A.(2003), Managing with style: The effect of managers on firm policies, The Quarterly Journal of Economics 118(4), 1169-1208. https://doi.org/10.1162/003355303322552775 Blanc, M., Esmail, T., Mascarell, C., & Rodriguez, R. (2016). Predicting Project Outcomes: A Simple Methodology for Predictions Based on Project Ratings (Policy Research Working Paper No. 7800). World Bank, Washington, DC. https://openknowledge.worldbank.org/handle/10986/25045 Blum, J. R. (2014). What Factors Predict How Public Sector Projects Perform? A Review of the World Bank's Public Sector Management Portfolio (Policy Research Working Paper No. 6798). World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-6798 Bold, T., Kimenyi, M., Mwabu, G., Ng’ang’a, A., & Sandefur, J. (2018). Experimental evidence on scaling up education reforms in Kenya. Journal of Public Economics, 168, 1–20. https://doi.org/10.1016/j.jpubeco.2018.08.007 Briggs, R. C. (2020). Results from single-donor analyses of project aid success seem to generalize pretty well across donors. The Review of International Organizations, 15(4), 947–963. https://doi.org/10.1007/s11558-019-09365-x Bulman, D., Kolkma, W., & Kraay, A. (2017). Good countries or good projects? Comparing macro and micro correlates of World Bank and Asian Development Bank project performance. The Review of International Organizations, 12(3), 335–363. https://doi.org/10.1007/s11558-016-9256-x Buntaine, M. T., & Parks, B. C. (2013). When Do Environmentally Focused Assistance Projects Achieve their Objectives? Evidence from World Bank Post-Project Evaluations. Global Environmental Politics, 13(2), 65–88. https://doi.org/10.1162/GLEP_a_00167 Cameron, D. B., Mishra, A., & Brown, A. N. (2016). The growth of impact evaluation for international development: How much have we learned? Journal of Development Effectiveness, 8(1), 1–21. https://doi.org/10.1080/19439342.2015.1034156 Caselli, F. G., & Presbitero, A. F. (2021). Aid Effectiveness in Fragile States. In Macroeconomic Policy in Fragile States (pp. 493–520). Oxford University Press. https://doi.org/10.1093/oso/9780198853091.003.0016 Chauvet, L., Collier, P., & Duponchel, M. (2010). What explains aid project success in post-conflict situations ? (Policy Research Working Paper No. 5418). World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-5418 20 Chauvet, L., Collier, P., & Fuster, A. (2013). Supervision and Project Performance: A Principal-Agent Approach (Document de Travail No. 2015–04). Université Paris-Dauphine, Paris, France. https://basepub.dauphine.fr//handle/123456789/14817 Chetty, R., Friedman, J. N. and Rockoff, J. E. (2014). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates, The American Economic Review 104(9), 2593-2632. https://doi.org/10.1257/aer.104.9.2593 Clark, R., & Dolan, L. R. (2021). Pleasing the Principal: U.S. Influence in World Bank Policymaking. American Journal of Political Science, 65(1), 36–51. https://doi.org/10.1111/ajps.12531 Corral, L. R., & McCarthy, N. (2020). Organisational Efficiency or Bureaucratic Quagmire: Do Quality-At- Entry Assessments Improve Project Performance? The Journal of Development Studies, 56(4), 765–781. https://doi.org/10.1080/00220388.2018.1554210 Das, A., Friedman, J., & Kandpal, E. (2018). Does involvement of local NGOs enhance public service delivery? Cautionary evidence from a malaria-prevention program in India. Health economics, 27(1), 172–188. https://doi.org/10.1002/hec.3529 Deaton, A., & Cartwright, N. (2018). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine (1982), 210, 2–21. https://doi.org/10.1016/j.socscimed.2017.12.005 Deininger, K., Squire, L., & Basu, S. (1998). Does Economic Analysis Improve the Quality of Foreign Assistance? The World Bank Economic Review, 12(3), 385–418. http://www.jstor.org/stable/3990181 Denizer, C., Kaufmann, D., & Kraay, A. (2013). Good countries or good projects? Macro and micro correlates of World Bank project performance. Journal of Development Economics, 105, 288–302. https://doi.org/10.1016/j.jdeveco.2013.06.003 Dreher, A., Nunnenkamp, P., & Thiele, R. (2008). Does Aid for Education Educate Children? Evidence from Panel Data. The World Bank Economic Review, 22(2), 291–314. https://doi.org/10.1093/wber/lhn003 Dreher, A., Klasen, S., Vreeland, J. R., & Werker, E. (2013). The Costs of Favoritism: Is Politically Driven Aid Less Effective? Economic Development and Cultural Change, 62(1), 157–191. https://doi.org/10.1086/671711 Feeny, S., & Vuong, V. (2017). Explaining Aid Project and Program Success: Findings from Asian Development Bank Interventions. World Development, 90, 329–343. https://doi.org/10.1016/j.worlddev.2016.10.009 Gehring, K., Michaelowa, K., Dreher, A., & Spörri, F. (2017). Aid Fragmentation and Effectiveness: What Do We Really Know? World Development, 99, 320–334. https://doi.org/10.1016/j.worlddev.2017.05.019 Geli, P., Kraay, A., & Nobakht, H. (2014). Predicting World Bank Project Outcome Ratings (Policy Research Working Paper No. 7001). World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-7001 Guillaumont, P., & Laajaj, R. (2006). When instability increases the effectiveness of aid projects (Policy Research Working Paper No. 4034). World Bank, Washington, DC. https://documents.worldbank.org/en/publication/documents- reports/documentdetail/419101468165570868/When-instability-increases-the-effectiveness-of-aid- projects 21 Hanson, J. K., & Sigman, R. (2019). State Capacity and World Bank Project Success. [Conference presentation paper] Annual meeting of the American Political Science Association, Washington, DC. http://hdl.handle.net/10945/64991 Hanushek, E. (1971), Teacher characteristics and gains in student achievement: Estimation using micro data, The American Economic Review 61(2), 280-288. http://www.jstor.org/stable/1817003 Honig, D. (2020). Information, power, and location: World Bank staff decentralization and aid project success. Governance, 33(4), 749–769. https://doi.org/10.1111/gove.12493 Independent Evaluation Group. (2016). World Bank Group Country Engagement: An Early-Stage Assessment of the Systematic Country Diagnostic and Country Partnership Framework Process and Implementation. World Bank, Washington, DC. https://openknowledge.worldbank.org/handle/10986/32123 Isham, J., Kaufmann, D., & Pritchett, L. H. (1997). Civil Liberties, Democracy, and the Performance of Government Projects. The World Bank Economic Review, 11(2), 219–242. https://doi.org/10.1093/wber/11.2.219 Kersting, E. K., & Kilby, C. (2016). With a little help from my friends: Global electioneering and World Bank lending. Journal of Development Economics, 121, 153–165. https://doi.org/10.1016/j.jdeveco.2016.03.010 Kilby, C. (2000). Supervision and performance: The case of World Bank projects. Journal of Development Economics, 62(1), 233–259. https://doi.org/10.1016/S0304-3878(00)00082-1 Kilby, C. (2001). World Bank-Borrower Relations and Project Supervision. Canadian Journal of Development Studies / Revue Canadienne d’études Du Développement, 22(1), 191–218. https://doi.org/10.1080/02255189.2001.9668807 Kilby, C. (2013). An Empirical Assessment of Informal Influence in the World Bank. Economic Development and Cultural Change, 61(2), 431–464. https://doi.org/10.1086/668278 Kilby, C. (2015). Assessing the impact of World Bank preparation on project outcomes. Journal of Development Economics, 115, 111–123. https://doi.org/10.1016/j.jdeveco.2015.02.005 Limodio, N. (2011). The success of infrastructure projects in low-income countries and the role of selectivity (Policy Research Working Paper No. 5694). World Bank, Washington, DC. https://documents.worldbank.org/en/publication/documents- reports/documentdetail/857331468176676870/The-success-of-infrastructure-projects-in-low-income- countries-and-the-role-of-selectivity Limodio, N. (2021). Bureaucrat Allocation in the Public Sector: Evidence from the World Bank. The Economic Journal. https://doi.org/10.1093/ej/ueab008 Marty, R., Dolan, C. B., Leu, M., & Runfola, D. (2017). Taking the health aid debate to the subnational level: The impact and allocation of foreign health aid in Malawi. BMJ Global Health, 2(1), e000129. https://doi.org/10.1136/bmjgh-2016-000129 Moll, P., Geli, P., & Saavedra, P. (2015). Correlates of Success in World Bank Development Policy Lending (Policy Research Working Paper No. 7181). World Bank, Washington, DC. https://openknowledge.worldbank.org/handle/10986/21471 Mubila, M., Lufumpa, C., & Kayizzi-Mugerwa, S. (2000). A Statistical Analysis of Determinants of Project Success: Examples from the African Development Bank (Economic Research Paper No. 56). African 22 Development Bank, Abidjan, Côte d’Ivoire. https://www.afdb.org/fr/documents/document/working- paper-56-a-statistical-analysis-of-determinants-of-project-success-examples-from-the-african- development-bank-9020 Odokonyero, T., Marty, R., Muhumuza, T., Ijjo, A. T., & Moses, G. O. (2018). The impact of aid on health outcomes in Uganda. Health Economics, 27(4), 733–745. https://doi.org/10.1002/hec.3632 OECD. (2019). Better Criteria for Better Evaluation: Revised Evaluation Criteria Definitions and Principles for Use. OECD/DAC Network on Development Evaluation. https://www.oecd.org/dac/evaluation/revised-evaluation-criteria-dec-2019.pdf Riddell, A., & Niño-Zarazúa, M. (2016). The effectiveness of foreign aid to education: What can be learned? International Journal of Educational Development, 48, 23–36. https://doi.org/10.1016/j.ijedudev.2015.11.013 Shin, W., Kim, Y., & Sohn, H.-S. (2017). Do Different Implementing Partnerships Lead to Different Project Outcomes? Evidence from the World Bank Project-Level Evaluation Data. World Development, 95, 268– 284. https://doi.org/10.1016/j.worlddev.2017.02.033 Smets, L., Knack, S., & Molenaers, N. (2013). Political ideology, quality at entry and the success of economic reform programs. The Review of International Organizations, 8(4), 447–476. https://doi.org/10.1007/s11558-013-9164-2 Wane, W. (2004). The Quality of Foreign Aid: Country Selectivity or Donors Incentives? (Policy Research Working Paper No. 3325). World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-3325 Williams, M. J. (2020). External Validity and Policy Adaptation: From Impact Evaluation to Policy Design. The World Bank Research Observer, 35(2), 158–191. https://doi.org/10.1093/wbro/lky010 Winters, M. S. (2019). Too Many Cooks in the Kitchen? The Division of Financing in World Bank Projects and Project Performance. Politics and Governance, 7(2), 117–126. https://doi.org/10.17645/pag.v7i2.1826 World Bank. (2011). A Guide to the World Bank: Third Edition. World Bank, Washington, DC. https://openknowledge.worldbank.org/handle/10986/2342 World Bank Group. (2014). World Bank Group Guidance for Country Partnership Framework Products. Washington, DC: World Bank Group. Wood, T., Otor, S., & Dornan, M. (2020). Australian aid projects: What works, where projects work and how Australia compares. Asia & the Pacific Policy Studies, 7(2), 171–186. https://doi.org/10.1002/app5.300 23 Table 1. Development effectiveness: contrasting levels of analysis, outcomes investigated, and explanatory factors investigated Explanatory factors interacting with Level of analysis Typical outcome development assistance Country Growth outcomes Institutional quality, political/economic stability Sector Sectoral outcomes Institutional quality, familiarity with donor (health, education etc.) requirements, coherence with country strategy Project Evaluation rating of project; Design/supervision quality (partly proxied by rate of disbursement; internal team composition); country and project rate of return characteristics Intervention Outcome directly or indirectly Design/content of intervention, targeted by intervention population characteristics Note: this conceptual table provides a high-level summary of the literature. See Annex 1 for a descriptive list of recent studies at the project level. 24 Table 2. Summary Statistics for Project Characteristics in Analytical and Qualitative Data Sets Analytical Data Set Qualitative Data Set (N = 2,845 Projects) (N = 120 Projects) in in $ in in $ Panel A – Distribution of projects projects value projects value Share by IEG outcome rating Highly unsatisfactory 0.01 0.01 0.05 0.02 Unsatisfactory 0.11 0.09 0.12 0.12 Moderately unsatisfactory 0.15 0.13 0.33 0.31 Moderately satisfactory 0.37 0.36 0.29 0.36 Satisfactory 0.32 0.36 0.18 0.18 Highly satisfactory 0.03 0.04 0.03 0.01 Share by region East Asia and Pacific 0.16 0.24 0.18 0.23 Europe and Central Asia 0.22 0.15 0.23 0.17 Latin America and the Caribbean 0.20 0.21 0.18 0.26 Middle East and North Africa 0.06 0.05 0.04 0.04 South Asia 0.11 0.20 0.12 0.15 Sub-Saharan Africa 0.25 0.15 0.26 0.15 Share by sector Equitable Growth, Finance and Institutions (EFI) 0.17 0.11 0.20 0.11 Human Development (HD) 0.28 0.26 0.27 0.35 Sustainable Development (SD) 0.56 0.63 0.53 0.54 Share by lending type International Development Association (IDA) 0.56 0.36 0.61 0.40 International Bank for Reconstruction and 0.44 0.64 0.39 0.60 Development (IBRD) Analytical Data Set Qualitative Data Set (N = 2,845 Projects) (N = 120 Projects) Panel B - Project and Country Characteristics Mean S.D. Mean S.D. Commitment amounts Original [constant 2015 USD millions] 89.6 125.7 81.3 142.0 Net [constant 2015 USD millions] 87.3 141.0 77.6 147.1 Project duration Original length [years] 5.05 1.30 4.96 1.27 Actual length [years] 6.75 2.00 6.71 2.05 Country-level economic indicators GDPpc at approval year [PPP 2015 USD] 5,625 4,601 6,144 5,106 Real GDPpc growth over project lifetime 3.83% 4.77% 4.10% 3.36% Country Policy and Institutional Assessment (CPIA) CPIA rating at approval year [1 low - 6 high] 3.51 0.48 3.51 0.55 CPIA change over project lifetime 0.15 0.38 0.13 0.36 Note: The analytical data set of 2,845 projects approved between Fiscal Years 1995 and 2009 informs staffing and country analysis, while the deep dive data set - a smaller stratified sample of 120 projects - informs analysis on design related factors and project performance. The sectors listed correspond to the 3 WB practice groups as of FY2018, in which the global practices were nested as follows: (EFI) Finance and Markets; Governance; Macro, Economics and Fiscal Management; Poverty & Equity. (HD) Education; Health, Nutrition & Population; and Social Protection and Labor. (SD) Agriculture and Rural Development; Energy and Extractives; Environment and Natural Resources; Social, Urban, Rural and Resilience; Transport and ICT; Water. 25 Table 3. Decomposition of the variance of IEG outcome ratings Mean Adjusted Sum F- Variables R-Squared of Squares Statistic Prob(>F) N Evaluator 0.044 1.630 1.472 0.000 2,809 Country 0.071 2.436 2.625 0.000 2,809 Country and Year (approval) 0.072 2.309 2.490 0.000 2,809 Country x Year 0.093 1.125 1.242 0.000 2,809 Practice 0.002 1.378 1.382 0.160 2,809 Country, Year and Practice 0.073 2.208 2.383 0.000 2,809 TTL at preparation 0.125 1.115 1.277 0.000 2,809 TTL at midpoint 0.140 1.113 1.301 0.000 2,797 TTL at endpoint 0.144 1.107 1.301 0.000 2,794 TTL at prep and TTL at endpoint 0.287 1.026 1.499 0.000 2,794 Practice Manager at preparation 0.056 1.316 1.400 0.000 1,837 Country Director at preparation 0.037 1.571 1.632 0.000 2,806 Note: Table reports different estimates from approaches to variance decomposition. Table 4. Relationship between country-level characteristics and IEG outcome rating Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) CPIA rating at 0.159** 0.407*** 0.387*** approval year (0.072) (0.079) (0.093) CPIA change over 0.316*** 0.580*** 0.530*** project lifetime (0.083) (0.107) (0.118) Log(GDPpc PPP$ -0.072 -0.051 -0.105* at approval year) (0.057) (0.056) (0.053) Real GDPpc 1.549** 1.586** 1.396 growth over project lifetime (0.740) (0.734) (0.891) 2.707*** 3.396*** 2.056*** 3.431*** 3.270*** 3.323*** 2.723*** Constant (0.394) (0.271) (0.338) (0.454) (0.313) (0.447) (0.463) Observations 2,790 2,767 2,767 2,735 2,665 2,580 2,516 Adjusted R- 0.079 0.086 0.105 0.071 0.072 0.069 0.093 squared Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project and IEG evaluator fixed effects. 26 Table 5. Relationship between staffing predicted performance and IEG outcome rating Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) (8) 0.085*** 0.044 0.050 0.067 Preparation TTL (0.025) (0.030) (0.038) (0.041) 0.144*** 0.039 -0.009 -0.003 1st half supervision TTL (0.028) (0.039) (0.050) (0.052) 0.162*** 0.126*** 0.131** 0.130** 2nd half supervision TTL (0.028) (0.038) (0.051) (0.054) 0.054 0.031 0.075 Practice Manager1 (0.042) (0.049) (0.055) 0.013 -0.092 Country Director1 (0.042) (0.060) 2.427*** 2.216*** 2.263*** 2.691*** 1.875*** 2.567*** 1.814*** 2.395*** Constant (0.339) (0.303) (0.306) (0.348) (0.376) (0.488) (0.276) (0.527) Observations 2,038 2,560 2,513 1,946 1,723 1,222 2,540 1,129 Adjusted R-squared 0.122 0.116 0.115 0.130 0.124 0.134 0.107 0.149 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. (1) Practice Manager and Country Director at project preparation. 27 Table 6. Relationship between Task Team Leader (TTL) turnover and IEG outcome rating Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) Dummy for preparation TTL change -0.053 -0.018 in the 1st or 2nd ISR (0.044) (0.045) TTL turnover during first half (Number of -0.412*** -0.309* TTLs/Number of ISRs) (0.156) (0.166) TTL turnover during second half (Number of -0.579*** -0.575*** TTLs/Number of ISRs) (0.171) (0.166) 2.066*** 2.240*** 2.471*** 2.603*** Constant (0.330) (0.331) (0.365) (0.354) Observations 2,763 2,762 2,757 2,752 Adjusted R-squared 0.105 0.107 0.108 0.110 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Table 7. Relationship between staff experience, staff predicted performance, and IEG outcome rating Dependent variable: Dependent variable: Staff predicted performance IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) Number of projects prepared before 0.068*** -0.012 by the preparation TTL (0.014) (0.010) (Log) Number of ISRs signed before 0.062*** 0.010 by the mid-point supervision TTL (0.019) (0.021) (Log) Number of ISRs signed before 0.058*** 0.042* by the final supervision TTL (0.019) (0.023) -0.105*** -0.184*** -0.230*** 2.051*** 2.030*** 1.978*** Constant (0.037) (0.056) (0.059) (0.344) (0.342) (0.324) Observations 2,038 2,560 2,513 2,767 2,762 2,757 Adjusted R-squared 0.011 0.004 0.003 0.105 0.105 0.104 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS. Columns (1) (2) (3) do not include any controls. Columns (4) (5) (6) include controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. 28 Table 8. Relationship between project design and IEG outcome rating Dependent variable: IEG 1-6 outcome rating Design index components (1) (2) (3) (4) (5) (6) (7) Number of project -0.162** components (0.071) Number of project -0.024 sub-components (0.016) Number of PDO indicators at -0.038 entry (0.024) Number of intermediate -0.030*** indicators at entry (0.011) PDO rating 0.024 (1 high - 4 low) (0.155) Results framework rating (1 -0.226* high – 4 low) (0.128) Extent of changes to project -0.290** components (0.133) Constant 2.985*** 2.565*** 2.640*** 2.768*** 2.335** 3.159*** 3.189*** (0.810) (0.786) (0.793) (0.772) (0.912) (0.886) (0.850) Observations 120 120 120 120 120 120 120 Adjusted R-squared 0.09 0.067 0.067 0.110 0.046 0.073 0.086 Dependent variable: IEG 1-6 outcome rating (8) (9) (10) (11) 0.223*** Design index (0.071) Project Characteristics Index 0.121** for comparison1 (0.057) CPIA at Approval 0.300 (0.195) Change in CPIA over Project -0.294 Lifetime (0.261) Observations 120 120 120 120 Adjusted R-squared 0.126 0.084 0.062 0.053 Note: Explanatory Power of Design index = 5.54, Project Characteristics = 1.14 and Country Characteristics 2.28. This table provides estimates of IEG Outcome Ratings regressed on common design features and quality ratings. Using Principal Component Analysis , an Index is created for Design based on these features and ratings. (1) For comparison, an Index is created for Project Characteristics based on the log of original commitment, preparation cost, first and second half supervision cost, the change in project size from approval to closing, and the time it takes projects to move from one key milestone in the project lifecycle to another. The table also provides the estimates of regressing IEG Outcome on two country variables, CPIA score of project country at approval and the change in the score over the project lifetime. 29 Table 9. Relationship between ESW/ASA and IEG outcome rating by project group Dependent variable: IEG 1-6 outcome rating By lending source By country group2 Overall IBRD IDA FCV UMIC Dummy for any ESW/ASA in 0.120** 0.065 0.145** 0.182 -0.012 the corresponding GP (0.051) (0.095) (0.059) (0.136) (0.105) within 3 years before project approval1 2.346*** 2.197*** 2.453*** 2.184*** 3.555*** Constant (0.278) (0.462) (0.369) (0.743) (0.708) Observations 2,374 1,031 1,343 301 324 Adjusted R-squared 0.081 0.093 0.077 0.035 0.034 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS and controlling for region, sector, CPIA score of project country at approval, change in the CPIA score over project lifetime, original commitment amount of project and the exit year. Sub-sample of projects within exit fiscal year 2004-2016. Specific project groups displayed in the first row consisting of: International Bank for Reconstruction and Development (IBRD); International Development Association (IDA); fragile, conflict and violence settings (FCV); upper-middle income countries (UMIC); Equitable Growth, Finance and Institutions (EFI); Human Development (HD); Sustainable Development (SD). (1) Considers only Economic and Sector Work/Analytical and Advisory Services (ESW/ASA) work executed by the same Global Practice (GP) for a given country. (2) Country Groups are not mutually exclusive nor collectively exhaustive, with FCV and UMIC selected only for reference. Designation as FCV is determined by project country being on the WB Harmonized List of Fragile Situations in Fiscal Year of Approval. 30 Annex 1 – Literature Review In Section 2 (What do we know so far?), we reviewed the literature of factors associated with aid effectiveness at the project level. All the studies identified in the systematic search are listed in Table A1.1. below. For the subset of studies more closely related to this one, we report further details in Table A1.2, including an overview of the population, the methods and variables in the studies reviewed on the determinants of development- projects success. The full reference of studies not included in the main text literature review can be found at the end of this Annex. Table A1.1 Complete list of studies reviewed on the determinants of development-projects success Included in Reference Literature Question of interest Main findings Review What are the mechanisms that explain how leadership affects Results indicate that team building partially mediates the effect of transformational Aga et al., 2016 No NGO-led projects success in Ethiopia? leadership on project success. Ahsan and Gunawan, What are the correlations between Asian Development Bank No project costs, delay and outcome ratings? Most late projects experience cost underrun, which seems linked to poor performance. 2010 Most of the variance in IDB project disbursements is found within country. Project team Yes (Table leader plays an important role in understanding project performance, although the effect What drives project disbursement performance at the Inter- Álvarez et al., 2012 A1.2 and of the project team leader's location on disbursement performance has not been American Development Bank? Text) constant over time. Project team leader characteristics related to experience or seniority are not significantly associated with project performance. Meta-analysis of the literature that empirically test the The authors find no consistent evidence that either bilateral or multilateral aid is more Biscaye et al., 2017 No associations between bilateral and multilateral aid flows and effective overall. various development outcomes at the country-level Projects with lower ISR ratings have a higher risk of being rated unsatisfactory at exit. For Review the efficacy of existing World Bank monitoring systems projects that do not have low ISR ratings, having at least three of six Bank monitoring Yes (Text (ISRs) to provide warnings about the potential of an Blanc et al., 2016 flags (project management, procurement, M&E, safeguards, counterpart funding and only) investment project in the East Asia and Pacific region failing to financial management) rated as low is also predictive of an unsatisfactory IEG outcome deliver its intended results rating. Which key country context, reform content, process and Yes (Table Political economy factors are key predictors of how the World Bank’s PSM projects project management characteristics predict the performance Blum, 2014 A1.2 and of World Bank Public Sector Management (PSM) projects and perform. These factors distinctly predict PSM project performance vs. non-PSM project Text) performance, suggesting PSM projects are particularly sensitive to the political context. do they have differential effects on PSM vs. non-PSM projects? Similar factors seem to drive project success across donors. For all donors, country-level factors end up explaining only about 15% of the variation in project outcomes. All but Yes (Table Do the findings on development project success that focus on one of the donors with a large sample of projects in the database show a negative Briggs, 2020 A1.2 and World Bank data generalize across multiple donors? relationship between project duration and outcome ratings. All donors found the same Text) country-time periods relatively easy or hard, and had more successfully rated projects in recipient countries with faster growth. 31 Country-level characteristics explain only 10-25% of project success, indicating an Yes (Table What is the relative importance of country vs. project important role for project-specific factors in understanding project outcomes. Shorter Bulman et al., 2017 A1.2 and characteristics in driving aid effectiveness at the WBG and project duration, the presence of additional financing and the track record of the project Text) ADB? manager in delivering successful projects are highly significantly correlated with project outcomes. Findings do not significantly differ between the two institutions. Does the influence of previously identified potential predictors Three predictor variables had a robust relationship with outcome ratings: the Yes (Table Buntaine and Parks, of successful outcomes in environmentally focused assistance government effectiveness index and the IEG rating for quality of supervision showed a A1.2 and projects carry over to environmentally focused World Bank strong and positive relationship; a project focus on preventing climate change or 2013 Text) projects’ outcome ratings? protecting biodiversity (“global outcomes”) had a negative relationship Yes (Table Using a large data set of aid projects funded by seven large donors over 190-2012, the Caselli and Presbitero, A1.2 and How predictable and effective are aid inflows to fragile states? authors find that a project implemented in a fragile state is about 8 percentage points 2021 less likely to be successful than a similar project financed in another developing country. Text) What is the probability that World Bank projects will be Yes (Table successful depending on a set of projects and country The probability that World Bank projects will be successful depending on a set of projects Chauvet et al., 2010 A1.2 and characteristics, including detailed factors relating to the and country characteristics, and factors relating to the history of civil war in countries. Text) history of civil war in countries? Supervision is differentially effective in improving the performance of projects where donor and recipient country interests are least congruent. However, donors do not Yes (Table Is the impact of supervision on project success related to the accordingly expend greater effort in supervision in situations of widely divergent Chauvet et al., 2013 A1.2 and degree of divergence of interests between the donor and the interests, suggesting that either there are offsetting costs of supervision, or incentives Text) recipient country? facing donor management to allocate administrative budgets are not well-aligned with the objective of project success. Using a data set on the conditions associated with World Bank policy loans, the authors How do policies in international organizations reflect the Yes (Text find that borrower countries that vote with the United States at the United Nations are Clark and Dolan, 2021 preferences of powerful institutional stakeholders? Examining only) evidence of US influence in WB policy loans. required to enact fewer domestic policy reforms, and on fewer and softer issue areas (prior action data set for DPLs). Yes (Table In the IDB, quality-at-entry project assessment - especially higher scores on project logic Corral and McCarthy, Do quality-at-entry assessments enhance the delivery of A1.2 and and economic analysis - is correlated with implementation performance. However, 2020 development projects at the IDB? monitoring and impact assessment scores had limited impacts on performance. Text) The article sets out an idealized model of decision-making in which a country manager Yes (Table Does economic and sector work (ESW) improve the quality of makes an allocation of resources between lending services and economic and sector Deininger et al., 1998 A1.2 and World Bank loans? How do managers decide on staff time work. ESW has a significant positive impact on the quality of World Bank loans, but Text) allocation trade-offs? country managers choose to allocate resources in favor of preparation at the expense of ESW due to disbursement concerns. Country level “macro” measures of the quality of policies and institutions are strongly correlated with project outcome, however, the success of individual development projects varies much more within countries than it does between (80-20). Project size, Yes (Table To investigate the relative importance of country-level project length, the effort devoted to project preparation and supervision, and early- Denizer et al., 2013 A1.2 and “macro” factors and project-level “micro” factors in driving warning indicators that flag problematic projects during the implementation stage, Text) project-level outcomes. accounts for some of this within-country variation. Measures of World Bank project manager quality also matter significantly for the ultimate project outcomes. Excessive TTL turnover worsens project outcomes. 32 Existence of high-quality institutions in a recipient country raises the probability of What are the effects of country-specific institutional and Dollar and Levin, 2005 No policy indicators on project-level success? project success. The study considers for each country the average of outcomes of projects in the country rather than each project individually. The analysis suggests that higher per capita aid for education significantly increases To investigate education-specific aid effectiveness in terms of Dreher et al., 2008 No achieving universal primary school enrollment. primary school enrollment, while increased domestic government spending on education does not. Politics affect World Bank project performance only in exceptional cases of high political Yes (Table How might political favoritism negatively influence the impact stakes and challenging economic circumstances. The authors find no significant average Dreher et al., 2013 A1.2 and of foreign aid in general and the performance of World Bank effect of political influence on the likelihood of a successful project evaluation, but when Text) projects in particular? examining projects granted to UNSC members experiencing economic mismanagement or vulnerability, there is evidence that project performance suffers. To investigate the degree of leeway donors of foreign aid grant to recipient governments when their preferences over The authors propose a model of donor decision and test it using data dyadic data for 28 Dreher et al., 2017 No how to implement the aid are different, and both the donor bilateral aid donors, considering "centralized" aid as project aid and budget aid as and recipient possess some private information about the "decentralized" aid. Their empirical findings are aligned with the model. most effective policies. To investigate whether foreign aid from China is prone to political capture in aid-receiving countries. Specifically, the Foreign aid from China is prone to political capture in Africa (more allocated to the birth Dreher et al., 2019 No authors examine whether more Chinese aid is allocated to the regions of political leaders when incumbents face upcoming elections), but such pattern birth regions of political leaders, controlling for indicators of is not observed in World Bank projects. need and various fixed effects. To investigate whether foreign aid from China is effectiveness Chinese aid boosts short-term economic growth, with no evidence of impairing the Dreher et al., 2020 No in boosting economic growth and whether it impairs the overall effectiveness of aid from Western donors. overall effectiveness of aid from Western donors. Chinese aid improves local development outcomes - as measured by per-capita nighttime Is favoritism and political capture a threat to Chinese aid Dreher et al., 2021 No light emissions - , regardless of whether such aid is given to politically favored effectiveness in Africa? jurisdictions. The paper tested both macro (country-level) and micro (project-level) determinants. The rate of per capita income growth has a positive association with project outcomes whilst What macro and micro determinants are relevant for ADB Yes (Table probability of success appears higher in less democratic countries. Projects are found to Feeny and Vuong, project and program success, and to what extent is the A1.2 and have a higher probability of success than programs, with project size positively 2017 variation in project and program success determined by associated with final outcomes. The authors also find evidence of capacity constraints Text) project-level versus country-level factors? reducing the probability of success since interventions that received less funding than anticipated are less likely to be successful. Aid fragmentation has mixed effects. In education, donor concentration appears to be How does aid fragmentation affects its effectiveness (country- Gehring et al., 2017 No detrimental. For economic growth, only if fragmentation is conceptualized as a lack of level outcomes)? lead donors is a negative effect found. Project size, preparation time, initially planned project length, a TTL’s track record and Yes (Table Can information on WB project characteristics, available early country policy performance create a model more capable of predicting final WB project Geli et al., 2014 A1.2 and in project implementation, predict final project outcome outcomes than implementation progress report ratings, with TTL track record and Text) ratings? country policy performance being the most significant correlates of project outcomes. 33 The results show that internal evaluations focused more on micro- and meso-level Goyal and Howlett, By how much and how do internal and external evaluations characteristics, while external evaluations laid more emphasis on meso- and macro-level No 2019 from ADB projects differ? constructs, such as dimensions of policy and the institutional environment in the recipient country, or its level and rate of economic growth. Yes (Table Does aid effectiveness increase when receiving countries face Guillaumont and During economic vulnerability, as the total amount of aid increases, project effectiveness A1.2 and exogenous shocks and does it decrease when the total amount Laajaj, 2006 of aid increases? decreases, likely due to absorptive capacity limitations. Text) Using a new measure of state capacity with continuous coverage from 1960-2015 from Yes (Table Hanson and Sigman (2013), the authors find that projects are most likely to succeed Hanson and Sigman, In what types of institutional environments are international A1.2 and development projects most likely to succeed? where state capacity is relatively high, regardless of regime type or Freedom House 2019 scores. In turn, successful World Bank projects can have a positive impact on state Text) capacity. To investigate whether World Bank conditionality in DPLs is Empirical results indicate that the World Bank delivers loans with significantly fewer affected by the presence of ‘‘new” donors (such as China) by Hernandez, 2017 No using panel data for 54 African countries over the 1980–2013 conditions to recipient countries which are assisted by China. In fact, these receive 15% fewer conditions for every percentage-point increase in Chinese aid. period. Merely placing WB staff in developing countries has little effect on the success of development projects. However, in the most fragile states, the presence of senior Yes (Table To investigate the relationship between staff presence in personnel (WB Country Director) is associated with greater project success after the Honig, 2020 A1.2 and recipient countries and aid project performance. "Strategic Compact" increased CD's power. In short, the impact of WB staff Text) decentralization is mixed and appears to be driven primarily by the power of senior personnel in the field, not the ability of field staff to gather local information. There is a statistically significant and positive relationship between project success and Yes (Text What is the relationship between critical success factors (CSFs) each of the five CSFs: monitoring; coordination; design; training; and institutional Ika et al., 2012 only) for World Bank projects and projects’ success? environment. The most prominent CSFs for project supervisors are design and monitoring. Yes (Table There is a strong statistical link between a country’s civil liberties and WB project Is there a link between WB project success and country Isham et al., 1997 A1.2 and performance, however type of political regime and the status of more purely political governance indicators? Text) liberties do not appear to significantly affect project performance. Investigates how WB lending responds to upcoming elections Kersting and Kilby, Yes (Text in borrowing countries, according to U.S. foreign policy Projects in geopolitically important countries and are disbursed faster before elections. 2016 only) interests Early supervision (measured by number of staff weeks spent on supervision) has a Yes (Table positive impact on performance. Supervision is most effective early in implementation What is the impact of donor supervision on WB development Kilby, 2000 A1.2 and project performance? and in projects with smaller loans but has a relatively homogenous impact across regions Text) and sectors. An interaction term between supervision and loan amount indicates that the marginal impact of supervision falls as the size of the loan increases. Yes (Table How do agency problems between the World Bank and Evidence from WB projects suggests that agency problems have a substantial impact on Kilby, 2001 A1.2 and borrowing countries affect the implementation and project implementation, due to limits on supervision and the absence of direct WB Text) performance of WB projects? implementation. 34 To attempt to isolate post approval informal influence by Yes (Text The author finds small but positive and significant links between UN voting and WB Kilby, 2013 examining WB disbursements after controlling for prior disbursements of already approved loans, reflecting US informal influence. only) commitments What is the impact of World Bank preparation on the outcome of World Bank funded projects? What is the role of donor Yes (Table Predicted preparation duration has a positive and significant relationship with project influence in international financial institutions at different Kilby, 2015 A1.2 and stages in the resource transfer process? How does this explain outcomes. The effects are conditional on the degree of economic vulnerability and the Text) impact greater in countries experiencing debt problems. how donors impact the efficacy of international financial institutions? Success of infrastructure projects depends fundamentally on the quality of What is the causal link between the quality of WB project Yes (Text implementation. Although bad implementation can harm structurally solid projects, good Limodio, 2011 implementation (from the side of the borrower) and its only) outcome? implementation cannot make structurally weak projects successful. Governance and selection of well-designed projects are essential for success. Whether, and to what extent, World Bank managers (Task Yes (Table World Bank project team leaders’ performance indices and country effects both correlate Team Leaders at PAD) affect project success, and what Limodio, 2021 A1.2 and determines the assignment of a high-performing manager to a positively with project outcomes. There is evidence of negative assortative matching, Text) with high-performing managers assigned to low-performing countries. specific country? Congruence or “line of sight” between the policy reforms supported and the results Yes (Table Are there design elements and other factors of policy-based framework is critical for success. Task team leader skills in general, and staff affiliated Moll et al., 2015 A1.2 and loans which are tightly linked with success in achieving the with the former “Economic Policy” department of the World Bank also increase the odds Text) intended development outcomes? of success. A weaker set of reforms and reforms supported in the energy sector tend to reduce the chance of success. To assess macroeconomic determinants of project success at Rates of return at appraisal are weak indicators of future project success. A good policy the African Development Bank by looking at the relationship Yes (Text environment – proxied by rates of economic growth, inflation and a country’s level of Mubila et al., 2000 between project economic rates of return at appraisal and only) completion to determine whether they differ more in development are as important for project success as project characteristics such as size and sector. relatively more challenging environments. Yes (Table To test for the presence of absorptive capacity constraints in World Bank investment projects are less likely to be successful if implemented during Presbitero, 2016 A1.2 and developing countries, and its relationship to aid flows and periods of public investment booms, because of absorptive capacity constraints in Text) donor fragmentation recipient countries. Yes (Table Do different types of implementing partnerships (i.e., state or WB projects implemented by non-state actors are more likely to receive higher Shin et al., 2017 A1.2 and non-state agencies) affect the outcome of WB development effectiveness ratings Text) projects? Yes (Table Do World Bank employees exert higher effort in designing The quality at entry of an economic policy loan is significantly higher for governments Smets et al., 2013 A1.2 and programs when faced with governments with left-leaning with a left-wing party orientation and incumbent tenure is also positively associated with Text) economic ideologies? quality at entry, suggesting that WB staff put more effort into their design. Do World Bank project appraisal documents (PADs) of Outcomes on World Bank education projects are better when the quality of project education projects which are judged “good” have a higher Vawda et al., 2003 No probability of leading to successful project outcomes than appraisal is good. There is a strong relationship between the quality of cost-benefit and cost-effectiveness analysis the quality of project outcomes. projects for which the PADs are judged “poor”? 35 Design effects are a crucial component of quality. The quality of aid is endogenous to the Yes (Table To show that the design of WB projects is a key determinant of relationship between the donor agency and the recipient government. Highly capable Wane, 2004 A1.2 and its quality and hence effectiveness. and accountable governments only accept well-designed projects. The stock of ESW Text) deliveries at project entry has a positive impact on the quality of projects. Has the World Bank operated in such a way that it favors The author examines the impact of borrower governance characteristics on overall World programmatic lending in well-governed countries and project Bank aid flows and find that, controlling for other country characteristics, the Bank gives lending in poorly governed ones? And in the context of project more money to well-governed countries. Moving beyond this aggregate focus to look at Winters, 2010 No lending, does the World Bank use more targeted projects (for specific aid project decisions, it reports how the World Bank alters its lending strategies example, subnational projects) in countries that suffer from in terms of the modalities through which it gives aid, finding that well-governed poor governance? countries receive a larger proportion of their project lending as nationwide projects. Yes (Table How does the division of financing (number and concentration World Bank projects with any cofinancing and with less concentrated financing receive Winters, 2019 A1.2 and of financial collaborators in WB projects) relate to overall less positive evaluations. The estimated correlations, however, are generally small in size, Text) project effectiveness? conditioning on other project characteristics and using country and year fixed effects. Between 7%-13% of the total variation in Australian aid project outcomes is a result of country-level factors as opposed to attributes associated with projects, in line with other Yes (Table To investigate country-level “macro” factors and project-level donors. Larger projects tend to be evaluated as more effectiveness. The only country- Wood et al., 2020 A1.2 and “micro” factors in driving project-level outcomes for Australian level significant covariate was GDP per capita, with wealthier countries having more Text) aid and compare it to other donors. successful projects. Limited project-level characteristics available to test (size, duration and sector only). Notes: ‘Question of interest’ and ‘Main findings’ often quoted directly from the abstract. The full reference of studies not included in the main literature review are listed at the end of this Annex. 36 Table A1.2 Overview of population, methods and variables in the studies reviewed on the determinants of development-projects success Outcome Coefficients (2) Project Management Country-level Manager characteristics Economic rate of return Demographic indicators Other economic factors GDP per capita growth Ex-post success rating Other design features Conflict / Geopolitical Management inputs GDP per capita level Project Duration Early warnings Project Size Institutions Other Reference N Project population (1) Period Álvarez et al., 2012 1,049 IADB 1996-2011 . . X + . . – . – . . . . . . WB (public sector Blum, 2014 3,202 1990-2013 X . . ns ns . . . – . . . + . . management) Briggs, 2020 11,722 PPD 1956-2012 X . . . – . . . . . + . ns . . Bulman et al., 2017 5,119 ADB and WB 1995-2012 X . . + – . + . – . + . + . . Buntaine and Parks, 2013 157 WB (environment) 1994-2007 X . . ns . . . . – . . . + . . Caselli and Presbitero, 2021 14,000 PPD 1956-2012 X . . . ns . ns . . ns + – . . – Chauvet et al., 2010 6,404 WB 1961-2002 X . . . – . . + . + + . . ns – Chauvet et al., 2013 2,023 WB 1977-2002 X . . . + . . + . ns . . + . . Corral and McCarthy, 2020 497 IADB 2008-2014 . . X nr nr . . + – . . . . . . Deiniger et al., 1998 5,064 WB 1975-1995 X X . . . . . + ns . . – + . . Denizer et al., 2013 6,569 WB 1983-2011 X . . – ns + + + – . + . + . . Dreher et al., 2013 6,000 WB 1975-2008 X . . ns . . . . . ns . ns ns ns ns Feeny and Vuong, 2017 1,553 ADB 1970-2010 X . . + ns . . . – + + . – ns . Geli et al., 2014 2,729 WB (IPF) 1995-2012 X . . ns – . + . – . . . + . . Guillaumont and Laajaj, 2006 2,894 WB 1981-2002 X . . . . . . . . ns . – ns . . Hanson and Sigman, 2019 10,000 WB 1960-2015 X . . . – . . . . ns + – + ns . Honig, 2020 8,367 WB 1975-2005 X . . + . . + . . . . . . . . Isham et al., 1997 761 WB 1975-1993 X X . . . . . . . . . . + . . Kilby, 2000 1,426 WB 1981-1991 X . . + . . . + . ns + . ns . . Kilby, 2001 1,447 WB 1981-1991 X . . . – – . . . + + . . . . 37 Kilby, 2015 3,627 WB 1994-2010 X . . ns . . . + . + + . . + . Limodio, 2021 3,385 WB 1970-2012 X . . . . . . . . . . . . . . Moll et al., 2015 312 WB (DPL) 2004-2012 X . . ns . + . . . ns . . ns . . Presbitero, 2016 3,945 WB (IPF) 1970-2007 X . . ns . – . . . ns + – ns . . Shin et al., 2017 647 WB 2006-2014 X . . + ns + . . . . . . . . . Smets et al., 2013 182 WB (DPL macro) 1985-2008 X . . ns . . . + . ns . ns ns . + Wane, 2004 1,697 WB 1996-2002 X . . ns . . . + . . . . + . . Winters, 2019 2,024 WB 2000-2017 X . . + . – . . . . . . . . . Wood et al., 2020 456 AusA 2005-2019 X . . + ns . . . . + ns . ns . . Notes: (1) Asian Development Bank (ADB); Australian Government Aid (AusA); Inter-American Development Bank (IADB); Project Performance Database (PPD) which combines 7 international donors; and World Bank (WB). When a study analyzed only specific sectors or instruments - such as Investment Project Financing (IPF) or Development Policy Lending (DPL) - this is noted in parenthesis. (2) Coefficients coded as (+) (–) indicate the signal of a coefficient found to be significant at the 5 percent level; (ns) indicates the coefficient was not significant at the 5 percent level; (nr) indicates the factor was included as a control but its direction and/or significance were not reported. Those markers refer only to the main specification in the paper, according to the discussion of the results. 38 Additional References Aga, D. A., Noorderhaven, N., & Vallejo, B. (2016). Transformational leadership and project success: The mediating role of team-building. International Journal of Project Management, 34(5), 806–818. https://doi.org/10.1016/j.ijproman.2016.02.012 Ahsan, K., & Gunawan, I. (2010). Analysis of cost and schedule performance of international development projects. International Journal of Project Management, 28(1), 68–78. https://doi.org/10.1016/j.ijproman.2009.03.005 Biscaye, P. E., Reynolds, T. W., & Anderson, C. L. (2017). Relative Effectiveness of Bilateral and Multilateral Aid on Development Outcomes. Review of Development Economics, 21(4), 1425–1447. https://doi.org/10.1111/rode.12303 Dollar, D., & Levin, V. (2005). Sowing and Reaping: Institutional Quality and Project Outcomes in Developing Countries (Policy Research Working Paper No. 3524). World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-3524 Dreher, A., Nunnenkamp, P., & Thiele, R. (2008). Does Aid for Education Educate Children? Evidence from Panel Data. The World Bank Economic Review, 22(2), 291–314. https://doi.org/10.1093/wber/lhn003 Dreher, A., Langlotz, S., & Marchesi, S. (2017). Information Transmission and Ownership Consolidation in Aid Programs. Economic Inquiry, 55(4), 1671–1688. https://doi.org/10.1111/ecin.12450 Dreher, A., Fuchs, A., Hodler, R., Parks, B. C., Raschky, P. A., & Tierney, M. J. (2019). African leaders and the geography of China’s foreign assistance. Journal of Development Economics, 140, 44–71. https://doi.org/10.1016/j.jdeveco.2019.04.003 Dreher, A., Fuchs, A., Parks, B., Strange, A., & Tierney, M. J. (2020). Aid, China, and Growth: Evidence from a New Global Development Finance Dataset. American Economic Journal: Economic Policy. https://doi.org/10.1257/pol.20180631 Dreher, A., Fuchs, A., Hodler, R., Parks, B. C., Raschky, P. A., & Tierney, M. J. (2021). Is Favoritism a Threat to Chinese Aid Effectiveness? A Subnational Analysis of Chinese Development Projects. World Development, 139, 105291. https://doi.org/10.1016/j.worlddev.2020.105291 Gehring, K., Michaelowa, K., Dreher, A., & Spörri, F. (2017). Aid Fragmentation and Effectiveness: What Do We Really Know? World Development, 99, 320–334. https://doi.org/10.1016/j.worlddev.2017.05.019 Goyal, N., & Howlett, M. (2019). Combining internal and external evaluations within a multilevel evaluation framework: Computational text analysis of lessons from the Asian Development Bank. Evaluation, 25(3), 366–380. https://doi.org/10.1177/1356389019827035 Hernandez, D. (2017). Are “New” Donors Challenging World Bank Conditionality? World Development, 96, 529–549. https://doi.org/10.1016/j.worlddev.2017.03.035 Vawda, A. Y., Moock, P., Gittinger, J. P., & Patrinos, H. A. (2003). Economic analysis of World Bank education projects and project outcomes. International Journal of Educational Development, 23(6), 645– 660. https://doi.org/10.1016/S0738-0593(03)00100-7 Winters, M. S. (2010). Choosing to Target: What Types of Countries Get Different Types of World Bank Projects. World Politics, 62(3), 422–458. https://doi.org/10.1017/S0043887110000092 39 Annex 2 – Further details on data coverage and sourcing Despite initial data collection for 4,348 Investment Project Finance (IPF) projects closed between fiscal year (FY) 1995 and FY2017, the paper’s analysis concentrates only on those approved between FY1995 and FY2009. This interval minimizes both missingness in select measures, which is more prevalent in projects that were approved before FY1995, and censoring, which is an issue for projects approved after 2009 as many of them had yet to close or be rated at the time of data collection. Accordingly, the paper focuses more on relatively recent World Bank IPF lending, and the analytical data set comprises 90% of projects approved between FY1995 and FY2009, which is 89% of projects that exited between FY2001 and FY2015. 1 Table A2.1 summarizes the distribution of projects by fiscal year. As this paper has as a central objective, the intention to focus further on the design and preparation phase of the project cycle, an extensive data collection exercise was required in order to capture elements of the preparation task team not systematically stored in the Bank’s centralized data systems, and match them to their corresponding projects. By design, the World Bank’s administrative and project-level data systems store several fields of information in its current format only, overwriting historical data. This includes the TTL who prepares and takes a project to Board, and their Practice Manager and Country Director at that time. For example, if, as is often the case, the project’s Practice Manager at closing is not the same as the project’s Practice Manager at entry, then system data cannot be used as it will only provide information for the last known Practice Manager. To circumvent this issue, data collection turned to text scraping to avoid a manual data collection of 4,348 projects’ preparation staffing teams from the projects’ Project Appraisal Documents (PADs). Doing so clearly proved more efficient than manual collection, however it was not without its own set of challenges. While uncommon, sometimes the challenge was simply that a project’s PAD could not always be found in electronic format in the Bank’s main document directories. This occurred more frequently for older projects whose preparation documents were scanned into the system from hard copy. A number of different templates that the document has used over the years increased the programming variations necessary to text read and locate the correct staffing information. Not only the changing layouts of documents caused problems, but also changing tendencies as to who is listed in a PAD as part of the preparation team. For example, a majority of PADs listed the Country Manager, a position that reports to the Country Director. Information for the latter was eventually sourced from the World Bank’s service “Ask an Archivist.” Sector Manager as opposed to the more elevated Sector Director uncovered a separate set of issues. Quality checks of the data collated by text scraping found cases where the person listed as Sector Manager for the project had in fact, at that time, been in the role of Sector Director, and the potential effect of these positions on project quality are not necessarily interchangeable. Therefore, first, Sector Manager at approval data had to be collected manually, and then cross-checked with HR historical reports to ensure the name collected had the title of Sector Manager and not Sector Director. Given duplication and the number of ways in which staff may choose to have their names recorded (full name on HR’s staffing record, but only first and last name on PAD – for example), the most reliable identifier is a staff’s Unique Personal Identification number (UPI), which the data set required in order to 1 These shares are determined as of July 2017 when the data set was created. The missing 10% of the universe of IPF lending approvals between FY1995 and 2009 is mainly due to projects that while approved, were cancelled within permitted timeframes and without sufficient disbursement to warrant an IEG evaluation. 40 link project staff to key characteristic data. This required significant time to work through the cases where names scraped from PADs were not immediately located in HR reports, or in some cases more than one UPI was associated with a name, and therefore it was necessary to investigate further through the Bank’s internal staff directory “PeopleFinder” to match the person with their project. Moving forward, more reliable population of these internal staff directory pages and access to the back-end data is one possible circumventing solution to the issue of the Bank’s project-level systems not storing historical data on a project’s TTL. Staffing characteristics such as education and age are sensitive and confidential. These were provided by HR reports with the corresponding restrictions and permissions. TTL location at preparation and supervision had to be sourced from historical end-of year snapshots of the active staff directory provided by HR. Table A2.2 provides breakdowns for where each measure was sourced, and for how many projects in the full data set the information was available. While most variables have good coverage across the full data set, certain staffing information does not. For example, in addition to the TTL, it would be of interest to assess other team members’ contributions, but that information is not available for the preparation phase. More sensitive data for senior staff members was also not secured. Notwithstanding legitimate data privacy and confidentiality concerns, there are ways in which this data could be stored and shared for projects in an anonymous fashion, and this should be considered by the institution for future purposes. Table A2.1 Coverage of analytical data set World Bank IPF Lending Analytical Data Set Approval Fiscal Year (N = 3,151 Projects) (N = 2,845 Projects) FY95 221 213 FY96 228 221 FY97 213 204 FY98 252 243 FY99 244 226 FY00 205 194 FY01 205 181 FY02 192 181 FY03 208 186 FY04 210 192 FY05 228 198 FY06 220 200 FY07 192 164 FY08 186 145 FY09 147 97 Analytical Data Set as a 90% share of WB IPF Lending 41 Table A2.2 Data sources and availability of project level staffing variables Full Data Set Analytical Data Set (N = 4,348 Projects) (N = 2,845 Projects) Panel A - Data Sources Frequency Percent Frequency Percent For Preparation Task Team Leader (At Approval) Data Scraping from PADs 1,740 52 1,696 60 Manually Collected 929 28 538 19 TTL as Listed on 1st ISR 610 18 541 19 Staff time charge code data 76 2 65 2 For Mid-Point Supervision Task Team Leader (Mid- point ISR) ISR Reports (stored on internal database) 3,649 100 2,833 100 For Closing Supervision Task Team Leader (Last ISR) ISR Reports (stored on internal database) 3,819 100 2,838 100 For Practice Manager (At Approval) Manual Collection cross referenced with HR Reports 3,123 99 2,842 99 Data Scraping from PADs 31 1 17 1 For Country Director (At Approval) "Ask an Archivist" Service 4,342 100 2,842 100 Panel B - Availability of Staffing Characteristics Frequency Frequency For Preparation Task Team Leader (At Approval) Age 2,916 2,793 Education 2,929 2,704 Experience (# of WB projects as TTL) 3,355 2,840 For Mid-Point Supervision Task Team Leader (Mid- point ISR) Age 3,250 2,621 Education 3,128 2,516 Experience (# of WB projects as TTL) 3,649 2,833 For Closing Supervision Task Team Leader (Last ISR) Age 3,817 2,838 Education 3,598 2,674 Experience (# of WB projects as TTL) 3,819 2,827 For Practice Manager (At Approval) Age 1,477 1,344 Education 1,217 1,119 Experience (# of WB projects approved as PM) 2,211 1,862 For Country Director (At Approval) Age - - Education - - Experience (# of WB projects approved CD) 4,342 2,842 42 Annex 3 – Technical approach to estimating TTL quality Within the World Bank, Task Team Leaders (TTLs) are akin to project managers. They are responsible for the preparation of the Project Appraisal Document (PAD), periodic Implementation Status Reports (ISRs) and a final Implementation Completion and Results report (ICR). Overseeing the TTL is a lattice of management with Practice Managers (PMs) and Country Directors (CDs) charged with immediate management of the World Bank project portfolio. TTLs design and supervise World Bank projects in collaboration with the client government; PMs make decisions on staffing and clear all aspects of project preparation and subsequent implementation; CDs shape the overall dialogue with the country and chair the meetings that finalize project design. Figure A3.1 conveys the roles and responsibilities of key staff positions in the project cycle. Figure A3.1 The project cycle: roles and responsibilities One straightforward approach to measure TTL “quality” is the average outcome ratings of projects with which they have previously been associated as the supervisory leader (TTL) (Denizer et al., 2013). 2 In those studies, the TTL quality measure affixed to the project in question is estimated by the mean outcome of all other projects that were ever supervised by the same TTL (i.e. the leave-out mean). An alternative measure of staff quality is an estimate of the “value added” by a particular staff member to project outcomes, one that attempts to estimate the additional contribution of the TTL/PM/CD to project quality over and above other factors such as the country’s CPIA score. In this study, the TTL predicted performance (the ‘value-added’ measure) of the project in question is estimated as the “average 2 Moll et al. (2015) found an analogous result for Development Policy Financing, an alternative mode of World Bank development financing. 43 value added to other projects”, which means the average difference between the real and predicted outcomes of other projects in which the same TTL ever served. This is in line with the teacher value- added framework (Hanushek, 1971; Chetty et al., 2014), which has also been used to study the effects of managers in firms (Bertrand and Schoar, 2003). A project’s predicted outcome is estimated from on a linear regression using the IEG outcome rating as the dependent variable (the 6-point scale) and CPIA, GDP per capita, population size, project size, practice group and country fixed effects as the independent variables. When calculating a supervision TTL’s `value-added’, it makes sense to put more emphasis (weight) on the outcomes of projects that they supervised for a longer period. Therefore, the supervision TTL predicted performance is weighted by the number of ISRs that each TTL reported for that project. This approach generates a wide dispersion of estimates of TTL quality, set out in Figure A3.2a that presents the distribution of the three value-added measures (staff quality measures of Preparation TTL of the project, Supervision TTL during the first half of the project, and Supervision TTL during the second half of the project). When it comes to estimating quality measures for the practice manager or country director, the larger average number of associated projects enables a more robust estimate of staff ‘value added’. For this analysis, the inferential challenge of isolating any influence of the PM or CD net of other important factors is that the presence of the managerial staff is largely coincident with the country of operation and thus all country-level factors that, independently of the PM/CD, influence project quality. Therefore, in our study this value-added measure is estimated as the average difference between the real project quality of the PM or CD’s portfolio in all other countries where they ever served and the predicted project quality of that portfolio. Again, the predicted project quality here is generated by a predictive OLS model using the IEG outcome rating as the dependent variable (the 6-point scale) and CPIA, GDP per capita, population size, project size, global practice (where relevant) and country fixed effects as the independent variables. The only difference is that the first-step regression for estimating CD quality replaces dummy variables of global practices with the ones for sectors (EFI/HD/SD), within which the global practices are nested. The distributions of values on the value-added measures of PM and CD qualities are illustrated in Figure A3.2b. Tables A3.1 and A3.2 summarize these first-step regressions which generate the predicted project outcomes. As explained above, we take the difference between the real project outcome and the predicted one as the ‘value added’ by a staff member to a given project. 44 Figure A3.2a Distributions of values in TTL quality measures Figure A3.2b Distribution of values in PM and CD quality measures 45 Table A3.1 First-step regression used for calculating expected IEG Outcomes, by which value-added measures of TTL and PM qualities are generated (1) Variables 0.279** CPIA at Approval Calendar Year (0.132) 0.358*** CPIA Change Over Project Lifetime (-0.108) -0.341*** (Log) GDP per capita PPP $ (at approval Fiscal Year) (-0.086) -0.423 (Log) Population (at approval Fiscal Year) (-0.258) 0.000 (Log) Original Commitment Amount (in millions [constant 2015 $]) (-0.023) 0.175* Education Practice (-0.105) 0.064 Energy & Extractives Practice (-0.111) -0.259* Environment & Natural Resources Practice (-0.132) -0.044 Finance & Markets Practice (-0.134) -0.099 Governance Practice (-0.129) -0.018 Health, Nutrition & Population Practice (-0.102) 0.054 Macro Economics & Fiscal Management Practice (-0.198) 0.824*** Poverty & Equity Practice (-0.106) 0.327*** Social Protection & Labor (-0.107) 0.188* Social, Urban, Rural and Resilience Practice (-0.105) 0.168 Trade & Competitiveness Practice (-0.206) 0.448*** Transport & ICT Practice (-0.103) 0.018 Water Practice (-0.097) Constant 12.89*** Observations 3,498 Adjusted R-Squared 0.102 Note: Robust standard errors are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). Regression estimated using OLS on full 4,348 data set. All specifications include country fixed effects. 46 Table A3.2 First-step regression used for calculating expected IEG Outcomes, by which value-added measure of CD quality is generated (1) Variables 0.267** CPIA at Approval Calendar Year (0.131) 0.354*** CPIA Change Over Project Lifetime (-0.11) -0.282*** (Log) GDP per capita PPP $ (at approval Fiscal Year) (-0.086) -0.416 (Log) Population (at approval Fiscal Year) (-0.258) 0.030 (Log) Original Commitment Amount (in millions [constant 2015 $]) (-0.024) 0.171*** Human Development Sector Group (-0.061) 0.176*** Sustainable Development Sector Group (-0.0634) Constant 12.17*** Observations 3,498 Adjusted R-Squared 0.084 Note: Robust standard errors are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). Regression estimated using OLS on full 4,348 data set. All specifications include country fixed effects. 47 Annex 4 – Alternative specifications of main results Table A4.4a Relationship between country-level characteristics and IEG outcome rating, ordered probit Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) CPIA rating at 0.167** 0.437*** 0.425*** approval year (0.075) (0.080) (0.094) CPIA change over 0.343*** 0.630*** 0.580*** project lifetime (0.085) (0.106) (0.117) Log(GDPpc PPP$ -0.074 -0.052 -0.115* at approval year) (0.060) (0.059) (0.056) Real GDPpc growth 1.856*** 1.905** 1.486 over project lifetime (0.712) (0.700) (0.926) Observations 2,790 2,767 2,767 2,735 2,665 2,580 2,516 Pseudo R-squared 0.070 0.073 0.081 0.069 0.070 0.070 0.079 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Ordered Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project and IEG evaluator fixed effects. Table A4.4b Relationship between country-level characteristics and binary IEG outcome rating, probit Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) (1) (2) (3) (4) (5) (6) (7) CPIA rating at 0.190** 0.482*** 0.422*** approval year (0.083) (0.099) (0.109) CPIA change over 0.355*** 0.673*** 0.593*** project lifetime (0.102) (0.126) (0.135) Log(GDPpc PPP$ -0.041 -0.011 -0.057 at approval year) (0.060) (0.061) (0.060) Real GDPpc growth 2.346** 2.452** 2.363** over project lifetime (1.195) (1.209) (1.110) Observations 2,593 2,572 2,572 2,528 2,475 2,386 2,331 Pseudo R-squared 0.082 0.086 0.099 0.080 0.084 0.085 0.101 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project and IEG evaluator fixed effects. 48 Table A4.4c Relationship between country-level characteristics and IEG outcome rating, weighted by original commitment (in millions, constant 2015 $) Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) CPIA rating at 0.160* 0.425*** 0.346*** approval year (0.085) (0.117) (0.116) CPIA change over 0.241** 0.505*** 0.374*** project lifetime (0.100) (0.150) (0.147) Log(GDPpc PPP$ -0.072 0.016 -0.014 at approval year) (0.091) (0.087) (0.081) Real GDPpc growth 3.715*** 3.992*** 3.656*** over project lifetime (1.295) (1.374) (1.254) 3.122*** 3.848*** 2.340*** 3.733*** 3.631*** 3.097*** 2.326*** Constant (0.474) (0.370) (0.458) (0.700) (0.355) (0.663) (0.721) Observations 2,790 2,767 2,767 2,735 2,665 2,580 2,516 Adjusted R-squared 0.187 0.193 0.208 0.188 0.197 0.203 0.222 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighting by original commitment amount, and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project and IEG evaluator fixed effects. Table A4.4d Relationship between country-level characteristics and IEG outcome rating, mean imputation for missingness Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) CPIA rating at 0.157** 0.406*** 0.426*** approval year (0.713) (0.077) (0.080) CPIA change over 0.320*** 0.582*** 0.580*** project lifetime (0.083) (0.106) (0.108) Log(GDPpc PPP$ -0.075 -0.0694 -0.116** at approval year) (0.053) (0.052) (0.048) Real GDPpc growth 1.380* 1.331* 0.586 over project lifetime (0.734) (0.719) (0.425) 2.701*** 3.386*** 2.048*** 3.820*** 3.272*** 3.789*** 2.848*** Constant (0.395) (0.270) (0.337) (0.499) (0.313) (0.490) (0.457) Observations 2,837 2,837 2,837 2,837 2,837 2,837 2,837 Adjusted R-squared 0.080 0.087 0.106 0.077 0.078 0.079 0.109 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project and IEG evaluator fixed effects. Mean imputation used for missing values of CPIA and GDP measures. 49 Table A4.4e Relationship between country-level characteristics and IEG outcome rating, results for IBRD/IDA Dependent variable: IEG 1-6 outcome rating By lending source IBRD IBRD IBRD IBRD IBRD IBRD IBRD CPIA rating at 0.159 0.423*** 0.390*** approval year (0.107) (0.112) (0.140) CPIA change over 0.338*** 0.611*** 0.514*** project lifetime (0.122) (0.154) (0.174) Log(GDPpc PPP$ -0.104 -0.005 -0.142 at approval year) (0.152) (0.140) (0.135) Real GDPpc growth 4.488*** 4.502*** 1.800 over project lifetime (1.413) (1.389) (1.427) 3.924*** 4.525*** 3.083*** 5.407*** 4.499*** 4.600*** 4.504*** Constant (0.504) (0.416) (0.452) (1.308) (0.400) (1.190) (1.086) Observations 1,247 1,236 1,236 1,198 1,195 1,143 2,837 Adjusted R-squared 0.073 0.084 0.107 0.077 0.070 0.077 0.109 Dependent variable: IEG 1-6 outcome rating By lending source IDA IDA IDA IDA IDA IDA IDA CPIA rating at 0.169* 0.414*** 0.439*** approval year (0.090) (0.108) (0.134) CPIA change over 0.323** 0.572*** 0.572*** project lifetime (0.126) (0.157) (0.169) Log(GDPpc PPP$ -0.080 -0.091 -0.112 at approval year) (0.067) (0.071) (0.068) Real GDPpc growth 0.944* 1.087* 0.658 over project lifetime (0.529) (0.547) (1.330) 2.504*** 3.179*** 1.901*** 3.300*** 3.085*** 3.476*** 2.537*** Constant (0.460) (0.304) (0.442) (0.556) (0.350) (0.584) (0.635) Observations 1,543 1,531 1,531 1,537 1,470 1,437 1,386 Adjusted R-squared 0.067 0.073 0.089 0.059 0.065 0.061 0.080 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, and controlling for region, sector, original commitment amount of project, original length of project and IEG evaluator fixed effects. 50 Table A4.5a Relationship between staffing predicted performance and IEG outcome rating, ordered probit Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) (8) 0.104*** 0.055* 0.069* 0.087** Preparation TTL (0.027) (0.030) (0.040) (0.044) 1st half supervision 0.168*** 0.051 -0.002 0.005 TTL (0.030) (0.040) (0.052) (0.056) 2nd half supervision 0.189*** 0.150*** 0.156*** 0.161*** TTL (0.030) (0.040) (0.054) (0.058) 0.0637 0.0373 0.096 Practice Manager1 (0.042) (0.053) (0.061) 0.0117 -0.132** Country Director1 (0.044) (0.067) Observations 2,038 2,560 2,513 1,946 1,723 1,222 2,540 1,129 Pseudo R-squared 0.095 0.087 0.088 0.101 0.102 0.120 0.084 0.130 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Ordered Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. (1) Practice Manager and Country Director at project preparation. Table A4.5b Relationship between staffing predicted performance and binary IEG outcome rating, probit Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) (1) (2) (3) (4) (5) (6) (7) (8) 0.096*** 0.056 0.105** 0.140** Preparation TTL (0.034) (0.040) (0.053) (0.059) 1st half supervision 0.143*** 0.025 -0.101 -0.010 TTL (0.037) (0.056) (0.076) (0.081) 2nd half supervision 0.190*** 0.173*** 0.222*** 0.220*** TTL (0.037) (0.056) (0.073) (0.079) 0.063 0.039 0.101 Practice Manager1 (0.050) (0.060) (0.069) -0.004 -0.104 Country Director1 (0.056) (0.081) Observations 1,851 2,366 2,316 1,767 1,543 1,069 2,355 985 Pseudo R-squared 0.130 0.109 0.111 0.138 0.125 0.174 0.105 0.195 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. (1) Practice Manager and Country Director at project preparation. 51 Table A4.5c Relationship between staffing predicted performance and IEG outcome rating, weighted by original commitment (in millions, constant 2015 $) Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) (8) 0.122*** 0.083** 0.104** 0.135** Preparation TTL (0.039) (0.042) (0.052) (0.057) 0.200*** 0.061 0.051 0.048 1st half supervision TTL (0.036) (0.053) (0.069) (0.058) 0.198*** 0.136** 0.143*** 0.144*** 2nd half supervision TTL (0.030) (0.053) (0.048) (0.046) 0.061 0.065 0.148** Practice Manager1 (0.064) (0.064) (0.059) -0.043 -0.127* Country Director1 (0.066) (0.074) 2.802*** 2.440*** 2.387*** 3.137*** 2.227*** 2.671*** 1.924*** 2.474*** Constant (0.363) (0.414) (0.394) (0.344) (0.477) (0.517) (0.356) (0.581) Observations 2,038 2,560 2,513 1,946 1,723 1,222 2,540 1,129 Adjusted R-squared 0.253 0.238 0.237 0.271 0.224 0.303 0.219 0.338 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighted by original commitment amount and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. (1) Practice Manager and Country Director at project preparation. Table A4.5d Relationship between staffing predicted performance and IEG outcome rating, mean imputation for missingness Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) (7) (8) 0.096*** 0.052** 0.051** 0.051** Preparation TTL (0.023) (0.025) (0.025) (0.025) 0.148*** 0.067* 0.068** 0.068** 1st half supervision TTL (0.027) (0.034) (0.034) (0.034) 0.158*** 0.103*** 0.101*** 0.101*** 2nd half supervision TTL (0.029) (0.036) (0.036) (0.036) 0.061 0.051 0.052 Practice Manager1 (0.037) (0.039) (0.039) 0.010 -0.009 Country Director1 (0.044) (0.042) 2.119*** 2.332*** 2.408*** 2.444*** 2.093*** 2.472*** 2.066*** 2.465*** Constant (0.339) (0.313) (0.302) (0.304) (0.335) (0.301) (0.346) (0.309) Observations 2,767 2,767 2,767 2,767 2,767 2,767 2,767 2,767 Adjusted R-squared 0.112 0.118 0.119 0.123 0.106 0.123 0.105 0.123 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Mean imputation used for missing values of staffing predicted performance measures. (1) Practice Manager and Country Director at project preparation. 52 Table A4.5e Relationship between country-level characteristics and IEG outcome rating, results for IBRD/IDA Dependent variable: IEG 1-6 outcome rating By lending source IBRD IBRD IBRD IBRD IBRD IBRD IBRD IBRD 0.158*** 0.151*** 0.146* 0.166** Preparation TTL (0.041) (0.047) (0.079) (0.081) 1st half supervision 0.187*** -0.033 -0.086 -0.080 TTL (0.046) (0.063) (0.089) (0.098) 2nd half supervision 0.191*** 0.152*** 0.177** 0.170* TTL (0.036) (0.051) (0.083) (0.089) -0.025 -0.052 -0.017 Practice Manager1 (0.082) (0.094) (0.121) -0.003 -0.099 Country Director1 (0.072) (0.122) 3.198*** 3.431*** 3.315*** 3.276*** 2.269*** 1.959** 2.906*** 1.750** Constant (0.512) (0.473) (0.468) (0.509) (0.767) (0.807) (0.415) (0.857) Observations 942 1,144 1,129 904 644 470 1,137 432 Adjusted R-squared 0.158 0.142 0.135 0.172 0.178 0.236 0.105 0.220 Dependent variable: IEG 1-6 outcome rating By lending source IDA IDA IDA IDA IDA IDA IDA IDA -0.007 -0.060 -0.064 -0.045 Preparation TTL (0.036) (0.042) (0.048) (0.055) 1st half supervision 0.088** 0.073 0.043 0.068 TTL (0.041) (0.056) (0.070) (0.070) 2 half supervision nd 0.127** 0.101 0.102 0.080 TTL (0.049) (0.063) (0.068) (0.074) 0.083 0.099 0.114 Practice Manager1 (0.053) (0.069) (0.070) 0.036 -0.044 Country Director1 (0.050) (0.075) 1.853*** 1.877*** 1.953*** 2.130*** 1.903*** 2.710*** 1.797*** 2.580*** Constant (0.516) (0.399) (0.407) (0.552) (0.497) (0.706) (0.394) (0.740) Observations 1,096 1,416 1,384 1,042 1,079 752 1,403 697 Adjusted R-squared 0.114 0.098 0.097 0.119 0.100 0.101 0.096 0.126 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, and controlling for region, sector, original commitment amount of project, original length of project and IEG evaluator fixed effects. 53 Table A4.6a Relationship between Task Team Leader (TTL) turnover and IEG outcome rating, ordered probit Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) Dummy for preparation TTL change -0.056 -0.020 in the 1st or 2nd ISR (0.046) (0.047) TTL turnover during first half (Number of -0.412*** -0.305* TTLs/Number of ISRs) (0.158) (0.170) TTL turnover during second half (Number of -0.614*** -0.623*** TTLs/Number of ISRs) (0.182) (0.176) Observations 2,763 2,762 2,757 2,752 Pseudo R-squared 0.081 0.082 0.082 0.083 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Ordered Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Table A4.6b. Relationship between Task Team Leader (TTL) turnover and IEG outcome rating, probit Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) (1) (2) (3) (4) Dummy for preparation TTL change -0.116* -0.079 in the 1st or 2nd ISR (0.062) (0.063) TTL turnover during first half (Number of -0.530*** -0.382* TTLs/Number of ISRs) (0.204) (0.211) TTL turnover during second half (Number of -0.614*** -0.682*** TTLs/Number of ISRs) (0.209) (0.203) Observations 2,568 2,567 2,563 2,558 Pseudo R-squared 0.101 0.101 0.102 0.105 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Probit and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. 54 Table A4.6c Relationship between Task Team Leader (TTL) turnover and IEG outcome rating, weighted by original commitment (in millions, constant 2015 $) Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) Dummy for preparation TTL change -0.049 -0.0245 in the 1st or 2nd ISR (0.069) (0.068) TTL turnover during first half (Number of -0.253 -0.130 TTLs/Number of ISRs) (0.171) (0.175) TTL turnover during second half (Number of -0.789*** -0.782*** TTLs/Number of ISRs) (0.227) (0.226) 2.344*** 2.433*** 2.867*** 2.909*** Constant (0.455) (0.453) (0.542) (0.534) Observations 2,763 2,762 2,757 2,761 Adjusted R-squared 0.208 0.209 0.218 0.208 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighted by original commitment, and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Table A4.6d Relationship between Task Team Leader (TTL) turnover and IEG outcome rating, mean imputation for missingness Dependent variable: IEG 1-6 outcome rating (1) (2) (3) (4) Dummy for preparation TTL change -0.055 -0.028 in the 1st or 2nd ISR (0.044) (0.045) TTL turnover during first half (Number of -0.418*** -0.336** TTLs/Number of ISRs) (0.153) (0.159) TTL turnover during second half (Number of -0.587*** -0.548*** TTLs/Number of ISRs) (0.169) (0.169) 2.088*** 2.248*** 2.504*** 2.644*** Constant (0.333) (0.334) (0.359) (0.352) Observations 2,763 2,762 2,757 2,761 Adjusted R-squared 0.208 0.209 0.218 0.208 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, and controlling for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Mean imputation used for missing values of TTL turnover measures. 55 Table A4.6e Relationship between Task Team Leader (TTL) turnover and IEG outcome rating, IBRD/IDA results Dependent variable: IEG 1-6 outcome rating By lending source IBRD IBRD IBRD IBRD Dummy for preparation TTL change -0.0573 -0.005 in the 1st or 2nd ISR (0.085) (0.087) TTL turnover during first half (Number of -0.682** -0.495* TTLs/Number of ISRs) (0.258) (0.263) TTL turnover during second half (Number of -1.047*** -1.028*** TTLs/Number of ISRs) (0.273) (0.277) 3.924*** 4.525*** 3.083*** 3.355*** Constant (0.504) (0.416) (0.452) (0.470) Observations 1,235 1,235 1,230 1,229 Adjusted R-squared 0.106 0.112 0.124 0.126 Dependent variable: IEG 1-6 outcome rating By lending source IDA IDA IDA IDA Dummy for preparation TTL change -0.069 -0.053 in the 1st or 2nd ISR (0.053) (0.053) TTL turnover during first half (Number of -0.173 -0.116 TTLs/Number of ISRs) (0.210) (0.217) TTL turnover during second half (Number of -0.167 -0.189 TTLs/Number of ISRs) (0.264) (0.240) 3.924*** 4.525*** 3.083*** 2.119*** Constant (0.504) (0.416) (0.452) (0.501) Observations 1,528 1,527 1,527 1,537 Adjusted R-squared 0.093 0.092 0.086 0.059 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, and controlling for region, sector, original commitment amount of project, original length of project and IEG evaluator fixed effects. 56 Table A4.7a Relationship between staff experience and IEG outcome rating, ordered probit Dependent variable: IEG 1-6 outcome rating (1) (2) (3) Number of projects prepared before by the -0.018* preparation TTL (0.011) (Log) Number of ISRs signed before by the 0.008 mid-point supervision TTL (0.022) (Log) Number of ISRs signed before by the 0.040* final supervision TTL (0.024) Observations 2,767 2,762 2,757 Pseudo R-squared 0.081 0.081 0.08 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Ordered Probit, with controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Table A4.7b Relationship between staff experience and IEG outcome rating, probit Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) (1) (2) (3) Number of projects prepared before by the -0.004 preparation TTL (0.015) (Log) Number of ISRs signed before by the -0.016 mid-point supervision TTL (0.030) (Log) Number of ISRs signed before by the 0.035 final supervision TTL (0.028) Observations 2,572 2,567 2,563 Pseudo R-squared 0.099 0.099 0.099 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using Ordered Probit, with controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. 57 Table A4.7c Relationship between staff experience, staff predicted performance, and IEG outcome rating, weighted by original commitment (in millions, constant 2015 $) Dependent variable: Dependent variable: Staff predicted performance IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) Number of projects prepared 0.026** -0.007 before by the preparation TTL (0.013) (0.010) (Log) Number of ISRs signed 0.072*** 0.021 before by the mid-point supervision TTL (0.020) (0.030) (Log) Number of ISRs signed 0.053*** 0.083** before by the final supervision TTL (0.019) (0.038) 0.481*** 0.300*** 0.258*** 2.337*** 2.286*** 2.105*** Constant (0.039) (0.061) (0.060) (0.462) (0.476) (0.444) Observations 2,038 2,560 2,513 2,767 2,762 2,757 Adjusted R-squared 0.001 0.004 0.003 0.208 0.208 0.214 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighted by original commitment amount. Columns (1) (2) (3) do not include any controls. Columns (4) (5) (6) include controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Table A4.7d Relationship between staff experience, staff predicted performance, and IEG outcome rating, mean imputation for missingness Dependent variable: Dependent variable: Staff predicted performance IEG 1-6 outcome rating (1) (2) (3) (4) (5) (6) Number of projects prepared 0.068*** -0.013 before by the preparation TTL (0.014) (0.010) (Log) Number of ISRs signed before 0.062*** 0.011 by the mid-point supervision TTL (0.020) (0.021) (Log) Number of ISRs signed before 0.058*** 0.041* by the final supervision TTL (0.019) (0.023) -0.105*** -0.184*** -0.230*** 2.051*** 2.035*** 2.007*** Constant (0.039) (0.057) (0.060) (0.344) (0.345) (0.320) Observations 2,038 2,560 2,513 2,767 2,767 2,767 Adjusted R-squared 0.011 0.004 0.003 0.105 0.105 0.106 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighted by original commitment amount. Columns (1) (2) (3) do not include any controls. Columns (4) (5) (6) include controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. Mean imputation used for missing values of TTL experience. 58 Table A4.7e Relationship between staff experience, staff predicted performance, and IEG outcome rating, results for IBRD/IDA Dependent variable: Staff Predicted Dependent variable: IEG 1-6 Performance outcome rating By lending source By lending source IBRD IBRD IBRD IBRD IBRD IBRD Number of projects prepared before by the 0.043** -0.003 preparation TTL (0.017) (0.014) (Log) Number of ISRs signed before by the mid- 0.049* 0.006 point supervision TTL (0.030) (0.039) (Log) Number of ISRs signed before by the final 0.005 0.081** supervision TTL (0.028) (0.037) 0.294*** 0.223** 0.267*** 3.077*** 3.063*** 2.842*** Constant (0.054) (0.088) (0.090) (0.453) (0.479) (0.445) Observations 942 1,144 1,129 942 1,144 1,129 Adjusted R-squared 0.005 0.002 -0.001 0.005 0.002 -0.001 Dependent variable: Staff Predicted Dependent variable: IEG 1-6 Performance outcome rating By lending source By lending source IDA IDA IDA IDA IDA IDA Number of projects prepared before by the 0.039* -0.032 preparation TTL (0.023) (0.021) (Log) Number of ISRs signed before by the mid- 0.006 -0.005 point supervision TTL (0.024) (0.025) (Log) Number of ISRs signed before by the final 0.046* -0.020 supervision TTL (0.024) (0.031) -0.356*** -0.341*** -0.472*** 1.899*** 1.909*** 1.882*** Constant (0.049) (0.067) (0.073) (0.045) (0.444) (0.462) Observations 1,096 1,416 1,384 1,531 1,527 1,527 Adjusted R-squared 0.002 -0.001 0.002 0.090 0.092 0.086 Note: Robust standard errors clustered at the country level are reported in parentheses. (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighted by original commitment amount. Columns (1) (2) (3) do not include any controls. Columns (4) (5) (6) include controls for region, sector, IBRD/IDA, original commitment amount of project, original length of project, CPIA score of project country at approval, change in the CPIA score over project lifetime and IEG evaluator fixed effects. 59 Table A4.8a Relationship between project design and IEG binary outcome rating, ordered probit Dependent variable: IEG 1-6 outcome rating Design index components (1) (2) (3) (4) (5) (6) (7) Number of project -0.163** components (0.069) Number of project -0.024 sub-components (0.015) Number of PDO indicators -0.038* at entry (0.023) Number of intermediate -0.030*** indicators at entry (0.011) PDO rating 0.029 (1 high - 4 low) (0.145) Results framework rating -0.228* (1 high – 4 low) (0.123) Extent of changes to -0.294** project components (0.129) Observations 120 120 120 120 120 120 120 Pseudo R-squared 0.059 0.051 0.051 0.066 0.044 0.053 0.058 Dependent variable: IEG 1-6 outcome rating (8) (9) (10) (11) 0.230*** Design index (0.072) Project Characteristics 0.131*** Index for comparison1 (0.056) 0.290 CPIA at Approval (0.184) Change in CPIA over -0.294 Project Lifetime (0.247) Observations 120 120 120 120 Pseudo R-squared 0.072 0.059 0.043 0.040 Notes: This table provides estimates of IEG Outcome Ratings regressed on common design features and quality ratings. Using Principal Component Analysis , an Index is created for Design based on these features and ratings. (1) For comparison, an Index is created for Project Characteristics based on the log of original commitment, preparation cost, first and second half supervision cost, the change in project size from approval to closing, and the time it takes projects to move from one key milestone in the project lifecycle to another. The table also provides the estimates of regressing IEG Outcome on two country variables, CPIA score of project country at approval and the change in the score over the project lifetime. 60 Table A4.8b Relationship between project design and IEG binary outcome rating, probit Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) Design index components (1) (2) (3) (4) (5) (6) (7) Number of project -0.121 components (0.084) Number of project -0.026 sub-components (0.019) Number of PDO indicators -0.047 at entry (0.030) Number of intermediate -0.032*** indicators at entry (0.013) PDO rating -0.191 (1 high - 4 low) (0.179) Results framework rating -0.373** (1 high – 4 low) (0.154) Extent of changes to -0.253 project components (0.157) Observations 120 120 120 120 120 120 120 Pseudo R-squared 0.073 0.073 0.076 0.099 0.068 0.097 0.077 Dependent variable: Binary IEG outcome rating (Satisfactory=1, Non-satisfactory=0) (8) (9) (10) (11) 0.280*** Design index (0.091) Project Characteristics 0.082*** Index for comparison1 (0.067) 0.395* CPIA at Approval (0.231) Change in CPIA over -0.445 Project Lifetime (0.312) Observations 120 120 120 120 Pseudo R-squared 0.121 0.070 0.058 0.053 Notes: This table provides estimates of IEG Outcome Ratings regressed on common design features and quality ratings. Using Principal Component Analysis , an Index is created for Design based on these features and ratings. (1) For comparison, an Index is created for Project Characteristics based on the log of original commitment, preparation cost, first and second half supervision cost, the change in project size from approval to closing, and the time it takes projects to move from one key milestone in the project lifecycle to another. The table also provides the estimates of regressing IEG Outcome on two country variables, CPIA score of project country at approval and the change in the score over the project lifetime. 61 Table A4.8c Relationship between project design and IEG binary outcome rating, weighted by original commitment (in millions, constant 2015 $) Dependent variable: IEG 1-6 outcome rating Design index components (1) (2) (3) (4) (5) (6) (7) Number of project -0.010 components (0.057) Number of project -0.030** sub-components (0.013) Number of PDO indicators at -0.058 entry (0.020) Number of intermediate -0.005 indicators at entry (0.011) PDO rating 0.031 (1 high - 4 low) (0.132) Results framework rating -0.242** (1 high – 4 low) (0.109) Extent of changes to project -0.291** components (0.119) Observations 120 120 120 120 120 120 120 Pseudo R-squared 0.131 0.173 0.191 0.133 0.131 0.169 0.176 Dependent variable: IEG 1-6 outcome rating (8) (9) (10) (11) 0.120** Design index (0.060) Project Characteristics Index 0.191*** for comparison1 (0.052) 0.444** CPIA at Approval (0.204) Change in CPIA over Project -0.0685 Lifetime (0.278) Observations 120 120 120 120 Pseudo R-squared 0.164 0.226 0.14 0.104 Notes: This table provides estimates of IEG Outcome Ratings regressed on common design features and quality ratings. Using Principal Component Analysis , an Index is created for Design based on these features and ratings. (1) For comparison, an Index is created for Project Characteristics based on the log of original commitment, preparation cost, first and second half supervision cost, the change in project size from approval to closing, and the time it takes projects to move from one key milestone in the project lifecycle to another. The table also provides the estimates of regressing IEG Outcome on two country variables, CPIA score of project country at approval and the change in the score over the project lifetime. 62 Table A4.9a Relationship between ESW/ASA and IEG outcome rating by project group, ordered probit By lending source By country group2 Overall IBRD IDA FCV UMIC Dummy for any ESW/ASA in 0.147*** 0.099 0.173*** 0.178 -0.004 the corresponding GP within 3 years before project approval1 (0.055) (0.099) (0.066) (0.142) (0.123) Observations 2,374 1,031 1,343 301 324 Pseudo R-squared 0.036 0.046 0.036 0.042 0.045 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using ordered probit and controlling for region, sector, IBRD/IDA, CPIA score of project country at approval, change in the CPIA score over project lifetime, original commitment amount of project and the exit year. Sub-sample of projects within exit fiscal year 2004-2016. Specific project groups displayed in the first row consisting of: International Bank for Reconstruction and Development (IBRD); International Development Association (IDA); fragile, conflict and violence settings (FCV); upper-middle income countries (UMIC); Equitable Growth, Finance and Institutions (EFI); Human Development (HD); Sustainable Development (SD). (1) Considers only Economic and Sector Work/Analytical and Advisory Services (ESW/ASA) work executed by the same Global Practice (GP) for a given country. (2) Country Groups are not mutually exclusive nor collectively exhaustive, with FCV and UMIC selected only for reference. Designation as FCV is determined by project country being on the WB Harmonized List of Fragile Situations in Fiscal Year of Approval. Table A4.9b Relationship between ESW/ASA and binary IEG outcome rating by project group, probit By lending source By country group2 Overall IBRD IDA FCV UMIC Dummy for any ESW/ASA in the 0.055 -0.036 0.109 0.294 -0.133 corresponding GP within 3 years before project approval1 (0.066) (0.105) (0.088) (0.189) (0.164) Observations 2,374 1,031 1,343 301 324 Pseudo R-squared 0.050 0.069 0.056 0.061 0.087 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using probit and controlling for region, sector, IBRD/IDA, CPIA score of project country at approval, change in the CPIA score over project lifetime, original commitment amount of project and the exit year. Sub-sample of projects within exit fiscal year 2004-2016. Specific project groups displayed in the first row consisting of: International Bank for Reconstruction and Development (IBRD); International Development Association (IDA); fragile, conflict and violence settings (FCV); upper-middle income countries (UMIC); Equitable Growth, Finance and Institutions (EFI); Human Development (HD); Sustainable Development (SD). (1) Considers only Economic and Sector Work/Analytical and Advisory Services (ESW/ASA) work executed by the same Global Practice (GP) for a given country. (2) Country Groups are not mutually exclusive nor collectively exhaustive, with FCV and UMIC selected only for reference. Designation as FCV is determined by project country being on the WB Harmonized List of Fragile Situations in Fiscal Year of Approval. 63 Table A4.9c Relationship between ESW/ASA and IEG outcome rating by project group, weighted by original commitment (in millions, constant 2015 $) By lending source By country group2 Overall IBRD IDA FCV UMIC Dummy for any ESW/ASA in the 0.161* 0.052 0.319*** 0.247* -0.207* corresponding GP within 3 years before project approval1 (0.091) (0.126) (0.066) (0.141) (0.117) Observations 2,374 1,031 1,343 301 324 Adjusted R-squared 0.090 0.097 0.084 0.034 0.030 Note: Robust standard errors clustered at the country level are reported in parentheses (***p<0.01, **p<0.05, *p<0.10). All regressions are estimated using OLS, weighting by original commitment and controlling for region, sector, IBRD/IDA, CPIA score of project country at approval, change in the CPIA score over project lifetime, original commitment amount of project and the exit year. Sub-sample of projects within exit fiscal year 2004-2016. Specific project groups displayed in the first row consisting of: International Bank for Reconstruction and Development (IBRD); International Development Association (IDA); fragile, conflict and violence settings (FCV); upper-middle income countries (UMIC); Equitable Growth, Finance and Institutions (EFI); Human Development (HD); Sustainable Development (SD). (1) Considers only Economic and Sector Work/Analytical and Advisory Services (ESW/ASA) work executed by the same Global Practice (GP) for a given country. (2) Country Groups are not mutually exclusive nor collectively exhaustive, with FCV and UMIC selected only for reference. Designation as FCV is determined by project country being on the WB Harmonized List of Fragile Situations in Fiscal Year of Approval. 64