Policy Research Working Paper 10845 Are Short-Term Gains in Learning Outcomes Possible? Evidence from the Malawi Education Sector Improvement Project Salman Asim Ravinder Casley Gera Education Global Practice July 2024 Policy Research Working Paper 10845 Abstract This paper presents evidence of the impact of a five-year environments, and learning outcomes, particularly for girls. package of interconnected interventions intended to Employing administrative data and data from a nation- improve learning environments in eight disadvantaged ally representative independent sample of public primary districts in Malawi. The intervention, which was imple- schools, the analysis finds that these investments closed the mented over five years, provided additional finance to gap in learning outcomes between the targeted districts and schools to support the hiring of additional teachers and the rest of Malawi. There is also suggestive evidence that construction of learning shelters to improve class sizes in the program reduced learning gaps between girls and boys. lower primary, along with constructing classrooms and The findings suggest that even in a low-income environment providing results-based finance to reward improvements with significant constraints, targeted efforts to reduce class in staffing. The interventions were targeted to eight dis- sizes can close district-level gaps in learning. tricts with longstanding disadvantages in staffing, learning This paper is a product of the Education Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at sasim@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Are Short-Term Gains in Learning Outcomes Possible? Evidence from the Malawi Education Sector Improvement Project * Ravinder Casley Gera† Salman Asim, JEL Classification: I21; I28; C99. Keywords: Education Quality; Primary School; Education Reform; Econometrics. * Senior Economist, Education Global Practice, World Bank: sasim@worldbank.org. † Education Specialist, Education Global Practice, World Bank Introduction Despite substantial investments in infrastructure and teachers, many low-income countries still strug- gle with large class sizes in primary school, particularly in lower grades. The typical primary school in Sub-Saharan Africa has more than 30 pupils per class, but average sizes rise to above 50 in countries including the Democratic Republic of Congo, Tanzania and Chad (Bashir et al., 2018). In Malawi, the context for this study, class sizes are particularly severe with a typical school having more than 125 students to a class (ibid.). These poor conditions stem from the failure of national public education systems to keep up with rapid growth in school enrollments, the result of rapid population growth and rising enrollment rates over more than two decades since the introduction of free primary education by the majority of low-income countries in the 1990s (ibid.). These poor conditions contribute to low rates of learning in lower grades. In Malawi, fewer than 25 percent of students in Grade 2 achieve minimum proficiency levels in reading according to the Early Grade Reading Assessment (EGRA) conducted by the United States Agency for International Development (USAID) (USAID, 2013). These poor levels of learning in early grades contribute to high rates of repetition, further exacerbating large class sizes, a case of ‘early grade bulge’ (Bashir et al., 2018). In the case of Malawi, repetition rates are typically above 25 percent in lower grades (Ministry of Education, 2022). Faced with these poor conditions, many students simply drop out. Fewer than three-quarters of remain in school until Grade 5 in a number of countries in Sub-Saharan Africa, including Uganda, Burkina Faso, and Mozambique (World Bank, b). In Malawi, around one in four students drop out prior to Grade 5 (Ministry of Education, 2022). Large class sizes may prove detrimental to learning by reducing the level of attention received from teachers by each student, reducing students’ ability to ask and answer questions and seek guidance, and reducing the likelihood that teachers engage in time-intensive teaching tasks such as marking homework. In Malawi, evidence from the Malawi Longitudinal Schools Survey (see section ) sug- gests that, while primary school teachers are as likely as those in neighboring countries to correct mistakes and give positive reinforcement, they are substantially less likely to set and mark homework and to be available to support students after class, activities which increase in time commitment as a result of large class sizes (Asim and Casley Gera, 2024). In addition to overall poor conditions in lower primary, many low-income countries have wide varia- tions in conditions between schools, with poorer and more remote districts and sub-district areas typ- ically having larger class sizes than wealthier districts and those closer to capital cities, particularly as a result of inefficiencies in the distribution of teachers (Mulkeen, 2010; Bashir et al., 2018). These in- equitable conditions are associated with inequities in learning outcomes, with the least well-equipped districts and schools typically having lower learning outcomes. Evidence from cross-sectional data derived from a range of primary level standardized tests suggests that ‘raising the floor’ on test scores by improving the performance of the lowest-performing students and schools is likely to be the most cost-effective way for low-income countries to raise overall learning levels in the short to medium term (Crouch and Rolleston, 2017). In Tanzania, a recent set of reforms targeted to seven disadvan- 2 taged regions, providing pre-primary education, teacher training and mentoring for female students1 , led to an increase of more than one-third in Grade 4 learning outcomes between 2014 and 2016, en- abling these regions to overtake the rest of the country (Asim et al., 2019). There is substantial evidence that reducing class sizes can lead to improved test scores, although the evidence is mixed and dependent on context and the size of classes before and after reductions. In high-income countries, Angrist and Lavy (1999), exploiting a class size limit of 40 in Israel to con- duct regression discontinuity analysis, estimate that reductions in class size can induce substantial improvements in test scores for older primary school students (Grade 4 and 5); however, later anal- ysis using more recent data suggested no impacts on learning from the class size reduction (Angrist et al., 2017). Similarly, Kreuger (1999), analyzing longitudinal data from the United States, finds reductions in class size associated with improvement in standardized tests of at least four percentile points; however, Hoxby (2000), conducting further analysis on similar data, finds no significant im- pacts on learning from class size reduction. Analyzing cross-country data from 47 countries with pupil fixed effects, Altinok and Kingdon (2011) conclude that the negative effects of class size are small and observed in a minority of countries. Turning to low- and middle-income countries, where class sizes tend to be significantly larger, Mu- ralidharan and Sundararaman (2013) combine experimental and panel data and estimate that the ad- dition of contract teachers to schools in India to reduce pupil-teacher ratios (PTRs) led to an im- provement in student learning of 0.15-0.16 standard deviations (s.d.) in math and language tests. In addition, Duflo et al. (2015) conduct an experimental evaluation of a similar intervention in Kenya found that the apparent benefits of class size reduction only for students in the classes taught by con- tract teachers and not for students in other classes with regular Government teachers who experienced a similar reduction in class size, concluding that improved practices and reduced absenteeism by con- tract teachers compared to Government teachers was the reason for gains in learning rather than class size reduction. In recent years, new research has directly explored the question of how large class sizes must be to negatively affect learning outcomes. Datta and Kingdon (2021), examining large-scale test data from India with pupil fixed effects, find that negative effects from large class sizes begin at a size of approximately 40 for science subjects and 50 for non-science subjects, and that India could allow its current national PTR to increase to 402 without negatively affecting test scores. However, there is a lack of rigorous analysis of the effects of class size reduction focused on countries where class sizes are substantially larger. Evidence from Malawi suggests that PTRs above 90 are associated with significantly lower test scores in Grade 4, equivalent to several weeks’ lost learn- ing, even when controlling for a wide range of school, teacher and student characteristics (Asim and 1 The Education Quality Improvement Programme (EQUIP-Tanzania) 2 From a current 22.8 3 Casley Gera, 2024). Finally, although the evidence primarily assesses the impact of class size on test scores, student rep- etition and promotion rates – which are generally closely linked to learning – may also respond to reductions in class size. Walter (2018), assessing evidence from a panel of 20 high-, middle- and low-income countries, estimates that reducing PTRs by improving the allocation of existing teach- ers between schools could improve student promotion rates by between 0.1 and 4.2 percentage points. In this paper, we present evidence of the impact of a five-year package of interconnected interven- tions intended to reduce class size in eight disadvantaged districts of Malawi. The intervention, im- plemented over five years, constructed classrooms; provided additional finance to schools to support hiring of contract teachers and construction of learning shelters; and provided results-based finance to reward improvements in staffing in lower grades. The interventions were targeted to eight districts with longstanding disadvantages in staffing, learning environments, and learning outcomes, particu- larly for girls. Employing administrative data and data from a nationally representative independent sample of pub- lic primary schools, we estimate the impacts of the intervention on class sizes, repetition rates, and test scores. We exploit the targeting of the interventions to particular districts to conduct difference- in-difference analysis between schools and students in the targeted districts and those in a comparison group of 17 rural districts (all non-urban, mainland districts which did not receive a similar interven- tion). We find that the interventions succeeded in reducing PTRs and pupil-classroom ratios (PCRs) in lower primary; reduced repetition rates in lower primary by 0.7 points; and closed the gap in learn- ing outcomes between the targeted districts and the rest of Malawi. We also find suggestive evidence that the treatment reduced the gap in learning between girls and boys. The findings suggest that even in a low-income environment with significant constraints, targeted efforts to reduce class sizes can close district-level gaps in learning. Our study is one of the first to assess the impact of an interconnected set of interventions to reduce class sizes conducted at large scale. Previous experimental evaluations of interventions to reduce class sizes, such as those as described above, have been conducted on a small scale under pilot con- ditions.3 While randomized evaluations have historically been considered the ‘gold standard’ for causal estimation of the impact of an intervention on selected outcomes (Cartwright, 2011), there is increasing concern that small-scale experiments can have low levels of external validity. Experiments in which allocation to treatment and control is done randomly, but from a purposively selected overall evaluation sample, may meet the technical definition of a randomized control trial without achiev- ing representativeness of the target population (Deaton and Cartwright, 2018). Furthermore, even RCTs in which both sample selection and allocation to treatment are randomized, typically involve 3 For example, Duflo et al. (2015) employs a sample of 70 schools while Muralidharan and Sundararaman (2013) employs a sample of 100 schools. 4 interventions conducted on a pilot basis with unique implementation arrangements; these may have an extremely low level of external validity in relation to the implementation of similar interventions at scale through conventional public service institutions (Vivalt, 2020; Pritchett, 2019). For exam- ple, the impacts of reducing class sizes through provision of contract teachers observed in Kenya in pilot conditions (Duflo et al., 2015) disappeared when the intervention was implemented by Govern- ment rather than by a non-governmental organization (Bold et al., 2018). More generally, evaluation of specific individual interventions is inherently of limited external validity to the context of real- life policymaking, where multiple reforms and interventions are typically conducted simultaneously (Pritchett, 2019). By assessing the impact on learning outcomes of a set of interconnected interven- tions, implemented at scale, our study is able to provide evidence that is likely to reflect the potential impact of similar interventions in other contexts. Design Research Questions and General Hypotheses Our research question is: can a set of interconnected interventions reduce district-level inequities in school staffing and, as a result, reduce student repetition and improve test scores? Our general hypotheses are: 1. The provision of the MESIP interventions is expected to improve PTRs in lower primary in the eight disadvantage districts compared to in the 17 comparison districts. 2. The improvement in PTR in lower primary is expected to lead to improvements in student repetition rates and learning outcomes. Context Malawi has made impressive achievements in access to schooling since the introduction of free pri- mary education in 1994, but its education system has failed to keep pace with increased enrolments, resulting in overcrowded classrooms and understaffed schools. The average class in Grade 14 has about 150 students; and in Grade 2, about 125 students (Bashir et al., 2018). These large class sizes reflect weaknesses in the management of Malawi’s teacher workforce. Malawi has a barely adequate supply of teachers, with a national PTR of 62:1 (Ministry of Education, 2022). However, as recently as 2016 when MESIP began, the national PTR was significantly higher at 69:1 (World Bank, 2015). Although Malawi produces around 4,000 graduates each year from its standard pre-service training for primary school teachers, the Initial Primary Teacher Education (IPTE). How- ever, in recent years, these teachers have typically waited 1-2 years following graduation before being 4 Grades are known as Standards in Malawi, but will be referenced as Grades throughout this paper. Grades range from Grade 1-12. Primary Education is from Grade 1-8, and Secondary Education from Grade 9-12. 5 hired into the national teaching workforce, slowing the reduction in the national PTR. Moreover, teachers are poorly distributed between districts and between schools. The most poorly- staffed eight districts have an overall PTR of 68 or more, and school PTRs can vary within a single zone (sub-district) by a factor of ten or more (Ministry of Education, 2020). The result is that more than 2,200 of Malawi’s approximately 5,800 schools have pupil-qualified teacher ratios (PqTRs) above 90 in lower primary, and more than 800 of these have PqTRs above 190 in lower primary, double the Government of Malawi’s target of 60 (Ministry of Education, 2021). In addition to being poorly distributed between schools, Malawi’s teachers are also poorly distributed within schools, between grades, with severe under-allocation to lower grades. Average PqTRs range from 83.2 in Grade 1 to 52.1 in Grade 5 and 21.6 in Grade 8 ((Ministry of Education, 2021). These misallocations exacerbate large class sizes in lower grades. Mixed methods analysis conducted during MESIP confirms that these large class sizes constrain Malawi’s teachers from maintaining certain positive teaching practices, including setting and mark- ing homework and being available for additional support after class (Asim and Casley Gera, 2024). Severe shortages of classrooms exacerbate large class sizes. The average school pupil-permanent classroom ratio (PpCR) is 98:1, well above the target of 60:1 (MoE, 2022). Moreover, as with teach- ers, the available infrastructure is inequitably distributed between schools and more than 500 schools have a PCR of more than 160 in lower primary. These extremely poor conditions lead to high rates of grade repetition and dropout. Most primary schools in Malawi require students to achieve a passing mark in end-of-year examinations to progress to the next grade. In 2021/22, repetition rates nationwide averaged 25 percent (MoE, 2022b). Only 62 percent of students enrolling in Grade 1 survive to Grade 5, and only 39 percent to Grade 8 (ibid.). Ac- cording to a recent World Bank report, among 12 sampled countries in sub-Saharan Africa, Malawi’s students are the most likely to repeat early grades, and the least likely to survive to Grade 8 (Bashir et al., 2018). Students who remain in primary school experience poor learning outcomes. At Grade 2 level, fewer than 25 percent of students achieve minimum proficiency levels in the Early Grade Reading As- sessment (EGRA) conducted by the United States Agency for International Development (USAID). At Grade 6, fewer than 25 percent of students achieve minimum proficiency levels in the Southern African Consortium for Monitoring Educational Quality (SACMEQ) assessment in Mathematics, placing Malawi near the bottom in the region (USAID, 2013). Disadvantaged districts. In addition to these overall poor conditions and outcomes, Malawi’s public school system has historically been burdened by wide disparities between districts. PTRs and PCRs vary significantly between districts, with certain districts demonstrating consistent disadvantage com- 6 pared to the others. As described in more detail below, the key MESIP interventions were focused in eight districts5 selected on the basis of longstanding disadvantage in terms of conditions and learning outcomes. Figure 1 demonstrates that eight particular PCRs and PTRs in the eight disadvantaged districts were consistently higher in the years leading up to MESIP (2010-2016) than in 17 comparable rural dis- tricts (the ratio of PCR/PTR in the eight districts to those in the 17 districts was consistently above 1). See Sample for more details on the construction of the selection of the eight districts and the construction of the comparison group. 5 Malawi has 28 districts, the larger of which are subdivided into two or three educational districts for administration of education, for a total of 34 educational districts. For the remainder of this study, we will use the word ‘district’ to refer to educational districts. 7 Figure 1: Eight disadvantaged districts: historical trend 8 Intervention MESIP was a large-scale program of education investment and reforms, implemented by the Min- istry of Education of Malawi and financed by the Global Partnership for Education. A sister program, MESIP-Extended, financed by the Royal Norwegian Embassy, extended the MESIP interventions to additional schools. MESIP was implemented from December 2016 to July 2021. MESIP included a range of interconnected interventions, including: 1. The provision of grants of an average US$1,761 per year to selected public primary schools, to be used to implement a range of strategies to improve promotion for all students in lower grades and reduce dropout for girls in Grades 6-8; 2. Construction of 500 classrooms and more than 300 sanitation blocks; 3. Results-based financing rewarding Government for improvement in school staffing, specifically (i) reduced PqTRs in Grades 1-2 and (ii) increased female-to-male teacher ratios in Grades 5-8; as well as (iii) reduced repetition rates in Grades 1-4; 4. School Leadership Program providing training to 1200 headteachers, along with their deputies and PEAs, to raise skills in resource management, recordkeeping, teacher management and the creation of inclusive school cultures; and 5. ICT-based community information and engagement, providing schools with either (a) monthly visits from zonal EMIS officers to collect data on key performance indicators, disseminated to communities via SMS and printed report cards; or (b) access to an SMS-based system for community dialogue around education and school conditions, practices and outcomes. Of these, interventions 1-3 were focused in eight disadvantaged districts of Malawi, with longstanding disadvantages in staffing, learning environments, and learning outcomes, particularly for girls. Inter- ventions 4 and 5 were implemented in a random selection of schools within all districts of Malawi. In this study, we focus on the impacts of the first three MESIP interventions, which were focused in the eight disadvantaged districts and which were expected to lead to reductions in class sizes. Grants. Malawi already operated a school grants scheme prior to MESIP, called the Primary School Improvement Program (PSIP), which was intended to provide a School Improvement Grant (SIG) of an average US$950 per year to public primary schools to complete activities recorded in a School Improvement Plan (SIP). However, these SIG were often subject to extreme delays in delivery, often not being received by schools until several months into the school year, and in many cases schools would never receive the full amount (Asim and Casley Gera, 2024). Under MESIP, 800 randomly selected schools within the eight disadvantaged districts received an additional grant, paid on time at the beginning of the school year.6 The grants were paid on a per-school basis, with per-learner 6 On-time delivery was achieved through payment of funds to schools direct from the MESIP finance, without use of central and local government budgeting and flow of funds processes which are associated with delays and diversion of funds. 9 top-up payments for schools with more than 1,000 students, and averaged US$1,761 per year. An ad- ditional 400 of these schools were eligible to receive an additional performance-based grant of up to US$1,200 per year, in exchange for improvements in promotion rates. The first grants were received by schools in May 2017, and following this they were received each year in September-November each year near the beginning of the school year. The grant finance was accompanied by a set of guidelines specifying particular strategies for the use of the grant finance to improve promotion for all students in lower grades and reduce dropout for girls in Grades 6-8. The strategies included, among others: hiring auxiliary teachers to reduce class sizes (see below); construction of low-cost ‘learning shelters’ to also reduce class sizes where there were insufficient classrooms; provision of rewards for rapidly improving students; zone-level collaboration between teachers to strengthen skills; and provision of materials for remedial classes for low-performing learners. Primary Education Advisers, sub-district education officials, provided support to schools in the implementation of the guidelines. In addition to the grants and guidelines, a randomly selected subset of 400 schools were selected to receive training on the implementation of the strategies.7 In this study, we focus on the two largest expenditure items for schools receiving the grants: 1. Auxiliary teachers. A key use of the grant finance by schools was to hire ‘auxiliary’ (contract) teachers to reduce class sizes, particularly in lower primary. Auxiliary teachers (ATs) were qualified teachers who had recently completed pre-service training, but were awaiting deploy- ment into the regular teaching workforce. Under MESIP, they were hired by schools under direct contracts with School Management Committees (SMCs) and allocated to reduce large class sizes in lower primary grades. Although hired by schools directly, ATs were hired on standardized contracts with support to their recruitment and management by PEAs. 2. Learning shelters. A second key use of the grant finance was the construction by communities of simple, low-cost classrooms, known as learning shelters. Constructed from brick and with roofs and paved floors, but only partial walls, these shelters used a standardized, low-cost design which was intended to provide adequate shelter for a class at a fraction of the cost of a traditional classroom (around US$5,000). In addition to shelters, schools also constructed changeroom facilities for the menstrual needs of older female students.8 Classroom construction. In addition to the learning shelters produced by schools receiving grants, MESIP supported construction of 500 conventional classrooms, targeted to schools with the 7 These schools were randomly selected as part of crossover design for impact evaluation. Two hundred schools receiving only the main MESIP grant, and 200 also eligible for the performance-based grant, received the training. Training lasted one day and was conducted at zone (sub-district) level by district officials. 8 Change rooms are simple rooms, often with a sink or other handwashing facility such as a bucket, to provide a dedicated place for girls’ menstrual health. 10 highest PCRs. These classrooms were constructed by large-scale construction firms through central- ized contracting. 342 sanitation blocks were also constructed. Results-based financing. In addition to supporting appointment of auxiliary teachers, MESIP also supported improved staffing and class sizes through results-based finance (RBF). As described above, MESIP provided RBF according to improvement in three areas: early grade PqTRs, upper grade female/male teacher ratios, and lower primary repetition rates. A total of US$13.5 million of MESIP’s total US$45 million finance was provided through results-based financing, with US$7 mil- lion tied to the targets relating to staffing: • US$2 million rewarded the preparation of a strategy and action plan to improve the distribution of teachers between schools. • US$2.5 million rewarded the reduction of average PqTRs in Grades 1-2 in the eight disadvan- taged districts, from 166:1 to 132:1 (a 20 percent decrease). • US$2.5 million rewarded the increase in the average female teacher to male teacher ratio in Grades 5-8 in the eight disadvantaged districts, from 0.31 to 0.34, at ten percent increase. Theory of Change We anticipate impacts from the intervention on PTR in schools in the eight MESIP disadvantaged districts, as a result of both the hiring of auxiliary teachers and the improvement in teacher allocation to schools and grades. We expect particular improvement in PTRs in lower primary, as a result of the targeting of the majority of auxiliary teachers to these grades and the priority placed on early grades in the results-based finance. We anticipate impacts from the intervention on PCR as a result of the construction of learning shel- ters. As learning shelters were assigned for use in lower primary classes, we expect the impacts on PCR to be primarily observed in lower grades. We anticipate reduced PTRs and PCRs to lead to reduced class sizes. As a result of these improvements in class sizes, we anticipate improved student learning outcomes and reduced repetition. Repetition is closely associated with learning outcomes in Malawi, with pro- motion decisions in the eight disadvantaged districts being decided directly through performance in end-of-year school testing. Therefore, we expect improvements in repetition rates and learning out- comes to occur in tandem. See Figure 2 for the full theory of change. 11 Figure 2: Theory of Change Identification Strategy We exploit the district targeting of MESIP to use a difference-in-difference approach, comparing the change in outcomes in the eight disadvantaged districts targeted by MESIP with that in the 17 comparison districts. We assume a parallel trend in class sizes and learning outcomes between the disadvantaged and comparison districts, as evidenced by Figure 1. We measure the change in out- comes between 2016, the year in which MESIP began implementation, and 2020, the last full year of MESIP implementation.9 Outcomes of interest Our intermediate outcomes of interest are school-level PTR and PCR. PTR: We report PTR at school level using EMIS data. We report PTR rather than PqTR, with no restrictions on the type of teacher.10 For overall PTR, we sum the total enrollment at each school and divide by the total number of teachers employed at the school. For lower primary PTR, we sum the total enrollment in Grades 1-4 and divide by the total number of teachers employed at the school who teach entirely or partially in these grades.11 9 MESIP implementation continued until July 2021; however, most activities were completed by December 2020 and the final months were used to complete activities delayed as a result of COVID-19. Exact measurement date varies according to data source: see Data. 10 As described, the auxiliary teachers appointed via MESIP SIG were all qualified teachers. Approximately 95 percent of Malawi’s primary school teachers are qualified, however, accuracy of school-level administrative data on the qualification status of teachers has not been consistently found to be accurate. Therefore, for simplicity and the reduction of noise, we employ PTR without specifying the qualifications of teachers. 11 The majority of Malawi’s primary school teachers are class teachers teaching all subjects to one primary class, however, 12 PCR: We report PCR at school level using EMIS data. We report PCR rather than PpCR, with no restrictions on the type of classroom.12 We report PCR school-wide and at lower primary level. For overall PCR, we sum the total enrollment at each school and divide by the total number of classrooms available for use at the school. For lower primary PCR, we sum the total enrollment in Grades 1-4 and divide by the total number of classrooms at the school which are used entirely or partially for these grades.13 Our ultimate outcomes of interest are student repetition rates and test scores. Repetition: We report repetition rates at school level, both the overall rate across grades and the lower primary rate across grades 1-4. We calculate repetition rates from EMIS data. The rate is cal- culated using the number of students in each grade repeating a grade in the most recently completed school year, divided by the total number of students enrolled in that grade for that year; averaged across all grades offered in a school for overall repetition, or across Grades 1-4 for lower primary repetition. Student test scores: We report student test scores at student level, using data from the MLSS learning assessments, which are norm referenced tests that cover competencies from Grade 1 through Grade 6. We employ data from a longitudinal sample of students who were in Grade 4 at baseline and pro- gressed to upper primary grades during the course of MESIP (see Sample). We report student scores in English, Chichewa and Mathematics. Percentage scores in each subject are adjusted using Item Response Theory and converted to knowledge scores mean-centered at 500 points with a standard deviation of 100 points. For more on the MLSS learning assessments, see Appendix A. • Mathematics Knowledge Score: This is measured by the knowledge score of each student in the mathematics test for each school. • English Knowledge Score: This is measured the knowledge score of each student in the English test for each school. • Total Knowledge Score: This is measured by the knowledge score of each student averaged across both subjects for each school. in larger schools teachers may teach multiple grades. Our expansive definition of lower primary PTR includes all teachers teaching at least some of their time regularly in grades 1-4. 12 Although the learning shelters constructed via MESIP SIG are considered permanent classrooms, there is some evidence some were mis-classified as temporary during EMIS data collection; we use PCR to ensure all MESIP-supported and other construction is identified. 13 The majority of Malawi’s primary school teachers are class teachers teaching all subjects to one primary class, however, in larger schools teachers may teach multiple grades. Our expansive definition of lower primary PTR includes all teachers teaching at least some of their time regularly in Grades 1-4. 13 Data We employ both administrative data and data from a large-scale longitudinal survey. Education Management Information System. The Government of Malawi operates an Education Management Information System (EMIS) with data on all public schools in Malawi. The EMIS data, which is collected primary through an Annual School Census (ASC) completed by Head Teachers and zonal EMIS officers, includes data on a wide range of indicators including enrollment, dropout and repetition rates, staffing, availability of infrastructure and textbooks. Because it is available for all public schools, the EMIS data is available for the entire sample of public primary schools in all districts of Malawi. For our difference-in-difference analysis, we compare data from EMIS 2016, collected during the 2015/16 school year prior to the start of MESIP; with data from EMIS 2020, collected during the 2019/20 school year towards the end of MESIP implementation.14 We employ EMIS data for measurement of PTR, PCR, and repetition rates. Malawi Longitudinal Schools Survey. For test scores, we draw on data from the Malawi Longi- tudinal School Survey (MLSS). The MLSS is an independent, nationally representative survey that provides data on students, teachers, schools and school communities with the objective of captur- ing and evaluating the impact of MESIP over 2016-2020. The MLSS is designed to align with the MESIP project cycle to both provide information to the government on implementation and outcomes of the project and to support evaluation of the project. Visits are unannounced.15 Data is collected through observations and interviews – for example, observations of school and classroom facilities; observation of lessons and teaching practices; interviews with Head Teachers, teachers, and members of community committees; and interviews and testing of students in both a cohort and longitudinal sample (see below). Testing is conducted under classroom conditions in a primarily multiple-choice format supervised by the class teacher as well as MLSS enumerators. For more details of the MLSS instruments and procedures, see Appendix A. For our difference-in-difference analysis, we compare the data from MLSS baseline and endline. MLSS baseline data collection was conducted in a sample of 559 schools between May and Septem- ber 2016, just prior to the commencement of MESIP (Asim and Casley Gera, 2024). Endline data collection took place between April 2021 and February 2022, following the completion of the main MESIP activities. 14 EMIS 2020 data is the last collected prior to COVID-19. COVID-19 introduced significant disruption to schooling (see Asim et al., 2022) and to EMIS data collection; we use the last available pre-COVID data to mitigate these potential confounding factors. This is a conservative approach which is expected to potentially underestimate the full extent of impacts from MESIP activities. 15 The first visit to each school is unannounced to help capture the real situation of the school in terms of infrastructure, school performance, school and classroom management practices, student and teacher absenteeism and student learning outcomes. If required to complete all instruments, a second visit is made on an announced or pre-scheduled basis. 14 Sample District sample The eight MESIP disadvantaged districts were selected on the basis of being among the worst-off in the nation in terms of PqTR; female/male teacher ratio; shortages of classrooms; and repetition, promotion, and dropout rates in lower primary. The selection was also conducted in such a way as to ensure the inclusion of at least one school from each of Malawi’s six sub-regional education divisions. Ultimately, the chosen districts were Lilongwe Rural East, and Dedza from the large Central region; Kasungu from the smaller Northern region; and Chikwawa, Machinga, Mangochi, and Phalombe, all from the largest region, Southern. For our comparison sample, we first identify the 26 districts not targeted by the main MESIP in- terventions. We remove the four districts into which these interventions were extended by MESIP- Extended.16 Although MESIP-Extended supported similar interventions to MESIP, there were differ- ences in sequencing and implementation modality which make it difficult to robustly and simultane- ously evaluate both programs. We also exclude Malawi’s four urban districts, as the MESIP interventions were designed for imple- mentation in rural areas. Finally, we also exclude Likoma district, a small island in Lake Malawi with a very particular set of challenges relating to schooling stemming from its particular geography. With these exclusions, we were left with a group of 17 rural education districts as a comparison group. See Table 1 for a summary of the selection. School Sample For indicators drawn from EMIS, our sample is all public primary schools in the eight disadvantaged districts and the 17 comparison districts (a total of 4,428 schools).17 For test scores which are drawn from MLSS, our sample is all public primary schools in the eight disadvantaged districts and the 17 comparison districts which are also part of the MLSS sample and for which baseline and endline data is available (a total of 444 schools). The MLSS sample is constructed using stratified probability proportional to size (PPS) sampling was used, with strata defined based on the six educational divisions. From each stratum, a random sample of schools was selected using PPS, using the number of schools in each stratum as measure of size. The PPS process generated a recommended sample size of 700 schools, 571 within the 25 study districts. From these 16 MESIP-Extended extended the MESIP interventions focused in the eight disadvantaged districts to four more districts which were similarly disadvantaged: Dowa, Mulanje, Nkhotakota, and Rumphi. MESIP-Extended did not extend the RBF incentives to new districts. 17 There were 1,856 schools in the eight disadvantaged districts and 2,572 schools in the 17 comparison districts. We include schools which are present in both EMIS 2016 and EMIS 2020 data. 15 districts, 468 schools were included in the MLSS sample and visited at least once, of which 444 were visited in both the baseline and endline rounds. On average, our sample includes 81 percent of the PPS required schools in each district – an average of 74 percent in the eight disadvantaged districts and 85 percent in the 17 comparison districts (see Table 2). Student Sample For test scores, we report student level scores from a longitudinal sample of students. As the MLSS is a longitudinal study, it employs both a cohort and longitudinal sample for students. A gender- balanced random sample of 25 Grade 4 students per school (13 girls and 12 boys) is selected at baseline. We then select 15 of these at random (8 girls and 7 boys) for retesting at endline. Students who have dropped out or transferred to other schools are traced to the new schools or their homes and complete learning assessment and a modified version of the student interview. Two students are additionally selected as a reserve sample to replace students who have died, left Malawi, or cannot be traced. We expect to trace 90 percent of the selected students at endline.18 We compare the baseline and endline scores of each student to calculate student-level difference in learning. Attrition There is no school level attrition. At student level, our final sample includes data from 6,940 students of an original longitudinal sample of 7,594 for the sampled schools (91 percent). Summary statistics Table 3 shows summary statistics for the key outcomes of interest for the schools in the eight disad- vantaged districts and the 17 comparison districts. Results Implementation of MESIP Auxiliary teachers. In total, 478 ATs (362 male and 116 female) were hired by schools in the eight disadvantaged districts (Ministry of Education, 2021). In most cases, these ATs served for 1-2 years before being deployed into the formal teaching workforce. As a custom, ATs were typically (but not exclusively) hired into the same schools in which they had served as ATs before MESIP closed, formalizing the improvements in staffing gained from their hiring. 18 In addition to this longitudinal sample, at endline the MLSS also selects a cohort sample of 15 students in Grade 4 at endline (8 girls and 7 boys) for testing. In this study, we focus on test scores for the longitudinal sample of students. In addition to testing, students complete a short interview at both baseline and endline. 16 Learning Shelters. In total, 1,345 learning shelters were constructed by schools in the eight dis- advantaged districts. The Government estimates that this construction enabled 79,500 students to move from open-air classrooms to learning in a shelter (Ministry of Education, 2021). The majority of shelters (754) were constructed within the first year of the receipt of grants (World Bank, a). In addition, 542 change rooms were constructed for girls’ menstrual needs. Classroom construction. In addition to learning shelters, MESIP constructed 500 classrooms in the eight disadvantaged districts. The construction was subject to significant delays, with fewer than half (224) constructed by October 2019 and the remainder constructed prior to project closing in July 2021. A total of 342 sanitation blocks were also constructed. Deployment of regular government teachers. Prior to the implementation of MESIP, there had been severe delays in hiring of teachers into the regular government teaching workforce, with teach- ers typically waiting two years after graduation from the IPTE before being hired into schools. In 2017, the government deployed 4,900 new teachers. In response to the results-based financing pro- vided under MESIP, this deployment was substantially targeted towards the disadvantaged districts, with 58 percent of these teachers deployed to these eight disadvantaged districts in a bid to reduce the historical understaffing. In 2018, to reduce the shortage of teachers and the waiting period for IPTE graduates, the government conducted a ‘double deployment’, hiring both IPTE10 and 11 teachers – more than 8,000 in total,19 compared to the typical 4-5,000. Again, this ’double deployment’ was targeted to the eight disadvantaged districts, with around 4,000 teachers allocated to these districts in response to the results-based financing. Results-Based Financing. The Government fully achieved all three of the staffing-related RBF tar- gets.20 A Primary Teacher Management Strategy was completed and approved by Government for use to improve teacher distribution as well as more generally supporting improved teacher hiring, promotion, transfer and management. The PqTR in Grades 1-2 in the eight disadvantaged districts was reduced from 166:1 to 107:1, a reduction of 36 percent. The female/male teacher ratio in Grades 5-8 in the eight disadvantaged districts was increased from 0.31 to 0.37, an increase of 19 percent (Ministry of Education, 2021). Difference-in-difference analysis: Intermediate outcomes Table 4 shows the impact of MESIP on PCR and PTR, drawn from EMIS data. The first row confirms that PCRs and PTRs improved in the comparison group of schools between 2016 and 2020; the second confirms that baseline PCRs and PTRs were higher in the MESIP districts. The third row 19 Both 2017 and 2018 deployments also included approximately 1,500 Open Distance Learning (ODL) graduates. 20 The Government also fully achieved targets relating to the passage of national strategies and plans to improve fe- male/teacher ratio and reduce repetition rates, but only partially achieved the target relating to repetition (reducing the repetition rate from 23.7 percent to 21.4 percent). In total, the government successfully triggered the release of a total US$12.2 million out of a maximum US$13.5 million in results-based financing. 17 shows difference-in-difference results: PCRs in lower primary decreased significantly more rapidly in MESIP districts, by 14.8 pupils per classroom; overall PTRs decreased by 12.3 pupils per teacher; and PTRs in lower primary experienced a large reduction of 21.4 pupils per teacher. Difference-in-difference analysis: Outcomes Repetition Table 5 shows difference-in-difference analysis of the impact of MESIP on repetition rates. We find that repetition rates increased slightly between 2016 and 2020 in both MESIP and non-MESIP dis- tricts. However, we find that the increase in MESIP districts was significantly smaller in MESIP districts in lower primary by 0.7 percentage points. In order to confirm the relationship between class size reduction and repetition, we explore the dy- namics of change in PTR and lower primary repetition rates, exploiting the fact that both PTR and repetition rate are measured using EMIS data which is available on an annual basis. Table 6 shows the year-by-year change in PTR in both MESIP and comparison districts. A particularly rapid reduc- tion in PTRs in MESIP districts occurs between the 2016/17 and 2017/18 school years, reflecting the commencement of MESIP, the allocation of auxiliary teachers to schools, and the targeting of regular deployment of teachers to the eight disadvantaged districts. To more clearly assess the relationship between PTR reduction and repetition rates, Table 7 presents year-on-year, difference-in-difference analyses of repetition rates for a particular subset of treatment and control schools for three time periods: 2017-18 to 2018/19, 2018/19 to 2019/20, and 2019/20 to 2020/21. For treatment schools, for each year-on-year comparison, we limit the analysis to schools in the MESIP districts which had an overall PTR of above 90 in the first year of the comparison, received additional teachers, either auxiliary teachers or new regular government teachers, and as a result had a PTR below 90 in the second year of the comparison. For control schools, we limit to schools in the control districts which obtained one or more additional teachers but did not reduce PTR from above to below 90. This is a lagged analysis, with repetition measured one year following the observed change in PTR. For example, where we report changes in repetition in the period 2017/18-2018/19, the treatment schools are those whose PTR fell below 90 between 2016/17 and 2017/18. This is because repetition rates are established at the end of the academic year when promotion decisions are made, so the first set of promotion decisions reflecting the impact of this change in PTR would be made at the end of the 2017/18 year and reflected in the data for 2018/19. We find that MESIP reduced repetition rates in lower primary by a large amount - 5.5 percent - be- tween 2017/18 and 2018/19, and that this reduction is statistically significant in comparison to control (difference-in-difference). This suggests that the large reduction in PTR observed in 2017/18 had an impact on promotion decisions made at the end of that year, reflecting improvements in student per- formance. 18 Test scores Table 8 shows the impact of MESIP on test scores. Again, the first row confirms that scores increased in the comparison districts and the second shows that scores in the MESIP districts were below those in the comparison districts at baseline. The third row shows difference-in-difference results. In terms of total scores, students in the MESIP districts achieved an increase in scores 39 points greater than those in the comparison districts, equivalent to several months’ additional learning. The average gain in learning for students in the MESIP districts between baseline and endline was 200 points, one-fifth greater than in the comparison districts. The treatment effect is larger than the baseline gap in learning, demonstrating that MESIP enabled students in the eight disadvantaged districts to close the gap with those in the comparison districts. A similar pattern persists with English, Math and Chichewa scores specifically, with Chichewa experiencing the largest impacts from MESIP: students’ average Chichewa scores in the eight districts increased by 190 points, one-third more than in the comparison districts. Gender A number of aspects of the MESIP interventions were expected to particularly support girls’ learning, including the provision of sanitary facilities using grants and the results-based support for improved utilization of female teachers. Table 9 presents difference-in-difference analysis of the impact of MESIP on test scores, interacted with whether a student is female. Here again, we find suggestive ev- idence that the effects of MESIP were greater among female students by approximately ten points (in total scores). However, these impacts are not statistically significant as the analysis is not adequately powered to detect significant impacts (there are 2,914 female students in our sample, 52 percent of the total sample.) Conclusions Our analysis confirms that, where class sizes are extremely large (above 90 students per class), re- ductions in class size through hiring of additional teachers and construction of classrooms can lead to reductions in student repetition rates and improvements in test scores. The provision of a package of interventions to reduce class sizes led to significant reductions in lower primary repetition rates and gains of several months’ learning in Grade 4 test scores. We also find suggestive evidence that the treatment reduced the gap in learning between girls and boys. These findings have important implications in low-income countries, where class sizes above 90 students remain common in lower primary grades. Our analysis also confirms that a coordinated package of interconnected interventions, targeted to districts with longstanding disadvantages in conditions and outcomes, can close the gap in learning with other districts. This also has important implications in low-income countries, where inter-district disparities are a key driver of low overall learning outcomes (Crouch and Rolleston, 2017). 19 In Malawi, with MESIP having substantially addressed inter-district variation in conditions and out- comes, the government has now shifted its investment paradigm to focus on school-level variation. The successor project to MESIP, the Malawi Education Reform Program (MERP), active from 2022 to 2025, scales up the support to hiring of auxiliary teachers and construction of learning shelters introduced under MESIP, providing dedicated finance for auxiliary teachers and construction of low- cost classrooms (using a model adapted from the MESIP learning shelter, with an improved design). Under MERP, this support is targeted specifically to schools with severe overcrowding in lower pri- mary (defined as PqTR or PCR above 90), regardless of district. MERP aims to reduce the share of schools with severe shortages of teachers in lower primary from 70 percent to 30 percent, and the share of schools with severe shortages of classrooms from 61 percent to 30 percent, by 2025. 20 References Altinok, N. and G. Kingdon (2011). New Evidence on Class Size Effects: A Pupil Fixed Effects Approach. Oxford Bulletin of Economics and Statistics 74, 203–234. Angrist, J. and V. Lavy (1999). "Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement". The Quarterly Journal of Economics 114, 553–575. Angrist, J., V. Lavy, J. Leder-Luis, and S. Adi (2017). "Maimonides’ Rule Redux." NBER Working Paper No. 23486. Asim, S. and R. Casley Gera (2024). "What Matters for Learning in Malawi? Evidence from the Malawi Longitudinal School Survey". Bashir, S., M. Lockheed, E. Ninan, and J.-P. Tan (2018). Facing Forward: Schooling for Learning in Africa. Bold, T., M. Kimenyi, G. Mwabu, A. Ng’ang’a, and J. Sandefur (2018). Experimental evidence on scaling up education reforms in kenya. Journal of Public Economics 168. Cartwright, N. (2011). A philosopher’s view of the long road from rcts to effectiveness. The Lancet 377. Crouch, L. and C. Rolleston (2017). "Raising the Floor on Learning Levels: Equitable Improvement Starts with the Tail". https://riseprogramme.org/sites/default/files/publications/RISE%20Equity%20Insight%20UPDATE.pdf. Accessed 1 December 2022. Datta, S. and G. Kingdon (2021). "Class Size and Learning: Has India Spent Too Much on Reduc- ing Class Size?" RISE Working Paper 21/059. https://riseprogramme.org/sites/default/files/2021- 01/RISEW P − 0059D attaK ingdon. pd f . Accessed1February2022. Deaton, A. and N. Cartwright (2018). Understanding and misunderstanding randomized controlled trials. Social Science Medicine 210, 2–21. Duflo, E., P. Dupas, and M. Kremer (2015). "School governance, teacher incentives, and pupil-teacher ratios: Experimental evidence from Kenyan primary schools.". Journal of Public Economics 123. Hoxby, C. (2000). "The Effects of Class Size on Student Achievement: New Evidence from Popula- tion Variation". The Quarterly Journal of Economics 115, 1239–1285. Kreuger, A. (1999). "Experimental Estimates of Education Production Functions". The Quarterly Journal of Economics. Ministry of Education (2020). Malawi Education Statistics 2019/20. Ministry of Education (2021). Project Completion Report: Malawi Education Sector Improvement Project. Mimeo. 21 Ministry of Education (2022). Malawi Education Statistics 2021/22. MoEST (2015). Education Management Information System 2014-15 [database]. Mulkeen, A. (2010). Teachers in Anglophone Africa: Issues in Teacher Supply, Training, and Management. World Bank Publications. Muralidharan, K. and V. Sundararaman (2013). "Contract Teachers: Experimental Evidence from India.” NBER Working Paper No. 19440. Pritchett, L. (2019). Randomizing development: Method or madness? Mimeo. USAID (2013). "Malawi National Early Grade Reading Assessment Survey: Final Assessment". https://ierc-publicfiles.s3.amazonaws.com/public/resources/Madagascar%20national%20EGRA.pdf. Accessed 1 December 2022. Vivalt, E. (2020). How Much Can We Generalize fom Impact Evaluations? Journal of the European Economic Association. Walter, T. (2018). "Misallocation of State Capacity?" PhD Thesis. http://etheses.lse.ac.uk/3852/1/Walterm isallocation−o f − state−capacity. pd f . Accessed1December2022. World Bank. Malawi Education Sector Improvement Project (MESIP) Mid-Term Review, Aide Mem- oire. Mimeo. World Bank. World Development Indicators: Persistence to Grade 5. https://data.worldbank.org/indicator/SE.PRM.PRS5.ZS. Accessed 1 December 2022. World Bank (2015). Project Appraisal Document for a Malawi Education Sector Improvement Project. 22 Tables 23 Table 1: District Sampling Division District PQTR Classrooms Promotion Repetition Dropout Sample required rate rate rate (Std 1-4) (Std 1-4) (Std 1-4) Dowa 131.3 908 68.7 25.5 5.8 ** Kasungu 138.1 1217 69.3 23.1 7.6 MESIP Central Eastern Nkhotakota 136.2 746 64.4 29.9 5.7 ** Ntchisi 107.9 435 64.3 24.7 11.0 Comp Salima 127.8 587 64.9 26.8 8.3 Comp Dedza 161.2 1087 66.0 26.7 7.4 MESIP Lilongwe City 109.0 713 80.2 18.4 1.4 * Central Lilongwe RE 178.9 817 71.9 17.7 10.5 Comp Western Lilongwe RW 137.7 1016 69.2 22.0 8.8 MESIP Mchinji 127.6 969 66.0 28.2 5.7 Comp Ntcheu 125.9 760 68.5 27.8 3.7 Comp Chitipa 76.4 519 72.8 23.4 3.7 Comp Karonga 105.1 799 71.3 26.4 2.3 Comp Likoma 49.3 8 69.7 30.3 0.0 * Mzimba North 86.8 689 69.5 26.1 4.4 Comp Northern Mzimba South 101.3 858 71.0 21.3 7.7 MESIP Mzuzu City 87.2 242 80.6 18.6 0.8 * Nkhata Bay 83.4 555 70.4 27.3 2.3 Comp Rumphi 73.7 507 72.4 21.1 6.5 ** Chiradzulu 117.2 516 70.9 28.0 1.1 Comp Shire Mulanje 144.4 1137 73.4 23.6 3.0 ** Highlands Phalombe 136.3 499 70.6 22.1 7.2 Comp Thyolo 118.4 1258 68.4 27.3 4.3 MESIP Balaka 127.5 650 63.9 28.0 8.1 Comp Machinga 173.4 916 68.0 23.8 8.2 MESIP Southern Eastern Mangochi 191.5 1158 62.7 25.5 11.8 MESIP Zomba Rural 145.5 930 69.6 23.8 6.6 Comp Zomba Urban 65.0 107 81.1 18.2 0.7 * Blantyre City 104.2 422 82.2 15.9 1.8 * Blantyre Rural 110.4 651 74.7 20.3 5.0 Comp Southern Chikwawa 167.4 701 70.7 22.4 6.9 MESIP Western Mwanza 125.7 201 67.8 31.0 1.2 Comp Neno 114.5 204 69.6 30.4 0.0 Comp Nsanje 141.3 245 70.2 18.4 11.4 Comp Notes: Yellow highlight denotes MESIP eight disadvantaged districts. Green highlight denotes compar- ison districts. * denotes urban district/Likoma. ** denotes MESIP-Extended district. 24 Table 2: School sample District Total schools PPS MLSS Represent (EMIS) sample size sample size -ativeness (%) Chikwawa 176 21 20 95% Dedza 236 39 25 64% Kasungu 338 46 33 72% Lilongwe Rural West 241 28 28 100% MESIP eight disadvantaged districts Machinga 161 24 11 46% Mangochi 259 35 19 54% Mzimba South 302 32 27 84% Thyolo 179 26 19 73% Subtotal 1892 251 182 81% Balaka 154 17 15 88% Blantyre Rural 157 18 13 72% Chiradzulu 88 13 13 100% Chitipa 170 26 14 54% Karonga 167 19 19 100% Lilongwe Rural East 204 28 25 89% Mchinji 196 19 11 58% Mwanza 45 9 9 100% Comparison Mzimba North 259 32 23 72% 17 districts Neno 71 9 9 100% Nkhata Bay 188 25 20 80% Nsanje 104 12 12 100% Ntcheu 237 29 26 90% Ntchisi 143 18 15 83% Phalombe 88 11 10 91% Salima 139 16 13 81% Zomba Rural 193 19 15 79% Subtotal 2603 320 262 85% Total 4495 571 468 84.9 25 Table 3: Lower Primary Summary Statistics and T-test differences in MESIP and Non-MESIP districts by year MESIP districts Non-MESIP districts EMIS 2015-16 EMIS 2019-20 Difference EMIS 2015-16 EMIS 2019-20 Difference Mean/(SD) Mean/(SD) T-test/(SE) Mean/(SD) Mean/(SD) T-test/(SE) Enrollment 625.73 647.00 21.27 536.91 542.26 5.35 (425.60) (399.73) (13.55) (382.41) (382.85) (10.67) Teachers 5.68 7.76 2.08 ∗ ∗∗ 5.79 6.61 0.82 ∗ ∗∗ (4.30) (5.07) (0.15) (4.67) (4.61) (0.13) PCR 159.94 137.12 −22.82 ∗ ∗∗ 131.02 122.99 −8.03 ∗ ∗∗ 26 (102.78) (87.93) (3.14) (90.76) (84.49) (2.45) PTR 128.46 88.24 −40.22 ∗ ∗∗ 103.97 85.14 −18.83 ∗ ∗∗ (73.86) (34.15) (1.89) (60.56) (37.79) (1.41) Repetition Rate 0.25 0.26 0.01 ∗ ∗∗ 0.26 0.28 0.02 ∗ ∗∗ (0.12) (0.10) (0.00) (0.11) (0.11) (0.00) Dropout Rate 0.05 0.05 0.01 ∗ ∗ 0.03 0.03 −0.00 ∗ ∗ (0.07) (0.07) (0.00) (0.05) (0.05) (0.00) Schools (N) 2572 - 1856 - Significance levels: * p<0.10, ** p<0.05, *** p<0.01 Table 4: MESIP Impact on PCR and PTR between 2016 and 2020 (1) (2) (3) (4) PCR PCR Lower Primary PTR PTR Lower Primary Change from Baseline -4.39∗∗ -8.03∗∗ -12.07∗∗∗ -18.83∗∗∗ to Endline (Time) (1.65) (2.45) (0.86) (1.41) Schools in MESIP districts 18.66∗∗∗ 28.92∗∗∗ 14.92∗∗∗ 24.49∗∗∗ (Treatment) (2.00) (2.98) (1.22) (2.09) Time x Treatment -3.34 -14.79∗∗∗ -12.32∗∗∗ -21.39∗∗∗ (2.72) (3.98) (1.41) (2.36) Constant 105.37∗∗∗ 131.02∗∗∗ 79.65∗∗∗ 103.97∗∗∗ (1.20) (1.79) (0.71) (1.19) All schools 4428 Treatment [MESIP districts] 1856 Control [Non-MESIP districts] 2572 Note: ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 27 Table 5: MESIP Impact on repetition rates between 2016 and 2020 (1) (2) Repetition Rate Repetition Rate Lower Primary Change from Baseline to Endline (Time) 0.016∗∗∗ 0.023∗∗∗ (0.003) (0.003) Schools in MESIP districts (Treatment) -0.011∗∗∗ -0.006 (0.003) (0.004) Time x Treatment -0.007 -0.010∗ (0.004) (0.005) Constant 0.251∗∗∗ 0.257∗∗∗ (0.002) (0.002) All schools 4428 Treatment [MESIP districts] 1856 Control [Non-MESIP districts] 2572 Note: ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 28 Table 6: Lower Primary Summary Statistics: Schools in MESIP and Non-MESIP districts EMIS 2015-16 EMIS 2016-17 EMIS 2017-18 EMIS 2018-19 EMIS 2019-20 Non-MESIP MESIP Non-MESIP MESIP Non-MESIP MESIP Non-MESIP MESIP Non-MESIP MESIP Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Mean/[S.D.] Enrollment 536.91 625.73 551.66 644.63 549.82 670.43 547.41 665.38 542.26 647.00 29 [382.41] [425.60] [393.24] [393.87] [383.94] [410.04] [378.71] [405.93] [382.85] [399.73] Teachers 5.79 5.68 6.08 6.24 6.54 7.34 6.75 7.77 6.61 7.76 [4.67] [4.30] [4.79] [4.48] [4.73] [4.68] [4.82] [5.01] [4.61] [5.07] PTR 103.97 128.46 98.51 117.62 88.72 97.60 84.47 90.72 85.14 88.24 [60.56] [73.86] [50.80] [60.25] [42.24] [42.35] [35.88] [34.75] [37.79] [34.15] Schools (N) 2572 1856 Table 7: Lagged analysis: Repetition Rate in Lower Primary Period Period Period 2017-18 / 2018-19 2018-19 / 2019-20 2019-20 / 2020-21 Control 0.283 ∗ ∗∗ 0.283 ∗ ∗∗ 0.291 ∗ ∗∗ (0.013) (0.009) (0.010) Time 0.000 −0.002 −0.057 ∗ ∗∗ (0.018) (0.014) (0.014) Treatment 0.006 −0.057 ∗ ∗∗ 0.023 (0.015) (0.019) (0.029) DiD (Treatment and Time) −0.055 ∗ ∗ −0.015 −0.014 (0.021) (0.024) (0.035) Control [Non-MESIP] 80 151 164 Treatment [MESIP] 214 48 26 Note: Robust standard errors in parentheses. Schools in the treatment group are in MESIP districts, received additional teachers in the first year of comparison, and reduced their overall PqTR from above 90 in the first year of comparison to below 90 in the second year. Schools in control group are in 17 comparison districts, received additional teachers in the first year of comparison, but did not reduce their overall PqTR from above 90 in the first year of comparison to below 90 in the second year. * p<0.10, ** p<0.05, *** p<0.01 30 Table 8: MLSS Knowledge scores in 8 MESIP vs 17 Non-MESIP districts (1) (2) (3) (4) Total English Math Chichewa Change from Baseline to Endline (Time) 160.58∗∗∗ 176.53∗∗∗ 162.19∗∗∗ 142.25∗∗∗ (2.45) (2.68) (2.81) (2.86) Schools in MESIP districts (Treatment) -27.98∗∗∗ -26.63∗∗∗ -30.83∗∗∗ -27.13∗∗∗ (2.28) (2.35) (2.64) (3.04) Time x Treatment 39.26∗∗∗ 29.87∗∗∗ 41.21∗∗∗ 48.10∗∗∗ (3.82) (4.12) (4.36) (4.52) Constant 425.20∗∗∗ 418.62∗∗∗ 423.52∗∗∗ 432.82∗∗∗ (1.46) (1.53) (1.67) (1.92) Students: In treatment | In control 2255 | 3352 Schools: In treatment | In control 182 | 262 Note: Harmonized scores ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 31 Table 9: MLSS Knowledge scores in treatment districts interacted with time and student gender (1) (2) (3) (4) Total English Math Chichewa Change from Baseline to Endline (Time) 161.67∗∗∗ 178.65∗∗∗ 164.66∗∗∗ 141.05∗∗∗ (3.57) (3.89) (4.09) (4.13) School in MESIP districts (Treatment) -22.32∗∗∗ -17.33∗∗∗ -28.51∗∗∗ -20.88∗∗∗ (3.28) (3.34) (3.83) (4.38) Female student -5.03 -1.10 -13.23∗∗∗ -0.46 (2.93) (3.06) (3.34) (3.83) Time x Treatment 33.81∗∗∗ 21.45∗∗∗ 37.32∗∗∗ 43.03∗∗∗ (5.58) (5.99) (6.37) (6.61) 32 Time x Female student -2.01 -3.99 -4.64 2.40 (4.92) (5.37) (5.62) (5.73) Treatment x Female student -10.62∗ -17.61∗∗∗ -4.08 -11.85 (4.56) (4.69) (5.27) (6.09) Time x Treatment x Female student 9.98 15.66 7.05 9.20 (7.65) (8.24) (8.72) (9.05) Constant 427.79∗∗∗ 419.19∗∗∗ 430.33∗∗∗ 433.06∗∗∗ (2.09) (2.16) (2.43) (2.71) Students: In treatment | In control 2255 | 3352 Schools: In treatment | In control 182 | 262 Note: Harmonized scores ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Appendices A. Malawi Longitudinal School Survey The Malawi Longitudinal School Survey (MLSS) collects extensive data on school, classrooms, teachers, Grade 4 students, community members and parents. This Appendix provides summary details. For additional details, see section . Instruments The survey contains the following instruments: 1. Observation of school and classroom facilities 2. Lesson observation 3. Head Teacher interview, including details of teachers and committee members; information about their background and procedures; and school information from records 4. Student interview 5. Student learning assessment 6. Teacher interview 7. Teacher knowledge assessment 8. Community interviews with members of the School Management Committee, Parent-Teacher Association Executive Committee, and Mother Group 9. Group Village Headman interview All instruments were included in all rounds, except the Group Village Headman interview, which was not included in the 2016 phase of baseline.21 . The MLSS instruments are based on similar tools used as part of the Service Delivery Indicators (SDI) survey implemented by the World Bank. The SDI instruments were adapted with additional indicators which were appropriate for the Malawian context and/or specific to the MESIP program and related impact evaluations. Learning assessments:: MLSS includes learning assessments in English and Mathematics. These subjects not only allow for capturing of students’ literacy and numeracy skills but also allow to test on wide range of cognitive skills. The curriculum of these subjects is relatively standardized across schools. The inclusion of Mathematics and English also makes the test comparable with other stan- dardized international tests. The assessments are targeted to Grade 4 students but contain items aligned with the curricula for Grades 1-6. The test was designed by a psychometrician in collabo- ration with experts who had prior experience in designing similar tests and were familiar with the primary school syllabus of Malawi. Teachers in government schools were also consulted during the design and formulation of the test items so as to keep the structure of questions as close as possible to 21 A Group Village Headman is an intermediary-level official in Malawi’s Traditional Authority structure, broadly analogous to a Village Chief 33 textbooks. Items were developed in reference to international tests and adapted for Malawian context. Student percentage scores are converted to a mean-centred scale, centred at 500, using Item Response Theory. Sampling The sampling frame for the MLSS was derived from the most up-to-date list of schools available prior to baseline, the 2015 EMIS (MoEST, 2015).22 Twenty percent of urban schools that were mainly concentrated in the four major cities of the country (i.e. Blantyre, Lilongwe, Mzuzu and Zomba cities) were randomly selected. For rural schools, a stratified probability proportional to size (PPS) sampling was used, with strata defined based on the six educational divisions. From each stratum, a random sample of schools was selected using PPS, using the number of schools in each stratum as measure of size23 . At the next stage, for districts that had few schools selected using the first round of PPS sampling, random oversampling was conducted to increase the final number of schools in each district to about 24. This oversampling allows district specific analysis. The urban and the rural samples were then combined to form the final survey sample of 924 schools. As the MLSS is a longitudinal study, it employs both a cohort and longitudinal sample for stu- dents. A gender-balanced random sample of 25 Grade 4 students per school (13 girls and 12 boys) is selected at baseline. We then select 15 of these at random (8 girls and 7 boys) for resurvey at end- line. Students who have dropped out or transferred to other schools are traced to the new schools or their homes and complete learning assessment and a modified version of the student interview. Two students are additionally selected as a reserve sample to replace students who have died, left Malawi, or cannot be traced. In addition, a new cohort of 15 Grade 4 students is surveyed at endline. For teachers, a primarily longitudinal sample is used. Ten teachers per school are selected at baseline, using a protocol which ensures representation of lower and upper primary and of female and male teachers while maintaining random selection. All of the sampled teachers are resurveyed at endline if eligible. Teachers who have transferred to new schools are tracked and administered a modified version of the interview, while those who have died, left Malawi, left teaching, or cannot be traced are replaced with teachers who have more recently joined the school or were not selected at baseline. B. Supplementary Data Supplementary data associated with this article can be found in the Online Annex at: https://bit.ly/MESIP- Overall-Evaluation-Annex. 22 The original sample frame contained 5,738 schools with identifier variables such as division, district and zone. 323 pri- vate schools were removed from the sample frame, which leaves the frame with and overall of 5415 primary schools subordinated to government or religious agencies. 23 Number of schools is used as a measure of stratum size instead of enrollment as the EMIS enrollment data was found unreliable in many instances. 34