Youth Development N o t e s 41270 Evaluating Youth Interventions Youth development projects aim to improve the lives and livelihoods of young people around the world. Interventions for youth are often multi-sectoral in nature, ranging from job- and life-skills development to programs for better health and nutrition. Rigorous impact evaluation is key to producing the knowledge base required by policymakers and practitioners to choose among different options, and implement the most cost effective projects. This note outlines some approaches to producing evidence of what works in the context of youth development projects, and looks at expanding the set of outcome indicators to more fully capture the effects of these projects on the welfare of young people around the world. Today's youth (15­24) constitute the largest cohort ever to enter the transition to adulthood. Nearly 90% live in developing countries and the challenges they face--low quality education, lack of marketable skills, high rates of unemployment, crime, early pregnancy, social exclusion, and the highest rates of new HIV/AIDS infections--are costly to themselves and to society at large. Client demand for policy advice on how to tap the enormous potential of youth is large and growing. This series aims to share research findings and lessons from the field to address these important cross-sectoral topics. Volume II, Number 5 June 2007 www.worldbank.org/childrenandyouth the challenge of incorporating impact X on Y?" For example: what is the effect of a youth training evaluation into youth projects program on employment? Ideally, this would be estimated by comparing the employment status of an individual with . . . few solid evaluations of youth programs in and without the training program at the same point in time. developing countries unambiguously identify the Given that we will never observe the same individual in two different states at the same time, impact evaluation must causality from policy to program to effect . . . attempt to construct a plausible alternative for comparison, or many (youth) programs fall into the promising but counterfactual: that is "what would have happened to the youth unproven camp . . . without the training program?". As depicted in Figure 1, the --World Development Report 2007 (WDR07), program impact is the difference between the observed outcome (the continuous line) and an estimate of the outcome had no Development and the Next Generation program been offered (the dashed line--i.e. the counterfactual). The evaluation of youth development projects poses special Counterfactuals are estimated using control groups, that is, challenges, both conceptual and the logistical, particularly a group of individuals who do not participate in a program. if they are multi-sectoral. Youth development projects are Identifying a valid counterfactual is critical to good impact often diffuse in nature and scope, extend over a long period evaluation. Typically, identifying the control group entails of time, vary widely across applications, and have outcomes determining why one group of individuals was treated and the across a range of sectors. These challenges must be addressed other was not. Doing this retrospectively can be challenging, in an evaluation to ensure that causality is well established and especially if the two groups were not randomly selected. There that outcomes are adequately measured. For example, when may be unobserved differences between those in the treatment looking at the effects of a youth intervention on employment, group and and those in the control that affect the outcome, and we know that obtaining a job is also a function of health and this will confound measurement of the impact of treatment. schooling. Alternatively, we may want to know if discouraging When working prospectively in the planning phase of an girls from early marriage is more effective when girls are in intervention, one can either explicitly select--or preferably school. We can use impact evaluation to isolate the impact of randomly assign--individuals into treatment and control groups. any one component of a youth intervention, test the optimal combination of interventions in different contexts, or look at Identifying a control group potential spillover effects across populations. By knowing Who is eligible, Where the intervention will go Although evaluations are usually narrowly defined, the and When the intervention will be delivered, we can identify a multifaceted nature of youth transitions means that control group that can be used to estimate a valid counterfactual interventions can have unexpected outcomes. Recent evaluations for the estimation of a program's impact. By working within the of youth programs use a wider range of outcome indicators context of program planning and operations, we can minimize to capture these different impacts. For example, interventions the ethical concerns that may arise by denying treatment to focusing on education have also been shown to affect risky the control group. For example, in the early stages of program behavior: conditional cash transfers may reduce alcohol and implementation, budgetary and logistical constraints usually smoking, early child development may reduce crime, violence limit the number of eligible youth or groups than can receive and teen pregnancy, and additional schooling may lower the the intervention. Everyone who is eligible will receive the incidence of teen pregnancy and HIV/AIDS (1). intervention, just not all at the same time. When a project can't When considering the evaluation of youth projects there are also go everywhere at the same time, managers must use some rule a number of logistical considerations to keep in mind. Young to determine where the project will begin and how it will scale people are exceptionally mobile, and it is important to make up. Provided we understand the scaling-up rules, the individuals provisions to track individuals in the evaluation sample over who do not receive the intervention in the early stages can time. Similarly, when interviewing minors, issues of parental provide valid controls for those who do. consent are important, while at the same time providing the Suppose that 100 localities are identified as the areas of highest necessary safeguards to protect the young person's privacy. youth unemployment, but budgetary and logistical constraints The remainder of this note outlines key aspects of an impact only permit coverage of a training program in 50 localities evaluation design, and considers a number of issues that are during the first year. One fair way to assign the benefit is to unique to the evaluation of youth development projects. give each locality an equal chance of receiving the benefit; for example, by using a lottery to select the localities that will elements of effective impact evaluation receive the intervention this year. In that case, the localities that design will receive the program in the future serve as a counterfactual control group to the localities that receive the program in the An impact evaluation design allows us to isolate the effect of first year. On average, there will be no differences (observed a youth development program on a given outcome, or to test or unobserved) between the two groups before the program the optimal combination of interventions in different contexts. is rolled out, and assignment to treatment and control groups Impact evaluation helps us understand "what is the effect of is by design unrelated to any characteristics of the localities. Youth Development Notes | June 007 Therefore, differences in outcomes between the two groups following program implementation can be attributed to the Box . A Counterfeit Counterfactual causal effect of the program, since the only difference between the groups is that one received training and the other did not. A commonly used counterfactual that may produce misleading re- sults is the comparison of the same individual before and after the When randomization is not possible, other good options for intervention. For example, say that the youth employment rate in a identifying valid control groups can be found using program given community is 0%. A training program enters the communi- eligibility rules. For example, interventions are often targeted to ty and trains young people in employable skills. After the training, it is observed that youth employment has increased to 60%. Was groups or individuals that meet certain criteria, such as poverty: the training program a success? If, for example, positive economic those with incomes just below the threshold are eligible, while growth also affected youth employment over the same period, the those just above are ineligible. Arguably, pre-intervention contribution of the program to the increase in employment may be differences between two individuals with incomes on either side only minor or zero. On the other hand, in the case of an eco- of the threshold are very small, and differences in outcomes after nomic recession over the same period, youth employment would have been much lower, and so the simple change in employment the intervention can be largely attributed to the intervention itself. rates will underestimate the true impact of the program. Thus, just comparing an outcome for the same individual before and after the Identifying relevant outcomes and introduction of a program may lead us to erroneous conclusions about a program's success. indicators Program activities produce outputs, and the resulting changes observed in the beneficiaries is the outcome. For example, in the case of vocational training, an outcome is employable skills, while an output is receiving the training. Outcomes are observed characteristics of the beneficiary, and not of the program; and whether short or long term, should have measurable proxies or Box 1. Outcome indicators indicators. Let's consider a job training program in a post conflict setting char- Any evaluation should have some idea how and why the acterized by inter-ethnic conflicts, designed to increase employ- intervention leads to the expected outcomes. Impact evaluation ment skills, as well as reduce inter-ethnic conflict, and risky behav- iors, while promoting tolerance, and civic participation. Direct and should include a review of program implementation, or a indirect expected outcomes to consider are as follows: "process evaluation," to understand this chain of events. Some Direct Outcomes programs do not work because planned activities are not carried 1. Competencies in training program skills (e.g., basic business, out as planned. When a program is poorly implemented, there finance, and accounting knowledge) may not be a great need to delve deeply into all the hypothesized . Business activities (e.g., size, profitability, employment, youth causal links in the chain. employed, revenues and revenue growth, return on invest- The selection of relevant outcome indicators is a critical step ment, etc.) in the design of an impact evaluation, and should be guided by . Credit and capital activities (sources of credit and equity the logical framework that connects program activities to direct raised) outcomes. These direct outcomes may in turn lead to other, 4. Economic status (e.g., employment, wages, days employed, average earnings, asset ownership) more indirect outcomes. Examples of measurable outcomes are described in Box 1. 5. General skill competency (e.g., literacy test score, numeracy test score, English language skills test score, computer skills Frequently, the diversity of objectives of an intervention makes test score) selecting valid indicators difficult. For example, projects that aim Indirect outcomes at providing skills may have a direct impact on competencies 6. Risky behaviors (e.g., school absenteeism or dropout, inactiv- and employment, but may also have equally important indirect ity - neither in school nor in work, substance use, early sexual impacts on reducing risk behaviors. It is necessary to anticipate initiation, unsafe sexual practices, criminal behavior) both direct and indirect outcomes, keeping in mind that direct 7. Violent behavior (e.g., hostility, participation in fights, car- outcomes may not always be the most relevant from a social and rying a weapon, participation in riots or violent protest, attitudes towards the use of violence) policy perspective. 8. Ethnic and religious attitudes (e.g., ethnic and religious tolerance, ability to articulate another ethnic group's point of Collecting data for the evaluation of view) youth development programs 9. Political and community participation (e.g., membership in community groups, civic participation, participation in The success and reliability of an evaluation rests heavily on peaceful protests, political extremism) the quality of the data used. Since primary data collection can represent the lion's share of an evaluation budget (2), . The outlined outcome indicators have been defined for the impact data collection strategies need to be carefully considered. evaluation of the Post Conflict Fund for Kosovo Youth Development (in particular for the Business Development for Young Entrepreneurs Samples should be representative of the target population component). For further information, please contact Silvia Paruzzolo and include sufficient sample sizes to detect the desired effect (sparuzzolo@worldbank.org) size (power calculations can help determine required sample sizes). Survey methods should also be carefully considered, Youth Development Notes | June 007 especially when collecting sensitive data. For example, in the case of risk behavior outcomes, experience suggests that audio Figure 1. Outcome level, Outcome change and or computer assisted self-interviewing (CASI) can help young program effect (impact) people to discuss candidly a range of sensitive and potentially embarrassing subjects.1 Compared to face-to-face interviews, both Audio-CASI and self-administered questionnaires have Outcome status with program been shown to provide better prevalence estimates of youth risk behavior in culturally conservative societies, especially for Post-program outcome level particularly stigmatized or legally sanctioned behaviors (4). Program effect hangec Measuring all the program's costs and Outcome status without program benefits Outcome Finally, for the purpose of informing policy decisions, an evaluation is not complete until one considers the costs of the Pre-program outcome level program. Impact is only one criterion for program selection. The program must be effective in both a statistical or clinical Before During After sense and an economic sense; the most effective program may program program program not be the most cost-effective one. An intervention may have a Source: Adapted from Rossi et al. (004) profound impact on participants, but if it is extremely expensive, it may not make sense to implement or continue it. It may be preferable to select a program that has a smaller impact, but is much less costly. References and Recommended Reading This highlights the importance of measuring all of the costs and all of the benefits of a given program. Some benefits, and some costs, may not become apparent until some time after the (1) World Bank. 006. World Development Report 2007: De- intervention. And as noted above, a program's benefits may be velopment and the Next Generation. New York: Oxford Uni- unrelated to its original goals. Similarly, the program may have versity Press. Also downloadable at: http://web.worldbank. social costs as well as financial costs, and all of the resources org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/ 0,,menuPK:846~pagePK:149018~piPK:14909~theSiteP used will have shadow costs--that is, even volunteers to a K:849,00.html. program have potential alternative uses, and it is the job of the () Rossi P.H., Lipsey M.W., Freeman H.E. 004. "Evaluation ­ A evaluation to determine whether the intervention presents the systematic approach" (7th ed.), Thousand Oaks, Calif.: SAGE best feasible use of these scarce resources. Finally, the program's Publications. average costs may not be a good indicator of marginal costs-- () Baker J. 000. "Evaluating the Impacts of Development Proj- that is, what it will cost to scale up the program. ects on Poverty: A Handbook for Practitioners." Washington, D.C.: The World Bank. (4) MacMillan H.L. 1999, "Computer survey technology: a win- dow on sensitive issues", CMAJ Specialty Spotlight. · Duflo E., Glennersterzand R., Kremer M. 006. "Using Randomization in Development Economics Research: A Toolkit". Also downloadable at http://www.povertyactionlab. com/papers/Using%0Randomization%0in%0Developme nt%0Economics.pdf. · The Poverty Lab webpage (http://www.povertyactionlab. com/). · The World Bank impact evaluation webpage (http://web. worldbank.org/WBSITE/EXTERNAL/TOPICS/ EXTPOVERTY/EXTISPMA/0,,menuPK:846~ pagePK:149018~piPK:14909~theSitePK:849,00.html.) 1. With AUDIO-CASI, prerecorded questions are presented through headphones and on a computer screen. Answers are given using numbered keys on a computer keyboard. This obviates the need for interviewers, but given the costs of the technology may not reduce overall survey costs. Children & Youth Unit, Human Development Network, The World Bank www.worldbank.org/childrenandyouth This note was prepared by Silvia Paruzzolo, M&E Specialist (HDNCY), Sebastian Martinez, Economist (AFTRL), Luisa Sigrid Vivo, Economist (HDNVP), Linda McGinnis, Lead Economist (HDNCY), Mattias Lundberg, Senior Economist (HDNCY) and Paul Gertler, Professor of Economics at the University of California Berkeley. Photo credit: Ray Witlin. The views expressed in these notes are those of 4 the authors and do not necessarily reflect the view of the World Bank or their respectiveYouth Development Notes | June 007 institutions.