Exercise 5: Improving power In the situations described below, think about the following questions: (i) Will the evaluation design provide adequate statistical power? (ii) If not, how can we improve upon this design to achieve our desired level of statistical power? (iii) What are the implications of our design improvement in terms of data requirements and costs? 1. Suppose your government wants to evaluate a voucher program for preprimary education. In particular, you want to know if the voucher succeeds in increasing access to preprimary education among poorer populations and if it contributes to increasing school readiness among children. The chosen evaluation design involves randomly assigning 5 communities to a group that receives the vouchers and 5 communities to a group that does not receive the voucher. This design with 5 communities in each treatment group is unlikely to give us adequate statistical power for both outcomes of interest. We can improve upon this design either by (a) increasing the number of communities that the voucher is offered to, or by (b) randomizing at a smaller unit like a household or a smaller neighborhood that might lie within the community. If the number of communities assigned to each experimental group is increased, intervention costs might increase as it might more expensive to work in more communities. If all eligible households are offered the voucher in all program communities, then intervention costs will certainly increase. Survey costs could go up or down. On the one hand, an adequately powered design with the community as the unit of randomization and only 5 communities would require a very large sample of households, maybe even more households than are currently living in the communities. Thus, going to more communities will decrease the total number of households in the study and thus decrease the number of surveys needed. On the other hand, it is costlier to go to multiple communities to collect data. The net effect will depend on whether the increased costs of doing data collection in more communities are outweighed by the decrease in costs from interviewing fewer households. Changing the unit of randomization to the household will decrease the total number of households required for the study. If the program is only going to households in the study, then intervention costs and evaluation costs will both go down. 2. Now suppose from some field tests, you observe that take-up of the vouchers might be lower than you expected. Specifically, for every 100 vouchers offered, only 50 households might use them. Lower expected take-up decreases statistical power. Impact evaluations estimate the average effect of a program over the entire sample that gets assigned to the treatment group. Therefore, lower take-up should decrease the magnitude of the average effect of the program. Since a larger sample is required to detect a smaller effect, low take-up increases the sample size required and thus increases evaluation costs. 3. Suppose a research team approaches you with an evaluation of a teacher training program designed to increase the number of developmentally appropriate teaching activities that teachers do in a classroom. Currently, teachers complete approximately 20 percent of a checklist of developmentally appropriate activities taught during the training. The researchers select the sample size of schools and teachers so that they have sufficient power to detect an increase in average checklist completion from 20 percent to 80 percent. This is an underpowered study. Going from 20 percent to 80 percent is a giant (and possibly unrealistic) expected increase in the outcome of interest. If the study design only has enough power to detect this large of an effect, that means it does not have the power to detect a change from 20 percent to 60 percent. That is, a tripling of the checklist completion rate would not be statistically distinguished from a zero percent change. It would be better off to aim for a smaller minimum detectable effect size. This would require increasing the sample size of schools and possibly teachers. 4. Suppose another research team submits an alternative design for the same evaluation where the sample size of schools and teachers will allow you to estimate an improvement from a checklist completion rate of 20 percent to a completion rate of 22 percent. This is a high powered study, as it is adequately powered to detect a 10 percent increase in the checklist completion rate. In fact, it might be overpowered and therefore too large of a sample. We might want to ask what does this 10 percent mean. If, for example, the 10 percent improvement corresponds only to half of a checklist item, then we might want to equate a 10 percent increase or even something slightly larger to an increase of zero. If this is the case, then we can reduce our sample size. 5. If you want to test the impact of a new curriculum on school readiness among children, what sample design is likely to give you higher statistical power: a. Sampling 30 schools with 50 students sampled from each school b. Sampling 50 schools with 30 students sampled from each school Option B would give us much higher power as there are more clusters in this option. Also, it is rare to see more than 30 students sampled per cluster. Adding additional students over and above this is unlikely going to add to power.