Policy Research Working Paper 10148 Property Tax Compliance in Tanzania Can Nudges Help? Matthew Edward Collin Vincenzo Di Maro David K. Evans Frederik Manang Development Impact Evaluation Group & Macroeconomics, Trade and Investment Global Practice August 2022 Policy Research Working Paper 10148 Abstract Low tax compliance in low- and middle-income countries (social pressure), or no message (control). Recipients of any around the world limits the ability of governments to offer message were 18 percent (or 2 percentage points) more effective public services. This paper reports the results of likely to pay any property tax by the end of the study period. a randomly rolled out text message campaign aimed at Each type of message resulted in gains in payment rates, promoting tax compliance among landowners in Dar es although social pressure messages delivered the lowest gains. Salaam, Tanzania. Landowners were randomly assigned to Total payment amounts were highest for those who received one of four groups designed to test different aspects of tax reciprocity messages. Nudges were most effective in areas morale. They received a simple text message reminder to with lower initial rates of tax compliance. The average esti- pay their tax (a test of salience), a message highlighting the mated benefit-cost ratio across treatments is 36:1 due to the connection between taxes and public services (reciprocity), low cost of the intervention, with higher cost-effectiveness a message communicating that people who did not pay for reciprocity messages. were not contributing to local or national development This paper is a product of the Development Impact Evaluation Group, Development Economics and the Macroeconomics, Trade and Investment Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at mcollin@worldbank.org, imaro@worldbank.org, devans@cgdev.org, and fmanang@gmail.com. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Property Tax Compliance in Tanzania: Can Nudges Help? Matthew Collin Vincenzo Di Maro World Bank World Bank David K. Evans Fredrick Manang Center for Global Development University of Dodoma Keywords: Tax Compliance, Tax Morale, Public Finance, Nudges JEL classification: H26, H13, O17 ∗ The authors are grateful for support from the Tanzania Revenue Authority (TRA), especially the non- tax revenue department at TRA headquarters. We thank Anne Brockmeyer, Santiago de la Cadena Becerra, Addisu Lashitew, Fabrizio Santoro, Stuti Khemani, Thiago Scot, and participants at the CSAE conference and at World Bank workshops for discussions and comments. We thank the DIME Analytics team at the World Bank for their careful replication review. The Impact Evaluation to Development Impact (i2i) multi-donor trust fund at the World Bank providing funding. The authors declare no competing interests. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not not reflect the views of the Executive Directors of the World Bank or the governments they represent, nor of the employers of non-World Bank authors. Corresponding author: Manang (fmanang@udom.ac.tz). Other authors: Collin (mcollin@worldbank.org), Di Maro (vdimaro@worldbank.org), and Evans (devans@cgdev.org). 1 Introduction Governments need revenue to fund public goods and services. It is becoming clearer that the source of that revenue also matters (Gadenne 2017): the state’s ability to raise tax revenue is thought to invite public scrutiny and strengthen the “social contract” between citizen and state, leading to positive effects on institutional and economic development (Besley and Persson 2013; Ali, Fjeldstad, and Katera 2017; Dincecco and Katz 2014). Despite the purported benefits of higher tax capacity, low- and middle-income countries largely struggle to raise the same levels of tax revenue as their higher income peers. As of 2014, the tax-to-GDP ratio of the median low or lower-middle income country was 15.8 percent, compared to 20.5 percent for higher income countries. Evidence suggests that this may condemn some of these countries to lower levels of economic growth in the long term (Gaspar, Jaramillo, and Wingender 2016). One area where governments seek to improve tax compliance is the taxation of immovable property, which presents an attractive source of revenue as it is – in theory - both easier to target and easier to tax in a non-distortionary way. But low- and middle-income countries perform even worse in the collection of property tax than they do overall: where OECD countries bring in approximately 2.1 percent of GDP of revenue from property taxes, poorer countries bring in only 0.6 percent (Norregaard 2013). Tanzania, the context of our study, has historically struggled with low rates of tax revenue, both overall (11.8 percent of GDP) and for property tax (0.1 percent of GDP). Furthermore, the government has oscillated between a regime of decentralized revenue collection (where the local authorities are responsible for collecting property tax) and that of centralized collection (where the Tanzania Revenue Authority is responsible) (Fjeldstad, Ali, and Katera 2017). This process has led to unstable and unpredictable levels of property tax revenues over the past decade. Furthermore, the Tanzanian government has struggled with low levels of compliance. This may in part be due to a lack of perceived reciprocity by taxpayers: property owners do not understand how the government will use their money (PO-RALG 2013).1 It is also the result of a small tax base: while legally every property owner is obligated to pay tax, local authorities have previously prioritized those with larger properties living in the most affluent areas of the city (PO-RALG 2013). It is within this context that the Tanzania Revenue Authority (TRA) is examining new ways to improve property tax compliance in the cities for which it is responsible. This paper investigates the impact of a series of reminders via text message that leverage different aspects of citizens’ voluntary motivation to pay property taxes (i.e., their “tax morale”). Working with the TRA we randomly allocated a group of more than 200,000 individuals in Dar es Salaam who were liable to pay property tax - but as of one month prior to the annual deadline had not paid any - into four groups.2 Three groups received a text message treatment: a simple reminder (increasing the salience of tax paying), a message that emphasized the link between tax revenue and publicly-provided goods (focusing on reciprocity), or a message highlighting the non-cooperative nature of tax evasion (focusing on social pressure). The fourth group served as 1 In Pakistan, even an explicit intervention to collect citizen preferences on public services and then to deliver services based on those preferences had little initial effect on tax payments (Khwaja et al. 2020). 2 As discussed later, taxpayers were actually assigned to five groups, but two groups received the same treat- ment due to an implementation error. 2 a control. We find that all three treatments had a positive impact on both the propensity of taxpayers to make payments to the TRA and the total amount paid. Those receiving the reciprocity treatment paid higher amounts of total tax. Similar to what has been found with other “nudge” style interventions, the intervention is highly cost effective, with the increase in revenue driven by treatment exceeding its cost by roughly a factor of 36. Finally, we also document two interesting sources of heterogeneity. The first is across the ge- ography of the city: areas that have a higher rate of tax compliance among the control group also had lower treatment effects, suggesting that nudges may be more successful in low-compliance areas when policymakers are able to identify them ex-ante. Second, we find heterogeneity across the amount that taxpayers owed the TRA: average treatment effects appear to be strongest for those who owe less than 25,000 TSh, predominantly smaller properties that do not bring in much revenue and so are less likely to be subject to TRA follow up. This indicates that tax nudges may, in some circumstances, be regressive in their impact on compliance. The rest of the paper is structured as follows. Section 2 discusses prevailing theories of tax compliance and the existing body of evidence on the impact of messages on taxpayer behavior. Section 3 discusses the structure of the experiment and the data we will use to examine its impact. Section 4 presents and discusses the results and we conclude with Section 5. 2 Background Until recently, economists have viewed tax compliance largely through the lens of enforcement (Allingham and Sandmo 1972), where taxpayers increase their compliance when the perceived probability of detection goes up. There is evidence that letters and electronic forms of commu- nication have the potential to do this: research in many high-income economies suggests that letters containing an implicit or explicit threat of audit increases tax payments (Coleman 1996; Blumenthal et al. 2001; Hasseldine et al. 2007; Kleven et al. 2011; Fellner et al. 2013; Castro and Scartascini 2015; Hallsworth et al. 2017; Pomeranz 2015; Meiselman 2018; Hernandez et al. 2017) although the effect sizes vary across contexts and are not always significant (Ariel 2012). Work in low- and middle-income countries has largely revealed similar results (Ortega and Scar- tascini 2020; Brockmeyer et al. 2019; Kettle et al. 2016; Brockmeyer et al. 2020), again with results not always significant (Del Carmen, Espinal Hernandez, and Scot 2020). Evidence from Rwanda suggests that less aggressive messages (such as reciprocity or reminder-framed mes- sages) work slightly better than those aimed at deterrence (Mascagni, Nell, and Monkam 2017), whereas evidence from Uganda suggests stronger impacts for enforcement focused messages (Cohen 2020). In another study in Tanzania (albeit in a different part of the country than our study), both direct threats of fines sent via text message and indirect threats passed on through local leaders led to an increase in payments by Tanzanian property taxpayers (Mwaijande et al. 2021). In recent years, economists have extended the Allingham and Sandmo model to include the concept of “tax morale,” a bundle of mechanisms which explain voluntary tax compliance (Luttmer and Singhal 2014). Recent experiments have attempted to make the components of 3 tax morale more salient through careful messaging, with mixed results. Kettle et al. (2016) finds that both letters emphasizing national pride and those emphasizing social norms improve compliance in Guatemala, but not significantly more than a letter invoking a heightened prob- ability of audit. In richer countries, randomized studies of letter or e-mail campaigns typically find that attempts to emphasize the social contract or civic duty either have little impact or are marginally effective (Coleman 1996; Blumenthal et al. 2001; Torgler 2004; Ariel 2012; Fellner et al. 2013; Castro and Scartascini 2015; Meiselman 2018) with some exceptions (Hallsworth et al. 2017). Krause (2020) finds that messages that emphasize the social pressure mechanism in Haiti might even have a negative effect on tax compliance. This is consistent with results from psychology research which have shown that in contexts with low rates of pro-social co- ordination, a mechanism known as antisocial punishment could be at play (Herrmann et al. 2008). A recent meta-analysis (Antinyan and Asatryan 2020) of studies of nudges for tax compliance finds that deterrence nudges are on average more effective than tax morale nudges. The meta- analysis also finds that nudges seem to work better for the sub-samples of late payers, which is the sample we focus on in this paper. 3 Experiment and data collection 3.1 Baseline data and randomization The frame for this experiment is a list of 241,200 properties for which, as of June 1, 2018, no property tax had been paid to the TRA for the 2017/2018 tax year. The deadline for property tax payments to be completed was June 30th. After June 30 had passed, the TRA extended the deadline for another two weeks, although continued to accept payments after this point. As some taxpayers own multiple parcels, we collapsed these data to the taxpayer level (237,699 taxpayers), as indicated by the taxpayer identification number associated with the property. We use two sources of information in the randomization: the location of the property and whether or not the property had been served a ‘demand notice’ at the time the data was collected. The location of the property is the lowest level of administration in Dar es Salaam, the sub-ward or ‘mtaa’ level. We assign taxpayers the same location as their property. When taxpayers have multiple properties that span more than one sub-ward, we pick the modal sub- ward. Where there is no modal sub-ward, we randomly choose one of those sub-wards to assign to the taxpayer. Ultimately, the randomization was conducted within 1,211 different strata, which were defined both by the location of the property (sub-ward) and whether a bill had been issued. Demand notices are bills issued by the TRA to landowners. Approximately 19 percent of the experimental sample had been issued a bill at the time of the data collection. The TRA issued bills (called “demand notices”) in bulk for a specific area of the city (this could be a ward or a set of core streets) and sent them to landowners by manual delivery by TRA officers and temporary interns, typically after seeking the support of street leaders. While the goal is to cover all areas of the city every fiscal year, limited financial and human resources explain why only a subset of the city is covered in practice. 4 To better understand how our experimental sample compared to the average property in Dar es Salaam, we matched our experimental data to a set of data comprising every parcel in TRA’s database for the city. Out of approximately 830,000 parcels in the city, nearly 30 percent are owned by a taxpayer in our experimental sample. Parcels included in our experiment were only slightly (1.4 percentage points) less likely to have been issued a bill. Of those that were issued bills, only 35 percent of properties in the experimental sample had been valued by the TRA, as opposed to 74 percent in the rest of the city (those that were not valued were charged a TSh 10,000 flat tax). Conditional on being billed and rated, the median value for properties included in the experiment was higher, approximately 31.2 million TSh (13,500 USD) versus 20 million TSh (8,600 USD) for the rest of the city. Properties that were billed and rated faced, on average, an annual tax rate of approximately 0.16% of the property’s rated value. 3.2 Treatments In collaboration with the Tanzania Revenue Authority, we randomized each property owner into one of five groups.3 The treatments are summarized in Table 1. The control was not to receive any message from the TRA. Group T1 received a simple message reminding them to pay their property tax, indicating the due date (June 30th) and providing information the taxpayers could use to contact the TRA in case they had any questions. All other treatments included this reminder message. Treatment 2 added a ‘reciprocity’ message, where taxpayers were reminded that taxes fund social services and infrastructure and finished with the TRA’s slogan “Together we build our Nation.” Treatment 3 included the simple reminder as well as a ’social pressure’ message in which taxpayers were reminded, in a negative fashion, that non-compliers were not contributing to the development of the country or their own communities. The final planned treatment was an enforcement message which included the simple statement “Pay your tax early to avoid penalties.” However, for reasons we discuss below, during the implementation of the experiment no taxpayers were sent this message. 3.3 Implementation and challenges Following the randomization, a list of taxpayers and their phone numbers (included in the original TRA dataset) were provided to the TRA. The majority of messages were sent out after June 20, fewer than ten days before the initial deadline to pay property tax. While there was overlap in the delivery of different treatments, completion of each treatment arm proceeded sequentially, with reminder messages being sent first to group T1, reciprocity messages second to group T2, followed by social pressure messages to group T3.4 There were two errors in the message delivery process. First, the firm in charge of sending the messages sent Treatment 1 messages to taxpayers who had been randomly allocate to Treatment 3 The randomization was conducted using the Stata command randtreat, with “misfits” (i.e., observations beyond those that are a multiple of the number of treatment groups) being dealt with using the strata method, which randomly allocates misfits across all strata (Carril et al. 2017). 4 Figure A1 in the Appendix shows the timing of message delivery. One concern is that the differential timing of message delivery affected the impacts. In the results section, we provide suggestive evidence that our pattern of results is robust to this. 5 Table 1: Treatment arms and treatment assignments Treatment Type Message N Control No message 47,555 “Dear taxpayer, TRA reminds you to pay your property tax. Pay before 30th June. For more information: T1 Reminder 47,502 dial * 152 * 00 #, visit your nearest TRA office or call 0800780078. Thank you.” [Reminder] + “Your tax facilitates access to social T2 Reciprocity 47,547 services and infrastructure. Together we build our nation.” [Reminder] + “Non-taxpayers are not contributing T3 Social Pressure to national development and thus hindering 47,538 development of their communities.” T4 Enforcement* [Reminder] + “Pay your tax early to avoid penalties.” 47,553 *Note: Subjects randomly allocated to T4 were mistakenly sent T1. See sub-section 3.3 for details. Original messages were sent in Swahili. 4 (Enforcement), essentially doubling the size of the first treatment arm. Thus no enforcement messages were sent as a part of the experiment (although a few were sent independently by the TRA). Second, instead of using the list of cleaned and prepared phone numbers they were provided, the firm chose an unformatted list which contained the same numbers, but in some cases were not usable due to missing pre-fixes or mistakenly included county codes. As a result, between 22-33 percent of each treatment arm was not sent the intended message. Using data from the text message delivery, we can account for which taxpayers were or were not sent a message due to this error. Finally, the randomization was conducted at the taxpayer identification level, but a small subset of taxpayer IDs shared identical phone numbers. This is likely because some taxpayers were issued multiple taxpayer IDs, or households sharing a single number contained multiple taxpayers. This leads to spillovers in actual treatment between the various treatment arms and the control group. These spillovers complicate our ability to discern the effects of each treatment in isolation. To account for this, for our main analyses we restrict our sample to the roughly 60% of taxpayers that have a unique phone number (one that is not shared with any other taxpayer in the sample). As we show in Appendix section A1.3, those who only have unique phone numbers are no more or less likely to receive any particular treatment, so this restriction does not undermine the randomization. The actual frequencies of treatment across each group are summarized in Table 2. Because we know the actual distribution of messages that were sent, we can instrument for actual message delivery with assignment to treatment. 3.4 Outcome data Our data on outcomes was retrieved from the TRA at the beginning of August 2018. We merged the complete record of all property tax payments made for a given taxpayer ID between June and the beginning of August to our original experimental sample. For each taxpayer ID we record, for each date during this period, whether any payments associated with that ID had been made up to that date as well as the cumulative amount of payments made so far. For a 6 Table 2: Frequencies of actual treatment, by treatment group Treatment Arm Received No Message Received T1 Received T2 Received T3 Control 99.99% 0% 0% 0% T1 - Reminder 11.1% 88.9% 0% 0% T2 - Reciprocity 4.3% 0% 93.01% 2.7% T3 - Social pressure 15.18% .01% 2.7% 82.13% Note: As some some taxpayers received multiple messages, the above frequencies can exceed 100. Sample restricted to taxpayers with unique phone numbers. subset of approximately 45,000 taxpayers we also have the final demand notice (bill) that was issued by the TRA.5 This allows us to also investigate the impact our treatments have had on the proportion of the final tax bill each taxpayer has paid, as well as the probability that they have paid their entire bill. 4 Results For most of the main results, we display the results from either a single treatment dummy (for any treatment) or a full set of dummies for each treatment: Pi = α + β × treatedi + γs + i (1) and Pi = α + β1 T 1i + β3 T 2i + β4 T 3i + γs + i (2) where Pi is alternatively whether the taxpayer has paid anything to the TRA or the amount the taxpayer has paid. In equation (1), treatedi is an indicator variable equal to one if the taxpayer was randomized into any of the three treatment groups. In equation (2), the indicator variables T 1i , T 3i and T 3i are dummy variables equal to one if the taxpayer was randomized into the reminder, reciprocity or social pressure treatments, respectively. Unless otherwise specified, we run both specifications (1) and (2) with strata fixed effects (indicated by γs ) and cluster the standard errors at the taxpayer level. Figure 1 shows the intent-to-treat (ITT) coefficient estimates from specification (1) when the outcome is whether the taxpayer has paid anything, measured at different points of time during the experiment. As can be seen, prior to the introduction of the text messages, treated taxpayers had the same propensity to have made a payment to the TRA as untreated taxpayers. Only following the introduction of the text messages do we see a difference. By the end of the period for which we have administrative data, those randomized into a message treatment were 1.8 percentage points more likely to have made a payment to the TRA, over a baseline of approximately 11 percent. Figure 2 shows (a) the average payment rate and (b) the average amount paid for the 5 As we show in Table A1 in the Appendix, there is no imbalance within our experimental sample in the probability of being issued a bill, although for the bill subsample there are some minor imbalances between the reciprocity and social pressure treatment arms for the overall bill amount. 7 Figure 1: Timeline of ITT effect of all messages on payment rates (a) Impact on probability of payment (b) Impact on total amount paid Note: Figure shows the pooled effect of being randomized into one of the treatment groups over time. Subfigure (a) shows the intent-to-treat effect on the probability that the taxpayer has made any payment to the TRA by the date shown. Subfigure (b) shows the intent-to-treat effect on the total amount the taxpayer has paid to the TRA by the date shown. All dates are for 2018. 95% confidence intervals shown. control group and each treatment group over the course of the study relative to the timing of the treatment (indicated by the dark grey shaded bars). On average, all three treatment messages outperform the control group, with the gap opening up around the time of the first payment deadline on June 31st. Table 3 shows the results of specification (2) when the outcome is whether the taxpayer has made any payment to the TRA measured at different points of time during the experiment. Columns (1)-(3) show the ITT estimates and (4)-(6) show the 2SLS estimates, where treatment assignment is used to instrument receipt of the correct text message. For the latter, the results indicate that one month after the deadline, those that received a reminder, reciprocity, or a social pressure message were 2, 2.3, or 1.7 percentage points more likely to make a payment, 8 Figure 2: Timeline of payment rates by treatment group (a) Probability of any payment to TRA (b) Amount paid to TRA Note: Figures show the average outcomes for each treatment group for (a) the share of taxpayers who made any payment to the TRA and (b) the average amount paid to the TRA in TSh, both over the period we have data for. The shaded bar graph indicates the timing of the treatment: each bar indicates the proportion of messages that were sent to taxpayers on a given day. over a control mean of about 11 percentage points. At the bottom of each column we present a test of equality of these coefficients: we find that even though the reciprocity treatment has a larger effect size than the other treatments, it only has a significantly larger impact than the social pressure treatment. Table 4 follows the same approach, but the outcome is the amount paid in TSh. Recipients of the reminder, reciprocity, and social pressure messages paid an additional 542 TSh above a 9 Table 3: Impact of message assignment on payment rates ITT estimates 2SLS estimates One month One month Start of First tax after first Start of First tax after first experiment deadline deadline experiment deadline deadline Pooled treatment arms Treated -0.000578 0.0147∗∗∗ 0.0179∗∗∗ -0.000663 0.0169∗∗∗ 0.0205∗∗∗ (-1.04) (8.60) (8.72) (-1.07) (8.88) (8.96) Separate treatment arms T1: Reminder -0.000738 0.0153∗∗∗ 0.0177∗∗∗ -0.000830 0.0173∗∗∗ 0.0199∗∗∗ (0.000605) (0.00190) (0.00227) (0.000681) (0.00214) (0.00256) T2: Reciprocity -0.000601 0.0180∗∗∗ 0.0220∗∗∗ -0.000638 0.0191∗∗∗ 0.0232∗∗∗ (0.000699) (0.00224) (0.00267) (0.000740) (0.00237) (0.00283) T3: Social pressure -0.000234 0.0101∗∗∗ 0.0144∗∗∗ -0.000264 0.0117∗∗∗ 0.0167∗∗∗ (0.000707) (0.00219) (0.00263) (0.000850) (0.00263) (0.00317) Constant 0.00735∗∗∗ 0.0716∗∗∗ 0.109∗∗∗ (0.000503) (0.00150) (0.00182) Control mean 0.007 0.072 0.109 0.007 0.072 0.109 First stage f-stat 135,432 135,432 135,432 R2 0.013 0.041 0.048 0.000 0.001 0.001 Obs 143,425 143,425 143,425 143,425 143,425 143,425 Test: T1 = T2 0.816 0.181 0.070 0.766 0.418 0.210 Test: T1 = T3 0.401 0.008 0.156 0.428 0.016 0.254 Test: T2 = T3 0.597 0.001 0.005 0.650 0.007 0.047 Notes: Outcome is the probability a taxpayer made any payment to the TRA by the date indicated. ITT estimates indicate impact of being assigned to treatment. 2SLS estimates instrument the receipt of each message type with assignment to treatment. Robust standard-errors in parentheses. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 control mean of approximately 3,544 TSh, an increase in about 15 percent. Those receiving the reciprocity message gave substantially more: nearly 700 TSh. This is significantly greater than reminder messages, although not significantly greater than social pressure messages at conventional levels of inference. In Table 5 we use our subsample of taxpayers for whom we have bill data to unpack what proportion of their final tax bill has been paid. Column (1) replicates column (6) of Table 3, then column (2) restricts the sample to the bill subsample, showing that the effect sizes are similar across the two samples. Columns (3) investigates the impact of the three treatments on the proportion of the total tax bill (the 2017/18 bill and any arrears) paid by the taxpayer. During the time period we consider the control group has paid roughly 10 percent of their total bill on average, and we find that treated taxpayers pay an additional 1.6-1.9 percent of their bill on average. Only 8.4 percent of control group taxpayers paid off their entire bill. Those receiving a reminder or a reciprocity message were 1.6-2 percentage points more likely to fully pay. Again, reciprocity performed better than the other message types, although not significantly so. To check whether the timing of the messages across groups affected the impacts, we include analysis of our main results – whether citizens paid any part of their bill and the amount that they paid – for the subgroup that received messages on June 26 or later (Appendix Figure A1). Our pattern of results is consistent in this smaller sample. The impact on payment rates is higher for those who received any treatment, and the point estimate is highest for the 10 Table 4: Impact of message assignment on payment amounts ITT estimates 2SLS estimates One month One month Start of First tax after first Start of First tax after first experiment deadline deadline experiment deadline deadline Pooled treatment arms Treated -18.44 395.7∗∗∗ 468.6∗∗∗ -18.77 460.1∗∗∗ 542.3∗∗∗ (-0.72) (5.16) (5.25) (-0.66) (5.38) (5.45) Separate treatment arms T1: Reminder -47.06+ 354.5∗∗∗ 401.3∗∗∗ -52.97+ 399.0∗∗∗ 451.7∗∗∗ (27.01) (84.13) (97.71) (30.41) (94.70) (110.0) T2: Reciprocity 12.31 559.1∗∗∗ 660.0∗∗∗ 12.96 590.6∗∗∗ 695.7∗∗∗ (33.94) (100.3) (115.8) (35.98) (106.4) (122.8) T3: Social pressure 8.130 313.8∗∗ 411.3∗∗∗ 9.469 362.9∗∗ 478.1∗∗∗ (33.54) (98.08) (114.2) (40.32) (117.9) (137.2) Constant 262.1∗∗∗ 2434.8∗∗∗ 3540.1∗∗∗ (23.10) (67.89) (79.25) Control mean 259.641 2435.597 3544.343 259.641 2435.597 3544.343 First stage f-stat 135,432 135,432 135,432 R2 0.019 0.047 0.055 0.000 0.000 0.000 Obs 143,425 143,425 143,425 143,425 143,425 143,425 Test: T1 = T2 0.037 0.021 0.011 0.034 0.049 0.029 Test: T1 = T3 0.050 0.638 0.920 0.064 0.725 0.824 Test: T2 = T3 0.904 0.016 0.034 0.932 0.060 0.119 Notes: Outcome is the amount in Tanzanian shillings a taxpayer paid to the TRA by the date indicated. ITT estimates indicate impact of being assigned to treatment. 2SLS estimates instrument the receipt of each message type with assignment to treatment. Payment amounts winnorsized at the 99th percentile. Robust standard-errors in parentheses. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 reciprocity group (Appendix Table A2). Likewise, the point estimates on total amount paid are highest for the reciprocity group, although in this subsample, the amount paid is quite high for the social pressure group as well (Appendix Table A3). The remainder of the results section presents some extensions beyond the main results. This discussion is exploratory in nature and may be extended in future research. 4.1 The geographic distribution of effects In this subsection, we ask a simple question: had the TRA targeted a specific region of Dar es Salaam with messages, would they have had the same impact observed across the entire city? Recent meta-analyses of nudges have revealed a significant amount of heterogeneity across contexts (Antinyan and Asatryan 2020; DellaVigna and Linos 2020). This heterogeneity is important to the policymaker, who will need to know the likelihood a given treatment will be effective in a new setting. If messages sent to taxpayers in different parts of the city have drastically different effects, then a revenue authority may want to consider targeting areas where the intervention is more cost effective, or tailor its messaging to be more effective in those areas. As mentioned above, the randomization was conducted within 1,211 different strata, which were defined both by the location of the property (sub-ward) and whether a bill had been issued. This allows us to compare effect sizes across different locations to see if any systematic relationships appear. 11 Table 5: Payment results for subsample with bills Full Sample Bill Sample Pr(any payment) Pr(any payment) Proportion paid Pr(fully paid) Pooled treatment arms Treated 0.0205∗∗∗ 0.0223∗∗∗ 0.0197∗∗∗ 0.0166∗∗∗ (0.00229) (0.00486) (0.00440) (0.00418) Separate treatment arms T1: Reminder 0.0199∗∗∗ 0.0224∗∗∗ 0.0190∗∗∗ 0.0159∗∗∗ (0.00256) (0.00545) (0.00492) (0.00468) T2: Reciprocity 0.0232∗∗∗ 0.0241∗∗∗ 0.0224∗∗∗ 0.0201∗∗∗ (0.00283) (0.00600) (0.00545) (0.00519) T3: Social pressure 0.0167∗∗∗ 0.0185∗∗ 0.0162∗∗ 0.0111+ (0.00317) (0.00676) (0.00610) (0.00578) Control mean 0.109 0.109 0.095 0.078 First stage f-stat 135,432 28,065.7 28,065.7 28,065.7 R2 0.001 0.001 0.001 0.001 Obs 143,425 31,554 31,554 31,554 Test: T1 = T2 0.210 0.752 0.498 0.383 Test: T1 = T3 0.254 0.518 0.596 0.349 Test: T2 = T3 0.047 0.418 0.317 0.132 Notes: Table shows results for payment outcomes for both entire experimental sample and the subsample for which we have bill data. The outcomes in columns (1) and(2) is an indicator equal to one if the taxpayer has made any paymen to the TRA during the study period. The outcome in column (3) is the share of the total cumulative tax bill the taxpayer made by the end of the study period. The outcome in column (4) is an indicator equal to one if the taxpayer has fully paid their tax bill by the end of the study period. All results are 2SLS results where the actual messages sent to the taxpayer are instrumented using the original treatment assignment. Robust standard-errors in parentheses. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 For this analysis, we drop sub-wards with fewer than 100 observations (roughly 4 percent of the total experimental sample), leaving us with 360 sub-wards. For each sub-ward, we run our standard regressions, where the outcome is an indicator equal to one if the taxpayer made any payment to the TRA during the study period as a function of the reduced form treatment assignment, as well as including a control for whether that taxpayer received a bill or not. We then recover both the coefficient β as well as calculate the control group mean µ from each regression. The top half of Figure 3 shows (i) the distribution of those effect sizes across the 360 sub-wards, and (ii) the distribution of average control group compliance. The figures demonstrate a substantial amount of variation in effect sizes across strata. To give a sense of the implications for comparing small-scale experiments to city-wide ones: if the experiment had been conducted in a randomly-chosen strata, the chance that the resulting effect size would have been within one percentage point of the “true” effect estimate is only 58 percent. Only 65 percent of sub-wards had estimated effect sizes greater than zero. The bottom half of Figure 3 compares the estimated distributions of control group com- pliance and treatment group compliance across strata. The bottom left graph shows that the distribution of treatment group compliance is shifted up and to the right of control group com- pliance. The scatterplot in the bottom right shows that the effect sizes are inversely proportional to control level compliance: the treatment seems to be more effective in areas that have a higher “baseline” rate of non-compliance. 12 Figure 3: Distribution of effects and end-of-experiment compliance rates across strata (a) Distribution of average compliance (b) Strata-specific effect size versus across strata for taxpayers receiving the average compliance rate for the any message versus the control group control group (c) Distribution of strata-specific (d) Distribution of control-group compliance effect sizes rate across strata Note: Each observation is either the coefficient or the constant from a strata-specific regression of whether a household has made any payments to the TRA during the study period on a dummy = 1 if the household was randomized into a message treatment. Sample includes all strata with at least 100 observations. 4.2 Cost-effectiveness of the impacts and comparisons to other estimated effects While the messages in our experiment have a relatively small effect on compliance and amount paid, this effect can still be shown to have substantial practical relevance (cost-effectiveness). Through a straightforward calculation, we estimate that the experiment’s benefits are around 36 times their cost. This comes from simply dividing the average increase in the amount paid to TRA due to the intervention (around 540 TSh for those that received a message) with the text-message campaign’s cost, which is estimated to be around 15 Tsh per text message. The reciprocity treatment (impact on amount paid estimated at 695 TSh) would be the most cost- effective intervention in our setting with an increase in the amount paid of more than 46 times the messages’ cost. There are, of course, several factors that this simple calculation does not consider. First, the 13 cost of the messages could be lower if the government achieves any discount with large numbers of messages. Second, as more taxpayers comply with payment, other taxpayers could start to emulate them, and so the messages could have an additional, spillover effect that we do not consider in this simple cost-effectiveness analysis. On the other hand, we cannot rule out that the messages have only a short-term effect, although we suspect this is not the case. Previous work on text message reminders suggest that high frequency messages (e.g., daily reminders to take medicines) may lose efficacy, but not so with occasional reminders (Pop-Eleches et al. 2011). As described earlier in the paper, we did not explicitly test an enforcement message as part of the randomized control trial. However, a subset of taxpayers - not experimentally assigned - did receive an enforcement message from the TRA, indicating that they may face fines if they did not make their payment on time. We include those results in Section A1.2 in the Appendix, with the caveat that those receiving the enforcement messages differed significantly from other participants across a number of covariates. Related to cost effectiveness, we can also infer some insights on how program implementation affects the program’s benefits. From Table 4, we observe that the ITT estimated impact is of around 470 Tsh (see pooled treatment arms) while the 2SLS estimated impact is of around 540 Tsh. As the ITT includes potential implementation failures (i.e., text messages not effectively received or opened), it gives an indication of the impact under imperfect implementation. On the other hand, the 2SLS would give an indication of the impact under implementation with greater fidelity. In the case of this program, implementation failures led to a roughly 13% decrease in the effectiveness of the program. 4.3 Distributional effects Another dimension we explore in the data is the distribution of effects across types of properties. For the sample of properties for which a bill was issued, we order the bills by amount owed percentiles (for the 17/18 tax year, not including back taxes) and plot the percentage of the total potential bill revenue for each percentile in Figure 4. The graph indicates that in terms of potential revenues, most of the opportunity for increased revenue would come from the top 10 percent of billed properties, which account for more than 80 percent of the total potential bill revenue. We then look at treatment heterogeneity across bill amounts in Figure 5. The red curve (with 95 percent confidence bands in red) shows the proportion of the bill paid after reception of any treatment, while the blue curve (with 95 percent confidence bands in blue) shows the proportion of the bill paid for the control group (i.e., the space between the lines indicates the impact of the program). The graph suggests that the treatments have a bigger impact for smaller bills, primarily those who owe below 25,000 TSh. This result offers insight into why the overall impact of the messages in terms of payment amounts is relatively low: most of the potential tax take would come from the very highly valued properties/bills, but the messages have a stronger impact for the smaller bills. Because this heterogeneity is measured relative to the bill amount for a given property rather than household wealth or income, we 14 Figure 4: Potential bill revenue and bill amount percentiles Note: Figure shows (on the y-axis) the cumulative proportion of the total the total 2017/18 fiscal year bill amount charged by the TRA against (on the x-axis) the percentile of bill amounts the taxpayer is in. For example, the first 50% of taxpayers, ranked by their bill amount, account for last than 5% being charged by the TRA. cannot draw clear conclusions about the progressivity or regressivity of the reminders.6 5 Conclusion In the face of limited tax compliance and limited capacity for enforcement of tax compliance, this study reports the impact of a randomized controlled trial to test different ways of leveraging tax morale in Dar es Salaam, Tanzania. The study tests salience (via a simple reminder), reciprocity (via a message highlighting the link between tax payment and publicly provided services), and social pressure (via a message emphasizing that those who do not pay are not contributing to national or local development). We find positive impacts of all the messages, suggesting an impact of simple salience (since any message boosts salience). But we find evidence that reciprocity treatments led to higher overall tax payments, suggesting that it is possible for government to leverage these aspects of tax morale. While the absolute gains are not enormous, the interventions are very cheap, such that the benefit to cost ratio is estimated to be over 30:1 on average across all treatments and 40:1 for the more effective treatments. Text message reminders are one useful tool that governments 6 Further complicating inferences on progressivity, we find that the reminders have a larger impact on properties that the TRA has valued, and it is highly likely that TRA has focused on valuing properties that are worth more (Appendix Figure A2). 15 Figure 5: Proportion of bill paid and bill amount Note: Figure shows local polynomial estimates of the relationship between (on the y-axis) the proportion of the total total 2017/18 fiscal year bill amount paid by taxpayers in the experimental sample and the log of the total , divided by whether they are (blue) in the control group or (red) received any treatment. 95% confidence intervals shown. can draw on to mobilize domestic resources for public services. That said, in absolute terms, tax systems will require a broad range of improvements, including more expansive registration and valuation and more effective enforcement, to substantively raise domestic revenues. Future work in this area can be designed and statistically powered to test the progressivity of tax morale interventions and a wider range of potential messages to ascertain the sensitivity of tax morale interventions to implementation details around wording, length, and frequency. Data availability statement The tax data used in this paper were received from the TRA for the purposes of this research, and because this analysis uses the trajectory of payments, anonymization of the data is not straightforward. Thus, the data are not publicly available. We would be happy to run any requested sensitivity analysis, and we welcome other researchers to approach the TRA for access to revenue data. References Ali, M., O.-H. Fjeldstad, and L. Katera (2017). Property taxation in developing countries. CMI Brief . 16 Allingham, M. G. and A. Sandmo (1972). Income tax evasion: a theoretical analysis. Journal of Public Economics 1 (3-4), 323–338. Antinyan, A. and Z. Asatryan (2020). Nudging for tax compliance: A meta-analysis. cesifo Working Papers 8500. Ariel, B. (2012). Deterrence and moral persuasion effects on corporate tax compliance: find- ings from a randomized controlled trial. Criminology 50 (1), 27–69. Besley, T. and T. Persson (2013). Taxation and development. In Handbook of public eco- nomics, Volume 5, pp. 51–110. Elsevier. Blumenthal, M., C. Christian, J. Slemrod, and M. G. Smith (2001). Do normative appeals affect tax compliance? Evidence from a controlled experiment in Minnesota. National Tax Journal , 125–138. Brockmeyer, A., A. M. Estefan, K. R. Arras, and J. C. S. Serrato (2020). Taxing property in developing countries: Theory and evidence from mexico. Working Paper . Brockmeyer, A., S. Smith, M. Hernandez, and S. Kettle (2019). Casting a wider tax net: Experimental evidence from Costa Rica. American Economic Journal: Economic Pol- icy 11 (3), 55–87. Callaway, B. and P. H. Sant’Anna (2021). Difference-in-differences with multiple time periods. Journal of Econometrics 225 (2), 200–230. Carril, A. et al. (2017). Dealing with misfits in random treatment assignment. Stata Jour- nal 17 (3), 652–667. Castro, L. and C. Scartascini (2015). Tax compliance and enforcement in the Pampas evidence from a field experiment. Journal of Economic Behavior & Organization 116, 65–82. Cohen, I. (2020). Low-cost tax capacity: A randomized evaluation with the Uganda Revenue Authority. Coleman, S. (1996). The Minnesota income tax compliance experiment. Del Carmen, G., E. E. Espinal Hernandez, and T. Scot (2020). Targeting in tax compliance interventions: Experimental evidence from honduras. Working Paper . DellaVigna, S. and E. Linos (2020). Rcts to scale: Comprehensive evidence from two nudge units. Technical report, National Bureau of Economic Research. Dincecco, M. and G. Katz (2014). State capacity and long-run economic performance. The Economic Journal 126 (590), 189–218. Fellner, G., R. Sausgruber, and C. Traxler (2013). Testing enforcement strategies in the field: Threat, moral appeal and social information. Journal of the European Economic Association 11 (3), 634–660. Fjeldstad, O.-H., M. Ali, and L. Katera (2017). Taxing the urban boom in Tanzania: Central versus local government property tax collection. CMI Insight . Gadenne, L. (2017). Tax me, but spend wisely? sources of public finance and government accountability. American Economic Journal: Applied Economics . 17 Gaspar, V., L. Jaramillo, and M. P. Wingender (2016). Tax Capacity and Growth: Is there a Tipping Point? International Monetary Fund. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics . Hallsworth, M., J. A. List, R. D. Metcalfe, and I. Vlaev (2017). The behavioralist as tax collector: Using natural field experiments to enhance tax compliance. Journal of Public Economics 148, 14–31. Hasseldine, J., P. Hite, S. James, and M. Toumi (2007). Persuasive communications: Tax compliance enforcement strategies for sole proprietors. Contemporary Accounting Re- search 24 (1), 171–194. Hernandez, M., J. Jamison, E. Korczyc, N. Mazar, and R. Sormani (2017). Applying behav- ioral insights to improve tax collection: experimental evidence from poland. Herrmann, B., C. Toni, and S. Gachter (2008). Antisocial punishment across societies. Sci- ence 319, 1362–167. Kettle, S., M. Hernandez, S. Ruda, and M. Sanders (2016). Behavioral interventions in tax compliance: evidence from Guatemala. The World Bank. Khwaja, A. I., O. Haq, A. Q. Khan, B. Olken, and M. Shaukat (2020). Rebuilding the social compact: Urban service delivery and property taxes in Pakistan. 3ie. Kleven, H. J., M. B. Knudsen, C. T. Kreiner, S. Pedersen, and E. Saez (2011). Unwilling or unable to cheat? evidence from a tax audit experiment in Denmark. Econometrica 79 (3), 651–692. Krause, B. (2020). Balancing purse and peace: Tax collection, public goods and protests. Luttmer, E. F. and M. Singhal (2014). Tax morale. Journal of Economic Perspectives 28, 149–168. Mascagni, G., C. Nell, and N. Monkam (2017). One size does not fit all: a field experiment on the drivers of tax compliance and delivery methods in Rwanda. Meiselman, B. S. (2018). Ghostbusting in Detroit: Evidence on nonfilers from a controlled field experiment. Journal of Public Economics 158, 180–193. Mwaijande, F., M. Kachwamba, J. Mwakalikamo, D. Shirima, and G. Cruces (2021). Local authorities and tax collection: Experimental evidence from tanzania. Norregaard, M. J. (2013). Taxing Immovable Property Revenue Potential and Implementation Challenges. Number 13-129. International Monetary Fund. Ortega, D. and C. Scartascini (2020). Don’t blame the messenger. the delivery method of a message matters. Journal of Economic Behavior & Organization 170, 286–300. PO-RALG (2013). A study on LGAs own source revenue collection. PO-RALG - Internal Report . Pomeranz, D. (2015). No taxation without information: Deterrence and self-enforcement in the value added tax. American Economic Review 105 (8), 2539–69. 18 Pop-Eleches, C., H. Thirumurthy, J. P. Habyarimana, J. G. Zivin, M. P. Goldestein, D. de Walque, L. MacKeen, J. Haberer, S. Kimaiyo, J. Sidle, D. Ngare, and D. R. Bangs- berg (2011). Mobile phone technologies improve adherence to antiretroviral treatment in a resource-limited setting: a randomized controlled trial of text message reminders. AIDS 25 (6), 825–834. Torgler, B. (2004). Moral suasion: An alternative tax policy strategy? Evidence from a controlled field experiment in Switzerland. Economics of Governance 5 (3), 235–253. 19 A1 Appendix A1.1 Additional Graphs and Tables Table A1: Balance tests for bill subsample Full Sample Bill Sample (1) (2) (3) (4) Probability(bill issued) Log(total bill) Log(17/18 bill) IHS(property value) T1: Reminder -0.00164 -0.0107 -0.00848 -0.128 (0.00300) (0.0132) (0.0116) (0.0907) T2: Reciprocity -0.00112 -0.00722 0.00317 -0.0763 (0.00347) (0.0152) (0.0135) (0.105) T3: Social pressure -0.00316 0.00816 0.00631 0.00126 (0.00347) (0.0154) (0.0136) (0.106) Control mean (tsh) .2222144628 268,066.239 120,048.821 74,139,772 R2 0.000 0.465 0.549 0.547 Obs 143,520 31,550 27,090 31,554 Test: T1 = T2 0.861 0.789 0.307 0.570 Test: T1 = T3 0.612 0.152 0.202 0.159 Test: T2 = T3 0.554 0.312 0.815 0.461 Notes: Table shows balance tests for four bill related outcomes. Outcome (1) is a binary outcome = 1 if the taxpayer was issued a bill by the TRA. Outcome (2) is the log of the cumulative bill amount (current and back taxes) across all the taxpayer’s properties. Outcome (3) is the log of the 2017/2018 tax owed only. Outcome (4) is the inverse hyperbolic sign transformation of the assessed property value by the TRA. The treatment measures are the original, intent-to-treat indicators. Robust standard-errors in parentheses. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 20 Table A2: Impact of message assignment on payment rates for taxpayers that only received a message four or fewer days before the first deadline ITT estimates 2SLS estimates One month One month Start of First tax after first Start of First tax after first experiment deadline deadline experiment deadline deadline Pooled treatment arms Treated -0.000464 0.0114∗∗∗ 0.0152∗∗∗ -0.000467 0.0137∗∗∗ 0.0182∗∗∗ (-0.79) (6.36) (7.06) (-0.68) (6.51) (7.22) Separate treatment arms T1: Reminder -0.000852 0.0107∗∗∗ 0.0138∗∗∗ -0.00108 0.0136∗∗∗ 0.0174∗∗∗ (0.000683) (0.00217) (0.00259) (0.000864) (0.00274) (0.00328) T2: Reciprocity -0.000222 0.0147∗∗∗ 0.0189∗∗∗ -0.000237 0.0159∗∗∗ 0.0204∗∗∗ (0.000791) (0.00249) (0.00298) (0.000866) (0.00274) (0.00327) T3: Social pressure -0.000216 0.00986∗∗∗ 0.0142∗∗∗ -0.000255 0.0115∗∗∗ 0.0166∗∗∗ (0.000709) (0.00219) (0.00263) (0.000851) (0.00264) (0.00317) Constant 0.00735∗∗∗ 0.0717∗∗∗ 0.109∗∗∗ (0.000505) (0.00151) (0.00182) Control mean 0.007 0.072 0.109 0.007 0.072 0.109 First stage F-stat 76,992.6 76,992.6 76,992.6 R2 0.015 0.044 0.050 0.000 0.001 0.001 Obs 106,832 106,832 106,832 106,832 106,832 106,832 Test: T1 = T2 0.409 0.114 0.087 0.350 0.435 0.410 Test: T1 = T3 0.348 0.691 0.875 0.329 0.448 0.807 Test: T2 = T3 0.993 0.055 0.120 0.985 0.147 0.307 Notes: Outcome is the probability a taxpayer made any payment to the TRA by the date indicated. ITT estimates indicate impact of being assigned to treatment. 2SLS estimates instrument the receipt of each message type with assignment to treatment. Robust standard-errors in parentheses. Sample restricted to taxpayers that received no message or only received a message on June 26th or after. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 21 Table A3: Impact of message assignment on payment amounts for taxpayers that only received a message four or fewer days before the first deadline ITT estimates 2SLS estimates One month One month Start of First tax after first Start of First tax after first experiment deadline deadline experiment deadline deadline Pooled treatment arms Treated -10.04 281.4∗∗∗ 360.1∗∗∗ -3.410 356.6∗∗∗ 451.5∗∗∗ (-0.37) (3.51) (3.86) (-0.11) (3.79) (4.13) Separate treatment arms T1: Reminder -48.66 166.5+ 218.1∗ -61.60 210.5+ 275.8∗ (30.31) (94.99) (110.4) (38.36) (120.2) (139.7) T2: Reciprocity 24.27 432.1∗∗∗ 516.5∗∗∗ 26.68 466.2∗∗∗ 554.9∗∗∗ (38.76) (111.4) (128.3) (42.56) (122.3) (140.7) T3: Social pressure 7.746 301.4∗∗ 404.9∗∗∗ 8.555 351.9∗∗ 475.0∗∗∗ (33.58) (98.11) (114.2) (40.39) (118.0) (137.3) Constant 262.7∗∗∗ 2442.6∗∗∗ 3543.6∗∗∗ (23.21) (67.97) (79.30) Control mean 259.922 2437.188 3541.804 259.922 2437.188 3541.804 First stage F-stat 76,992.6 76,992.6 76,992.6 R2 0.022 0.049 0.057 0.000 0.000 0.000 Obs 106,832 106,832 106,832 106,832 106,832 106,832 Test: T1 = T2 0.046 0.016 0.019 0.039 0.050 0.062 Test: T1 = T3 0.071 0.164 0.096 0.071 0.241 0.154 Test: T2 = T3 0.675 0.248 0.391 0.702 0.401 0.610 Notes: Outcome is the amount in Tanzanian shillings a taxpayer paid to the TRA by the date indicated. ITT estimates indicate impact of being assigned to treatment. 2SLS estimates instrument the receipt of each message type with assignment to treatment. Sample restricted to taxpayers that received no message or only received a message on June 26th or after. Payment amounts winnorsized at the 99th percentile. Robust standard-errors in parentheses. + p < 0.10,∗ p < 0.05,∗∗ p < 0.01,∗∗∗ p < 0.001 22 Figure A1: Timeline of message delivery Note: Graph shows the cumulative proportion of messages sent (out of messages sent to all taxpayers, not just those in our experimental sample) over time. 23 Figure A2: Impact of treatment on probability of any payment by type of property Note: Figure shows the effect size (estimated using 2SLS) by type of property: (a) those that had not been issued a bill, (b) those that had been issued a bill but had not been valued by the TRA (and thus owed a flat fee of 10,000 TSh), (c) those that had both been issued a bill and positively valued by the TRA and (d) those that had not been issued a bill or had been issued a bill but not valued. 24 A1.2 Enforcement message difference-in-difference estimates As described in the main text, those assigned to enforcement messages in our experiment were, due to an error, instead sent reminder messages. However, the Tanzanian Revenue Authority did send its own set of enforcement messages to a subset of all property taxpayers, including nearly 600 taxpayers in our experimental sample. To investigate how an enforcement message might have induced compliance among this group, we followed an event-study design, to observe whether payment rates increased following the receipt of the first message. There are several caveats to this approach. The taxpayers that received enforcement mes- sages were not randomly selected. We observe that they were more likely to have received a bill from the TRA and, conditional on receiving a bill, had higher estimated property values. Thus any estimated effect might be driven by a combination of the actual effect of the enforcement message together with the effect of any unobserved characteristics (e.g., propensity to pay, un- observed effort on the part of the TRA) which induced these taxpayers to pay before the initial deadline. Because different taxpayers received an initial message at different points in time, we account for the fact that this is what is now referred to as a “staggered” difference-in-differences frame- work. To account for the problems that can develop when units are treated at different times (Goodman-Bacon 2021), there is expected heterogeneity in the treatment and heterogeneity across time, we proceed using the doubly-robust method of estimating difference-in-differences developed by (Callaway and Sant’Anna 2021). For a single message type, we retain all taxpayers that received that message and all control taxpayers, and then run the doubly-robust estimator separately for each message type (enforce- ment, reciprocity, reminder and social pressure) separately, so in each instance, the receipt of that type of message is being compared to a pure control group that received no message. Figure A3 displays the event study coefficients (βj ) for (a) any payment being made and (b) the total amount paid in Tanzanian shillings. For both outcomes, taxpayers that received an enforcement message saw substantially faster growth following the treatment period. The average treatment effect on the treated (ATT) for those sent enforcement messages is 7 percent- age points (versus 2.4 percentage points for the reciprocity) and had paid roughly Tsh 4,150 (versus 668 for the reciprocity).7 Given the challenge with non-comparability across groups, and some suggestion of pre-trends in the enforcement message treatment this only yields suggestive evidence that enforcement messages may merit further study. However, the effect sizes are in line with what was observed by (Mwaijande, Kachwamba, Mwakalikamo, Shirima, and Cruces 2021). 7 These effects also appear to hold as a percentage of the total amount owed, when we consider the subset of taxpayers with bills (results available upon request). 25 Figure A3: Stacked event-study estimates of impact of different message types (a) Outcome: any payment (b) Outcome: total paid Note: Figures graph the event time estimates of the impact of each message type, using the doubly robust method from (Callaway and Sant’Anna 2021). This which charts the impact of receiving a message from the TRA on either (a) whether the taxpayer has made any payment or (b) the total amount paid to the TRA. The sample is the experimental sample described in the main paper. 95% confidence intervals shown. 26 A1.3 Phone number spillovers and adjustments made to the final sample The randomization was originally conducted over 237,699 taxpayer IDs, and the phone numbers associated with those IDs were used for the message treatments. But, as discussed above, the same phone number could be associated with multiple taxpayer IDs. This means that even if a taxpayer was only allocated into a single treatment arm (for example, the pure control), they might have inadvertently received a message intended for another treatment arm if they shared a phone number with a taxpayer in that arm. Our estimates indicate that 39.62% of taxpayers share at least one other number with another taxpayer and, on average a taxpayer shares one other number with other taxpayers. Some taxpayers share several numbers: at the 99th percentile, a taxpayer shares a number with nine others. Those that share more numbers are more likely, by chance, to receive multiple treatment assignments. The starkest example of this is for the control group: those with any shared numbers were 80% likely to receive an experimental message, where those without any shared numbers basically had no chance of receiving one. However, a priori there is no reason to believe that this probability would vary across treatment arms. We can investigate this in two ways. First, Figure A4 shows the results of regressing the (inverse hyperbolic sine of the) # of shared numbers on each treatment arm (controlling for strata fixed effects), as well as a dummy = 1 if any numbers are shared. There appears to be no significant difference across treatment arms: those with more numbers do not appear to be more likely to be in the reminder treatment, versus the pure control, for example. Thus, we are able to restrict our analysis sample to taxpayers with unique phone numbers without undermining the randomization. 27 Figure A4: Differences in the number of shared phone numbers across treatment arms Figure A5: Proportion of taxpayers who received perfect treatment assignment Note: Perfect assignment is defined as: receiving no experimental messages (pure control group), or receiving only the correct experiment message (all treatment groups) and no other. * While we include the enforcement treatment assignment group separately, because this group was inadvertently assigned to the reminder treatment, we judge perfect assignment here as only receiving a reminder message. 28 Figure A6: Distribution of received messages Note: Perfect assignment is defined as: receiving no experimental messages (pure control group), or receiving only the correct experiment message (all treatment groups) and no other. * While we include the enforcement treatment assignment group separately, because this group was inadvertently assigned to the reminder treatment, we judge perfect assignment here as only receiving a reminder message. Figure A7: Distribution of received messages when we restrict to taxpayers with a unique phone number Note: Perfect assignment is defined as: receiving no experimental messages (pure control group), or receiving only the correct experiment message (all treatment groups) and no other. * While we include the enforcement treatment assignment group separately, because this group was inadvertently assigned to the reminder treatment, we judge perfect assignment here as only receiving a reminder message. 29