Bottom-Up Estimation of the VAT Reporting Gap in Bulgaria World Bank Group Brian Erard 8 October 2023 Contents Bottom-Up Estimation of the VAT Reporting Gap in Bulgaria ...................................................................... 0 1 Introduction .......................................................................................................................................... 2 2 Top-Down Estimation ........................................................................................................................... 2 3 Bottom-Up Estimation .......................................................................................................................... 4 3.1 Random Audit Studies................................................................................................................... 4 3.2 Selection-on-Observables Approaches ......................................................................................... 6 3.3 Sample Selection Approaches ....................................................................................................... 7 4 VAT Gap Estimation Methodology for Bulgaria .................................................................................... 8 4.1 Risk Group Classification ............................................................................................................. 11 4.2 Entropy Balancing ....................................................................................................................... 11 4.3 Statistical Matching..................................................................................................................... 13 4.4 Variable Selection ....................................................................................................................... 15 4.5 Outlier Control Strategy .............................................................................................................. 16 5 Sampling Design and Data Summary .................................................................................................. 20 6 Results ................................................................................................................................................. 23 6.1 Baseline Results .......................................................................................................................... 23 6.2 Sensitivity Analysis Regarding Audit Coverage Period................................................................ 27 6.3 Sensitivity Analysis Regarding Estimation Methodology ............................................................ 28 6.4 Sensitivity Analysis Using Higher Cutoff Rule ............................................................................. 29 7 Concluding Remarks............................................................................................................................ 30 Appendix A: Variable Selection Results ......................................................................................................... i A.1 Potential Explanatory Variables ......................................................................................................... i A.2 Logit Results for Final Risk-Classification Specifications ................................................................ viii A.3 Multinomial Logit Results for Selected Explanatory Variables for Entropy Balancing .................... xi Appendix B: Breakdowns of Statistical Matching VAT Reporting Gap Estimates ..................................... xxiii B.1 Matching-on-Level Results............................................................................................................ xxiii B.2 Matching-on-Rate Results.............................................................................................................. xxv Appendix C: Breakdowns of Entropy Balancing VAT Reporting Gap Estimates – Higher Cutoff Rule ..... xxvii 1 1 Introduction This report summarizes the methodology and findings of a preliminary study to estimate the Value- Added Tax (VAT) reporting gap in Bulgaria in 2019. The VAT reporting gap is defined as the difference between the amount of VAT that is actually owed by taxpayers and the amount that they declare on their tax returns.1 The VAT gap is commonly estimated using a “top-down� approach, based largely on national accounts data. While top-down estimation methodologies yield a rough sense of overall trends and levels of VAT noncompliance, they are largely incapable of producing more useful granular-level information on different components of the gap. In this report, we introduce a novel bottom-up estimation strategy for the VAT reporting gap in Bulgaria, which permits the development of disaggregated estimates of tax noncompliance by industry sector, region, and size separately for those taxpayers reporting a net VAT balance due and those claiming a refund. The bottom-up estimation strategy relies on an extrapolation of findings from risk-based VAT audits to the overall VAT population, and the estimates can be sensitive to the underlying modeling assumptions. This report illustrates how top-down VAT gap estimates can be used as a control measure against which the bottom-up estimates may be calibrated. The author is grateful to the National Revenue Agency for its substantial assistance in explaining the features and nuances of VAT reporting and administration in Bulgaria, including the audit selection process for VAT returns, as well as for compiling and providing access to an extremely detailed data sample for tax gap estimation within a short timeframe. 2 Top-Down Estimation Top-down VAT gap estimates can be derived using either a “consumption-side� or a “production-side� approach. Under the consumption-side approach (see, for example, European Commission et al., 2022), national accounts data on final consumption serve as the basis for estimates of the tax base, while data from household budget surveys and fiscal registers are relied upon to estimate the effective rates of tax that apply to different categories of expenditure, taking into account exempt transactions and legally 1 In a broader sense, the full VAT gap includes not only this “reporting gap�, but also VAT liabilities that are owed by firms that fail to file their required tax returns and VAT liabilities that are reported on tax returns but not fully paid. These latter components of the VAT gap, which generally represent only a small share of the full VAT gap, are not addressed in this report. 2 unregistered entities.2 By applying these estimated rates to the underlying consumption base and aggregating the results, an overall estimate of net VAT true tax liability (VTTL) is obtained. Under the production-side approach, typified by the IMF RA-GAP methodology (Hutton, 2017), sectoral national accounts data from the supply and use tables are employed to estimate the net VTTL within each sector, taking into account the statutory VAT rates that apply to imports, input purchases, and outputs of various goods and services within the sector. This methodology incorporates various parameters to account for transactions that are not subject to input tax credits or output taxes, including tax-exempt sales (and the inputs used in the creation of those outputs) as well as transactions involving unregistered businesses.3 The estimated net tax on the value-added within each sector is then aggregated to arrive at an estimate of overall net VTTL. Under both above-described top-down approaches, actual tax collections are subtracted from the overall VTTL estimate to arrive at an estimate of the aggregate VAT gap. Since the VTTL is estimated separately for each sector under the RA-GAP approach, a decomposition of the overall VAT gap by sector is potentially feasible. In practice, however, sector classifications in the national accounts often do not align with those in fiscal registers, which confounds comparisons of the estimated VTTL against actual VAT receipts within a given sector.4 The accuracy of each of these top-down approaches depends on the quality of the available national accounts data as well as the underlying modeling assumptions. Certain key information sources, such as supply and use tables, may be out of date, while other important statistics might be based on preliminary estimates. The degree to which informal sector transactions are accounted for in the official statistics is also of potential concern, particular in economies with large informal sectors. Consequently, top-down measures of the VAT gap are generally best viewed as ballpark figures rather than as precise estimates. 2 National statistics agencies in the EU routinely produce estimates of the weighted average rate of VAT (WAR) for use by top-down estimation approaches. 3 Such estimates typically rely on supplementary data from surveys and fiscal registers (as do the figures provided in the supply and use tables). 4 See the discussion in European Commission et al. (2022). 3 3 Bottom-Up Estimation Bottom-up VAT gap estimation studies are comparatively rare. Such studies attempt to estimate the tax gap by extrapolating firm-level results from VAT audits to the overall population of audited and unaudited taxpayers. Like top-down methods, bottom-up approaches rely on various modeling assumptions, and the results tend to have a large margin of error. As with top-down methods, bottom- up estimates of the VAT gap should be viewed more as ballpark measures than precise estimates. Bottom-up methods provide an alternative and complementary means of estimating the overall VAT gap. In countries with sufficiently rigorous national accounting systems, they provide an opportunity to corroborate top-down estimates. Owing to their foundation on micro-level data, bottom-up methods also provide a means to decomposing the estimated gap in various useful ways, permitting a much more granular perspective on the VAT gap than one can achieve using a top-down approach. Bottom-up approaches to tax gap estimation generally fall into three categories: • Random audit studies; • Applications of “selection-on-observables� methods to risk-based audit data; and • Applications of sample selection methods to risk-based audit data. 3.1 Random Audit Studies If a sufficient number of tax returns is randomly selected for audit and thoroughly examined, the audit results obtained from this sample should be broadly representative of the results one would achieve from auditing all returns within the general population. Under such an approach, one can estimate the tax gap by simply computing the weighted sum (using the sample weights associated with the random sampling design) of the audit adjustments within the sample. Although random audits are viewed by many researchers and policy-makers as the “gold standard� for tax gap estimation, they are subject to several practical limitations. First, they have a high opportunity cost in that they divert scarce resources from more productive risk-based audits (in the sense of direct revenue yield) and other important administrative functions (such as taxpayer services). This cost can be reduced to some degree through the employment of a stratified random sampling design that oversamples returns with higher compliance risks. In addition, results from random audit programs serve as a valuable resource for improving risk-based audit selection, which can help to justify the initial investment. Nevertheless, cost considerations generally lead most tax administrations to undertake only 4 small-scale random audit programs, if any at all. As a consequence, the resulting tax gap estimates tend to be quite imprecise. Small scale random audit studies are actually better suited for estimating noncompliance rates, such as the share of returns with substantive compliance, than noncompliance magnitudes. A second limitation of random audit studies is that it can be difficult to motivate tax examiners to conduct thorough audits. By their nature, tax examiners are drawn to issues where they perceive a substantial potential for significant misreporting; they may view it as a waste of their time to review issues on randomly selected returns where they perceive little scope for serious compliance problems. Third, even when examiners are highly motivated and thorough in their approach, audits sometimes fail to uncover all noncompliance that is present on a tax return. Although the undiscovered portion of the tax gap can be quite significant, it is often not accounted for under bottom-up tax gap estimation strategies.5 Undetected noncompliance is a pervasive issue for all tax gap estimation approaches that rely on audits, not just random audit studies. A final limitation is that small scale random audit studies are unlikely to capture the portion of the tax gap associated with low-incidence but high-impact compliance issues (such as overly aggressive tax planning activities or outright fraud). Owing to these limitations, random audit studies have the potential to substantially understate the tax gap. The Danish Customs and Tax Administration (SKAT) has employed periodic random audit studies to estimate the VAT gap based on stratified random samples of approximately 2,000 VAT-registered businesses.6 The estimates for 2010 indicate a modest VAT gap amounting to less than two-tenths of one percent of GDP – a figure that is much lower than top-down estimates for this period, which range between one-half and nine-tenths of one percent of GDP. One limitation of the bottom-up approach employed by SKAT is that the random sample excludes large firms (i.e., those that employ more than 5 An important exception is the bottom-up methodology employed by the U.S. Internal Revenue Service for individual income tax gap estimation, which relies on the detection-controlled estimation strategy developed by Erard and Feinstein (2011) based on the pioneering work by Feinstein (1991). This methodology relies on a statistical comparison of audit findings by a number of different examiners who have each thoroughly audited a number of different tax returns. Many audit-based studies of tax gaps do not involve a sufficient number of such examiners (or do not have the requisite information on examiner identities) to permit application of this technique. 6 The SKAT random audit program is described in International Monetary Fund (2016). 5 250 workers), which makes the sample less representative of the overall population.7 A second limitation is that a random audit program is unlikely to capture the effects of large organized fraud activities, which can only be effectively targeted under a risk-based audit program. Since Bulgaria does not perform random VAT audits, a random audit approach to VAT gap estimation was not feasible for this mission. 3.2 Selection-on-Observables Approaches When random audit data are not available, bottom-up estimation of tax gaps typically relies on results from risk-based audit programs. A key challenge with such an approach is that taxpayers who are targeted under a risk-based audit selection procedure will tend to differ in important ways from taxpayers who are not. Consequently, the results of such audits will not be representative of the degree of noncompliance to be expected in the overall population. More specifically, assuming that the risk- based audit selection process is effective, the incidence and magnitude of tax underreporting will tend to be more substantial among audited returns than unaudited ones. Selection-on-observables methods for tax gap estimation attempt to control for relevant observed differences between audited and unaudited taxpayers when extrapolating results from audited tax returns to the general return population. Under this approach, the compliance behavior of unaudited taxpayers is predicted based on the recommended tax adjustments for audited taxpayers with similar compliance risk factors, thereby permitting more of an “apples-to-apples� approach to measuring the tax gap. Such an approach is most promising when all relevant compliance risk factors are available and recorded in the data used for tax gap estimation. As an illustration of this strategy, Adu-Ababio et al. (2023) employ several alternative selection-on- observables approaches to estimate the VAT and CIT gaps in Zambia. Each method begins by estimating the relationship between risk-based tax audit adjustments and various predictors of tax noncompliance within a sample of audited taxpayers. This estimated relationship is then used to estimate the magnitude of tax underreporting for each unaudited taxpayer based on the values of the observed predictors for that taxpayer. The first method relies on a multiple linear regression specification, while two alternative methods are based on machine learning algorithms (neural networks and random 7 The tax gap is projected based on the ratio of underreported VAT to VAT receipts. 6 forests).8 The results indicate a very large combined VAT and CIT gap of 47-52 percent of true tax liability, which is roughly in-line with available top-down country-wide estimates. 3.3 Sample Selection Approaches A potential challenge with selection-on-observables methods for tax gap estimation is that all relevant risk factors for audit selection may not be observed and recorded in the data set used for estimation. For instance, suppose that some taxpayers are selected for audit on the basis of leads or referrals from third parties, which reliably indicate a high potential for noncompliance. If this audit risk factor is not recorded in the estimation sample, a selection-on-observables method will tend to predict that noncompliance among observationally similar unaudited taxpayers (who do not possess this risk factor but otherwise have similar values for the recorded risk factors) will be comparable to that found for the audited taxpayers (who do possess this risk factor). Consequently, the tax gap would tend to be over- estimated owing to the failure to account for this important unobserved risk factor. Whereas selection-on-observables models only control for recorded risk factors when predicting noncompliance, sample selection models attempt to control also for unrecorded risk factors, such as the presence of leads or referrals in the above example. They do this by specifying equations that jointly model the likelihood of audit and the presence/magnitude of noncompliance as functions of various recorded risk factors along with a set of error terms. These error terms account for unobserved risk factors, and their impact is indicated by the estimated correlation(s) among them across the equations describing audit selection and noncompliance. If the estimated correlations between the error terms of these equations are positive, this indicates that an audited taxpayer will tend to be less compliant than an unaudited taxpayer with similar recorded characteristics (just as in the above example). As an illustration of this approach, Udell (2021) estimates the VAT reporting gap in Latvia using a sample selection modeling framework developed by Dubin and McFadden (1984). The model includes equations that describe the likelihood of an audit, the type of audit to be undertaken (VAT-specific or a joint audit of VAT and other taxes), the likelihood of an adjustment to VAT liability as a result of the audit, and the magnitude of the adjustment when present. The model is estimated using a sample of audited and 8 See Alaimo Di Loro, P., D. Scacciatelli, and G. Tagliaferri (2023) for another example of a machine learning-based application of a selection-on-observables approach to bottom-up VAT gap estimation, which has been applied to Italian VAT data. 7 unaudited returns. The estimated coefficients and correlation terms are then used to predict the likelihood and magnitude of noncompliance among the unaudited returns in the sample. In principle, sample selection models seem like an attractive approach to controlling for unrecorded audit risk factors when estimating tax gaps. In practice, however, they have a high potential to produce misleading and unreliable estimates. Such models tend to be very sensitive to assumptions regarding the distributions of the error terms and the functional forms of the equations describing audit selection and compliance behavior; plausible alternative specifications will sometimes yield wildly different tax gap estimates. A possible solution is to rely on semiparametric versions of sample selection models, which rely on fewer modeling assumptions. However, a semiparametric model for tax gap estimation will not be identified (i.e., unique estimates of the parameters of the model, which are required to develop a unique estimate of the tax gap, cannot be obtained) unless the equations(s) describing the audit selection process include at least one explanatory variable that has been excluded from the equation(s) that describe tax underreporting.9 In practice, it can be very difficult to incorporate a valid exclusion restriction of this sort owing to the risk-based nature of the audit selection process. In particular, audit selection activities tend to be driven precisely by those factors that are known to be good indicators of compliance risk. Such risk indicators therefore belong as explanatory variables in both the audit and noncompliance equations; they should not be excluded from the latter. Given the high sensitivity of sample selection models to the choice of modeling assumptions and the lack of plausible exclusion restrictions, a sample selection approach to modeling the VAT reporting gap in Bulgaria was rejected. 4 VAT Gap Estimation Methodology for Bulgaria For this mission, a novel selection-on-observables estimation strategy was developed and implemented to estimate the Bulgarian VAT reporting gap in 2019. There is a rather lengthy lag between the time VAT 9 See Honoré & Hu (2020) for a review of semiparametric estimation of sample selection models and the importance of exclusion restrictions. Traditional parametric models of sample selection can be estimated in the absence of any exclusion restrictions. However, the model estimates in that case are entirely reliant on the validity of the functional form and distributional assumptions that have been imposed; moreover, the estimation results can be very sensitive to plausible sets of alternative assumptions. 8 returns are filed and when the audit process for those returns has been completed. The year 2019 was chosen as it is the most recent year for which the audit process is largely complete. Intuitively, the selected VAT gap estimation strategy involves the application of a set of balancing weights to the members of a population of audited taxpayers to mirror the relevant characteristics of the unaudited taxpayer population. The weighted sum of the observed VAT adjustments within the audit population then serves as an estimate of the VAT reporting gap within the population of unaudited taxpayers, while the unweighted sum of these VAT adjustments serves as an estimate of the gap within the population of audited taxpayers. The process that is employed for selecting these balancing weights is known as “entropy balancing.� VAT audit selection criteria will tend to differ for taxpayers who claim refunds and those who report a balance due. In Bulgaria, for instance, the audit selection process for refund claimants relies partially on the assigned values of certain compliance risk scores, known as “automated periodic risk assessment� scores (APRAs). In contrast, no such risk scores are assigned in the case of taxpayers reporting a net tax balance due. To address underlying differences in the audit selection process, refund claimants and taxpayers reporting a non-negative net tax balance on an annual basis are analyzed separately. Henceforth, we shall respectively refer to these samples as the “refund sample� and the “balance-due sample�. In contrast to audited VAT returns, it is reasonable to expect that a sizeable share of unaudited returns are at relatively low risk of noncompliance. In order to exploit this insight to improve tax gap estimation, a preliminary classification analysis was conducted to assign returns into low- and high-risk subgroups. Separate analyses were then conducted for each subgroup. Ultimately, this resulted in four separate analysis subgroups: (1) low-risk members of the refund sample; (2) high-risk members of the refund sample; (3) low-risk members of the balance-due sample; and (4) high-risk members of the balance-due sample. As described in Section 5, the data set used for VAT gap estimation allows us to control for many relevant VAT audit risk factors. Nevertheless, some portion of those taxpayers that were selected for audit were likely chosen based on unrecorded selection criteria (such as leads, referrals, or other reasonably reliable compliance risk indicators) that we are unable to control for directly. As discussed in Section 3.3, such members of the audit population are likely to be associated with higher levels of noncompliance than observationally similar unaudited taxpayers in our data sample (based solely on the 9 recorded risk factors). To address this issue, we employ a strategy that excludes a portion of the most extreme audit adjustment cases (which are presumed to be attributable to taxpayers with unrecorded indicators of high compliance risk) from the analysis. As discussed in Section 4.5, this “outlier control strategy� can be calibrated to yield an overall bottom-up VAT gap estimate that is consistent with an independent estimate, such as one obtained through a top-down estimation approach. This calibration approach has the desirable feature of producing top-line gap estimates that match another trusted source (where one is available), while permitting a decomposition of the overall VAT gap along desired dimensions, such as sector, region, size, and refund/balance-due status. In the case of Bulgaria, a top-line estimate of the VAT gap for 2019 is available based on a top-down estimation approach (European Commission et al., 2022). However, this estimate has received a low- confidence rating by the authors of the study, largely owing to its reliance on outdated information on the structure of intermediate consumption. As discussed in Section 4.5, evidence from the estimation sample used for our bottom-up analysis indicates that the top-down estimate of the VAT gap may be too low, which highlights an important benefit of the bottom-up framework; namely, both the direct findings from the underlying VAT audit data and the preliminary predictions from bottom-up models serve as valuable tools for assessing the plausibility of top-down estimates – tools which heretofore have been seldom exploited. In cases where an alternative top-line estimate of the VAT gap is either unavailable or of doubtful quality, the bottom-up methodology described in this report can be calibrated over a plausible range of top-line values, so that the sensitivity of estimates of the composition of the gap to the top-line value can be explored. As a reliability check on our methodology, we compare our findings with those of an alternative estimation strategy. The alternative approach involves directly matching each unaudited taxpayer in the relevant estimation sample to an audited taxpayer in that sample with a similar compliance risk profile. This risk profile is summarized by estimates of the likelihood that a given taxpayer would be selected for either a VAT-specific audit or a joint audit of VAT and one or more other taxes. Each unaudited taxpayer is assumed to have the same level (or, alternatively, rate) of noncompliance as the taxpayer’s matched audited counterpart. A more detailed discussion of our primary and alternative estimation methodologies is provided below. 10 4.1 Risk Group Classification Taxpayers from the refund and balance-due samples were separately classified into either high- or low- risk categories using a logit model: 1 Ai =0 (high risk) 1 + exp(𝛽 ′ 𝑋𝑖 ) ) Pr(𝐴𝑖 = exp(𝛽 ′ 𝑋𝑖 ) ′ Ai =1 (low risk). {1 + exp(𝛽 𝑋𝑖 ) In the above expression, “𝑖 “ indexes taxpayers, 𝑋𝑖 is a set (vector) of covariates used to explain compliance risk, and 𝛽 represents the vector of coefficients that are to be estimated. The dependent variable 𝐴𝑖 is an indicator for whether a VAT audit adjustment is less-than or equal to BGN 1,000 (low risk) or not (high risk). This model is estimated using the audited taxpayers in the relevant sample (refund or balance due), and ̃ are then used to predict, for each member of the sample, the probability the estimated coefficients 𝛽 ̃ ′ 𝑋 ) exp(𝛽𝑖 ̃ ′ 𝑋 . Each taxpayer in the sample is then classified that the taxpayer is low-risk using the formula 1+exp(𝛽 𝑖 ) as either high- or low-risk according to whether the predicted probability exceeds 50 percent. 4.2 Entropy Balancing Under the entropy balancing approach (Hainmueller, 2012), a maximum entropy reweighting scheme is employed to calibrate a set of weights for the audited taxpayers in the relevant sample (the specified risk subgroup of the refund or balance-due sample) that ensures balance with the unaudited taxpayers in that sample across a specified set of covariates (audit risk factors). In other words, the scheme ensures that the weighted mean value of each of the covariates is the same for the audited and unaudited taxpayers in the sample (using the balancing weights for the audited taxpayers and the original sample design weights for the unaudited taxpayers). The balancing weights are obtained by minimizing the following objective function over the audit group observations: 𝐿 = ∑ 𝑤𝑖 ln (𝑤𝑖 ) subject to a set of R balancing conditions involving the covariates 𝑋𝑖 : 1 ∑ 𝑤𝑖 𝑋𝑟𝑖 = 𝑚𝑟 , r=1,…,R �𝑢 11 as well as the constraints that all weights (𝑤𝑖 ) are non-negative and sum to the count of unaudited taxpayers in the population (�𝑢 ).10 The subscript “𝑖 “ indexes audited taxpayers in the sample, and 𝑚𝑟 represents the weighted sample mean for covariate r among the unaudited taxpayers in the sample (based on the design weights specified in Section 5). In essence, this optimization program solves for a set of weights for the audited taxpayers in the sample that satisfies each of the specified balancing conditions while deviating as little as possible (in terms of the entropy metric) from a uniform set of weights. To facilitate disaggregated estimates of the VAT reporting gap by sector, region, and size, indicators of these characteristics are included among the specified set of covariates, thereby ensuring that the weighted audit sample is representative of the unaudited taxpayer population along these dimensions. The overall VAT reporting gap associated with the unaudited taxpayers from the relevant population (as represented by a risk subgroup of the refund sample or the balance-due sample) is readily estimated as the weighted sum of the VAT adjustments among the audited taxpayers in the sample (using the above- described balancing weights). The portion of the tax gap associated with the audited taxpayers is just as readily computed as the unweighted sum of these same audit results. To compute the portions of the estimated tax gap associated with different segments of a population of unaudited taxpayers, such as by sector, we assume that the (weighted) subsample of unaudited taxpayers within a given segment has the same rate of noncompliance as the (weighted) subsample of audited taxpayers in that segment.11 The rate of noncompliance within a segment of the refund sample is defined as the ratio of the aggregate (weighted) refund overstatement to the aggregate (weighted) reported refund amount within that segment. In the case of the balance-due sample, the noncompliance rate is defined as the ratio of the aggregate (weighted) VAT understatement to aggregate (weighted) net reported VAT balance due within the segment.12 10 In practice, the optimization algorithm solves for a set of weights that are constrained to sum to one, and the weights are then multiplied by the (weighted) count of observations in the unaudited taxpayer sample. 11 Note that the equivalence between the aggregate (weighted) reported net tax balance between the audited and unaudited taxpayer samples is a feature of the model design (i.e., the balancing weights ensure that this is the case). However, this equivalence will not typically hold within subgroups of the populations, which is what motivates our approach of assuming common noncompliance rates (rather than common absolute levels of noncompliance) within the subgroups. 12 The decomposition of the tax gap by segment is carried out using the combined subsamples of low-risk and high- risk taxpayer groups. 12 To decompose the overall tax gap estimate for a specified population (associated with either the refund or balance-due sample), we therefore begin by computing the rate of noncompliance among the audited taxpayers within each segment. We then multiply this rate by the aggregate reported net tax balance (i.e., the aggregate (weighted) reported refund in the case of the refund sample, or the aggregate (weighted) reported net VAT balance due in the case of the balance-due sample) among the unaudited taxpayers in the segment. This serves as our preliminary estimate of the tax gap for the unaudited taxpayers within the segment. However, the sum of the preliminary gap estimates for each segment will tend to deviate to some degree from our estimate of the overall tax gap for this population. To address this issue, an adjustment factor is applied to the preliminary tax gap estimate for each segment to ensure that the final estimates are consistent with the overall tax gap estimate.13 4.3 Statistical Matching As an alternative to our entropy balancing methodology for VAT gap estimation, we have experimented with a statistical matching approach. Under this approach, each unaudited taxpayer within the relevant estimation sample (the specified risk subgroup within the refund or balance-due sample) is matched to an audited taxpayer from that sample with a similar risk profile. This risk profile is based on the predicted likelihood that a given taxpayer would be selected for two alternative types of audit: (1) a VAT-specific audit; and (2) a joint audit of VAT and one or more other taxes.14 A multinomial logit model is used to construct these two elements of the audit risk profile for each taxpayer in the sample: exp(𝛽 𝑔 ′ 𝑋𝑖 ) Pr(𝐴𝑖 = 𝑔) = 𝑔 ′ , where ∑2 0 exp(𝛽 𝑋𝑖 ) 0 no audit 𝑔 = {1 vat-specific audit 2 joint audit. In the above expression, “𝑖 “ indexes taxpayers, 𝑋𝑖 is a set (vector) of covariates used to explain audit selection, and 𝛽 𝑔 represents the vector of coefficients associated with subgroup 𝑔 that are to be estimated.15 13 The adjustment factor scales the gap estimates for each segment by the ratio of the overall gap estimate to the sum of the gap estimates for the individual segments. 14 Accounting for the potential for each type of audit was deemed important, because selection for a joint audit of VAT and other taxes might involve different criteria than selection for a VAT-specific audit. 15 The vector of coefficients 𝛽 0 is normalized to the zero vector. 13 Each taxpayer “𝑖 “ in the relevant estimation sample is assigned a risk profile summarized by the ̃ 2 ′ 𝑋𝑖 }, where the tildes denote the estimated values ̃1 ′ 𝑋𝑖 , 𝛽 estimated log-odds of each type of audit: {𝛽 of the parameters.16 For each unaudited taxpayer, an audited taxpayer with similar values of each of the two log-odds measures is selected (with replacement) as a match. More formally, the Mahalanobis distance metric is computed between an unaudited taxpayer “𝑖 “ and each audited taxpayer “𝑗“: ′ ̃ 𝑉 𝐷𝑖𝑗 = (𝑋𝑗 − 𝑋𝑖 ) 𝛽 ̃ ′ (𝑋𝑗 − 𝑋𝑖 ), ̃ −1 𝛽 ̃ represents the sample covariance matrix involving the two log-odds statistics among the where 𝑉 ̃ is a vector containing the full set of estimated multinomial subsample of unaudited taxpayers, and 𝛽 ̃ ̃ = [𝛽1 ]. The audited taxpayer “𝑗 ∗“ with the smallest value based on this logit model coefficients 𝛽 ̃2 𝛽 distance metric (i.e., the “nearest neighbor�) is then chosen as the match for taxpayer “𝑖 “. We experiment with two alternative approaches to estimating the tax gap under this framework. Under the first approach, we assign to each unaudited taxpayer the VAT audit adjustment that was made to the return of its matched counterpart. The tax gap among unaudited taxpayers is then readily computed by taking the weighted sum of their assigned VAT adjustments within the estimation sample. A decomposition of the overall estimated VAT reporting gap under this approach is also straightforward under this approach. One simply takes a weighted sum of the assigned VAT adjustments within each of the population segments of interest. Under the second approach, we assume that the weighted sample of unaudited taxpayers has the same aggregate noncompliance rate (rather than the same level of noncompliance) as their matched sample of audited counterparts. Therefore, to estimate the VAT reporting gap among the unaudited taxpayers, we begin by computing the rate of noncompliance within the matched audit sample. We then apply this rate to the aggregate (weighted) net tax balance reported by the unaudited taxpayers in the sample (i.e., the aggregate net reported refund in the case of the refund sample, or the aggregate net reported VAT in the case of the balance-due sample) to estimate their share of the tax gap. The corresponding 16 Taxpayers were matched on the log-odds of the vector of propensity scores rather than the propensity scores, themselves, to account for the choice-based nature of the sample. See Heckman and Todd (2009) for a discussion of this issue 14 share of the tax gap attributable to audited taxpayers is estimated as the sum of the VAT adjustments among the (unmatched) subsample of audited taxpayers. The same approach is employed to decompose the overall estimate of the tax gap by segments of the population, such as by sector, except that an adjustment factor (comparable to the one described in Section 4.2 for the entropy balancing methodology) is employed to ensure that the sum of the estimated gaps for each individual segment are consistent with the estimated gap for the overall population. 4.4 Variable Selection The VAT gap estimation sample that was made available for this mission by the NRA includes a rich set of audit and compliance risk factors. In order to reduce this set to a more parsimonious collection of the most relevant determinants, a variable selection process is employed. Similar procedures are employed to select covariates for risk classification, entropy balancing, and statistical matching. In the case of risk classification, the logit model described in Section 4.1 is initially estimated using all potential risk factors. The subset of risk factors that are found to be individually statistically insignificant at the 20 percent level (p-value >= 0.20) are then excluded from the model. Each of the excluded risk factors is then re-introduced to the model, one at a time, and re-tested for significance. The risk factor is retained if its associated p-value is less than 10 percent. In the final step, the model is estimated using all retained variables from the previous steps, and each variable is again tested for statistical significance. The subset of variables with p-values less than 20 percent are retained for the final model. For the entropy balancing and statistical matching applications, the multinomial model of audit selection described in Section 4.3 is initially estimated using all potential risk factors. Since it is desirable to ensure balance on the categories to be used for segmentation (NACE sector, region, and size) as well as the net reported tax balance (reported refund amount in the case of the refund sample and VAT balance owed in the case of the balance-due sample) the relevant indicators and measures are automatically retained in the model. The subset of all remaining risk factors that are found to be individually statistically insignificant at the 20 percent level (p-value >=.20) are excluded from the model. Each of the excluded risk factors is then re-introduced to the model, one at a time, and re-tested for significance. The risk factor is retained if its associated p-value is less than 20 percent. In the final step, the model is estimated using all retained variables from the previous steps, and each variable (with the exception of those that 15 are automatically retained) is again tested for statistical significance. The subset of variables with p- values of less than 20 percent are then retained for the final model. In the case of the entropy balancing application, the relevant test is for the joint significance of the coefficients of the variable for each audit type (VAT-specific and joint), since it is desirable to achieve balance with respect to the variable if it is a relevant predictor of either type of examination. In the case of the statistical matching application, the relevant test in the final step is for the individual significance of the coefficient for a specific type of audit.17 If the p-value is greater than or equal to 20 percent, that coefficient is restricted to zero in the analysis, and otherwise it is freely estimated by the model. For this application, then, a potential risk factor might be included in the estimation of the log-odds for one type of audit, but not the other (which makes senses, since the selection processes for VAT-only audits and joint audits are likely to depend on different criteria). Since taxpayers are matched on the basis of their values for both log-odds measures, it is desirable to separately select the most important determinants associated with each measure. The variable selection results for risk classification and entropy balancing models are presented in Appendix A. 4.5 Outlier Control Strategy Although our entropy balancing and statistical matching approaches control for all relevant recorded audit and compliance risk factors, they do not account for unrecorded risk factors, such as leads, third- party referrals, and other drivers of audit selection that are not present in our VAT gap estimation sample. Consequently, the subset of taxpayers that were selected for audit on the basis of such unrecorded risk factors are not suitable candidates for the balancing or matching analyses. As discussed in Section 3.3, such taxpayers will tend to be associated with more extreme levels of noncompliance than observationally similar taxpayers that were not selected for examination (i.e., who do not possess the unrecorded risk factor but otherwise have similar values for the recorded factors). Consequently, the inclusion of these taxpayers would result in a potentially substantial over-estimation of the VAT reporting gap. 17 In the previous steps, a joint-significance test of both coefficients of the variable is carried out, because it is desirable to retain the variable in the model if it is a relevant predictor of either type of audit. In the last step, the testing is refined in order to assess whether the variable belongs in only one specific equation or in both equations. 16 Since the subset of taxpayers that were selected for audit on the basis of unrecorded risk factors cannot be directly observed in our data sample, we must rely on an indirect (and somewhat imprecise) method to eliminate them from the analysis. Based on the assumption that such taxpayers will tend to be concentrated among the more extreme audit adjustments observed in the estimation samples, we exclude a portion of the more extreme cases from our analysis. We refer to this approach as an “outlier control strategy�. Under this strategy, we begin by specifying a cut-off threshold for VAT adjustments. We then exclude observations in our sample with VAT adjustments exceeding this threshold value from the balancing and statistical matching analyses. Although such observations therefore do not contribute to the estimation of the portion of the VAT reporting gap that is attributable to unaudited taxpayers, they are accounted for in the estimated portion of the gap associated with the audit population. The key issue for the outlier control strategy is how to select the cut-off threshold. Our baseline approach is to calibrate our cut-off rule so that it yields an overall estimate of the VAT gap that is approximately equal to the official 2019 top-down VAT gap estimate of BGN 1,185 million (European Commission et al., 2023, Table 11, p. 54). To achieve this objective, we first divide our overall estimation data set into the four previously described subsamples: 1) low-risk members of the refund sample; (2) high-risk members of the refund sample; (3) low-risk members of the balance-due sample; and (4) high-risk members of the balance-due sample. Next, for each subsample, we specify a common cut-off percentile based on the subset of audit cases with a positive VAT adjustment.18 We then apply our entropy balancing analysis and compute an estimate of the overall VAT gap based on that cut-off rule. If the estimated VAT gap exceeds (falls short of) BGN 1,185 million, we lower (raise) the cut-off percentile and try again. This process continues until a cut-off percentile is found that yields an overall VAT gap estimate of approximately BGN 1,185 million. Application of this procedure yields a cut-off percentile of 66 percent. In other words, nearly one-third 18 Approximately 30% of all audited returns in our VAT gap estimation data sample had no adjustment to VAT liability. 17 of the most extreme positive VAT adjustment cases are excluded from the process used to predict the VAT gap associated with unaudited VAT returns under the baseline analysis. During this mission, we held extensive discussions with the NRA regarding the audit selection process for VAT returns and the types of risk indicators that the Agency relies upon when selecting returns. As described below in Section 5, the NRA was able to provide us with a rich estimation data set that includes many of the relevant indicators. It is therefore somewhat surprising that it is necessary to exclude nearly one-third of all positive VAT adjustment cases to obtain a top-line VAT gap estimate that is comparable to the official European Commission top-down estimate. Given that we are able to control for a wide range of risk factors used for VAT audit selection, we had anticipated that we would find a smaller share of extreme audit outcomes that would appear to be attributable to taxpayers who were selected on the basis of unrecorded risk factors. One possible explanation for this anomaly is that the official top-down VAT gap estimate for 2019 may be too low. Another indication that the official top-down VAT gap estimate may be biased downward is that completed audits of returns filed in 2019 are associated with an aggregate VAT adjustment of 461.4 million, or 38.9 percent of the entire estimated VAT gap for that year. To put this finding in context, it is important to recognize that the overall VAT audit rate for 2019 returns was equal to 2.36 percent. The audit rate for large firms was slightly lower than this average (2.35 percent), while the rate for medium- sized firms was slightly higher (3.04 percent). While it is reasonable to assume that the NRA has a rigorous and effective audit selection and examination process, an estimate that suggests the Agency was able to uncover 38.9 percent of the VAT gap through audits of only 2.36 percent of the population seems rather optimistic.19 19 As a practical matter, audit-based VAT adjustments to tax returns sometimes overstate the degree of true noncompliance. For instance, some adjustments are later reversed or reduced through dispute processes (although it should be kept in mind that the final resolution of a dispute is not always a superior measure of the true level of tax compliance; also, undetected noncompliance is potentially a more pervasive issue than excessive audit adjustments). It should also be recognized that top-down VAT gap estimates are meant to account for forms of noncompliance that are not captured by audits, such as the portion of the gap that is attributable to taxpayers who fail to comply with their registration requirements (registration gap) and the portion that is attributable to taxpayers who report their tax liability on their returns but fail to fully pay the balance due (payment gap). Even allowing for some degree of excess in VAT adjustments, then, the share of valid adjustments resulting from audits may actually exceed 38.9% of that portion of the official overall estimate of the VAT gap that is attributable to misreporting behavior. 18 Another concern with the official top-down VAT gap measures for Bulgaria is that they show a remarkably steep decline over the 2016-2019 period, from 12.7 percent of VTTL to 9.3 percent. Even allowing for relevant improvements in VAT legislation, tax administration, information sharing, and technology over this period, the inference that there was such a rapid and dramatic improvement in voluntary compliance with the VAT seems rather optimistic. Changes over time in the overall level of voluntary tax compliance are normally rather gradual unless there is a sudden and dramatic change in the tax system, its administration, or the economy at large. Although policy-makers and researchers are generally advised to put more stock in the trends revealed by top-down VAT gap estimates than the levels (see, for instance, Hutton, 2017), neither level nor trend estimates should be used to draw firm conclusions without first making reasonable attempts to substantiate the findings, especially when the results appear anomalous or counterintuitive.20 The above discussion illustrates an important benefit of the bottom-up evaluation approach: it serves as a useful tool for assessing the validity of top-down tax gap estimates, albeit one that has been largely overlooked with regard to the VAT gap. The other major advantage of a bottom-up evaluation approach is that it permits a decomposition of the overall VAT reporting gap, thereby providing a much more detailed portrait of the compliance landscape. To assess how sensitive this portrait is to the specified cut-off threshold employed under our outlier control strategy, we have created an alternative set of VAT reporting gap estimates using a higher threshold than the 66th percentile of positive adjustments that was imposed for our baseline estimates. Under this alternative, we employ a cutoff percentile of 81.2 percent, which was chosen to yield an overall VAT reporting gap target estimate of approximately BGN 1,846 million (the actual estimate is 1,839 million). At this target value, the aggregate revenue from audits of 2019 VAT returns represents approximately 25 percent of the tax gap rather than the more optimistic value of 39.6 percent implied by the official top-down estimate for this year (and used as the source for our baseline analysis). 20 The case for putting greater stock in trend estimates than level estimates rests on the assumption that the direction and degree of bias in annual top-down VAT gap estimates will tend to remain fairly constant. While this notion might have some validity in certain contexts, it is doubtful that it generally describes the state of affairs across a diverse set of countries. Nor, as far as I am aware, has the validity of this assumption been subjected to rigorous empirical examination. 19 5 Sampling Design and Data Summary Our bottom-up estimation strategy requires a representative sample of audited and unaudited tax returns for 2019. The selected sampling design was developed based on the following objectives: • Oversample audited returns – audited returns reflect a small fraction of the population, but play a crucial role in tax gap estimation. • Oversample medium and large firms – the VAT population includes relatively few medium and large firms, so it is important to oversample them when attempting to decompose the VAT reporting gap by firm size. • Include a sufficiently large sample of unaudited returns to permit a decomposition of the VAT reporting gap along various dimensions, including refund/balance-due status, risk level, sector, region, and size. To achieve these objectives, the specified sampling design for this mission involved 100 percent sampling of all audited returns, 100 percent sampling of all unaudited returns filed by medium and large firms,21 and a 3 percent random sample of returns filed by unaudited small businesses. The actual sample included a somewhat larger random sample of the latter group (3.89 percent). To make the estimation sample representative of the overall population of VAT returns filed for 2019, a set of design weights was constructed based on the reciprocal of the sampling probabilities. Specifically, unaudited small businesses were assigned a design weight of (1�0.0389) = 25.7609, while all other taxpayers were assigned a design weight of 1. Table 1 summarizes the composition of the sample and population of returns filed in 2019. The filer population includes all firms that filed at least one monthly return for 2019, and the audit population includes all taxpayers who experienced an audit covering at least one monthly period for that year. The sample includes 22,851 taxpayers from a population of 309,786 taxpayers, including 7,474 audited taxpayers and 15,377 unaudited taxpayers. 21 The designation of firm size was based on the territorial directorate where the taxpayer is registered (Large Taxpayers Directorate, Medium Taxpayers Directorate, or another directorate. This measure differs from the European Union designation of firm size. 20 Table 1: Sample and Population Counts by Audit Status and Firm Size Small Medium Large All Audited Not Audited Not Audited Not Audited Not Audited Audited Audited Audited Sample 7,373 11,906 75 2,390 26 1,081 7,474 15,377 Size Population 7,373 306,315 75 2,390 26 1,081 7,474 309,786 Size The data set includes detailed information regarding each taxpayer and each monthly return filed, including: • Line-item details from each monthly return that was filed for 2018 and 2019; • Indicators for firm size; • Indicators for organizational form; • Indicators for the region where registered (NUTS2 Code); • Indicators for industry sector (NACE Code); • Number of years in business; • Number of employees; • Annual reported net profit in 2019; • Indicator for a history of late-filing; and • A set of risk indicators that is relied upon for VAT audit selection. A set of 126 distinct potential explanatory variables for the risk classification, entropy balancing, and statistical matching procedures was constructed from these variables, reflecting annual taxpayer characteristics and risk factors, year-to-year and month-to-month variations in certain reported items, and various accounting ratios. The full set of potential explanatory variables is described in Appendix A. For the subsample of audited taxpayers, the data set includes additional key information regarding the audits, including: 21 • An indicator for whether the audit involved only VAT or if it was a joint audit that covered other taxes as well; • The period covered by the audit; and • The magnitude of the VAT adjustment resulting from the audit. Table 2 displays the distribution of audits by the number of months in 2019 that were subject to examination. While many audits cover a substantial share of the monthly reports filed during the year, some cover only one or a few months. The implications of this finding for VAT gap estimation are explored in Section 6.2. Table 2: Distribution of Audits by Coverage Period Audit length Number Percentage 1-3 months 1,756 23.5 4-6 months 1,019 13.6 7-9 months 734 9.8 10-12 months 3,965 53.1 Total 7,474 100.0 Table 3 summarizes the audit results by annual refund/balance-due status. Refund claimants are subject to a higher overall audit rate than those who report a balance due (3.31 percent vs. 2.19 percent). At the same time, audits of refund claimants are associated with a higher no-change rate (35.8 percent vs. 29.1 percent). The mean VAT adjustment among audited taxpayers that experience a positive adjustment is much higher among refund claimants (BGN 131,075) than those who report a balance due (BGN 78,861). However, this large difference in mean adjustment values is attributable to a single outlier with an extremely large VAT adjustment.22 The median VAT adjustment among audited returns with a positive adjustment is actually somewhat smaller among refund claimants (BGN 13,515) than those reporting a balance due (BGN 14,314). 22 The mean VAT adjustment among refund claimants is equal to BGN 83,329 when this one outlier case is excluded. 22 Table 3: Audit Results by Refund/Balance-Due Status Refund Balance Due* Number of returns audited 1,537 5,937 (Audit coverage rate) (3.31%) (2.19%) Percentage of audits resulting in no VAT 35.8 29.1 Adjustment Mean VAT adjustment when positive (BGN) 131,075 78,861 (Median VAT adjustment when positive (BGN)) (13,513) (14,314) *Includes taxpayers with a zero net reported balance due. 6 Results This section presents the results of the analysis, including our baseline estimates and our various sensitivity analysis. 6.1 Baseline Results Our baseline estimation approach for the Bulgarian VAT reporting gap in 2019 was calibrated to approximately match the official top-line VAT gap estimate of BGN 1,185 million for that year. As shown in Table 4, our baseline estimate of BGN 1,189 million is a close match. Overall, the estimated gap represents 10.4 percent of aggregate net reported VAT liability – a measure that we refer to as the “noncompliance rate�. A commonly used alternative measure of overall VAT noncompliance is the ratio of the estimated VAT gap to aggregate VAT true tax liability (VTTL). For this measure, the estimated rate is 9.4 percent. 23 In this report, we rely on the former noncompliance rate measure, because it can be applied equally well to the subgroup of the population that claims a net refund on an annual basis and the subgroup that reports a non-negative tax balance. As shown in Table 4, the baseline estimate indicates that the noncompliance rate among taxpayers reporting a balance due (3.17 percent) is substantially larger than the rate for refund claimants (1.46 percent). The latter group accounts for approximately 22 percent of the overall gap. 23 These are based on an estimated aggregate net VAT reported in our weighted estimation sample of BGN 11.445 billion, which is slightly higher than the corresponding estimate in the official VAT gap report by the EC for 2019 (BGN 11.061 billion). The corresponding statistics based on the lower estimate of aggregate reported net VAT liability used in that report are 10.7 percent and 9.7 percent, respectively. 23 The breakdown in Table 4 also yields another important insight into the nature of compliance under a system that taxes on the basis of value added. Observe that the overall rate of noncompliance is much higher than the corresponding rates for the refund and balance-due groups. This is because the base against which misreporting is compared when the groups are combined is equal to the difference between the bases of the latter and former groups. Thus, while refund claimants as a group overstate the refunds to which they are entitled by a modest degree (1.46 percent), and those reporting a balance due as a group understate how much they owe to a somewhat larger degree (3.17 percent), the impact of their misreporting with respect to overall net tax revenue (10.39 percent) is quite substantial. Table 4: Baseline Top-Line VAT Reporting Gap Estimates by Refund/Balance-Due Status* (BGN Millions) Refund Balance Due** Total VAT Reporting Gap 260.9 928.1 1,189 Noncompliance Rate 1.46% 3.17% 10.39% *Noncompliance rate represents the ratio of the VAT reporting gap to the aggregate net reported refund or balance due amount. ** Includes taxpayers with a zero net reported balance due. Table 5 digs a bit deeper by breaking these figures down by taxpayer size group. Overall, the estimated rate of noncompliance is much higher for small taxpayers (12.65 percent) than medium (2.86 percent) or large taxpayers (0.72 percent). This general pattern does not hold when one focuses on refund claimants, however. Within this subgroup, the estimated rate of noncompliance is substantially higher for medium taxpayers (4.77 percent) than small ones (1.74 percent), although large taxpayers do continue to have the lowest rate (0.12 percent). Table 5: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Taxpayer Size, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Small 240.7 913.6 1,154.3 (1.74%) (3.98%) (12.65%) Medium 15.8 8.1 24.0 (4.77%) (0.70%) (2.86%) Large 4.3 6.4 10.7 (0.12%) (0.12%) (0.72%) 24 Table 6 breaks down the VAT reporting gap estimates by NACE sector. The overall estimated rates of noncompliance vary substantially across the different sectors. Extremely high rates are observed for real estate (107.83 percent), professional, scientific, and technical activities (52.15 percent), and taxpayers with missing NACE information (146.09 percent). To some degree, these findings reflect the fact that refunds claimed by the refund group offset a very substantial share of the revenue received from the balance-due group, resulting in a small base for computation of the noncompliance rate. In the case of the missing NACE code category, refund claims slightly more than offset the revenue received from the balance-due group. While not as extreme as the results for these three industry sectors, the overall estimated rate of noncompliance is also rather high for the combined service category for NACE codes K, P, Q, R, and S (17.49 percent). Among refund claimants, the estimated rate of noncompliance is especially high for the professional, scientific, and technical activities sector (20.52 percent). Audited returns account for 92.8 percent of the entire estimated gap within this sector, which indicates that the NRA VAT audit program is very effectively addressing the known compliance risks associated with this population segment.24 Relatively high estimated rates of noncompliance are also observed within certain other service sectors, including food and beverage (7.91 percent) and administrative and support activities (8.37 percent). The estimated noncompliance rate within the construction sector is also rather high (8.56 percent). Especially low estimated rates of noncompliance among refund claimants are observed in the primary [agriculture, forestry, and fishing (0.62 percent), mining, electricity, water supply (0.75 percent)] and manufacturing (0.14 percent) industries as well as the information and communication sector (0.48 percent). Among those reporting a balance due, the pattern of estimated noncompliance rates is fairly similar to what was observed for the combined groups, with the exception that the estimated rate for the real estate sector is much lower (3.54 percent). The highest estimated noncompliance rates within the balance-due population are for the combined service category for NACE codes K, P, Q, R, and S (7.48 percent), the agricultural, forestry, and fishing sector (11.22 percent), professional, scientific, and technical activities (17.36 percent), and, especially, taxpayers with missing NACE information (67.61 percent). Within these four sector categories, audited returns respectively account for 32.3, 14.9, 19.1, 24 Under the breakdown of the estimated VAT gap in Appendix B based on a higher top-line gap estimate, 78.1 percent of the estimated gap associated with refund claimants in this sector is accounted for by audited taxpayers. 25 and 49.9 percent of the entire estimated gap within the category. This indicates that the NRA VAT audit program is actively working to address the known compliance risks associated with these segments of the balance-due population. Table 6: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Sector, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Total Due Agriculture, forestry, fishing (NACE A) 6.89 46.13 53.02 (0.62%) (11.22%) (7.57%) Mining, electricity, water supply (NACE B, D, E) 4.05 6.49 10.54 (0.75%) (0.32%) (0.69%) Manufacturing (NACE C) 13.78 26.67 40.45 (0.14%) (0.44%) (1.10%) Construction (NACE F) 31.42 57.67 89.09 (8.56%) (1.34%) (2.27%) Wholesale & retail trade (NACE G) 80.28 407.04 487.32 (3.42%) (3.92%) (6.07%) Transportation & storage (NACE H) 7.71 15.29 23.00 (0.94%) (1.40%) (8.37%) Accommodation (NACE 55) 12.63 9.41 22.03 (1.63%) (3.13%) (4.65%) Food & beverage service (NACE 56) 2.75 42.81 45.56 (7.91%) (7.00%) (7.89%) Information & communication (NACE J) 3.85 31.21 35.06 (0.48%) (2.22%) (5.87%) Finance, insur., educ., health, & soc. work (NACE K, P, Q)* 15.79 (5.98%) Arts, entertainment, recreation (NACE R)* 8.98 1.11 37.45 (5.40%) (2.32%) (17.49%) Other service activities (NACE S)* 11.57 (16.90%) Real estate activities (NACE L) 12.15 17.35 29.50 (2.62%) (3.54%) (107.83%) Professional, scientific, & technical activities (NACE M) 54.36 96.05 150.41 (20.52%) (17.36%) (52.15%) Administrative and support service activities (NACE N) 12.88 23.61 36.50 (8.37%) (2.18%) (3.92%) 26 Refund Balance Total Due Unknown (missing NACE code) 9.13 119.94 129.07 (3.44%) (67.61%) (146.09%) *These three industry categories were combined for the refund group analysis to address low sample size issues, while they were estimated separately for the balance-due group analysis. For the balance-due group, the combined categories are associated with an estimated VAT gap of BGN 380.5 million and a noncompliance rate of 7.48 percent. Table 7 breaks down the baseline VAT reporting gap estimates by region. The extremely high estimated overall rates of noncompliance within the Northcentral (41.71 percent) and, especially, Northwest (212.35 percent) are partly attributable to the fact that refunds claimed by the refund group slightly exceed revenue from the balance-due group, leaving only a small base for estimation of the overall noncompliance rate. However, both regions have relatively high levels of noncompliance within the separate refund and balance-due groups as well, suggesting that noncompliance is indeed relatively high in these regions. Noncompliance rates are also relatively high within both the refund and balance-due populations registered in the Southwest region. Table 7: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Region, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Northeast 12.19 58.94 71.14 (0.61%) (1.65%) (4.52%) Northcentral 17.23 144.03 161.26 (0.73%) (7.34%) (41.71%) Northwest 21.56 204.27 225.83 (1.78%) (18.52%) (212.35%) Southeast 3.65 23.16 26.80 (0.30%) (1.18%) (3.62%) Southcentral 20.41 68.84 89.25 (0.41%) (0.91%) (3.50%) Southwest 185.83 428.90 614.73 (3.04%) (3.25%) (8.69%) 6.2 Sensitivity Analysis Regarding Audit Coverage Period 27 As discussed in Section 5, some of the audits in our sample covered only a relatively small portion of the monthly returns filed during the year. This raises the possibility that the results from such audits fail to capture the full extent of noncompliance on an annual basis; in particular, to the extent that taxes were understated on some of the monthly returns that were not subject to audit, this would not be captured. To explore the implications of this finding, we have conducted a sensitivity analysis in which the audit sample has been restricted to taxpayers who were subjected to an audit covering all monthly returns that were actually filed for 2019.25 Taxpayers who were only subject to partial-year audits are effectively treated as members of the unaudited taxpayer population for purposes of this analysis. The top-line results are summarized in Table 8. Compared with our baseline estimates presented in Table 4, the overall estimated VAT reporting gap when partial-year audits have been excluded is actually about 10 percent smaller: BGN 1,061.3 million compared to BGN 1,189.0 million. Based on this finding, it seems prudent to retain the partial-year audit cases for VAT gap estimation. These results do not necessarily indicate that taxpayers subject to partial-year audits are fully compliant with respect to the monthly returns that have not been subjected to examination, but they do suggest that such taxpayers have a relatively high degree of noncompliance with respect to the months that were covered by the audit. Table 8: Sensitivity Analysis Top-Line VAT Reporting Gap Estimates by Refund/Balance-Due Status (BGN Millions) Refund Balance Due Total VAT Reporting Gap 201.9 859.4 1,061.3 Noncompliance Rate 1.13% 2.93% 9.27% 6.3 Sensitivity Analysis Regarding Estimation Methodology As discussed in Section 4.3, we have performed an alternative analysis of the VAT reporting gap using a statistical matching strategy as a check on our entropy balancing methodology. As described previously, we have developed two sets of estimates using this alternative approach, one which assumes that unaudited taxpayers have the same level of tax underreporting as their matched audited counterparts (“matching on level�), and one which assumes they have the same rate of underreporting (“matching on 25 Some taxpayers in the sample filed fewer than 12 monthly returns, in some cases because they became newly registered to collect VAT during the course of the year. We retained partial-year filers in our audit sample so long as the audit covered all months that a return was filed. 28 rate�). The same outlier cutoff rule (66th percentile) has been used for this exercise as was used for our baseline analysis. Table 9 summarizes the top line estimates from our two alternative approaches. The matching-on-level results are extremely similar to those obtained using entropy balancing. The matching-on-rate results produce a somewhat higher estimate of the overall VAT gap, characterized by a somewhat smaller gap among refund claimants and a somewhat higher gap among those reporting a balance due. Breakdowns of the statistical matching estimation results by size, sector, and region are provided in Appendix B. They tend to show similar patterns of noncompliance as our baseline methodology, although they indicate a somewhat lower level of noncompliance among medium-sized refund claimants as well as taxpayers registered in the Northcentral and Northwest regions. Overall, the broad similarity in results is encouraging. Table 9: Top-Line Matching-on-Level and Matching-on-Rate VAT Reporting Gap Estimates Matching on Level Matching on Rate Refund Balance due Total Refund Balance due Total VAT Reporting 264.1 910.4 1,174.4 226.6 1137.7 1,364.3 Gap Noncompliance 1.48% 3.11% 10.26% 1.27% 3.88% 11.92% Rate 6.4 Sensitivity Analysis Using Higher Cutoff Rule As discussed in Section 4.5, we have re-estimated our entropy balancing model using a higher audit cutoff percentile of 81.2, which implies an overall 2019 VAT reporting gap of BGN 1,839 million, which implies that audit adjustments account for about 25 percent of the gap rather than 39.6 percent under our baseline strategy. Table 10 summarizes our top-line estimates. As with our baseline model, the noncompliance rate tends to be substantially higher among balance-due taxpayers than refund claimants. Table 10: Alternative Top-Line VAT Reporting Gap Estimates by Refund/Balance-Due Status (BGN Millions) Refund Balance Due Total 29 VAT Reporting 392.1 1,447.3 1,839.4 Gap Noncompliance 2.20% 4.94% 16.07% Rate Appendix C provides a breakdown of these alternative estimates based on size, sector, and region. Although the estimated levels and rates of noncompliance tend, of course, to be higher than the baseline results, the patterns of noncompliance across the various population segments are reasonably similar. Again, this is encouraging. 7 Concluding Remarks This study has provided a rather thorough preliminary investigation of the Bulgarian VAT reporting gap in 2019 using a novel bottom-up tax gap estimation methodology. As has been demonstrated, this model can be calibrated to the official European Commission top-line VAT gap estimate based on top- down estimation methods, or it can be employed to produce estimates over a specified range for the overall gap, chosen based on factors such as overall audit yields in relation to the audit coverage rate and the plausibility of different cutoff rules for calibration of the estimates. Several sensitivity analyses have been performed, and the model appears to be reasonably robust and informative. In contrast to the top-down approach, which only yields an overall estimate of the VAT gap, the bottom-up estimation strategy developed in this study permits a decomposition of the gap in various useful ways, thereby providing a rather detailed portrait of the compliance landscape. The report suggests that revenue authorities could fine-tune their compliance programs to more effectively address taxpayer populations with relatively high compliance risks. The results of the bottom-up VAT gap study indicate that the published top-down VAT gap estimates for Bulgaria by the European Commission may be too low and that the sharp estimated decline in the gap as a share of taxes owed in recent years is potentially misleading. Consequently, there may be a greater scope for increasing VAT revenues via improvements to audit processes and other compliance-oriented activities than has heretofore been recognized. The bottom-up estimates provide evidence on taxpayer groups for which the rate of VAT noncompliance is relatively high. For instance, taxpayers reporting a net balance due tend to have a 30 higher estimated rate of noncompliance than those who claim a refund. Among those reporting a net balance due, the estimated rate of noncompliance is particularly high among medium-sized businesses and companies that operate in certain sectors (including various services; professional, scientific, and technical activities; agriculture, forestry, and fishing; as well as companies that fail to report their sector of operation). The estimated rates of noncompliance are also relatively high among VAT refund claimants in certain sectors, including professional, scientific, and technical activities; construction; and various service activities. Geographically, taxpayers that are registered in the northwest, northcentral, and, to a lesser extent, southwest regions have relatively high estimated rates of VAT noncompliance. The bottom-up study indicates that audits are already effectively addressing certain pockets of VAT noncompliance, including some of the sectors that pose an especially high compliance risk. However, there are potential ways to refine audit programs and other compliance activities both to achieve a higher voluntary rate of compliance and to recover a larger share of unreported taxes. For instance, although automated periodic risk assessment scores are employed by the NRA to help target audit resources towards areas of high compliance risk among refund claimants, no comparable risk scores are assigned to taxpayers reporting a net balance due. Given the relatively high rates of noncompliance within the balance-due population, it would seem worthwhile to explore ways to develop suitable risk scores for these taxpayers as well. More generally, potential improvements in VAT revenues might be achieved through a more systematic approach to VAT audit selection and other compliance activities, guided to a greater degree by data analytics. 31 References Adu-Ababio, K., A. Koivisto, E. Lungu, E. Mwale, J. Msoni, and K. Musole (2023) “Estimating Tax Gaps in Zambia: A Bottom-Up Approach Based on Audit Assessments,� WIDER Working Paper 2023/25, United Nations University, February. https://www.wider.unu.edu/sites/default/files/Publications/ Working-paper/PDF/wp2023-25-estimating-tax-gaps-Zambia-bottom-up-approach-audit- assessments.pdf Alaimo Di Loro, P., D. Scacciatelli, and G. Tagliaferri (2023) “2-Step Gradient Boosting Approach to Selectivity Bias Correction in Tax Audit: An Application to the VAT Gap in Italy,� Statistical Methods & Applications (32), 237-270. Dubin, J.A. and McFadden, D.M. (1984) “An Econometric Analysis of Residential Electric Appliance Holdings and Consumption,� Econometrica 52(2), 345-362. Erard, B. and Feinstein, J.S. (2011) “The Individual Income Tax Reporting Gap: What We See and What We Don’t,� in A. Plumley (Ed.) Recent Research on Tax Administration and Compliance: Selected Papers Given at the 2011 IRS-TPC Research Conference on New Perspectives in Tax Administration, 73-92, Washington, DC. https://www.irs.gov/pub/irs-soi/11rescon.pdf European Commission, Directorate-General for Taxation and Customs Union, Poniatowski, G., Bonch- Osmolovskiy, M., Śmietanka, A., et al. (2022) “VAT Gap in the EU : Report 2022,� Publications Office of the European Union. https://op.europa.eu/en/publication-detail/-/publication/030df522-7452- 11ed-9887-01aa75ed71a1 Feinstein, J.S. (1991), “An Econometric Analysis of Income Tax Evasion and Its Detection,� RAND Journal of Economics 22(1), 14-35. Fiscalis Tax Gap Project Group. (2016) “The Concept of Tax Gaps: Report on VAT Gap Estimations,� European Commission Directorate-General Taxation and Customs Union, Brussels, March. https://taxation-customs.ec.europa.eu/system/files/2016-09/tgpg_report_en.pdf Hainmueller, J. (2012) “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies,� Political Analysis (20:1), 25-46. Honoré, B.E. and Hu, L. (2020) “Selection Without Exclusion,� Econometrica 88(3), 1007-1029. Hutton, E. (2017) “The Revenue Administration—Gap Analysis Program: Model and Methodology for Value-Added Tax Gap Estimation,� IMF Technical Notes and Manuals 17/04, Washington, DC: International Monetary Fund, March. https://www.imf.org/-/media/Files/Publications/ TNM/2017/tnm1704.ashx 32 International Monetary Fund (2016) “Denmark Technical Assistance Report—Revenue Administration Gap Analysis Program—the Value-Added Tax Gap,� IMF Country Report No. 16/59, February. https://www.imf.org/-/media/Websites/IMF/imported-full-text- pdf/external/pubs/ft/scr/2016/_cr1659.ashx Udell, M. (2021) “Preliminary Estimates of the CIT Tax Gap and Estimates of the VAT Tax Gap Using Company Level Tax Returns for Tax Years 2015, 2016, and 2017,� World Bank Group slides, March 30. 33 Appendix A: Variable Selection Results A.1 Potential Explanatory Variables Table A1: Description of Potential Explanatory Variables for Risk Classification, Balancing, and Matching Size medium – 1/0 indicator for size_reg = “TD_MEDIUM_TAXPAYERS� large – 1/0 indicator for size_reg = “TD_LARGE_TAXPAYERS� lnbase – natural log of aggregate value of vat_cl_1 for 2019 (equals zero if vat_cl_1 <= 0)26 zerobase (refund) – 1/0 indicator for aggregate value of vat_cl_1 for 2019 equal to 0 lnprofit – 1/0 indicator for natural log of 2019 firm profit (equals zero if profit<=0) noprofit – 1/0 indicator for 2019 profit = 0 loss – 1/0 indicator for 2019 profit < 0 missprof – 1/0 indicator for missing profit measure for 2019 lnnumemp – natural log of one plus the number of employees noemp – 1/0 indicator for number of employees is either zero or missing under6emp – 1/0 indicator for number of employees < 6 Organizational Form singlellc – 1/0 indicator for single-owner limited liability company asfouco – 1/0 indicator for cooperative, association, or foundation civil – 1/0 indicator for civil company foreign – 1/0 indicator for foreign branch or foreign non-person insure – 1/0 indicator for insurance company jointstock – 1/0 indicator for joint-stock company or single-owner joint-stock company soletrader – 1/0 indicator for sole trader other – 1/0 indicator for other legal form of organization (REIT or “unknown�) 26 The term “vat_cl_k� refers to the entry numbered “cl. k� on the Bulgarian monthly VAT return i Region northwest – 1/0 indicator for northwestern region (BG31) northcentral – 1/0 indicator for northern central region (BG32) northeast – 1/0 indicator for northeastern region (BG33) southeast – 1/0 indicator for southeastern region (BG34) southcentral – 1/0 indicator for southern central region (BG42) Age newfirm – 1/0 indicator for firm age of 0 years as of 2019 under5 – 1/0 indicator for firm age of less than 5 years as of 2019 lnage – natural log of one plus firm age in 2019 firstyr – 1/0 indicator for first VAT return filed for 2019 (no returns filed for 2018). Sector nace_2 – 1/0 indicator for NACE sector code (nace_sect) = “B� nace2_alt – 1/0 indicator for nace_sect = “B�, “D�, or “E� nace_3 – 1/0 indicator for nace_sect = “C� nace_4 – 1/0 indicator for nace_sect = “D� nace_5 – 1/0 indicator for nace_sect = “E� nace_6 – 1/0 indicator for nace_sect = “F� nace_7 – 1/0 indicator for nace_sect = “G� nace_8 – 1/0 indicator for nace_sect = “H� nace_9 – 1/0 indicator for NACE division code (nace_div) equal to “55� nace_10 – 1/0 indicator for nace_div = “56� nace_11 – 1/0 indicator for nace_sect = “J� nace_12 – 1/0 indicator for nace_sect = “K� ii nace_12_refund – 1/0 indicator for nace_sect = “K�, “P�, “Q�, “R� or “S� nace_12_baldue – 1/0 indicator for nace_sect = “K�, “P�, or “Q� nace_13 – 1/0 indicator for nace_sect = “L� nace_14 – 1/0 indicator for nace_sect = “M� nace_15 – 1/0 indicator for nace_sect = “N� or “O� nace_16 – 1/0 indicator for nace_sect = “P� or “Q� nace_17 – 1/0 indicator for nace_sect = “R� nace_18 – 1/0 indicator for nace_sect = “S� nace_19 – 1/0 indicator for a missing NACE sector code NRA Risk Factors pra_hi – 1/0 indicator for either pra_score_1_monthly > 383 or pra_score_2_monthly > 198 in any month in 2019 pra_lo – 1/0 indicator if the maximum monthly value of pra_score_1_monthly is less than 151 or the maximum monthly value of pra_score_2_monthly < 82 in 2019 no_pra_score – 1/0 indicator for the absence of both pra_score_1_monthly and pra_score_2_monthly scores for all monthly returns filed for 2019 red_ind – 1/0 indicator for membership in register of risky taxpayers red_trans_monthly – 1/0 indicator for presence of any transactions with a member of the register of risky taxpayers in 2019 num_red_trans – 1/0 indicator for number of transactions with members of the register of risky taxpayers in 2019 dereg_hist – 1/0 indicator for whether taxpayer was de-registered and then re-registered for VAT in the past 10 years vies – 1/0 indicator for frequent occurrence of corrections of turnover data in VIES-declarations cred_notes – 1/0 indicator for posting multiple credit notes and/or credit notes of large amounts in the sales ledger intraeu_disc – 1/0 indicator for discrepancies in intra-EU transactions between the taxpayer and a foreign seller iii other_risk – 1/0 indicator for established discrepancies for the specific period after a counter-detection has been carried out late_filing_hist – 1/0 indicator for presence of any late-filed monthly VAT returns for 2019 num_late_returns – 1/0 indicator for number of late-filed monthly VAT returns for 2019 missing_months – 1/0 indicator for fewer than 12 monthly returns filed for 2019 num_missing – equals twelve minus the number of monthly returns filed for 2019 2019 VAT Return Characteristics lnowe – natural log one plus the aggregate net tax balance (val_cl_20 – vat_cl_40) in 2019 (restricted to cases for which the aggregate balance is non-negative) lnrefund – natural log one plus the absolute value of the aggregate net tax balance (val_cl_20 – vat_cl_40) in 2019 (restricted to cases for which the aggregate balance is negative) zerovat – 1/0 indicator for aggregate value of (vat_cl_20-vat_cl_40) is equal to zero for 2019 net_tax_paid – aggregate value of (vat_cl_71 – vat_cl_80 – vat_cl_81 – vat_cl_82) reported for 2019 (restricted to case for which the aggregate value is non-negative) ln_net_tax_paid – natural log of one plus net_tax_paid net_tax_refund -- aggregate value of (vat_cl_80 +vat_cl_81 + vat_cl_82-vat_cl_71) reported for 2019 (restricted to cases for which the aggregate value is non-negative) ln_net_tax_refund – natural log of one plus net_tax_refund num_months_owe – number of months in 2019 for which vat_cl_50 > 0 allnil – 1/0 indicator for all monthly returns containing nil reports (no non-zero values for any relevant fields on the return) intraeu_pos – 1/0 indicator for vat_cl_12 > 0 on any monthly return for 2019 intraeu_neg – 1/0 indicator for vat_cl_12 < 0 on any monthly return for 2019 ninepct – 1/0 indicator for a non-zero amount reported for vat_cl_13 on any monthly return for 2019 zero_14 – 1/0 indicator for vat_cl_14 not equal to zero on any monthly return for 2019 zero_15 – 1/0 indicator for vat_cl_15 not equal to zero on any monthly return for 2019 zero_16 – 1/0 indicator for vat_cl_16 not equal to zero on any monthly return for 2019 iv zerorated – 1/0 indicator for non-zero amount reported for any of vat_cl_14, vat_cl_15, or vat_cl_16 on any monthly return for 2019 exempt_17 – 1/0 indicator for vat_cl_17 not equal to zero on any monthly return for 2019 exempt_18 – 1/0 indicator for vat_cl_18 not equal to zero on any monthly return for 2019 exempt_19 – 1/0 indicator for vat_cl_19 not equal to zero on any monthly return for 2019 for_rev_neg – 1/0 indicator for aggregate value of (vat_cl_17+vat_cl_18+vat_cl_19) for 2019 is less than 0 credbase_30 – 1/0 indicator for non-zero amount reported for vat_cl_30 on any monthly return for 2019 credbase_31 – 1/0 indicator for non-zero amount reported for vat_cl_31 on any monthly return for 2019 credbase_32 – 1/0 indicator for non-zero amount reported for vat_cl_32 on any monthly return for 2019 reimb_tax_70 – 1/0 indicator for vat_cl_70 > 0 on any monthly return for 2019 reimb_tax_80 – 1/0 indicator for vat_cl_80 > 0 on any monthly return for 2019 reimb_tax_81 – 1/0 indicator for vat_cl_81 > 0 on any monthly return for 2019 hi_taxpurch_shr – 1/0 indicator for ratio of aggregate value of (vat_cl_31 + vat_cl_32) to aggregate value of (vat_cl_11 + vat_cl_12 + vat_cl_13) is greater than one (variable equals zero if denominator is non-positive) hi_totpurch_shr – 1/0 indicator for ratio of aggregate value of (vat_cl_30+vat_cl_31+vat_cl_32) to the aggregate value of (vat_cl_1+vat_cl_17+vat_cl_18+vat_cl_19) for 2019 is greater than one (variable equals zero if denominator is non-positive) base_to_totrev – ratio of aggregate value of vat_cl_1 to aggregate value of (vat_cl_1+vat_cl_17+vat_cl_18+vat_cl_19) for 2019 (variable equals 0 if denominator is non-positive) refund_red – 1/0 indicator for aggregate value of vat_cl_60 is less than the aggregate value of (vat_cl_80+vat_cl_81+vat_cl_82-5) for 2019 refund_inc – 1/0 indicator for aggregate value of vat_cl_60 is greater than the aggregate value of (vat_cl_80+vat_cl_81+vat_cl_82+5) tax_red – 1/0 indicator for aggregate value of vat_cl_71 is less than aggregate value of (vat_cl_50-5) Relative 2019 Reporting Characteristics Within Size-NACE Categories rel_eff_vat_rate – ratio of eff_vat rate to mean value of eff_vat_rate within the taxpayer’s size (small=1 or small=0) and sector (nace_1 – nace_19) category, where: v eff_vat_rate – ratio of aggregate value of vat_cl_20 to aggregate value of vat_cl_1 for 2019 (variable equals zero if denominator is non-positive) rel_eff_vat_rate_new_bus – interaction of rel_eff_vat_rate and an indicator for a business whose age is less than 2 years. rel_share_intra_eu – ratio of share_intra_eu to the mean value of share_intra_eu within the taxpayer’s size and sector category, where: share_intra_eu – ratio of aggregate value of vat_cl_12 to aggregate value of vat_cl_1 for 2019 (variable equals zero if denominator is non-positive; variable equals one if ratio is greater than one) rel_share_ninepct – ratio of share_ninepct to the mean value of share_ninepct within the taxpayer’s size and sector category, where: share_ninepct – ratio of aggregate value of vat_cl_13 to aggregate value of vat_cl_1 for 2019 (variable equals zero if denominator is non-positive) rel_share_zerorate – ratio of share_zerorate to the mean value of share_zerorate within the taxpayer’s size and sector category, where: share_zerorate – ratio of aggregate value of (vat_cl_14+vat_cl_15+vat_cl_16) to aggregate value of vat_cl_1 for 2019 (variable equals zero if denominator is non-positive) rel_tax_purch_to_sales – ratio of tax_purch_to_sales to the mean value of tax_purch_to_sales within the taxpayer’s size and sector category, where: tax_purch_to_sales – ratio of aggregate value of (vat_cl_31 + vat_cl_32) to aggregate value of (vat_cl_11 + vat_cl_12 + vat_cl_13) for 2019 (variable equals zero if denominator is non-positive; variable equals one if ratio greater than one) rel_tot_purch_to_sales – ratio of tot_purch_to_sales to the mean value of tot_purch_to_sales within the taxpayer’s size and sector category, where: tot_purch_to_sales – ratio of aggregate value of (vat_cl_30+vat_cl_31+vat_cl_32) to aggregate value of (vat_cl_1+ vat_cl_17+vat_cl_18+vat_cl_19) for 2019 (variable equals zero if denominator is non- positive) 2018 VAT Return Characteristic and 2018-2019 Reporting Changes net_tax_refund_2018_ind – 1/0 indicator for aggregate value of (vat_cl_71-vat_cl_80-vat_cl_81- vat_cl_82) less than zero for 2018 ztp_vat – 1/0 indicator for aggregate amount reported for vat_cl_20 was positive for 2019 but non- positive for 2018 vi ptz_vat – 1/0 indicator for aggregate amount reported for vat_cl_20 was positive for 2018 but non- positive for 2019 ztp_ieu – 1/0 indicator for vat_cl_12 was positive for 2019 but non-positive for 2018 ptz_ieu – 1/0 indicator for aggregate amount reported for vat_cl_12 was positive for 2018 but non- positive for 2019 ztp_9pct – 1/0 indicator for aggregate amount reported for vat_cl_13 was positive for 2019 but non- positive for 2018 ptz_9pct – 1/0 indicator for aggregate amount reported for vat_cl_13 was positive for 2018 but non- positive for 2019 ztp_0rate – 1/0 indicator for aggregate amount reported for (vat_cl_14+vat_cl_15+vat_cl_16) was positive for 2019 but non-positive for 2018 ptz_0rate – 1/0 indicator for aggregate amount reported for (vat_cl_14+vat_cl_15+vat_cl_16) was positive for 2018 but non-positive for 2019 ptz_for – 1/0 indicator for aggregate amount reported for (vat_cl_17+vat_cl_18+vat_cl_19) was positive for 2018 but non-positive for 2019 ztp_for – 1/0 indicator for aggregate amount reported for (vat_cl_17+vat_cl_18+vat_cl_19) was positive for 2019 but non-positive for 2018 hipratechg – 1/0 indicator for eff_vat_rate for 2019 over 5 percentage points higher than eff_vat_rate for 2018 hinratechg – 1/0 indicator for eff_vat_rate for 2019 over 5 percentage points lower than eff_vat_rate for 2018 hipbchg – 1/0 indicator for an increase in the aggregate value of vat_cl_1 by more than 50 percent and also more than BGN 10,000 between 2018 and 2019 hinbchg – 1/0 indicator for a decrease in the aggregate value of vat_cl_1 by more than 50 percent and also more than BGN 10,000 between 2018 and 2019 hipieuchg – 1/0 indicator for an increase in share_intra_eu of more than 0.10 between 2018 and 2019 hinieuchg – 1/0 indicator for a decrease in share_intra_eu of more than 0.10 between 2018 and 2019 hip9chg – 1/0 indicator for an increase in share_ninepct of more than 0.05 between 2018 and 2019 hin9chg – 1/0 indicator for a decrease in in share_ninepct of more than 0.05 between 2018 and 2019 hiptptschg – 1/0 indicator for an increase in tax_purch_to_sales of more than 0.20 between 2018 and 2019 vii hintptschg – 1/0 indicator for a decrease in tax_purch_to_sales of more than 0.20 between 2018 and 2019 hip0chg – 1/0 indicator for an increase in share_zerorate of more than 0.05 between 2018 and 2019 hin0chg – 1/0 indicator for a decrease in share_zerorate of more than 0.05 between 2018 and 2019 A.2 Logit Results for Final Risk-Classification Specifications Table A2: Audited Taxpayers Reporting a Refund Dependent Variable VAT Audit Adjustment < BGN 1,000 medium -1.058 (1.55) zerobase 0.793 (2.94)** singlellc -0.393 (3.14)** civil 1.511 (1.67) northwest 0.559 (1.77) northcentral 0.607 (2.25)* southeast 0.989 (5.39)** newfirm 0.641 (1.83) lnage 0.281 (3.29)** lnrefund -0.073 (2.54)* ln_net_tax_refund 0.125 (2.96)** pra_lo 0.376 (1.40) red_trans_monthly -1.070 (6.37)** intraeu_disc -0.312 (1.93) other_risk -0.324 (1.75) missing_months -0.619 viii (2.82)** intraeu_pos 0.507 (3.72)** credbase_30 0.418 (2.87)** tax_purch_to_sales 0.255 (1.54) tot_purch_to_sales -0.185 (1.37) ztp_9pct -2.144 (2.01)* ptz_for 0.445 (1.72) hipbchg 0.197 (1.34) hip9chg 1.599 (1.96)* nace_11 0.575 (1.50) nace_12old -0.876 (1.62) hip0chg -0.637 (3.54)** noprofit -0.619 (1.67) asfouco -0.779 (1.80) num_months_owe -0.035 (1.49) refund_red 0.270 (1.81) ptz_vat 0.564 (1.76) _cons -1.547 (3.46)** N 1,537 * p<0.05; ** p<0.01 ix Table A3: Audited Taxpayers Reporting a Net Balance Due Dependent Variable: VAT Audit Adjustment < BGN 1,000 zerobase 0.672 (4.05)** missprof -0.682 (3.87)** singlellc -0.099 (1.42) other -0.736 (2.05)* southeast 0.664 (7.04)** newfirm 0.426 (2.53)* lnage 0.158 (3.38)** lnowe 0.045 (3.89)** red_ind -0.683 (4.20)** red_trans_monthly -1.084 (9.45)** num_red_trans -0.125 (5.69)** intraeu_disc -0.507 (4.75)** other_risk -0.524 (4.94)** missing_months -0.849 (8.02)** intraeu_pos 0.240 (3.02)** for_rev_neg 1.962 (2.37)* rel_share_intra_eu 0.023 (2.93)** rel_tax_purch_to_sales -0.217 (5.34)** ptz_vat -0.257 (1.25) ztp_0rate -0.217 (1.48) ptz_0rate -0.213 x (1.47) ptz_for 0.346 (2.45)* hipbchg -0.365 (4.12)** nace_6 0.213 (2.38)* nace_12 1.390 (3.16)** nace_13 0.397 (2.52)* nace_17 0.795 (3.00)** nace_19 1.064 (4.93)** num_months_owe -0.015 (1.34) nace_10 -0.205 (1.59) _cons -0.124 (0.79) N 5,937 A.3 Multinomial Logit Results for Selected Explanatory Variables for Entropy Balancing Table A4: Estimation Results, Low-Risk Refund Claimants VAT-Only Joint Audit net_tax_refund -0.000 -0.000 (0.95) (0.17) large -4.832 -2.805 (5.55)** (4.61)** northcentral -0.108 -0.625 (0.27) (2.00)* northwest 0.212 -0.072 (0.49) (0.22) southeast 0.115 1.094 (0.31) (4.37)** southcentral 0.438 -0.303 (1.24) (1.07) southwest 1.356 -0.126 (4.32)** (0.50) nace_2_alt -0.618 0.402 (0.84) (0.81) xi nace_3 -0.132 -1.179 (0.36) (3.47)** nace_6 0.861 0.857 (1.93) (2.16)* nace_7 0.299 -0.320 (0.98) (1.21) nace_8 -0.560 -0.555 (1.38) (1.82) nace_9 0.602 0.688 (1.14) (1.83) nace_10 1.707 1.498 (1.59) (1.98)* nace_11 -0.781 -1.354 (1.95) (3.12)** nace_12_refund 0.532 0.403 (1.00) (0.88) nace_13 0.214 -0.050 (0.58) (0.15) nace_14 -0.556 -0.457 (1.41) (1.24) nace_15 0.210 0.552 (0.33) (1.16) nace_19 0.042 -1.135 (0.08) (2.03)* noemp -0.992 -1.152 (3.77)** (4.54)** under6emp -0.671 -0.744 (2.93)** (3.45)** lnnumemp -1.144 -0.737 (9.91)** (7.21)** noprofit 1.067 -0.597 (1.56) (0.74) lnprofit -0.011 -0.047 (0.66) (3.30)** singlellc 0.383 0.704 (2.27)* (4.67)** lnrefund -0.024 0.021 (0.79) (0.70) ln_net_tax_refund 0.367 0.126 (6.11)** (2.21)* pra_lo -1.459 -0.501 (4.15)** (2.14)* red_trans_monthly 1.668 0.587 (1.36) (0.52) num_red_trans 0.801 0.857 (1.23) (1.36) xii vies 0.810 0.940 (1.45) (1.78) intraeu_disc 0.767 0.465 (3.01)** (1.94) other_risk 0.547 0.466 (1.82) (1.76) late_filing_hist -1.085 0.293 (1.49) (0.54) rel_share_ninepct 0.003 -0.003 (0.63) (0.43) ztp_vat 0.832 1.066 (1.57) (2.05)* hipratechg 0.634 -0.459 (1.63) (1.14) hiptptschg -0.615 -0.600 (1.81) (2.26)* hin0chg 0.659 0.677 (2.37)* (2.48)* exempt_18 -0.273 -0.471 (1.00) (1.57) exempt_19 0.320 0.502 (1.44) (2.66)** intraeu_pos -0.589 -0.400 (2.84)** (2.13)* zerorated 0.406 0.024 (1.60) (0.10) rel_eff_vat_rate 0.035 0.295 (0.26) (2.46)* rel_tot_purch_to_sales 0.205 0.156 (1.23) (0.95) reimb_tax_81 -0.605 -0.137 (2.58)** (0.66) _cons -3.272 -0.838 (4.85)** (1.40) * p<0.05; ** p<0.01 xiii Table A5: Estimation Results, High-Risk Refund Claimants VAT-only Joint Audit net_tax_refund -0.000 -0.000 (0.57) (0.10) medium -4.863 -2.804 (9.51)** (6.59)** large -5.171 -3.753 (6.11)** (3.41)** northcentral -0.052 -0.668 (0.11) (1.36) northwest -0.269 -0.689 (0.44) (1.16) southeast 1.620 1.586 (2.30)* (2.25)* southcentral 0.733 -0.494 (2.46)* (1.70) southwest 1.280 -0.680 (4.72)** (2.51)* nace_2_alt -0.652 -1.328 (0.96) (1.50) nace_3 -0.682 -0.672 (1.67) (1.53) nace_6 0.262 -0.172 (0.59) (0.33) nace_7 0.048 -0.002 (0.14) (0.00) nace_8 -0.308 -0.124 (0.74) (0.27) nace_9 1.304 -2.047 (2.15)* (1.71) nace_10 0.103 -0.286 (0.16) (0.38) nace_11 -2.612 -0.878 (2.96)** (1.05) nace_12_refund 1.255 0.752 (2.55)* (1.29) nace_13 -0.482 -0.695 (1.14) (1.38) nace_14 -0.087 0.485 (0.19) (0.99) nace_15 -0.700 -0.676 (1.26) (1.07) nace_19 0.848 0.340 (1.42) (0.42) lnbase 0.043 0.039 xiv (1.38) (1.05) noemp -0.037 -0.513 (0.18) (2.07)* lnnumemp -0.656 -0.451 (7.03)** (4.24)** lnprofit -0.027 0.062 (1.01) (1.72) singlellc 0.893 0.420 (4.44)** (1.84) asfouco 1.130 0.483 (1.73) (0.66) jointstock 0.663 -0.609 (1.65) (1.07) firstyr -0.896 -1.968 (2.02)* (3.36)** ln_net_tax_refund 0.314 0.145 (6.64)** (2.54)* num_months_owe 0.088 0.030 (2.67)** (0.81) pra_lo -1.073 -0.182 (2.90)** (0.47) red_trans_monthly 1.719 0.838 (5.57)** (2.35)* num_red_trans 0.113 0.119 (1.59) (1.53) vies 0.419 0.745 (0.85) (1.44) intraeu_disc 0.546 0.380 (2.38)* (1.39) other_risk 0.784 0.405 (3.19)** (1.39) late_filing_hist -0.061 2.309 (0.08) (1.48) num_late_returns 0.158 -0.877 (0.37) (0.71) missing_months 1.401 0.775 (2.75)** (1.12) num_missing -0.125 -0.019 (1.91) (0.21) refund_red 0.650 0.103 (2.46)* (0.33) refund_inc 0.490 0.086 (2.53)* (0.37) rel_eff_vat_rate -0.055 0.428 (0.34) (2.35)* rel_share_zerorate 0.010 0.044 xv (0.41) (1.95) rel_tax_purch_to_sales -0.795 -0.740 (3.25)** (2.49)* rel_tot_purch_to_sales 0.505 0.261 (2.70)** (1.18) ztp_vat -0.138 -1.312 (0.47) (2.89)** ztp_ieu 0.512 -0.540 (1.60) (1.19) ztp_0rate 0.554 0.876 (1.76) (2.49)* ptz_0rate 0.529 -0.090 (1.19) (0.18) hinratechg -0.318 0.226 (0.96) (0.60) hinbchg 0.432 0.342 (1.99)* (1.46) hinieuchg -0.354 -0.601 (1.25) (1.83) hin9chg -2.699 2.241 (2.00)* (1.96) hip0chg 0.595 0.273 (2.13)* (0.83) hin0chg 0.372 0.576 (1.11) (1.47) zero_14 -0.412 -0.305 (1.85) (1.16) zero_15 -0.127 -0.306 (0.52) (1.06) exempt_17 -0.542 -0.545 (2.52)* (2.07)* exempt_18 0.408 -0.413 (1.63) (1.24) exempt_19 -0.210 -0.499 (0.92) (1.81) reimb_tax_80 -0.306 0.156 (1.34) (0.55) loss 0.045 1.290 (0.15) (3.21)** net_tax_refund_2018_ind -0.552 -0.112 (2.87)** (0.50) _cons -4.320 -2.497 (6.14)** (3.13)** * p<0.05; ** p<0.01 xvi Table A6: Estimation Results, Low-Risk Balance-Due Sample VAT-only Joint Audit net_tax_paid -0.000 -0.000 (2.54)* (2.56)* medium -3.079 -2.392 (7.05)** (11.30)** large -3.017 -2.367 (3.78)** (4.98)** northcentral 0.537 -0.365 (1.42) (2.41)* northwest 0.347 -0.152 (0.76) (0.93) southeast 0.860 0.965 (2.66)** (8.97)** southcentral 1.120 0.243 (3.69)** (2.25)* southwest 1.666 0.052 (5.90)** (0.52) nace_2_alt -2.207 -0.716 (2.05)* (2.37)* nace_3 -0.424 -0.744 (1.06) (3.79)** nace_6 -0.030 0.124 (0.08) (0.66) nace_7 -0.057 -0.260 (0.16) (1.47) nace_8 -0.739 -0.613 (1.60) (2.81)** nace_9 0.744 0.012 (1.06) (0.04) nace_10 -0.631 0.156 (0.99) (0.69) nace_11 -1.071 -0.679 (2.08)* (2.72)** nace_12_baldue -0.188 -0.829 (0.38) (2.85)** nace_13 -0.605 -0.206 (1.45) (1.01) nace_14 -0.859 -0.722 (2.08)* (3.45)** nace_15 -0.922 -0.628 (1.72) (2.60)** nace_17 0.166 0.529 (0.32) (1.95) nace_18 -0.636 -0.774 xvii (1.06) (2.28)* nace_19 -1.270 -1.348 (2.96)** (5.79)** lnbase 0.266 0.253 (5.33)** (8.34)** zerobase 1.866 1.766 (3.45)** (5.53)** noemp -0.840 -1.127 (4.04)** (10.03)** under6emp -0.701 -0.616 (4.25)** (6.75)** lnnumemp -0.626 -0.517 (6.92)** (11.24)** noprofit -0.600 -0.436 (2.06)* (2.77)** lnprofit -0.033 -0.037 (2.47)* (5.12)** singlellc 0.107 0.161 (0.84) (2.48)* asfouco 1.039 0.390 (2.04)* (1.22) civil -1.392 -0.547 (2.40)* (1.85) foreign 1.956 0.189 (4.00)** (0.31) jointstock 0.447 -0.062 (1.84) (0.44) soletrader -2.523 -2.283 (3.48)** (8.08)** newfirm -2.388 -1.448 (6.71)** (4.27)** lnage -0.127 -0.160 (1.41) (3.14)** firstyr 2.802 0.867 (9.88)** (3.04)** net_tax_refund_2018_ind 0.389 -0.075 (1.89) (0.57) num_months_owe -0.098 -0.026 (4.14)** (2.01)* pra_lo -1.190 -0.726 (2.19)* (2.10)* red_ind 2.538 1.265 (3.95)** (1.94) red_trans_monthly 1.965 1.309 (2.76)** (2.63)** dereg_hist 0.185 0.473 xviii (0.39) (1.76) cred_notes 0.768 0.008 (4.65)** (0.08) late_filing_hist 1.140 0.507 (4.31)** (2.53)* num_missing 0.130 0.115 (4.15)** (4.86)** allnil -0.946 -0.429 (3.71)** (2.48)* intraeu_pos -0.590 -0.318 (3.28)** (3.60)** ninepct -0.692 0.489 (1.10) (2.53)* zerorated 0.597 0.185 (3.33)** (1.92) hi_totpurch_shr 0.499 0.612 (1.97)* (4.33)** base_to_totrev -0.674 -0.455 (1.96)* (2.25)* refund_red 1.313 0.746 (2.57)* (1.89) refund_inc 0.370 0.095 (2.83)** (1.40) rel_eff_vat_rate_newbus 0.157 -0.250 (1.19) (1.55) rel_share_intra_eu 0.026 0.007 (1.97)* (0.83) ztp_ieu 0.496 0.338 (1.86) (2.55)* ptz_9pct -20.000 1.156 (2.77)** hipratechg -0.219 0.392 (0.68) (2.18)* hipbchg 0.538 0.271 (2.46)* (2.22)* hinbchg 0.673 0.396 (4.29)** (4.59)** hinieuchg -0.299 0.206 (1.09) (1.51) hintptschg 0.061 0.373 (0.35) (4.19)** no_pra_score -0.679 0.007 (2.37)* (0.03) lnowe 0.020 -0.019 (0.60) (0.92) rel_tax_purch_to_sales -0.007 0.188 xix (0.05) (2.53)* ptz_0rate 0.323 0.150 (1.15) (0.97) ztp_for 0.131 0.268 (0.51) (2.09)* _cons -3.666 -1.762 (4.61)** (3.95)** * p<0.05; ** p<0.01 Table A7: Estimation Results, High-Risk Balance-Due Sample VAT-only Joint Audit net_tax_paid -0.000 0.000 (1.05) (1.01) medium -4.275 -3.001 (11.93)** (11.29)** large -4.465 -2.898 (6.65)** (6.49)** northcentral 0.049 -0.120 (0.23) (0.68) northwest 0.281 0.237 (1.22) (1.26) southeast 0.893 1.300 (3.02)** (4.88)** southcentral 0.485 0.541 (2.94)** (3.99)** southwest 0.708 -0.077 (4.89)** (0.62) nace_2_alt 0.246 -0.503 (0.50) (1.07) nace_3 -0.188 -0.492 (0.62) (1.90) nace_6 0.584 0.217 (2.01)* (0.86) nace_7 0.002 -0.349 (0.01) (1.49) nace_8 -0.454 -0.476 (1.44) (1.80) nace_9 0.431 -0.484 (0.79) (1.03) nace_10 0.501 0.440 (1.63) (1.70) nace_11 -0.759 -0.787 (2.20)* (2.60)** nace_12_baldue 0.686 -0.033 xx (1.52) (0.07) nace_13 0.439 0.332 (1.07) (0.91) nace_14 -0.206 -0.518 (0.68) (1.96)* nace_15 0.284 0.097 (0.83) (0.33) nace_17 -0.514 -0.409 (0.88) (0.75) nace_18 -0.244 -0.727 (0.66) (2.07)* nace_19 0.060 -0.419 (0.15) (1.15) lnbase 0.299 0.264 (10.83)** (9.88)** zerobase 2.003 1.535 (4.89)** (3.49)** noemp -0.979 -1.130 (7.02)** (8.68)** under6emp -0.424 -0.642 (3.64)** (5.90)** lnnumemp -0.719 -0.735 (11.44)** (12.76)** missprof 0.743 0.780 (3.67)** (4.19)** lnprofit -0.009 -0.021 (0.90) (2.39)* singlellc 0.372 0.435 (3.65)** (4.66)** civil -1.637 -1.205 (3.48)** (2.85)** jointstock -0.440 -0.824 (1.70) (3.11)** soletrader -3.366 -2.233 (3.27)** (4.92)** other 0.422 -1.287 (1.15) (2.57)* newfirm -1.467 -0.629 (8.77)** (3.50)** under5 -0.159 -0.217 (1.76) (2.61)** firstyr 1.689 0.444 (9.00)** (2.33)* net_tax_refund_2018_ind -0.146 -0.392 (0.74) (2.25)* zerovat -0.369 -0.529 xxi (1.63) (2.53)* lnowe -0.104 -0.136 (3.66)** (5.19)** num_months_owe -0.034 0.016 (2.01)* (1.04) red_ind 0.803 0.312 (4.31)** (1.62) red_trans_monthly 1.833 1.307 (13.53)** (10.02)** num_red_trans 0.298 0.211 (11.24)** (7.94)** vies 0.263 0.884 (0.53) (2.09)* cred_notes 0.130 -0.217 (1.10) (1.93) intraeu_disc 0.428 0.377 (3.41)** (3.21)** other_risk 0.827 0.559 (6.20)** (4.61)** missing_months 0.036 0.018 (0.18) (0.09) num_missing 0.062 0.009 (2.74)** (0.38) allnil -0.868 -0.278 (2.39)* (0.70) intraeu_neg -0.162 -0.342 (0.51) (1.14) credbase_30 -0.316 -0.225 (3.29)** (2.48)* hi_totpurch_shr 0.335 0.135 (2.52)* (1.07) rel_share_zerorate 0.011 0.001 (1.91) (0.17) rel_tax_purch_to_sales -0.117 -0.127 (2.48)* (2.60)** ztp_vat -0.506 -0.760 (1.52) (2.47)* ptz_vat 0.483 0.037 (1.25) (0.10) ztp_ieu 0.106 0.425 (0.62) (2.76)** ptz_ieu 0.232 0.024 (1.18) (0.14) ztp_9pct 0.094 1.137 (0.10) (1.64) ztp_for 0.154 0.291 xxii (0.88) (1.86) hipratechg 0.589 0.405 (2.01)* (1.53) hinbchg 0.358 0.510 (2.69)** (4.26)** hinieuchg -0.112 0.323 (0.52) (1.74) hintptschg 0.250 0.551 (1.45) (3.73)** no_pra_score -0.555 -0.701 (2.01)* (2.92)** ninepct -0.400 0.296 (0.98) (0.93) tax_red -0.136 -0.045 (1.61) (0.58) ptz_9pct -0.848 0.609 (0.62) (0.76) hinratechg -0.016 -0.278 (0.05) (1.03) _cons -2.977 -0.708 (5.45)** (1.46) * p<0.05; ** p<0.01 Appendix B: Breakdowns of Statistical Matching VAT Reporting Gap Estimates B.1 Matching-on-Level Results Table B1: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Taxpayer Size, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Small 255.0 896.1 1,151.1 (1.84%) (3.90%) (12.61%) Medium 4.2 7.3 11.5 (1.28%) (0.62%) (1.37%) Large 4.9 7.0 11.9 (0.13%) (0.14%) (0.80%) xxiii Table B2: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Sector, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Agriculture, forestry, fishing (NACE A) 17.12 18.98 36.10 (1.54%) (4.62%) (5.15%) Mining, electricity, water supply (NACE B, D, E) 5.54 13.23 18.77 (1.03%) (0.64%) (1.24%) Manufacturing (NACE C) 38.80 71.43 110.23 (0.40%) (1.18%) (2.99%) Construction (NACE F) 7.72 86.86 94.58 (2.10%) (2.02%) (2.41%) Wholesale & retail trade (NACE G) 69.22 309.79 379.01 (2.95%) (2.98%) (4.72%) Transportation & storage (NACE H) 23.45 44.49 67.94 (2.87%) (4.07%) (24.71%) Accommodation (NACE 55) 3.00 11.12 14.12 (0.39%) (3.70%) (2.98%) Food & beverage service (NACE 56) 1.93 37.69 39.62 (5.55%) (6.16%) (6.86%) Information & communication (NACE J) 6.38 28.48 34.86 (0.79%) (2.02%) (5.84%) Finance, insur., educ., health, & soc. work (NACE K, P, Q) 11.82 (4.47%) 3.89 37.89 Arts, entertainment, recreation (NACE R) (2.34%) 4.66 (17.70%) (9.73%) Other service activities (NACE S) 17.52 (25.60%) Real estate activities (NACE L) 12.62 21.30 33.92 (2.72%) (4.34%) (123.96%) Professional, scientific, & technical activities (NACE M) 58.87 63.40 122.27 (22.22%) (11.46%) (42.39%) Administration and support service activities (NACE N) 4.97 25.58 30.55 (3.23%) (2.36%) (3.28%) Unknown (missing NACE code) 10.57 144.00 154.57 (3.98%) (81.17%) (174.95%) . xxiv Table B3: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Region, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Northeast 23.85 87.62 111.47 (1.20%) (2.46%) (7.09%) Northcentral 12.00 61.04 73.04 (0.51%) (3.11%) (18.89%) Northwest 7.20 37.97 45.17 (0.60%) (3.44%) (42.47%) Southeast 6.30 41.14 47.44 (0.51%) (2.09%) (6.41%) Southcentral 46.30 162.51 208.81 (0.93%) (2.16%) (8.18%) Southwest 168.44 520.07 688.50 (2.76%) (3.95%) (9.74%) B.2 Matching-on-Rate Results Table B4: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Taxpayer Size, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Small 218.7 1,112.5 1,331.2 (1.58%) (4.84%) (14.58%) Medium 1.3 7.9 9.3 (0.40%) (0.67%) (1.11%) Large 6.5 17.3 23.8 (0.18%) (0.33%) (1.61%) xxv Table B5: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Sector, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Agriculture, forestry, fishing (NACE A) 10.65 13.45 24.10 (0.96%) (3.27%) (3.44%) Mining, electricity, water supply (NACE B, D, E) 5.57 39.25 44.82 (1.04%) (1.91%) (2.95%) Manufacturing (NACE C) 32.66 113.67 146.34 (0.34%) (1.88%) (3.97%) Construction (NACE F) 12.32 152.40 164.72 (3.36%) (3.55%) (4.19%) Wholesale & retail trade (NACE G) 71.15 564.65 635.80 (3.03%) (5.44%) (7.91%) Transportation & storage (NACE H) 10.24 32.08 42.32 (1.25%) (2.94%) (15.39%) Accommodation (NACE 55) 4.39 8.98 13.37 (0.57%) (2.90%) (2.82%) Food & beverage service (NACE 56) 2.92 21.87 24.79 (8.38%) (3.57%) (4.29%) Information & communication (NACE J) 2.92 47.42 50.34 (0.36%) (3.37%) (8.43%) Finance, insur., educ., health, & soc. work (NACE K, P, Q) 14.90 (1.51%) Arts, entertainment, recreation (NACE R) 2.00 15.30 2.01 (4.17%) (7.15%) (1.21%) Other service activities (NACE S) 7.30 (10.66%) Real estate activities (NACE L) 7.56 14.90 22.45 (1.63%) (3.04%) (82.06%) Professional, scientific, & technical activities (NACE M) 52.30 36.67 88.97 (19.74%) (6.63%) (30.85%) Administrative and support service activities (NACE N) 1.64 16.98 18.62 (1.07%) (1.56%) (2.00%) Unknown (missing NACE code) 10.25 62.09 72.35 (3.86%) (35.0%) (81.89%) xxvi Table B6: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Region, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Northeast 21.52 61.32 82.84 (1.08%) (1.72%) (5.27%) Northcentral 9.61 39.82 49.43 (0.41%) (2.03%) (12.79%) Northwest 7.92 25.33 33.25 (0.65%) (2.30%) (31.26%) Southeast 6.03 60.52 66.55 (0.49%) (3.08%) (8.99%) Southcentral 39.14 207.94 247.09 (0.79%) (2.76%) (9.68%) Southwest 142.37 742.77 885.14 (2.33%) (5.64%) (12.52%) Appendix C: Breakdowns of Entropy Balancing VAT Reporting Gap Estimates – Higher Cutoff Rule Table C1: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Taxpayer Size, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Small 372.0 1,427.0 1,799.0 (2.69%) (6.21%) (19.71%) Medium 15.7 14.1 29.9 (4.74%) (1.21%) (3.56%) Large 4.3 6.2 10.5 (0.12%) (0.12%) (0.71%) xxvii Table C2: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Sector, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Agriculture, forestry, fishing (NACE A) 12.19 136.32 148.51 (1.10%) (33.16%) (21.20%) Mining, electricity, water supply (NACE B, D, E) 4.39 8.35 12.74 (0.82%) (0.41%) (0.64%) Manufacturing (NACE C) 36.76 41.88 78.65 (0.38%) (0.69%) (2.13%) Construction (NACE F) 51.97 77.67 129.64 (14.16%) (1.81%) (3.30%) Wholesale & retail trade (NACE G) 117.95 578.34 696.29 (5.02%) (5.56%) (8.67%) Transportation & storage (NACE H) 17.48 31.64 49.11 (2.13%) (2.90%) (17.86%) Accommodation (NACE 55) 17.74 6.61 24.35 (2.29%) (2.20%) (5.13%) Food & beverage service (NACE 56) 4.08 60.83 64.90 (11.72%) (9.94%) (11.24%) Information & communication (NACE J) 4.43 93.49 97.92 (0.55%) (6.64%) (16.39%) Finance, insur., educ., health, & soc. work (NACE K, P, Q) 44.65 (16.90%) Arts, entertainment, recreation (NACE R) 1.95 71.02 12.88 (4.06%) (33.17%) (7.74%) Other service activities (NACE S) 11.54 (16.86%) Real estate activities (NACE L) 13.80 23.24 37.04 (2.98%) (4.74%) (135.38%) Professional, scientific, & technical activities (NACE M) 64.58 140.03 204.61 (24.37%) (25.30%) (70.94%) Administrative and support service activities (NACE N) 22.86 38.35 61.21 (14.85%) (3.53%) (6.57%) Unknown (missing NACE code) 10.97 152.39 163.37 (4.13%) (85.90%) (184.91%) xxviii Table C3: Breakdown of Baseline VAT Reporting Gap Estimates by Refund/Balance-Due Status and Region, Amounts in BGN Millions (Noncompliance Rate in Parentheses) Refund Balance Due Total Northeast 18.71 88.56 107.27 (0.94%) (2.48%) (6.82%) Northcentral 28.57 224.97 253.54 (1.22%) (11.47%) (65.58%) Northwest 58.59 320.14 378.73 (4.84%) (29.03%) (356.12%) Southeast 4.54 43.74 48.28 (0.37%) (2.23%) (6.52%) Southcentral 28.03 114.03 142.06 (0.56%) (1.51%) (5.56%) Southwest 253.64 655.83 909.47 (4.15%) (4.98%) (12.86%) xxix