99223 The Causes of Crime and Violence: A Guide for Empirical Researchers Pablo Fajnzylber, Daniel Lederman and Norman Loayza1 February 1999 1. Crime Data Sources To facilitate comparisons across the case studies being commissioned for this project, some preliminary analyses should be common across all the studies in the project. As a first step, each study should provide a description of the available sources of crime data. All possible sources should be considered at this point, even in the cases where, because of possible limitations of the data, some sources will not be used in the statistical analysis of the causes of crime. This analysis should include a detailed international comparative analysis of the types of data sources available in several countries. There are clearly two functions of this study. The first is to measure violent crime in each country or city and its trend over time. The second is to address the causes of violent crime. The same tools, i.e. basic data sets, will be used for each task. In addition, all the studies should shed some light on the particular causes of violent crime in their countries by comparing them to other countries. Hence, an international focus is essential in these studies. The first type of data set is collected from victimization surveys. This is unquestionably the primary workhorse for measuring crime. To be really useful, this sort of data must have geographic identifiers which enable the research to link the individual to the community in which he lives and was robbed. Then the researcher also needs to know attributes of that community. These attributes may be drawn from broader national surveys or national census. The second type of data set relies on official crime statistics. This data set is useful but relies on the quality of reported crime (and arrests) data. With this sort of data at the locality level, one might run regressions connecting city level characteristics with the level of reported crime. However, reporting (and recording) problems in some of the countries included in this project are so problematic that this type of data set has many problems. The third type of data set involves homicide (or intentional injury) data from hospitals and morticians. This type of data is generally more reliable than official crime statistics. It is also less sensitive to changing definitions of crimes across cultures. In 1 Fajnzylber is Professor of Economics at the Federal University of Minas Gerais, Belo Horizonte, Brazil; Lederman is Economist with the Office of the Chief Economist for Latin America and the Caribbean (LAC) of the World Bank; Loayza is Senior Economist with the Central Bank of Chile and is on leave for the Development Economics Research Group of the World Bank. The document was written to serve as the terms of reference for the case studies to be written by Latin American research groups in five countries for the project, "Crime in Latin American Cities," supported by the Regional Studies Program of the LAC Region of the World Bank. We are grateful for the suggestions and advice provided by professors Edwards Glaeser (Harvard University) and Jeffrey Grogger (UCLA). 1 principle, data on the victim can also provide a rich set of stylized facts about the nature of homicides in the country in question. A final data source is the offender survey. This type of survey can both be taken through traditional survey methods where respondents are asked if they have been arrested (or less reliably if they have committed a crime). Alternatively, these surveys can be done at the point of arrest, or through surveys of prison population. For that type of survey to be effective it must be assumed that the police arrest a relatively random sample of the population of criminals. When this type of data set is merged with data on the population at large, then it is possible to identify how criminals differ from average citizens. In addition, anecdotes and case studies may present valuable information on questions where data is hard to come by. All of these data sets together help to form a picture both of the level of crime and to identify potential causes of crime. The presentation of the available sources of crime, victimization and offenders data should be summarized by means of a table including four columns: Data Source, Coverage, Description, and Observations. The second column should provide information on both the spatial and temporal coverage of each data source. In the case of the data from health and police reports, this column should also indicate whether crime and population data is available at the sub-city level– e.g. police precincts, neighborhoods or census units. The column labeled “Description” should include the list of variables available from each source, such as the types of crimes covered and, in the case of victimization and offender surveys, also the information provided by the surveys regarding demographic and/or socioeconomic characteristics of victims and/or offenders. The column labeled “Observations” should comment on possible limitations of the data and/or the methodologies used for its construction. 2. Descriptive Statistics on Crime and Victimization Rates The second preliminary analysis to be undertaken by all case studies is the presentation of descriptive statistics on crime and victimization rates. Each study should produce a reliable estimate of the amount of violent crime in each country based on a variety of sources. The study should show both the level of crime, in a way that will be comparable across studies, and the trend of crime for as long a period as possible. If possible, crime rates should be presented at the national, the city and the sub-city level. This preliminary analysis should be applied in a comparative framework, where each country's statistics are compared and contrasted with those in other countries. In order to provide as complete a picture as possible of the crime problem in each city, all the available sources should be used, and an analysis of their consistency should be undertaken. It would be particularly useful to see comparisons of homicide rates (or even homicide counts) from police and public health (or coroner's) reports. The comparison of homicide data from different sources will help provide a gauge of the adequacy of each country's aggregate data collection systems. Likewise, each study should include a simple analysis of age-offending (and age-victimization) profiles for major categories of crimes. Time and age profiles have proven invaluable to the development of theory and evaluation 2 of policy in the United States, and it seems likely that they would prove similarly valuable in the countries represented in this project. Below, a suggestion of the steps to be taken by each study is presented: i) Aggregate data on homicides from health authorities ÷ Trend data should be presented for as long a period as possible, at both the national and the city level. ÷ If possible, homicide rates (per 100,000 population) should be calculated for different demographic groups, formed on the basis of gender and age(i.e. males, females, population aged less than 15, between 15 and 34, and older than 34). ÷ If data is available at the sub-city level, summary statistics of the distribution of homicide rates within the city should be presented. ii) Aggregate data on homicides and other violent crimes using police data. ÷ Trend data on crime rates (per 100,000 population) for as long a period as possible, at both the national and the city level. ÷ If data is available at the sub-city level, summary statistics of the distribution of homicide and other crime rates within the city should be presented. The bivariate correlation between the various crime rates should be calculated. ÷ An analysis of the consistency of the homicide data from health and police sources should be made. First, by comparing the levels and the trends of the crime rates calculated with data from different sources. Second, by calculating the correlation between the homicide rates calculated at the sub-city level using health and police data. iii) Victimization Surveys ÷ For each city and survey, a summary table should be presented containing the means (averages) of victimization (the rate of victims to the total number of individuals sampled) by type of crime and by type of individual (i.e. what is the victimization rate of young men,etc.). 3. The Causes of Crime As emphasized by professor Glaeser, we can think of the determinants of crime as coming in six separate categories: (1) opportunity cost of time, (2) social disorder, (3) returns to crime, (4) drugs and gangs, (5) weapons and (6) enforcement. Below we review the basic ways that can be used to test these different theories, and then present a suggested set of analysis to be undertaken by each study to investigate the possible determinants of crime rates and the probability of victimization. The opportunity cost of time is customarily measured by wages or unemployment, or other measures of poverty. In the case of victimization studies, this theory is tested either with wage variables at community level (so the subject’s probability of victimization is regressed on the average wage level in his community). This theory is also tested with the 3 characteristics of the victim. The assumption for this type of test is that victims often resemble criminals. This theory can also be tested with cross-sectional regressions where the poverty, inequality, wages or unemployment of an area are used to explain the level of crime. Ideally, it would be tested with offender level data where it can be examined whether offenders with lower opportunities in the legal sector are more likely to engage in crime. Another recommendation that is applicable to most of the proposals is that, if possible, wages AND unemployment rates should be used as explanatory variables to estimate the effects of the labor market on crime. For years, researchers in the United States have noted that unemployment and crime are largely uncorrelated, and have concluded that labor market factors play little role in determining crime. Grogger (1998) analyzed this issue with individual-level data from the U.S., however, and found that wages were a strong determinant of crime even though crime was essentially independent of employment. The lesson is that the link between crime and the labor market is best measured using wages. The project would benefit by examining this particular question, since these findings for the U.S. may not hold in certain contexts that are common in developing countries (e.g., lack of effective unemployment insurance). Social disorder (and social capital) are generally measured with variables like the level of migration into the area, the number of single-parent families, survey type questions on trust, and other non-economic variables. All studies should be able to test this hypothesis by using this type of variable at the community level and using victimization data. Cross-sectional or offender level data can also be used to test this hypothesis. The returns to crime are generally measured by the proximity of rich people or business (or tourists) to poor people. This can be tested by using community level variables in cross-sectional studies. It can be tested in the victimization studies by examining whether controlling individual income raises the probability of victimization of the individual holding the level of poverty in the community constant. Drugs and weapons are much harder to measure. In principle, disruptions to the drug trade or exogenous inflows of guns might be measured in particular cases. Individual studies should certainly try to do this when possible. However, measurement difficulties make it hard to be certain that either of these theories can be tested reliably. Enforcement has a major identification problem because countries tend to focus enforcement activities where there are higher levels of crime. Thus empirically we tend to identify a false positive correlation between the level of enforcement and the level of crime. However, it still important to try and measure enforcement and examine how it differs across space and what the correlation between enforcement and crime may be. When reliable data on reported crimes and arrests is available, then the arrest rate may serve as a plausible proxy for enforcement. Alternatively, measures of the number of police personnel or expenditures on police may be used. A possibility to deal with the spurious correlation between crime and enforcement variables is to use an instrumental variables approach to estimate the effect of law enforcement expenditures on violent crime rates. As mentioned, the problem with just regressing crime rates on expenditures is endogeneity: governments raise law enforcement 4 spending when crime is high, so one typically finds a positive relationship between crime and spending on police. What one needs is an instrumental variable that isolates spending increases that occur for reasons other than an increase in crime; Levitt (1997) proposes using electoral cycles, on the basis that the party in power may raise police spending in order to elicit votes. The approach works well on U.S. data, where there is political competition. Yet we understand that finding "good" instruments may be troublesome, and we leave it up to each research team to assess whether a particular instrumental variables approach is feasible. Nevertheless, if proper instruments are not found, and the regressions are run without instruments, then the interpretation of the resulting coefficients must consider the problem of endogeneity explicitly. In many countries, the police are themselves a major part of the crime problem. Instead of serving as the servants of the public, police often support criminal activities and engage in them themselves. To measure this, questions regarding crimes by the police must be included in victimization studies. Anecdotal evidence (hopefully with some data) can also help to explore this issue. Below we present the minimum steps that each study should take in order to implement the analysis of the causes of crime. It is worth emphasizing that these steps constitute a lower, rather than an upper bound for each study. Certainly each study would do well to present as much data and qualitative analysis as possible, using a wide range of different data sources and highlighting the issues that are the most relevant for the city or country. For instance, in the case of Brazil, the dissimilar incentives faced by the police in Rio and Sao Paulo deserve special attention; and in the case of El Salvador, the remains of the civil war (a demobilized army, a large number of weapons) must be taken into account. Similarly, in the case of Colombia, the drug trade is without doubt the most important issue. Thus, we urge each research team to deal with the issues that make crime and crime- prevention unique in each city. Hopefully, they can be dealt with in a statistically rigorous way in the context of analysis of victimization and crime rates. However, we understand that in some cases qualitative rather than econometric analysis is called for. In any case, what we are asking from every research team is to go well beyond the minimum requirements for the study and address in an original way the issues that are most relevant to their cities. i) Individual Risk of Victimization ÷ Victimization surveys constitute the main data source for the study of the determinants of individual risks of victimization. However, as already mentioned, if geographic identifiers are available that allow the researcher to link individuals to the communities in which they live and/or were victimized, those surveys should be complemented with data on the social and economic characteristics of the corresponding communities. This data should be extracted from broader national household surveys and /or national census. ÷ Before proceeding to the estimation of a model of the individual risk of victimization, each study should present a table with a summary description of the explanatory variables that will be used in the analysis. This table should have four columns: Name of the Variable, Unit of Analysis (individual, household, neighborhood, police 5 precinct, etc.), Data Source, Year of Reference (e.g. the census year when this is the data source). ÷ Next, a table with summary statistics for the explanatory variables (presented in the preceding table) should be constructed. This table should have six columns: Name of the Variable, Mean (and Standard Deviation in parenthesis) for the whole sample, Mean (and Standard Deviation in parenthesis) for non-victims, Mean (and Standard Deviation in parenthesis) for victims of non-economically motivated crimes, Mean (and Standard Deviation in parenthesis) for victims of economically motivated violent crimes, and Mean (and Standard Deviation in parenthesis) for victims of non-violent economically motivated crime. ÷ The explanatory variables to be considered can be classified under three categories. Below, we present a tentative list of variables (and types of variables) that should be considered in each of the three categories. Of course, the specific variables to be used in each study will depend on the type and amount of data available for each city, as well as on the possibility of linking each surveyed individual/household to some particular aggregate – or community – for which socioeconomic data is available. a) individual demographic characteristics of the victims: - gender; - age; - marital status; - racial/ethnic group; - household/family characteristics (gender of the head of the household, etc.). b) individual socioeconomic (and other) characteristics of the victims: - family income; - family income as a percentage of mean income in the sample; - educational achievement; - employment status; - home ownership status; - home characteristics; - social capital indicators (church attendance, participation in community organizations, etc.); - gun availability in the household; - drug and alcohol consumption; - value of property taken; - victim knew offender; - victim reported crime to the police; 6 - an arrest was made in connection with the crime; c) characteristics of the communities where the victims live: - income variables (average level, rates of change and distribution); - poverty indicators; - unemployment rate; - urban development indicators; - average educational achievement; - primary and secondary enrollment rates; - variables representing the strength of the police and justice system (arrest rates, police personnel and expenditures) ; - variables representative of the existence of profitable criminal activities (drug trafficking, gambling, etc.); - average demographic characteristics of the community; - average family characteristics (percent female head of household, etc.); - urban density; - gun availability in the community; - incidence of drug and alcohol consumption; - social capital variables at the community level. ÷ Finally, a regression analysis should be performed in order to determine the effect of the above mentioned variables on the probability of being victimized. Probit or Logit techniques should be used. First, victimization should be regressed only on individual demographic characteristics. Second, victimization should be regressed on demographic, socioeconomic and other individual characteristics. Finally victimization should be regressed on individual and community characteristics. It is likely that victimization surveys contain information on the type of crime suffered by the victim. Having various types of crime opens up the range of potential dependent variables. However, some grouping of crime types is necessary to, first, keep the econometric exercise of a manageable size, and, second, facilitate the interpretation of the estimated coefficients. We suggest the following categories of common crime: Economically motivated crime and non-economically motivated crime. The first category can be subdivided in crime involving the use or threat of violence (for example, robbery, homicide (sometimes), and house burglary (sometimes)) and crime without violence (for example, theft, and house burglary (sometimes)). We presume that all non-economically motivated crime involves violence (for example, homicide (sometimes), assault, and rape). 7 Having these categories in mind, our suggestion is to work with two types of discrete dependent-variable models, namely, binomial and multinomial models. In the case of binomial models, we suggest the analysis of the following "events" (where the alternative "event" for each of them is given implicitly): 1. Event: being a victim of any type of crime 2. Event: being a victim of an economically motivated crime (with or without violence) 3. Event: being a victim of a violent crime (whether or not economically motivated) 4. Event: being a victim of a non-economically motivated crime In the multinomial case, we suggest the analysis of the following models: 1. Events: being a victim of a non-economically motivated crime, being a victim of an economically motivated crime, and being a non-victim. 2. Events: being a victim of a violent crime, being a victim of an economically- motivated crime without violence, being a non-victim. We want to stress that the groupings and econometric analysis using victimization- survey data depend largely on the characteristics of available surveys. The suggestions given here attempt to homogenize all case studies, but they should be considered with flexibility by each research team. Thus, each team is given freedom to study additional groupings of types of crime and their corresponding "events," in accordance with both the focus and data availability of each case study. Finally, a technical point. It is important that all regressions with community level characteristics appropriately cluster the data to correct for correlation in the error terms within communities. The error term corresponding to each individual has two components, one corresponding to the individual himself (which is uncorrelated with other error terms) and the other corresponding to the community in which the crime was committed. The second component is, thus, common to all individuals victimized in a given community, which brings about a correlation between their error terms. This correlation must be controlled for in the process of estimation; otherwise, the estimated standard errors would be inconsistent. In addition, if the "community" component of the error term is correlated to any of the community-level variables, then the coefficient estimates will also be inconsistent. The way to control for the presence of the community component in the error term is to cluster observations by community and to allow community-specific intercepts in the estimation. ii) Social and Economic Determinants of Crime Rates Over Time and Across City Subdivisions Although the data from victimization surveys can be thought to be more reliable than the data from official health and police sources, the latter have the advantage of covering homicides and other violent crimes for which data is not available from 8 victimization surveys. Furthermore, official health and police statistics are produced more frequently than victimization surveys. For these reasons, it is important to perform, when possible, an analysis of the social and economic determinants of crime rates over time and across neighborhoods, precincts, census units or other city subdivisions for which data is available. ÷ A regression analysis should be performed using the homicide rate (and/or other crime rates) calculated at the sub-city level as dependent variable. The explanatory variables should be the community level variables constructed for the analysis of the determinants of victimization risks (third group of variables listed above). ÷ The econometric techniques to be used will depend on the characteristics of the data set to be constructed. Ideally, if data is available both across units and over time, panel data techniques should be used. However, whenever possible, even in the context of a cross-sectional analysis, an effort should be made to construct appropriate instrumental variables that correct for the possible endogeneity of some of the explanatory variables – especially those related to the strength of the police and judicial system. ÷ At the city level, a complementary analysis to be performed consists in the calculation of the correlation between time-series of crime rates (particularly homicide rates), and time series of variables representative of the socioeconomic characteristics of the cities (again from the third group of variables listed above). 5. Summary of the Tables (and Graphs) Related to the Measurement and Explanation of Crime and Victimization Rates 1) Table describing the available sources of crime data; 2) Tables and/or graphs with the levels and trends of homicide rates using data from public health reports: by demographic group and at the national, city, and sub-city levels; 3) Tables and/or graphs with the levels and trends of homicide and other crimes rates using police data: at the national, city, and sub-city levels, and including correlations between the various crime rates; 4) Tables with mean victimization rates: by type of crime and by demographic group; 5) Table with a description of the variables to be used in the explanation of victimization and crime rates; 6) Table with summary statistics of the variables to be used in the explanation of victimization and crime rates: mean and standard deviation for the total sample, the victims only and the non-victimized individuals only; 7) Tables with the results of regression of victimization rates (for the various types of crimes) on the explanatory variables described in the previous tables; 8) Tables with the results of regressions performed using crime rates at the sub-city level (calculated from police and/or public health data) as dependent variables. 9 9) Tables and/or graphs with the time-series correlation between crime rates at the city level and the explanatory variables used in the regressions above, also calculated at the city level. References Grogger, Jeff. 1998. "Market Wages and Youth Crime." Journal of Labor Economics 16(4): 756-791. Levitt, Steven D. 1997. "Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime." American Economic Review 87(3): 270-290. 10