Probable Causation Podcast with Giovanni Mastrobuoni on Keycrime delia® Suite

by Jennifer Doleac &
Giovanni Mastrobuoni

Below is the complete transcript of the Probable Causation podcast #57 from August 31st, 2021 hosted by Jennifer Doleac of Texas A&M University. She interviewed Professor Giovanni Mastrobuoni, who is the Carlo Alberto Chair at Collegio Carlo Alberto and Professor of Economics at the University of Turin, on his paper “Crime is terribly revealing: Information technology and police productivity”, The paper was published in March of 2020 in the Review of Economic Studies (REStud) by Oxford University Press.

Probable Causation podcast #57: Giovanni Mastrobuoni

Jennifer Doleac (JD): Hello and welcome to Probable Causation. A show about law, economics and crime. I’m your host. Jennifer Doleac at Texas A&M University where I’m an economics professor and the director of the Justice Tech Lab. My guest this week is Giovanni Mastrobuoni. Giovanni is the Carlo Alberto chair at Collegio Carlo Alberto and Professor of economics at the University of Turin Giovanni. Welcome to the show. Today we’re going to talk about your research on how predictive policing technology affects crime. But before we get into that, could you tell us about your research expertise and how you became interested in this topic?

Giovanni Mastrobuoni (GM): Ok, so I see myself as an empirical public and labor economist, so I see criminals as a just a different type of worker that we analyze. The way I got into this subject is right after my PhD in 2006, I moved back to Italy and that happened a few months after a massive collective pardon took place. So imagine a third of the prison population was released within a few weeks – and at the same time the Minister of Justice, his name was Mastella, was arguing that there would be no increase in crime. So my first reaction was you know how can this be? You know, why would you keep people in prison if you’re not expecting any recidivism, right? So I started looking for data. I realized that there were no data other than in paper format. After a lot of work, I managed to get these into electronic format. I found out that there had been several collective pardons: on average, after World War II, every five years, and so in my first crime paper, I use these pardons to estimate the incapacitation effect of prison time. That project then led to additional projects.

The first one was with the Italian Banking Association and they were interested because after the 2006 pardon, bank robberies doubled within a month. So they gave me data that allowed me to study bank robberies, and that led to sort of a paper on the disutility of prison time, which I presented at the Italian Banking Association. And it happened that there was a police officer in the audience and he called me up later in the in the evening on my cell phone – which was a bit scary, I thought – and he told me: look, I have even better data than what you analyzed. So we started talking and talking and talking. And after a while he trusted me. Shared the data and that that led to a couple of papers and one is the paper on predictive policing. The other paper is where I estimate the effect of police presence on clearance rates.

JD: Yes, your paper that we’re going to talk about today is titled Crime is terribly revealing: Information technology and police productivity. It was published in the review of Economic Studies in 2020. So big picture, what is predictive policing?

GM: OK, so the way I see predictive policing is as an evolution of hot spots policing. The main idea is to use data about the past, about past crimes, to predict future ones and then deploy police forces accordingly. Of course this can be done with different degrees of, let’s call it statistical sophistication, and potentially with different goals, mainly deterrence versus incapacitation. So for example, if the idea is to deter criminals from committing crimes you you want the police patrols to be highly visible. So this is, in a nutshell, what predictive policing is.

JD: And how common is this sort of technology in police departments?

GM: So this is hard to say. I tried to look into it, but we don’t really have reliable statistics on this. It’s pretty clear that it’s growing fast and it’s growing also together with statistical models. The kind of data I managed to find is that for the US we have that between 1987 and 2003, the proportion of agencies that use information technology more generally for criminal investigations, dispatch and fleet management went up by a lot. From 11%, 9% and 7% in 1987 to 59%, 58% and 34% in 2003. And we know that in 2013, about 90% of agencies used information technology for analysis. Several of these also use mapping strategies. The main issue with sort of measurement here is that, rather than being driven by national strategies, the adoption of predictive policing is often in the hands of individual law enforcement agencies. And so there are no statistics or numbers available. What I managed to do in the paper is to simply count the number of news articles that feature the main market leaders in predictive policing, which are Pred Pol, Hunchlab and Precobs. And basically I show that it’s growing exponentially over time.

JD: So this is pretty common, growing exponentially in its use. So before this paper, what had we known about whether this technology works, what the effects are on policing outcomes like clearances or crime rates.

GM: I really couldn’t find much, so the main focus in the literature was about the ability to predict crimes. So that’s what, especially a couple of papers by Mohler et al. that were published in the Journal of the American Statistical Association showing that Predpol (which was by the way was their own product) was better at predicting crime than crime analysts. So there was no paper that that sort of looked at crime rates or clearance rates. And of course they were advertising Predpol that they used in Santa Cruz by saying that Predpol had had managed to sort of reduce crime in a pre-post analysis. So there was no control group, the crime reductions were quite large.

JD: And so that kind of leads into why? Why we don’t know more than we do, and you’re alluding here to the identification challenge. Just looking at pre-post is not going to tell us the answer about whether this particular technology is what caused the change. So more broadly, what were the challenges that you had to overcome as you’re, you’re thinking about how to answer those questions? Is it mostly data access or is it mostly identification or both?

GM: Ok, I think it’s difficult because of several reasons. Now first of all, we can imagine that the introduction of these new strategies is endogenous. So for example, the Santa Cruz Police Department decided to use predictive policing, Predpol, after an unprecedented crime wave. Now we know as statisticians that this implies that mean reversion could potentially explain the following reduction in crime. So this is the first issue. So if you don’t have a control group, you know that you know you’re not going to solve that issue. Second, especially we economists, we’re sort of afraid that crime displacement may actually undo the effects on crime. So you introduce predictive policing, you generate deterrence and criminals just go somewhere else. The net effect might be much, much smaller than the one you measure. And finally, especially with respect to predictive policing, I think there is the issue that the type of arrests that are made could potentially be selective. Meaning that patrols may cherry pick the more predictable and potentially poorly organized crimes, therefore overstating the effectiveness of predictive policing.

JD: And so in this paper you’re going to focus on a specific predictive policing technology developed by an analyst in Milan, Italy. It’s called KeyCrime. So what does KeyCrime do?

GM: So KeyCrime is a bit different from its competitors in that it focuses on incapacitation rather than deterrence. So the main aim is to improve the officers role as apprehension agents. And so rather than predicting aggregate crime rates, what KeyCrime does is to try to predict individual robberies through crime linking. So there’s a whole procedure to link crimes over time. And they do that by gathering individual characteristics of robbers and their criminal strategies using both CCTV cameras as well as victim interviews. And so as an economist or as a statistician. I think of this as an attempt to build panel data of criminal events and use the within group predictions rather than the overall predictions.

JD: Can you give some examples about how this this might work in practice? I found this piece of the paper just fascinating. Like you basically are trying to find the individual robbers and match them across crimes, right?

GM: Right. KeyCrime allows you to visually see the distribution of crimes on a map and then easily check what kind of characteristics these individual robberies have, including any footage that comes from CCTV cameras, and then do comparisons with past crimes. And oftentimes, it’s fairly easy to see that two robberies are linked, because simply, you see in the picture that the robber is the same guy. You know he’s dressed in the same way and you know he potentially has the same weapon. Sometimes when they don’t have CCTV cameras, it’s going to be with the help of interviews. So they ask a lot of questions about, you know, even sort of little details that can later help the police generate these links. For example, you know if someone was wearing a particular watch, like a gold watch and at the same time, earrings. If they see that information in different robberies, they use that to generate these links.

JD: OK, so one question people might have is how good these predictions actually are. It might seem like smart offenders would vary their targets in the days and times they commit their crime to keep the police on their toes. So in the paper you show that crime is in fact terribly revealing. A very fitting Agatha Christie quote, and you make the case that if you’re a criminal, there are costs to varying your behavior, and so what offenders did in the past does indeed tell you a good amount about what they’ll do in the future. So you can look at this in the data. So in the data, how predictable is the second or subsequent offense by particular robbers when you have information on their earlier offenses?

GM: You said almost everything. So it is true that the most prolific criminals are those that are more unpredictable, so you see that in the data. So you see that those, for example, that operate on a wider sort of geographic scale and are less focused, sort of geographically, are the ones that managed to sort of commit the largest number of robberies before getting arrested, if they get arrested. But as you said, the data are amazing in that they allow me to compare conditional versus unconditional predictions. So one can use the past to predict what is going to happen next – or not. So for example, you know you can look at what is the likelihood that a random robber targets a bank. It’s about 15%. And then you can ask yourself, wait a minute. But what is the likelihood that he or she does target a bank if she he or she has targeted banks previously, and that’s more like 80%. And so what I do in the paper is compute these different probabilities, let’s call them marginal versus conditional probabilities for several dimensions. These are targets, so the type of business that is targeted, the mode of transportation, the neighborhoods, the day of the week, the time of the day, and the week of the month. To summarize, basically, if police patrols choose to patrol predicted targets, meaning in a specific neighborhood in a given shift, and for several days – and this is because most repeated robberies happened within a couple of weeks, so you don’t have to do this forever – you have an almost 12.5% chance of being in the right place at the right time. And therefore you’re able to arrest the offender. While if you do sort of random patrolling so you don’t use any information about the past, the likelihood is only 0.6%, which is about 20 times smaller. OK, so information about the past allows you to have predictions which are about 20 times better.

JD: So I guess the question then is whether that additional information, how much value it adds to what the police would have done otherwise. So would they really be patrolling randomly and maybe they would, but I could imagine some cops listening to this and saying, well, we know that stuff too, and so the question in all of this is, when you look in the data, what is the causal effect on things like how often you can make an arrest or how much crime goes down. That tells you what the value add is. Is that the right way to think about it?

GM: That’s right. I mean random patrolling is sort of assuming that they you know they use no information whatsoever. Now to see how the use of KeyCrime compares to sort of business as usual, kind of controlling what I do in the data is how I compare what the Polizia does, which is the force that uses KeyCrime, with respect to what the Carabinieri, which is the other police force that doesn’t have access to KeyCrime.

JD: Yeah, so let’s talk more about that. So it turns out that the way policing is done in Milan is a bit unusual, and I think this was one of the first econo-crime papers I saw presented after I finished grad school. And I remember just being blown away by the cool natural experiment you found here. So there is not just one police force in Milan, but two. And, as you just said, only one had access to KeyCrime during the period you’re studying. So describe the two police forces that have jurisdiction in the city and how they are assigned to cover different areas.

GM: It is indeed unusual, but there are other countries that have two police forces like Spain or France. This is mainly for historical reasons. For example the current carabinieri was the police force of the royal family, while the Polizia was the police force of the government. Now when, when Italy became a Republic, the two forces were sort of operating side by side and over time they, I think also be you know through bargaining and so on, they developed into almost two identical forces. In the 1990s to save on costs they decided to divide the city into sectors so what you have in Milan, but in most larger cities in Italy, is that two sectors are patrolled by the Polizia. And so for Milan, these sectors would use KeyCrime thanks to this police officer who, in a bottom-up way, developed this predictive policing software and one would not, which would be the sector patrolled by the Carabinieri.

This is great, but by itself, it would not help me much. Now, on top of this division, you have the fact that these sectors rotate every time there’s a shift change. So if criminals are not aware of this rotation mechanism, they cannot «target» the area that is weaker in terms of law enforcement. And so what you have is sort of an experiment where investigations are almost randomly assigned to one of the two forces. So there is no cherry picking. There’s no selection, and that’s one great advantage of from this way that policing happens to be organized in Italy.

JD: So when was KeyCrime adopted by the police?

GM: They started in 2008. So at end of 2007 they finished producing the software and then beginning of 2008 they started using it and trying to combat robberies or commercial robberies.

JD: Ok, so in the first part of the paper you consider how that adoption of key crime affected robbery rates in Milan relative to other cities. So walk us through how you do that.

GM: The way I do this is through a synthetic control type of approach. So very simply by comparing Milan to other Italian cities. And the main issue that I faced was that, similar to what had happened in Santa Cruz, what I find is that KeyCrime is adopted after a fairly large increase in robberies. So there were some base front positive pre-trends. And so in sort of in the synthetic control language, I’m sort of outside of what is called the convex hull of the control cities. So what was happening in Milan wasn’t happening anywhere else. OK, so this is a little bit of an issue because you have no great comparison city to pick. And so I had to sort of twist the synthetic control method a bit and allow for pre-existing differences in these trends and what I ended up doing is using lasso regressions with an added time trend.

What you see in the overall picture is that Milan, through it has this amazing increase in robberies, but then once KeyCrime is adopted, you have a fairly large reduction.

JD: And once you kind of take advantage of or, as you said, use these methods, kind of adjust for that pre-existing trend. Yeah, it looks like it’s just like flat relative to these comparison cities and then it just starts declining, which is what you would expect if basically you know you’ve got 2/3 of the city covered by this new predictive policing technology and it’s working. If it’s doing something. Which is sort of what you were trying to show here. So what data were you using for that piece of paper?

GM: For this part of the paper, since I needed data before KeyCrime was adopted as well as after, I use municipality-level bank robbery data which is data that that I got from the Italian banking Association. But I also use yearly province level data on commercial robberies as well as other crimes. So the municipality-level data are great because key crime is used at the municipality level in the municipality of Milan, and they are great because they are monthly. On the other hand, it’s only one type of commercial robbery: bank robbery. And so as a robustness check, I also look at the yearly province level

JD: And then how big was that that effect of key crime on robbery rates?

GM: Very large. So what you see is that, within a few years, robbery rates fell by about 80%. What I also mention is that this could potentially be subject to some biases as well. So there could still be displacements. The one I mentioned before as well as mean reversion potentially. And so I think this is why it’s important that one looks also at individual level data which is sort of what I do then next.

JD: Yeah so next, and this is the really cool part, you’re going to use this rotation of police assignment to measure the causal effect of predictive policing within Milan. So to do this, you’re going to compare clearance rates for the two police forces. So that is the rate at which they’re solving these crimes. So basically you’re making arrests on individual robberies. So got you’ve got the one police force that uses KeyCrime and one that didn’t. For offenses where technology should matter compared with offences where it shouldn’t matter. So you’re going to use a difference in differences design here. So walk us through your empirical approach in this second part of the paper.

GM: You’re totally right, so I shift the focus from crime rates to clearance rates. So why do I do this? One reason is that having data on repeat offenders, at the end of the day, I can map these changes and clearances into changes in crime. So you know if I know the likelihood with which someone repeats an offense and that guy is arrested, I can also sort of pick out sort of how many crimes, what kind of reductions in crime we would expect. And in addition, focusing on clearances bypasses the issue of displacement because if someone is arrested, I know that he’s not going to operate somewhere else. As for the difference in difference design, here the main idea is that KeyCrime is able to predict crimes but only once some data have been gathered. And so you need at least one data point, so one robbery, to predict the next one. And so, if you don’t have that first robbery, your prediction is – you know you can’t – you don’t have data so you can’t generate a prediction. And so I used the very first robbery in the data to measure pre-existing differences in the productivity of the two forces. So for the very first robbery you know I know that KeyCrime hasn’t been used and so I can check sort of how the Polizia compares to the to the Carabinieri in the absence of KeyCrime to make sure that there are no pre-existing differences between the two forces.

JD: Right, so if you’d seen, if you just compared them, in a simple comparison and it turned out the Polizia had higher clearance rates, that might just be because they’re better on other dimensions. They’re just better at solving crime for other reasons. So yeah, OK, great. So and then, what data do you use for this part of the study?

GM: So these data were the ones that I told you about at the beginning. So the one that the inventor, the developer of the KeyCrime gave me. So these are individual level data for each commercial robbery that took place in Milan between 2008 and 2009. I wasn’t given the algorithm that they use, but I was given data on how much money was stolen, whether the individual was later arrested and when he was arrested, as well as information on the type of weapon used, the mode of transportation, the location, and the exact time that the robbery took place.

JD: It sounds like an amazing data set. I think you said in the paper that they didn’t give you the actual photos of the robbers, but they gave you everything else.

GM: That’s right. Yes, he didn’t have the precise description of the robbers.

JD: Got it, got it OK, so let’s talk about the results here. So the first thing you do is simply compare that baseline difference in clearance rates for the two police forces. So what do you find there?

GM: That’s right. So for the first robbery I find almost no differences in clearance rates between the two forces. So after a robbery take place, both forces have about a 12% chance of clearing a case, meaning you know, arrest the perpetrator before the perpetrator commits another robbery. So very similar chance of clearing the case.

JD: Yes, so then next you compare the effects across police forces for the first and subsequent offenses in that sequence by the same set of robbers. And again, the idea here is KeyCrime should be helpful for making arrests and the subsequent offenses, but not that first offense. So what do you find there?

GM: So what I find is that the likelihood, so for the subsequent robberies, the likelihood that the Carabinieri make an arrest is only 7.4%. And you can think that this is for two reasons. The pool of repeat offenders. So the offenders that decide to get back in action, are those that were not arrested to begin with. The ones that probably did a better job to begin with. OK, so it’s a selected group of criminals. They learn more and they were not arrested. And so the likelihood that the Carabinieri, make an arrest after subsequent robbery is much lower than after the very first. But for the Polizia, the force that uses KeyCrime, what you see is that their likelihood of making an arrest is 9 percentage points larger than for the Carabinieri. They’re not hit by this selection effect. So thanks to KeyCrime, it seems they can keep their productivity fairly high.

JD: OK, so that’s sort of one big piece of evidence that KeyCrime is working, and that this this is adding value. And then you also use a different approach to control for baseline differences across the police forces. In this case, you take advantage of a delay, and when information about an offence was added to KeyCrime so that the predictions can be updated. So first tell us a little bit about that, that information delay, and second tell us what you find when you use that information.

GM: Yeah, so, luckily, what happens after each robbery, is that is that the Polizia, so the force that uses KeyCrime, they wait until the next morning to interview the victims. And that’s mainly to reduce the victim’s immediate distress and to avoid therefore any recall bias about the robbery. So they sensed that interviewing the victims right after the robbery would not be very useful. And so this is great because I can compare clearances of robberies that happened within the same day and so KeyCrime was not updated, because the victims had not been interviewed yet, with clearances that that happened after one day or two days, and so on. It’s what I see is that the improvement, which is in this case close to 12 percentage points, happens only once the predictions get updated. And also, in addition, what I find is that these results get larger as more data are acquired, so as more robberies have taken place so more data can be analyzed. And this is I forgot to mention that this is something I find also in the previous experiment as more data gathered, the sort of the productivity gap between the Carabinieri and the Polizia grows.

JD: Excellent. OK, and then the last piece of your analysis is that you consider what happened when prosecutors forced the Polizia to share the KeyCrime predictions with the Carabinieri, to kind of even the playing field, in 2010. So first, what are you looking for in that policy change? And what do you find was the effect of that change?

GM: So basically the way it works is that the Polizia, once an arrest is made, the Polizia were sharing all the information with the prosecutors. And so it went that at one point the prosecutors told them well, you need to inform the Carabinieri about this. You need to show them what you’re doing. And so this happened in the beginning of 2010. So the Carabinieri were given these predictions. And so what I find, indeed, is that the Carabinieri, after 2010, they close the gap right after they get informed about these predictions.

JD: So it helps kind of reassure if you think there’s like anything else, that your difference in difference wasn’t capturing or something. I think this helps reassure you that it really was KeyCrime. It really was the KeyCrime predictions.

GM: Someone still needed, you know, additional evidence?

JD: Yeah, exactly. So, based on your estimates, you consider the costs and benefits of this technology in terms of the number of robberies that KeyCrime helped Milan avoid. So what do you find when you crunch those numbers?

GM: What I can do is, for repeat offenders is, I can sort of compute how differences in clearance rates map into differences in the expected number of robberies. It turns out that the expected number of robberies is just one over the clearance rate for repeat offenders. So with a 10 percentage point difference, and the kind of productivity that we see in the data, this implies that the number of expected robberies per criminal group drops from 17.8 to 6.4. Now, since each year there are, in Milan, about 85 new repeat offenders, so let’s remember that these are commercial robberies, so these are fairly serious crimes, we expect to have 900 fewer robberies per year. In the data I also have information about the hauls, so I can compute the average haul which is €2.800, which is probably something like $3200. This generates a total reduction in the direct cost of crime of about €2.5 million or about 2.8 million dollars. This is without considering any other costs that crime may generate. On the other hand, we have the running costs of KeyCrime are very low. We’re talking about five officers who are paid less than €20,000 a year, and so overall, the use of this technology seems to seems to pay off.

JD: All right, so that is your paper. Are there any other papers related to this topic that have come out since you first started working on this study?

GM: You’re probably better at keeping track of all the papers in the literature. So there aren’t many papers, I would say, that estimate the effect of, broadly speaking, technology on police productivity. And you probably wrote most of these papers. The ones where you look at the effect of DNA databases on crime. Another paper somehow related that I can think of is is by former student of mine, Evelina Gavrilova, and Vincenzo Bove where they estimate the effect of police militarization on crime and find large reductions which they interpret as deterrence. I think there have been also other papers that have looked at police militarization. Another paper that is related to this paper on predictive policing is a paper that I’m writing with Jordi Blanes i Vidal, where we basically look at the other side, the effect of random patrolling. What we find is that random patrolling has almost no effect on crime or, to be more precise, we find very large deterrent effects in the short term, about 30 minutes, but it seems that once the police patrols are gone, criminals go back to business very quickly. And so when you when you average over the day, you find fairly small effects on crime.

JD: That’s really interesting. Yeah, I agree with you that the technology and policing space is pretty sparse. It’s sort of amazing. It’s like all this stuff is super interesting and technology is a bigger and bigger part of policing, and there are always fancy new tools that police forces are trying out. And we know so little about whether any of them are having any or adding any value. So it’s a good space for people to work in if they’re looking for topics, I think. So what are the policy implications of this paper and the other work in this area? Which should policymakers be taking away from this?

GM: My sense of things, if we want to summarize it brutally, is that law enforcement should embrace statistical methods and, broadly speaking, technology. You know, I think it works. So my feeling is that random patrolling works very little, while focused patrolling and also, particularly through incapacitation, works much better.

JD: Especially when you’ve got that other paper to compare side by side, is that it is particularly striking how much better the police are. Especially looking at robbery. I guess there’s still a question of how much value this kind of technology might add on other types of crime? What do you think about that?

GM: Yes, exactly. I think this is still an open question. Up until now, I’ve talked about this technology as something great, but we know that it works for robberies. We still don’t know where that works for other types of crimes. So we don’t know, how much other types of crime are predictable. Robberies, by definition, have at least a victim who is also a witness, and it’s a witness who will give you a lot of useful information to build such predictions. And we also know that robbers tend to be prolific offenders, so you know once you have a prediction you don’t have to wait for long before that individual is back in business. I think for other violent crimes it might not be that easy. And another open question is whether it works for property crimes.

JD: Yeah, so those are some open questions. What are other big questions on the research frontier? Other questions that you and enterprising grad students should be thinking about going forward?

GM: Well, so I think another question, and we haven’t talked about this, but it’s a lot in the news, is about the relationship between predictive policing and racial biases. So my understanding is that KeyCrime, by focusing on individual predictions of serial criminals, shouldn’t be subject to these biases. But we know that aggregate predictions may in principle, for example, target areas based on crime differences, and these can potentially be correlated with racial differences. But I must admit I haven’t done any research on this. Yet it is certainly a challenging research question.

JD: Yeah, so the idea there is basically, if you’re sending cops to places where they have made lots of arrests in the past or detected lots of crime in the past, then you could just be sending them back to already over-policed communities again and again. And yeah, people get very worried about that for good reason.

GM: Exactly, and this may happen even if you don’t use race at all, as one of your considerations.

JD: Right. On the other hand, it could make you, to the extent that the technology is directing you to places where there is, you know, real crime happening that we really care about and is able to sort of redirect police to places they wouldn’t have gone otherwise, it could help police to reduce biases. I agree. I think it’s a really interesting empirical question that we just need more work on.

GM: And then more generally I think we just sort of run into the other paper. I think that patrolling and deterrence in general, you know my sense is that the jury is still out on how large those effects are. So we know that static patrolling is very effective in reducing crime. But whether mobile patrolling a does the same? I think this is still an open question. It’s a tough question because of course, it’s much more difficult to follow officers that are moving, and then the officers that are that are not. But I think better and better data, for example AVL data, so data that that have GPS locators should allow us to get there.

JD: Yes, I agree, those data are super cool and I look forward to more papers using them. Alright, well my guest today has been Giovanni Mastrobuoni from Collegio Carlo Alberto and the University of Turin. Giovanni, thanks so much for talking with me.

GM: Thanks for having me. It was a pleasure.

This interview provided courtesy of Jennifer Doleac.

Podcast: https://www.probablecausation.com/podcasts/episode-57-giovanni-mastrobuoni

“Crime is Terribly Revealing: Information Technology and Police Productivity” by Giovanni Mastrobuoni.