, Ph.D. is a University Lecturer in Criminology and Complex Networks at the Institute of Criminology, University of Cambridge, where is the leader of the Crime & Networks Group.
, Ph.D. has been a post-doctoral research associate in the Department of Economics at the University of Tehnology–Sydney since 2017, and since 2020 has been a member of the
We explore how we can best predict violent attacks with injury using a limited set of information on (a) previous violence, (b) previous knife and weapon carrying, and (c) violence-related behaviour of known associates, without analysing any demographic characteristics.
Our initial data set consists of 63,022 individuals involved in 375,599 events that police recorded in Merseyside (UK) from 1 January 2015 to 18 October 2018.
We split our data into two periods: T1 (initial 2 years) and T2 (the remaining period). We predict “violence with injury” at time T2 as defined by Merseyside Police using the following individual-level predictors at time T1: violence with injury; involvement in a knife incident and involvement in a weapon incident. Furthermore, we relied on social network analysis to reconstruct the network of associates at time T1 (co-offending network) for those individuals who have committed violence at T2, and built three additional network-based predictors (associates’ violence; associates’ knife incident; associates’ weapon incident). Finally, we tackled the issue of predicting violence (a) through a series of robust logistic regression models using a bootstrapping method and (b) through a specificity/sensitivity analysis.
We found that 7720 individuals committed violence with injury at T2. Of those, 2004 were also present at T1 (27.7%) and co-offended with a total of 7202 individuals.
Regression models suggest that previous violence at time T1 is the strongest predictor of future violence (with an increase in odds never smaller than 123%), knife incidents and weapon incidents at the individual level have some predictive power (but only when no information on previous violence is considered), and the behaviour of one’s associates matters. Prior association with a violent individual and prior association with a knife-flagged individual were the two strongest network predictors, with a slightly stronger effect for knife flags. The best performing regressors are (a) individual past violence (36% of future violence cases correctly identified); (b) associates’ past violence (25%); and (c) associates’ knife involvement (14%). All regressors are characterised by a very high level of specificity in predicting who will not commit violence (80% or more).
Network-based indicators
Violent attacks are one of the rising challenges to the security of urban environments and the well-being of local communities in the UK and elsewhere. For example, England and Wales recorded 43,500 offences involving a sharp instrument (often a knife or a blade) in the year ending March 2019—the highest number since the year 2010/11 (Allen et al.
Such an increase in knife crime in England and Wales is often described as a “knife epidemic”—a label that echoes the “gunshot epidemics” of Boston and Chicago (NYTimes
Epidemics are based on the notion of connectivity, which makes them an inherently relational phenomenon (Jackson
In this paper, we define a network as a set of actors (individuals) and their relations among them (following Wasserman and Faust
A limited number of works have so far relied on network analysis to study violence, starting from the pioneering work by Kennedy et al. (
Papachristos et al. (
The relatively few studies conducted so far have shown the benefit of applying a network approach to study violence. Yet, they are limited in their focus on (a) gang-related events (with the exception of Papachristos et al.,
In this paper, we advance this line of work in three different ways. First, we move beyond gang and organised crime-related violent events. Second, we move beyond gun-related violence alone and consider all instances of violence with injury. Third, we offer the first analysis of its kind outside the USA and in a setting not characterised by high intensity of generalised violence (Merseyside, UK), where homicide is far less frequent per capita than in US cities like Chicago.
Our study also builds on the pioneering work by Massey et al. (
Our main question is: how can we best predict violent attacks with injury at Time 2 (T2) using information from Time 1 (T1) on (a) previous violence, (b) previous knife and weapon carrying, and (c) the violence-related behaviour of known associates?
We will explore six sub-questions, in which “violence” denotes attacks with injury: What is the predictive value of violence at T1 on committing violence at T2? What is the predictive value of carrying a knife at T1 on committing violence at T2? What is the predictive value of carrying a weapon at T1 on committing violence at T2? What is the predictive value of associating with someone who has committed violence at T1 on committing violence at T2 What is the predictive value of associating with someone who has carried a knife at T1 on committing violence at T2? What is the predictive value of associating with someone who has carried a weapon at T1 on committing violence at T2
Our data consist of 63,022 individuals involved in 375,599 police-recorded events in Merseyside and spanning the period from 1 January 2015 to 18 October 2018. The data were collected and made available to us in a fully anonymised form by Merseyside Police, a territorial police force responsible for policing a large area in the northwest of England (UK). Their jurisdiction covers a population of around 1.5 million people, of which roughly half a million reside in its main city, Liverpool.
We have taken a broad approach to our analysis by including all recorded events regardless of their criminal justice outcome: this includes events in which a person was arrested, cautioned, charged, and wanted on warrant, as well as interviewed, suspected, or when no further action was taken. We have, however, excluded from our analysis events classified by the police as domestic incidents and sexual offences.
To study the emergence of violence, we split our dataset into two periods: T1 and T2. The first period (T1) runs from 1 January 2015 to 31 December 2016 (2 years). The second period (T2) runs from 1 January 2017 to 18 October 2018 (date of the data extraction, with almost 21 months in T2).
In this work, we interpret violence (our dependent variable) as “violence with injury” as defined by Merseyside Police. The main offences included in this category are as follows: murder and attempted murder; assault occasioning actual bodily harm (Section 47); wounding with intent to do grievous bodily harm (Section 18); inflicting grievous bodily harm; malicious wounding; racially or religiously aggravated actual bodily harm; allowing a dog to be dangerously out of control injuring any person (both in a public space or in a non-public space).
Our independent variables are a set of binary variables (0/1) defined as follows: Violence T1: whether the individual has committed violence with injury at time T1; Weapon T1: whether the individual has been flagged by the police for a weapon incident at time T1. (To this end, we relied on the flag ‘gun_involved’ included in the dataset.); Knife T1: whether the individual has been flagged by the police for a knife incident at time T1. (To this end, we relied on the flag ‘knife_involved’ included in the dataset.)
This set of variables captures the behaviour of a single individual
The network element is captured by three binary variables (0/1): Violence associates T1: whether any of individual Weapon associates T1: whether any of individual Knife associates T1: whether any of individual
We first identified the individuals who have committed violence at time T2. To create the co-offending network, we dropped those individuals who were only present at T2, and focussed on individuals who had been present at both T1 and T2. The network of their associates was then built based on offences recorded at T1. Our model is intentionally simplified as it does not require information on socio-demographic characteristics of the individuals or spatial information on the individuals and/or criminal events. Future development of the model can integrate such additional perspectives.
In this work, we tackle the issue of predicting violence using two different approaches: (a) a series of “bootstrapping” logistic regressions and (b) a “sensitivity/specificity” analysis.
We ran seven logistic regression models: four are presented in the main text (Models 1–4) and three in the
Sensitivity and specificity analyses are carried out in medicine to understand the performance of a medical test in identifying individuals who carry a specific disease.
We found that a population of 7720 individuals committed violence (with injury) during T2 The network of target individuals and their associates. Note: large dark grey = target individuals; small light grey = associates (co-offenders)
Next, we look at the determinants of violence at time T2 through a series of logistic regression models capturing behaviour at time T1. Predicting violence in Merseyside: target individuals Log odds Std. error Probability Odds ratio Model 1 Violence T1 0.865 0.055 0.000*** 2.37 Constant − 1.509 0.030 0.000*** 0.22 − Model 2 Violence T1 0.842 0.053 0.000*** 2.32 Knife T1 0.112 0.107 0.295 1.12 Weapon T1 0.361 0.183 0.049* 1.43 − Note: Dependent variable = violence at T2. Bootstrap estimations with 500 replications. N obs. = 9206. Significance: *** < 0.001; ** < 0.01; * < 0.05; + < 0.1
Model 1 is akin to a baseline model: we know from the literature that previous violence tends to be associated with higher chances of committing violence in subsequent periods, and we found support for this relationship also in our data. The odds of observing violence at T2 for an individual who has committed violence at T1 are equal to 2.37. In terms of percentage changes, the odds for an individual who has committed violence at T1 to also commit violence at T2 are 137.5% higher than for those who have not committed violence at T1 (we remind the reader that in our models we are comparing against non-violent offenders, not against the general population).
Model 2 adds the effect of having been flagged at time T1 for (a) a weapon-involved incident and (b) a knife-involved incident. Having committed previous violence remains the strongest predictor; both knife and weapon have a positive effect, but only having been flagged for a weapon-involved incident is statistically significant. In terms of percentage changes, a weapon incident at T1 increases the odds of violence at T2 by 43%. (Model A1 in the Appendix Table
Next, we assess the effect of network-based indicators (Table Predicting violence in Merseyside: target individuals and their associates Log odds Std. error Probability Odds ratio Model 3 Violence T1 0.804 0.059 0.000*** 2.23 Violence associates T1 0.248 0.065 0.000*** 1.28 Constant − 1.544 0.032 0.000*** Model 4 Violence T1 0.804 0.057 0.000*** 2.23 Knife T1 − 0.004 0.120 0.969 0.99 Weapon T1 0.200 0.210 0.342 1.22 Violence associates T1 0.150 0.082 0.068+ 1.16 Knife associates T1 0.18 0.104 0.083+ 1.20 Weapon associates T1 0.021 0.154 0.891 1.02 Constant − 1.54 0.033 0.000 0.21 0.0263 − 4696.73 Note: Dependent variable = violence at T2. Bootstrap estimations with 500 replications. N obs. = 9206. Significance: *** < 0.001; ** < 0.01; * < 0.05; + < 0.1
Model 3 shows that the behaviour of an offender’s associates
In Model 4, we jointly consider the full set of indicators: individual-based and network-based. Previous violence by a target individual remains a strong predictor of future violence: there is a + 124% increase in the odds of committing violence at T2 compared with an offender who has not committed violence at T1. Secondly, the behaviour of the associates continues to matter: prior association with an individual who has committed violence at T1 increases the odds of committing violence at T2 by 16%; prior association with an individual who has been flagged for knife incident increases the odds by slightly more (+ 20%).
We do not find, however, any effect in Model 4 for prior association with an individual flagged for a weapon incident. Finally, when adding the associates’ behaviour to the model, the fact that a target individual has been flagged for a knife incident loses predictive power—showing no effect in Model 4. Weapon flagging at time T1 still shows a positive effect (+ 22% increase in odds), but loses its statistical significance. This is partially due to the relative small number of weapon-related incidents in the datasets (we return on this point in the sensitivity/specificity analysis below).
Next, we contextualise the relevance of each predictor under consideration through a sensitivity/specificity analysis. We ask to what extent the fact that an individual is flagged or
We remind the reader that sensitivity indicates the extent to which actual positive cases are correctly classified: true positives are correctly identified and false negatives minimised. Specificity indicates the extent to which negative occurrences are correctly classified: actual negatives are identified and false positives minimised. In our context, a true positive is when a knife flag at T1 is associated with violence at T2; conversely, we expect no knife flag at T1 to be associated with no violence at T2 (true negative). A false positive is when a knife flag at T1 is associated with an individual who will not commit violence at T2; a false negative is when no knife flag at T1 is associated with violence at T2.
As there is normally a trade-off between the two measures, we need to look at them jointly to draw conclusions on the strength of each indicator relative to the predicted behaviour of individuals. Figure Sensitivity and specificity of violence predictors
For our purposes, the key measure of interest is sensitivity as we seek to answer the following question: if we rely on, say, knife flagging at T1, what is our ability to correctly identify individuals who will commit violence at T2? To put it in another way, we are trying to minimise false negatives (e.g. no knife at T1 and violence at T2) while capturing as many true positives as possible.
The indicators show a striking heterogeneity in the degree of sensitivity. The best performer is past violence (violence T1): 36% of all individuals observed at T1 who then commit violence at T2 are flagged with violence T1. This is followed by prior association with a violent individual, which correctly identifies 25% of cases. Prior association with a knife-flagged individual correctly identifies 14% of cases. Knife flagging at T1, prior association with a weapon-flagged individual, and weapon flagging at T1 show the lowest level of sensitivity (this is in line with the results of the regression models discussed above). Weapon flagging at T1 (sensitivity level 3%) performs 91% worse than the best available regressor (violence T1). On the other hand, the sensitivity levels of network-based counterparts of these regressors are on average 98% higher.
As for the question of who will
It is also important to note that the network-based indicators still perform well on this measure vis-à-vis individual-based indicators, recording a level of specificity never lower than 83%.
In sum, the joint specificity and sensitivity analysis shows that a better understanding of an individual’s co-offending network allows a practitioner to cast a more robust judgement relative to future violent behaviour. In particular, we remark that violence associates at T1 does a good job in both sensitivity and specificity tests, thus efficiently complementing the individual violence T1. For these two variables, both the presence
In this paper, we offered an assessment of a new class of network-based indicators to predict future violence vis-à-vis individual-based indicators. We have purposely used a limited set of information on (a) past violence, (b) weapon incidents, and (c) knife incidents. We relied on two different approaches to explore the emergence of violence and assess the strength of our indicators: a series of robust logistic regression models and a sensitivity/specificity analysis. In this work, we relied on evidence from Merseyside Police spanning the period from January 2015 to October 2018.
What we did not rely on was any information about the demographic characteristics of the subjects, the communities in which they reside, or any other information besides the variables we have displayed in this report. Compared to previous models predicting high vs. low-risk of offending (e.g. Berk et al.
The robust logistic regression models with these limited, demographic-free data pointed to three main findings. Firstly, information on previous violence is the strongest predictor of future violence across all model specifications, with an increase in odds never smaller than 123%. Previous involvement in a weapon incident increases the odds by 43% when only individual-level information is considered, but stops being informative when network-level information is considered. Individual involvement in a knife incident has little, if any, predictive power.
Secondly, network information on one’s associates matters and
The importance of prior violence has been confirmed by a sensitivity/specificity analysis of the indicators. The same analysis has also shown that network-based measures may help practitioners cast a more robust judgement relative to future violent behaviour. Prior association with a violent individual and, to a lesser extent, prior association with a knife-flagged individual are identified as good performers.
As this is a preliminary work, there are a number of limitations. Firstly, we used ‘weapon’ and ‘knife’ flags as recorded by the police, but future works might rely on different ways to assess knife and weapon carrying. Secondly, the boundaries of what constitutes violence with injury might be redefined, moving away from the statutory police definition. Thirdly, we implemented a fixed separation between T1 and T2, but future developments might rely on moving windows or individual-based time windows. Also, this would allow to increase the number of individuals for which it is possible to build a co-offending network (in our work, this is limited to 27.7% of those who have committed violence at time T2). Finally, it is possible to expand our approach by integrating network information on associates with spatial information on where the crime events took place.
This approach can enable police to limit the time they invest in preventing violence by people who are already unlikely to commit violence with injuries. Like solvability factors, which predict that a past crime will not be solved, these prediction factors can guide police in what cases
We are extremely grateful to Larry Sherman for his intellectual generosity and for motivating us to carry out this study; he has also offered insightful comments on an earlier version of this paper. We would also like to express our sincere gratitude to Merseyside Police and particularly to Chris Gibson and the evidence-based team that have made this research possible. A special thank you goes to Louise Kane and Christopher Wells for extracting the data from the police records management system and preparing the original dataset. Our thanks also go to Alyssa Knisley for her comments on an earlier version of the paper.
PC received financial support from Leverhulme Trust through the Research Grant RPG-2018-119; AG reports support from the Australian Research Council through Discovery Project DP170100429.
Predicting violence in Merseyside: targeted individuals Log odds Std. error Probability Odds ratio Model A1 Knife T1 0.359 0.103 0.000*** 1.43 Weapon T1 0.626 0.174 0.000*** 1.87 Constant − 1.313 0.026 0.000*** 0.27 Note: Dependent variable = violence at T2. Bootstrap estimations with 500 replications. N obs. = 9206. Significance: *** < 0.001; ** < 0.01; * < 0.05; + < 0.1
Predicting violence in Merseyside: associates only Log odds Std. error Probability Odds ratio Model A2 Violence associates T1 0.4796 0.058 0.000*** 1.62 Constant − 1.378 0.029 0.000*** 0.25 Model A3 Knife associates T1 0.395 0.085 0.000*** 1.48 Weapon associates T1 0.214 0.127 0.093+ 1.23 Constant − 1.335 0.025 0.000*** 0.26 Note: Dependent variable = violence at T2. Bootstrap estimations with 500 replications. N obs. = 9206. Significance: *** < 0.001; ** < 0.01; * < 0.05; + < 0.1
Campana and Varese (
The formula for sensitivity is as follows: true positive/(true positive + false negative).
The formula for specificity is as follows: true negative/(true negative + false positive).
A note on how to interpret logistic coefficients: if the coefficient
In Model A3 in the Appendix, Table
The very high discrepancy in sensitivity and specificity tests for weapon-flagged and knife-flagged indicators is supportive of our findings relative to the weak statistical strength of such regressors in the logit analysis.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.