Jordan Cashmore, a student at Nottingham Trent University, recently asked me for help determining if correlation and causality exist between the rise of internet use over the last 15 years and the drop in British crime over the same period. Jordan, for his dissertation, proposes that correlation and causality do exist based on criminology theory, and requested help quantifying the proposed relationship with statistics. Using data he provided, I took on the challenge to practice my statistical consulting:
After accounting for the possible distortions in linear models brought about by regressions involving time-series data, I concluded that internet usage in the UK inversely correlates with a drop in British crime over the last 15 years (Pearson’s R of -0.949 with a p-value of 6.079e-10). The correlation is clearly significant.
The data is insufficient for statistical (Granger) causal inference. Analysts will have to rely on criminological theory to discern whether a causal relationship exists between the rise of internet use and Britain’s drop in crime.
I obtained crime victimization counts from the British Crime Survey for the years 1991 through 2009, and used linear interpolation to fill in missing values. This interpolation induces possible noise into the calculations; I chose to accept this risk. For internet usage, the World Bank provides yearly data detailing the percentage of the UK population using the internet.
Plotting the two time-series against each other and conducting OLS regression yields:
The regression lends strong evidence for correlation between the two time-series. However, because regressions involving time-series can be dodgy, I computed the Durbin-Watson statistic from the residuals, to test for serial correlation. Since the Durbin-Watson statistic exceeds the R2 value of the residuals vs. their lags, I concluded that no serial correlation distorts the model .
I also ensured the residuals are normally distributed:
After deciding from this analysis that linear models are appropriate, I computed the Pearson and Spearman correlations, along with tests for the null hypothesis that the true correlation is zero:
I tested for Granger causality between the differenced time-series with one through four lags, and in both directions. The data showed no Granger causality between the two series. However, this is unsurprising since there is relatively little data to work with (15 years, sampled yearly). Therefore, investigators will have to rely on criminology theory to infer causality; these Granger tests should be treated as inconclusive.
1. Marcus Marktanner, Chapter Four of online class notes, http://marcusmarktanner.com/Lecture%20Notes/Applied%20Econometrics/CHAPTER%204%20PERFORMING%20STEPS%20IN%20TIME%20SERIES%20REGRESSION.pdf, Accessed 4 March 2012.
R code used for this analysis is posted at http://badassdatascience.com/wiki/index.php?title=British_Crime_and_Internet_Use.