Jordan Cashmore, a student at Nottingham Trent University, recently asked me for help determining if correlation and causality exist between the rise of internet use over the last 15 years and the drop in British crime over the same period. Jordan, for his dissertation, proposes that correlation and causality do exist based on criminology theory, and requested help quantifying the proposed relationship with statistics. Using data he provided, I took on the challenge to practice my statistical consulting:

## Results

### Correlation

After accounting for the possible distortions in linear models brought about by regressions involving time-series data, I concluded that internet usage in the UK inversely correlates with a drop in British crime over the last 15 years (Pearson’s R of -0.949 with a p-value of 6.079e-10). The correlation is clearly significant.

### Causality

The data is insufficient for statistical (Granger) causal inference. Analysts will have to rely on criminological theory to discern whether a causal relationship exists between the rise of internet use and Britain’s drop in crime.

## Method

I obtained crime victimization counts from the British Crime Survey for the years 1991 through 2009, and used linear interpolation to fill in missing values. This interpolation induces possible noise into the calculations; I chose to accept this risk. For internet usage, the World Bank provides yearly data detailing the percentage of the UK population using the internet.

### Correlation

Plotting the two time-series against each other and conducting OLS regression yields:

The regression lends strong evidence for correlation between the two time-series. However, because regressions involving time-series can be dodgy, I computed the Durbin-Watson statistic from the residuals, to test for serial correlation. Since the Durbin-Watson statistic exceeds the R^{2} value of the residuals vs. their lags, I concluded that no serial correlation distorts the model [1].

I also ensured the residuals are normally distributed:

After deciding from this analysis that linear models are appropriate, I computed the Pearson and Spearman correlations, along with tests for the null hypothesis that the true correlation is zero:

### Causality

I tested for Granger causality between the differenced time-series with one through four lags, and in both directions. The data showed no Granger causality between the two series. However, this is unsurprising since there is relatively little data to work with (15 years, sampled yearly). Therefore, investigators will have to rely on criminology theory to infer causality; these Granger tests should be treated as inconclusive.

## References

1. Marcus Marktanner, Chapter Four of online class notes, http://marcusmarktanner.com/Lecture%20Notes/Applied%20Econometrics/CHAPTER%204%20PERFORMING%20STEPS%20IN%20TIME%20SERIES%20REGRESSION.pdf, Accessed 4 March 2012.

## Code

R code used for this analysis is posted at http://badassdatascience.com/wiki/index.php?title=British_Crime_and_Internet_Use.

Hi – this is a great site. And your online analysis is a great contribution to knowledge progression. That said – I may be wrong, and please correct me if I am – but wasn’t this data supplied to you by an undergraduate student? If so – I think its good scholarly etiquette to make the fact known. He’s been working on the idea for over a year, after all. In fact, since reading my blog: http://www.bestthinking.com/thinkers/science/social_sciences/sociology/mike-sutton?tab=blog&blogpostid=9634%2c9634

Good scholars site each others contributions -and he has sited your blog in his dissertation. If it gets a mark above 70 percent it gets published in the Internet Journal of criminology. And he has an academic article in draft…which is also going to site your blog. Quid pro quo?

Best wishes

Mike

Of course I’ll change the post to cite the source of my data. Sorry about the oversight!

From 1990 to 1995 the correlation was the opposite, how is that explained? What other macro statistics trended up (or down) steadily from 1995 to 2005 in particular? Purchasing power of the minimum wage perhaps? Average literacy levels? It could be anything. There are no doubt dozens of such statistics which would show an equally strong correlation. You do point out that “analysts will have to rely on criminological theory to discern whether a causal relationship exists” but no such theory is presented, and without it you might as well be talking about correlation to the price of palladium or even a packet of crisps.