This analysis may exceed the bounds of my statistics knowledge, but I will deliver it anyway in the name of “process” blogging. I welcome experienced critique of the method!
A modest positive correlation exists between a county’s industrial diversity and its percent change in unemployment rate over the period 2007-2010.
Several months ago I reported using an ecological species diversity equation to calculate a US county’s “industrial diversity”, producing the map shown below illustrating the degree of diversity of industries operating in each county. The computation method is detailed here.
Using county-level unemployment data from the Bureau of Labor Statistics (http://www.bls.gov/lau/tables.htm), I computed the percent change in each county’s unemployment from 2007 to 2010. Plotting the histogram of these values shows (by eyeball) a gamma distribution:
Model Including Outliers
I then removed the cases with negative and zero percent change (16 out of 3139 counties) and conducted a gamma regression. In the regression model, the dependent variable (change in unemployment) and the independent variable (industrial diversity) were paired by county.
The low p-values prompt us to reject the null hypothesis that the parameters equal zero. We therefore conclude that correlation exists. Plotting the regression line (adjusting for the log link function) shows the modest correlation:
As expected, the residuals appear gamma-distributed:
Model Excluding Outliers
While performing this analysis, I became concerned that the outliers were inappropriately biasing the model. So I ran the analysis again with the major outliers removed:
The result is fundamentally the same:
Python code used in this analysis is available here.