Ecologists use Shannon entropy to measure species diversity in a given region. Here I apply the same equation to determine industry diversity in each US county. In the map below, darker color indicates greater industrial diversity:
Method
Downloaded the 2009 County Business Patterns data from the US Census Bureau and extracted the business establishment counts for each six-digit NAICS code for each county. Then, for each county, I computed the Shannon entropy from the establishment counts in each NAICS code. Finally, I partitioned the entropy values into nine distinct colors to fill in the map above.
Code implementing the above-described computations is available on the Badass Data Science wiki.
Acknowledgments
I used the Python mapping method detailed here.





Just ran a regression on the 2009 county unemployment rates vs. the 2009 county industry diversity indices shown above. Found no correlation between the two.
Might be the case, though, that there is a correlation between industry diversity and unemployment variability over time. Probably a negative correlation, because having all your eggs in one basket would make you more prone to boom and bust? But you’d need to repeat this analysis annually for a few years, or else go back in time. Meaning get data from further back in time, I mean.
Pingback: industrial diversity vs percent change in unemployment rate | badass data science
Pingback: using Hadoop to examine county-level industrial diversity | badass data science