This marks the 100th post to badass data science. I’ve written about everything from Lady Gaga to computational fluid dynamics, usually with a science or data related spin.
I thought I’d look at my posts analytically rather than simply reminisce. First, here is a tag cloud for the first 99 posts:
From this tag cloud, I can see that either Python or R is used in many posts, and that most posts cover statistical and data science topics. Engineering is also a frequent tag.
I then produced a graph view using Networkx, where the nodes are tags and the edges are formed by tags that occur in the same post. Displaying this graph as VRML:
It is a little hard to see, so here is a closer view:
In this image one can see that “statistics” is a primary hub. In rotated views of the graph (not shown), Python, R, and data science show similar prominence.
Finally, I computed the frequency of the top occurring tags:
From this I see that I wrote about Python more than R, which surprised me. I expected an even split. However this insight matches the fact that I favor using Python rather than R whenever possible, because Python is a full-featured programming language capable of easy string parsing and web deployment. It also looks like the number of engineering posts and the number of science posts is split evenly, which accurately reflects my technical interactions with the world.
I do not think my posts are “badass” enough for the blog’s title, so I’ll try to up the ante. Maybe I need something involving sharks and tornadoes. The post on claiming squatters’ rights is the closest I’ve come to my goal of “badass data science”.
I actually want to branch out and write more detailed analyses of less technical things, such as policy. Or take highly technical topics, such as synthetic biology, and write about them in a laypersons’ voice.
Most importantly, I’ll just keep writing. I’m bound to hit on something good.
Code used to create the VRML shown above is attached.