church to bar ratio, by U.S. county (3rd edition)

Church to bar ratio by county from U.S. Census Bureau data: The brighter the color, the higher the church to bar ratio. Counties missing data necessary for the computation are shown in black. Method From the 2013 County Business Patterns data published at http://www.census.gov/econ/cbp/download/, I extracted the number of establishments in each county that have […]

DIY Twitter analytics (part 3: hashtag network)

I’ve been mathematically analyzing my Twitter feed to determine how best to position my tweets for maximum impact, and have been documenting the work on this blog. While I’ve not come to any brilliant conclusions yet, I’ve made progress. My first post on the subject described clustering my followers by their hashtag use to see […]

DIY Twitter analytics (part 1: clustering related users)

I’ve started working with the Twitter API to develop my own Twitter analytics tool chain. My goals are to figure out who the influencers in my subjects are, figure out how best to position my tweets, etc. I could certainly pay for this service, but then I wouldn’t learn any new technical skills in the […]

graph database for heterogeneous biological data

To assist with a project I’m working on, I recently implemented a substantial portion of DisGeNET as a graph database. Furthermore, I added MeSH, OMIM, Entrez, and GO into the database to facilitate linking of data between these sources. Here I briefly describe these data sources, describe graph databases, and then show how use of […]

HRC Corporate Equality Index correlates with Fortune’s 50 most admired companies

The Human Right’s Campaign, one of America’s largest civil rights groups, scores companies in its yearly Corporate Equality Index (CEI) according to their treatment of lesbian, gay, bisexual, and transgender employees [1]. The companies automatically evaluated are the Fortune 1000 and American Lawyer’s top 200. Additionally, any sufficiently large private sector organization can request inclusion […]

fast genomic coordinate comparison using PostgreSQL’s geometric operators

PostgreSQL provides operators for comparing geometric data types, for example for computing whether two boxes overlap or whether one box contains another. Such operators are quick compared to similar calculations implemented using normal comparison operators, which I’ll demonstrate below. Here I show use of such geometric data types and operators for determining whether one segment […]

Bayesian network modeling stock price change

Update 29 April 2018 I suspect this result is erroneous in that the graph often shows two arrows between any two given nodes, one inward and one outward. I’ll investigate this further and get back to you… – Emily Introduction Taking a cue from the systems biology folks, I decided to model stock price change […]

gene annotation database with MongoDB

After reading Datanami’s recent post “9 Must-Have Skills to Land Top Big Data Jobs in 2015” [1], I decided to round out my NoSQL knowledge by learning MongoDB. I have previously reported NoSQL work with Neo4j on this blog, where I discussed building a gene annotation graph database [2]. Here I build a similar gene […]

clustering stocks by price correlation (part 2)

In my last post, “clustering stocks by price correlation (part 1)“, I performed hierarchical clustering of NYSE stocks by correlation in weekly closing price. I expected the stocks to cluster by industry, and found that they did not. I proposed several explanations for this observation, including that perhaps I chose a poor distance metric for […]

clustering stocks by price correlation (part 1)

I’ve been building my knowledge of clustering techniques to apply to genetic circuit engineering, and decided to try the same tools for stock price analysis. In this post I describe building a hierarchical cluster of stocks by pairwise correlation in weekly price, to see how well the stocks cluster by industry, and compare the derived […]