picking stocks by graph database (part 2: machine learning)

In our last post, we demonstrated a graph database created to enable study of the stock market, particularly the study of causality relationships. So how to proceed from there? At this stage we want to pick winning stocks, not write an academic paper, so our focus turns toward practical machine learning. Source Data We start […]

picking stocks by graph database (part one)

Historical stock price data comes readily available at daily resolution. So we calculated the Granger causality for each pair of stocks we hold data for, at one and two day lags (testing the question “does daily percent change in volume for stock X Granger cause daily percent change in adjusted close price for stock Y?”). […]

Bayesian method for filtering out mRNA turnover rate bias from siRNA knockdown measurements

Abstract siRNA performance prediction calculations for a given siRNA may be divided into two broad categories: functions of the siRNA’s sequence, hereafter referred to as “intrinsic” properties of the siRNA, and functions of the target mRNA, hereafter referred to as “extrinsic” properties of the siRNA. When training a statistical or machine learning model to select […]

on leadership: dead reckoning

Sometimes circumstances require that you calculate your position using no information other than knowledge of your previous direction and distance traveled. Of course, this statement specifically refers to marine navigation, but it serves as a rather good metaphor for life and leadership. Two years ago I became “Emily”, drawing courage only from deep confidence in […]

RNAfold’s and RNAcofold’s predicted dG correlates with sequence length

This seems rather obvious, but I decided to double check before building a machine learning model based on RNAfold’s and RNAcofold’s predictions involving sequences of varying length. Method I generated 30,000 random RNA sequences of random length between 15 and 30 bases. I ran RNAfold on this list; and RNAcofold on this same list where […]

how I make a living: what is bioinformatics? (part #1)

I’m constantly asked to explain what I do for a living. Here is an attempt to do so in laypersons’ terms. I’ll assume my readers are non-scientists and non-engineers, but that they’ve taken a high school biology class. “Bioinformatics” is the application of mathematics and computer science to biological data, particularly molecular biology data. By […]

church to bar ratio, by U.S. county (3rd edition)

Church to bar ratio by county from U.S. Census Bureau data: The brighter the color, the higher the church to bar ratio. Counties missing data necessary for the computation are shown in black. Method From the 2013 County Business Patterns data published at http://www.census.gov/econ/cbp/download/, I extracted the number of establishments in each county that have […]

DIY Twitter analytics (part 3: hashtag network)

I’ve been mathematically analyzing my Twitter feed to determine how best to position my tweets for maximum impact, and have been documenting the work on this blog. While I’ve not come to any brilliant conclusions yet, I’ve made progress. My first post on the subject described clustering my followers by their hashtag use to see […]