simulating RNA-seq read counts

The Challenge I want to explore the statistics of RNA sequencing (RNA-seq) on next-generation sequencing (NGS) platforms in greater detail, so I thought I’d start by simulating read counts to experiment with. This post details how I constructed a simulated set of read counts, examines its concordance with the expected negative binomial distribution of the […]

net change of zero between closing and opening stock prices

I decided to investigate the variation between trading days’ closing prices and the following trading days’ opening prices for stocks listed on the New York Stock Exchange. I started with data in the following format for all trading days between January 2nd 2000 and October 30th 2014: I then calculated the percent change between one […]

EC2 spot instance price change: no correlation with day of week

My plans for world domination involve heavy use of Amazon EC2 instances, but I have to be frugal about it so I’m running spot instances to save cash. Therefore a means of forecasting spot instance prices would be helpful. Thus far I’ve had little success using mainstream forecasting tools such as ARIMA and exponential smoothing. […]

Apache Spark and stock price causality

The Challenge I wanted to compute Granger causality (described below) for each pair of stocks listed in the New York Stock Exchange. Moreover, I wanted to analyze between one and thirty lags for each pair’s comparison. Needless to say, this requires massive computing power. I used Amazon EC2 as the computing platform, but needed a […]

hacking the stock market (part 1)

Caveat: I am not a technical investor–just a hobbyist, so take this analysis with a grain of salt. I am also just beginning with my Master’s work in statistics. I wanted to examine the correlation between changes in the daily closing price of the Dow Jones Industrial Average (DJIA) and lags of those changes, to […]

building a web-enabled temperature logger

Not wanting to miss out on the “Internet of Things”, I decided to learn some of its foundational technology, namely microprocessor programming. Actually, I used a Raspberry Pi in this project instead of a classic microprocessor, but the idea is the same. Here I describe building a web-enabled temperature logger, complete with a web application […]

100th post to badass data science

This marks the 100th post to badass data science. I’ve written about everything from Lady Gaga to computational fluid dynamics, usually with a science or data related spin. I thought I’d look at my posts analytically rather than simply reminisce. First, here is a tag cloud for the first 99 posts: From this tag cloud, […]

pyDome updates: tangential and spoke angles

In a previous post, I introduced pyDome, a Python program for calculating geodesic dome vertices, chords, and faces. I have since added two hub angle computations to the program, and report on that progress here. Face angle calculations still need to be implemented. Angles Between Chords and the Hub Tangent Plane The angle between a […]

maximized entropy of a finite distribution

I received the following tweet yesterday from @ProbFact and decided to check it out in more detail: Two-Dimensional Case I generated the following test to investigate the claim: Create four category discrete distributions where two of the categories have 0.25 probability each, and the third category probability varies between 0.1 and 0.4. The fourth category’s […]