The Challenge I want to explore the statistics of RNA sequencing (RNA-seq) on next-generation sequencing (NGS) platforms in greater detail, so I thought I’d start by simulating read counts to experiment with. This post details how I constructed a simulated set of read counts, examines its concordance with the expected negative binomial distribution of the […]

## overfitting in statistics and machine learning (part one)

Overfitting is a common risk when designing statistical and machine-learning models. Here I give a brief demonstration of overfitting in action, using simple regression models. A later post will more rigorously address how to quantify and avoid overfitting. We start by sampling data from the process using the R code: Then we produce a linear […]

## simulated confidence intervals

In my last post, I demonstrated how repeated sampling from any probability distribution produces a normally-distributed distribution of the sample means, given a sufficiently large sample size. Here I describe how to use this distribution of sample means to define a confidence interval around the mean of any given sample, and simulate production of such […]

## SWIG, C++, Python, and Monte-Carlo simulation

In the previous post, I introduced MCS-libre, my C++ library for Monte-Carlo simulation. Here I show how to access it from Python using the Simplified Wrapper and Interface Generator (SWIG), while in the process demonstrating how to use SWIG with C++ classes. First we download and decompress the MCS-libre library code: Next, we create a SWIG […]

## monte-carlo simulation in C++ with MCS-libre

Monte-Carlo simulation is a sometimes elegant (and sometimes crude) method for simulating complex systems. Parameters that affect the system are selected from random distributions and the system response to these values is then calculated. Repeating this process many times produces often useful information about the system. The method is especially useful for examining non-linear systems […]

## simulating a synthetic biology circuit with system dynamics

McAdams and Arkin report the following synthetic biology oscillator circuit in their paper “Gene regulation: Towards a circuit engineering discipline” [1]: The circuit works by having gene R1’s protein inhibit production of R3, who’s protein inhibits production of R2, which in turn inhibits production of R1. Delays in the inhibition processes cause sufficient expression of […]

## simulated ROC curves

How receiver operating characteristic (ROC) curves vary with simulated data having stepped degrees of separation: Computational Notes These were created in R using the “ROCR” package. Be sure to say “ROCR” really fast! The simulated data are normally distributed within each group.

## system dynamics model of the Oregon Health Plan’s client caseload

Developed this model and wrote this description in 2007 as an analyst for the State of Oregon. We ultimately never used or published this model; I’m posting it here in hopes that someone will find it useful when a Google search delivers it. Introduction The State of Oregon offers medical assistance to low-income individuals […]

## DIY caffeine pharmacokinetics

A night of insomnia last weekend prompted me to build a mathematical model of my caffeine throughput. System dynamics provides the framework: Model description The stock and flow diagram shown above describes the basic system:  “Pipes” represent caffeine flow into and out of “reservoirs” (the boxes) that store caffeine. The text labels denote system variables, […]

## data scientist goes coolhunting…

Intuitive coolhunting scales poorly. Here’s some math to help fix that problem: Axioms of cool Five axioms enable us to mathematically model cool: No one is intrinsically cool, individuals simply channel it. Ability to temporarily hold coolness varies by individual. Coolness naturally flows into some individuals more readily than others. Rate of coolness flow into […]