In my last post, I illustrated how the Kaplan-Meier estimator can be used to estimate the survival curve of mRNA half-lives. In this post I will expand on that analysis and show how to compare two mRNA half-life Kaplan-Meier curves, each corresponding to a measured gene outcome, to see if mRNA half-life differs between outcomes….

# Category: science

## mRNA half-life survival curve estimation

In a recent post, I demonstrated the use of the Kaplan-Meier estimator for estimating survival curves of fictional characters undergoing treatment in a fictional drug trial. Here I illustrate the Kaplan-Meier estimator on real data, data that is unique from normal survival analysis data in that the event under consideration is neither time until death…

## stalling an airplane mid-flight

Thanks to my brother, I recently had the opportunity to fly a small Cessna aircraft under supervision of a flight instructor. The instructor took off and landed, but gave me the controls during flight. During this time we went through a few instructive maneuvers, including stalling the plane mid-flight. Here I explain how stalling an…

## the Kaplan-Meier estimator

In my last post, I wrote about censored data. This post continues the survival analysis theme by focusing on estimation of survival curves. In survival and reliability analysis, it is useful to determine the survival curve for the population under study. This is the curve defined by the probability that a random variable indicating the…

## “right censored” data

In clinical trials and reliability studies, researchers often measure the time until an event occurs for each patient or object in the study. That event may be patient death in the case of clinical trials for a new cancer drug, or bridge failure in the case of a reliability study of bridges. Sometimes, however, the…

## Excel mangles NCBI gene symbols

Using Microsoft’s Excel for bioinformatics work sucks, but sometimes a spreadsheet is the best format for communicating results to other scientists. The program’s default behavior mangles some NCBI gene symbols when you import them from a text file. Here is how to deal with it. Suppose you have the following list of gene symbols, and…

## simulating a synthetic biology circuit with system dynamics

McAdams and Arkin report the following synthetic biology oscillator circuit in their paper “Gene regulation: Towards a circuit engineering discipline” [1]: The circuit works by having gene R1’s protein inhibit production of R3, who’s protein inhibits production of R2, which in turn inhibits production of R1. Delays in the inhibition processes cause sufficient expression of…

## CPAP and the “bends”

Can using a CPAP machine cause decompression sickness (aka. the “bends”)? No. The following discussion outlines why. Decompression Sickness As they descend underwater, scuba divers breath air compressed to the same pressure as the surrounding water. For example, at sea level they breathe air at a pressure of 1 atm, while at a depth of…

## statistical reasoning in the “The Simpsons”

FOX recently* broadcasted a fundamental question that drives good science: “I’m sure there’s a correlation, but could there be a causation?” The intrepid Lisa Simpson, the greatest cartoon scientist of our time, spoke these words after observing a pair of scorpions become docile in the presence of a specific plant. Quality statistical reasoning rarely gets…

## RNAfold and sequence length

I’ve been looking for a way to compare RNAfold [1] dG results for two RNA sequences, where the two sequences differ in length. My initial thought was to simply divide the computed dG’s by sequence length (i.e., normalize by sequence length) and then compare the results. The analysis presented below shows why this won’t work….

## comparing BLAST results by bit score ratio

I recently read that two separate BLAST alignments to the same reference sequence can be compared to each other by normalizing the alignments by the maximum bit score of the reference sequence BLASTed against itself [1]. In this procedure, the user first aligns the reference sequence to itself to find the maximum possible bit score,…

## DIY hydroponics

The coming freshwater supply crisis prompts a need to design food-growing methods that require less water than current methods do. Hydroponics provides one such method. Here I report on my recent effort to design and build a hydroponic strawberry grower. But first, what does this have to do with data science? Not much at the…

## when you lack potential (energy), drive fast to compensate

Each gear on a car having a manual transmission offers a specific acceleration level to the driver. The lower the gear, the greater amount of acceleration available for use. If we imagine each gear setting as a distinct configuration of the automobile allowing it to perform work (e.g., accelerate out of a hazardous traffic setting)…

## lunar ephemeris calculations with PyEphem

While analyzing data for my recent post demonstrating that the lunar cycle does not correlate with crime incidents, I needed to compute daily lunar ephemeris data to match with daily crime incident counts. To accomplish this I turned to PyEphem, a Python package that computes–among other things–lunar position and phase for any given date. I first…

## 21504 to 1 odds the sun will rise tomorrow: an illustration of Bayesian reasoning

The following preposterous case illustrates the Bayesian worldview: Prior estimate If you ask a mathematically-gifted newborn for the probability that the sun will rise tomorrow, they might reply: “The probability that the sun will rise tomorrow follows a beta distribution with parameters a = b = 2.” Since the mean of the above distribution is…

## innovation’s long view

My favorite album, Midnight Oil’s Redneck Wonderland, contains no particularly memorable tracks. Had Sony demanded hits from the recording, it would never have left the studio. But taken together, the songs add up to one of the boldest albums ever made. Measuring personal innovation in discrete “units” (patents, daily activity, publications, Nobel Prizes, etc.) is like…

## DIY caffeine pharmacokinetics

A night of insomnia last weekend prompted me to build a mathematical model of my caffeine throughput. System dynamics provides the framework: Model description The stock and flow diagram shown above describes the basic system: “Pipes” represent caffeine flow into and out of “reservoirs” (the boxes) that store caffeine. The text labels denote system variables,…

## the lunar cycle: not a partner in crime

Emily Williams and Stacie Dutton, SETEC Astronomy, San Francisco, California, USA Despite abundant scientific evidence refuting the connection, the “lunar effect” persists as a common explanation for temporal variation in human behavior. Adherents of this idea implicate the lunar cycle in outcomes as diverse as lost elections and hemophilic episodes. We find the myth woven…

## werewolf transcriptome conjecture

Lycanthropy—the sudden transformation of individuals into wolf-human chimeras during full moon periods—remains one of the least understood medical conditions persisting today. Researchers find investigation of the phenomenon doubly confounded by social stigma (who wants to tell a scientist that they are a werewolf?) and sampling difficulty (how many werewolves will actually sit still for a…

## Data scientist makes peace with web programming

True to my hacker roots, I prefer command line interfaces (CLIs) to graphical user interfaces (GUIs). That sentiment compounds when the GUI is delivered through a web browser. However, I recently—finally—accepted the fact that the web browser is the most important user interface out there, and the only user interface that most scientists will bother…

## Stephen Colbert teaches proper data normalization

Colbert Report devotees recently witnessed a true miracle—Stephen Colbert spoke data science: Due to the massive volume [of suggestions received], we … used computers to crunch the data. Mr. Colbert had just received approximately 53,000 e-mailed suggestions from his minions proposing social issues for the newly formed Colbert Super PAC to address. Financial contributions to…

## Austin heat wave office wager

Data scientist enters the office betting pool: Whoever most accurately predicts the day Austin’s heat wave breaks (first day with a high temperature less than 100 degrees Fahrenheit) wins. Seeking a defendable approach, our hero generates an ARIMA time-series forecast based on the last eleven years of daily high temperatures: The model suggests September 1st…

## bioinformatics as data science

“…a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.” – The Economist, 27 February 2010 [emphasis mine] The above quotation neatly summarizes the role and core competencies of a bioinformatics professional, under the…

## marketing to scientists will give Red Bull new wings

“I don’t always drink rocket fuel, but when I do, I prefer Red Bull.” I hereby offer Red Bull a unique opportunity to sponsor my science career: I’ll drink Red Bull while giving talks at scientific conferences and at my eventual Ig Nobel Prize acceptance ceremony, while Red Bull pays me lots of money. Scientists…