how I make a living: what is bioinformatics? (part #1)

I’m constantly asked to explain what I do for a living. Here is an attempt to do so in laypersons’ terms. I’ll assume my readers are non-scientists and non-engineers, but that they’ve taken a high school biology class. “Bioinformatics” is the application of mathematics and computer science to biological data, particularly molecular biology data. By […]

fast genomic coordinate comparison using PostgreSQL’s geometric operators

PostgreSQL provides operators for comparing geometric data types, for example for computing whether two boxes overlap or whether one box contains another. Such operators are quick compared to similar calculations implemented using normal comparison operators, which I’ll demonstrate below. Here I show use of such geometric data types and operators for determining whether one segment […]

Bayesian network modeling stock price change

Update 29 April 2018 I suspect this result is erroneous in that the graph often shows two arrows between any two given nodes, one inward and one outward. I’ll investigate this further and get back to you… – Emily Introduction Taking a cue from the systems biology folks, I decided to model stock price change […]

iBioSim: a CAD package for genetic circuits

iBioSim is a CAD package for the design, analysis, and simulation of genetic circuits. It can also be used for modeling metabolic networks, pathways, and other biological/chemical processes [1]. The tool provides a graphical user interface (GUI) for specifying circuit design and parameters, and a GUI for running simulations on the resulting models and viewing […]

clustering stocks by price correlation (part 2)

In my last post, “clustering stocks by price correlation (part 1)“, I performed hierarchical clustering of NYSE stocks by correlation in weekly closing price. I expected the stocks to cluster by industry, and found that they did not. I proposed several explanations for this observation, including that perhaps I chose a poor distance metric for […]

clustering stocks by price correlation (part 1)

I’ve been building my knowledge of clustering techniques to apply to genetic circuit engineering, and decided to try the same tools for stock price analysis. In this post I describe building a hierarchical cluster of stocks by pairwise correlation in weekly price, to see how well the stocks cluster by industry, and compare the derived […]

simulating RNA-seq read counts

The Challenge I want to explore the statistics of RNA sequencing (RNA-seq) on next-generation sequencing (NGS) platforms in greater detail, so I thought I’d start by simulating read counts to experiment with. This post details how I constructed a simulated set of read counts, examines its concordance with the expected negative binomial distribution of the […]

sending an attachment with Amazon SES using Java

Sometimes data scientists need to write software that sends out automated e-mails, and sometimes those e-mails must carry attachments. Here is how to do so in Java using Amazon’s Simple Email Service (SES), which is an inexpensive outbound email service built on Amazon’s cloud infrastructure. Note: Be sure to log into Amazon SES and verify […]

fuzzy logic toolkit in C++

I recently came across some old C++ code I wrote about 10 years ago to assist fuzzy logic reasoning. This program is now posted on GitHub at https://github.com/badassdatascience/fuzzy-logic-toolkit and is described below. An example of the tool in action (with code) follows the description. Fuzzy Logic Suppose we have a numerical value for distance to […]