machine learning in FOREX (part one: establishing a performance baseline)

Introduction We’ve been applying machine learning to FOREX price prediction. The performance of our models varies widely, so to establish a baseline we created a simple linear regression model with which we can compare performance of more sophisticated models against. What We Are Trying To Do Given a time-series of 26 four-hour price samples, we […]

rapidly identifying potential CRISPR/Cas9 off-target sites (part one)

Before we can score segments in the genome having a small number of mismatches to a CRISPR for their off-target risk, we must first find these segments. Searching for every possible mismatch permutation proves computationally expensive, so we apply the following heuristic: We only search for mismatches in the top positions relevant to CRISPR efficiency. […]

selecting travel trailers by regression

Data Scientist has been thinking recently of moving into a used travel trailer. However, the weight of the trailer to be purchased is limited by that which our hero’s truck can pull. But most online used travel trailer listings only specify length of the vehicle, not its weight. So Data Scientist needed a quick way […]

the humble sum of the squared errors

As part of my effort to master statistical theory, I’m deconstructing basic statistics principles in blog posts, on the idea that writing about the principles is the best way to learn them more deeply. The humble sum of the squared errors (SSE) calculation has been a workhorse of statistics for the past 200 years. Here […]

overfitting in statistics and machine learning (part one)

Overfitting is a common risk when designing statistical and machine-learning models. Here I give a brief demonstration of overfitting in action, using simple regression models. A later post will more rigorously address how to quantify and avoid overfitting. We start by sampling data from the process using the R code: Then we produce a linear […]

industrial diversity vs percent change in unemployment rate

This analysis may exceed the bounds of my statistics knowledge, but I will deliver it anyway in the name of “process” blogging. I welcome experienced critique of the method! Result A modest positive correlation exists between a county’s industrial diversity and its percent change in unemployment rate over the period 2007-2010. Method Several months ago […]