Before we can score segments in the genome having a small number of mismatches to a CRISPR for their off-target risk, we must first find these segments. Searching for every possible mismatch permutation proves computationally expensive, so we apply the following heuristic: We only search for mismatches in the top positions relevant to CRISPR efficiency. […]

# Category: engineering

## pseudo-harmonic FOREX prediction with machine learning (part one)

“Harmonic” trading methods seek patterns in the relationships between neighboring peaks and valleys in the time series. Particularly, harmonic traders seek pre-specified ratios in the price differences among a series of peaks and valleys. For example, a trader might observe the following pattern: Let A, B, C, D, and E be the points in the […]

## picking stocks by graph database (part 2: machine learning)

In our last post, we demonstrated a graph database created to enable study of the stock market, particularly the study of causality relationships. So how to proceed from there? At this stage we want to pick winning stocks, not write an academic paper, so our focus turns toward practical machine learning. Source Data We start […]

## picking stocks by graph database (part one)

Historical stock price data comes readily available at daily resolution. So we calculated the Granger causality for each pair of stocks we hold data for, at one and two day lags (testing the question “does daily percent change in volume for stock X Granger cause daily percent change in adjusted close price for stock Y?”). […]

## a fashionista’s astronomy calculations

Cheers to all my fellow chic women in STEM! I take a lot of photos and film of myself, for reasons that mostly have to do with fashion. However, I do not have control over my lighting considering I do this outside without any equipment other than a video camera. So I try to shoot […]

## how I make a living: what is bioinformatics? (part #1)

I’m constantly asked to explain what I do for a living. Here is an attempt to do so in laypersons’ terms. I’ll assume my readers are non-scientists and non-engineers, but that they’ve taken a high school biology class. “Bioinformatics” is the application of mathematics and computer science to biological data, particularly molecular biology data. By […]

## fast genomic coordinate comparison using PostgreSQL’s geometric operators

PostgreSQL provides operators for comparing geometric data types, for example for computing whether two boxes overlap or whether one box contains another. Such operators are quick compared to similar calculations implemented using normal comparison operators, which I’ll demonstrate below. Here I show use of such geometric data types and operators for determining whether one segment […]

## iBioSim: a CAD package for genetic circuits

iBioSim is a CAD package for the design, analysis, and simulation of genetic circuits. It can also be used for modeling metabolic networks, pathways, and other biological/chemical processes [1]. The tool provides a graphical user interface (GUI) for specifying circuit design and parameters, and a GUI for running simulations on the resulting models and viewing […]

## clustering stocks by price correlation (part 2)

In my last post, “clustering stocks by price correlation (part 1)“, I performed hierarchical clustering of NYSE stocks by correlation in weekly closing price. I expected the stocks to cluster by industry, and found that they did not. I proposed several explanations for this observation, including that perhaps I chose a poor distance metric for […]

## clustering stocks by price correlation (part 1)

I’ve been building my knowledge of clustering techniques to apply to genetic circuit engineering, and decided to try the same tools for stock price analysis. In this post I describe building a hierarchical cluster of stocks by pairwise correlation in weekly price, to see how well the stocks cluster by industry, and compare the derived […]