Autocorrelation in FOREX

To inform the construction of a machine learning-based price prediction algorithm, we want to understand how many lags prove statistically significant with regard to autocorrelation in the seven major FOREX pairs. So we first choose 10,000 random time points between January 1, 2000 and January 1, 2017 for each of the seven pairs. Then we […]

Bayesian network modeling stock price change

Taking a cue from the systems biology folks, I decided to model stock price change interactions using a dynamic Bayesian network. For this analysis I focused on the members of the Dow Jones Industrial Average (DJIA) that are listed on the New York Stock Exchange (NYSE). Bayesian Networks A Bayesian network is an acyclic directed […]

clustering stocks by price correlation (part 2)

In my last post, “clustering stocks by price correlation (part 1)“, I performed hierarchical clustering of NYSE stocks by correlation in weekly closing price. I expected the stocks to cluster by industry, and found that they did not. I proposed several explanations for this observation, including that perhaps I chose a poor distance metric for […]

clustering stocks by price correlation (part 1)

I’ve been building my knowledge of clustering techniques to apply to genetic circuit engineering, and decided to try the same tools for stock price analysis. In this post I describe building a hierarchical cluster of stocks by pairwise correlation in weekly price, to see how well the stocks cluster by industry, and compare the derived […]

net change of zero between closing and opening stock prices

I decided to investigate the variation between trading days’ closing prices and the following trading days’ opening prices for stocks listed on the New York Stock Exchange. I started with data in the following format for all trading days between January 2nd 2000 and October 30th 2014: I then calculated the percent change between one […]

Apache Spark and stock price causality

The Challenge I wanted to compute Granger causality (described below) for each pair of stocks listed in the New York Stock Exchange. Moreover, I wanted to analyze between one and thirty lags for each pair’s comparison. Needless to say, this requires massive computing power. I used Amazon EC2 as the computing platform, but needed a […]

industrial diversity correlates with population

It seems logical that U.S. counties having greater populations would support more diverse industry than counties having lesser population. Perhaps this has been proven already, but I recently stumbled upon my own verification of the idea: The above plot shows industry diversity (expressed in the form of Shannon entropy, discussed below) as a function of […]

test driving Amazon Web Services’ Elastic MapReduce

Hadoop provides software infrastructure for running MapReduce tasks, but it requires substantial setup time and availability of a compute cluster to take full advantage of. Amazon’s Elastic MapReduce (EMR) solves these problems; delivering pre-configured Hadoop virtual machines running on the cloud for only the time they are required, and billing only for the computation minutes […]

the first Big Data recession

The “Great Recession” of 2007-2009 may be the first “Big Data” recession, i.e., the first recession which we can examine using the vast information delivered by the advent of Big Data. Certainly the next recession will be studied through that lens. To test whether new data is available that can be cast in an economic […]

the future orientation index

I’ve recently discovered Google Trends and have been looking for an opportunity to use it. Today I found such opportunity in a paper [1] published last April that computes countries’ “future orientation index” from Google Trends data and correlates it with national per-capita GDP. The authors report correlation for 2010; my experiment with Google Trends […]