Update 29 April 2018
I suspect this result is erroneous in that the graph often shows two arrows between any two given nodes, one inward and one outward. I’ll investigate this further and get back to you… – Emily
Taking a cue from the systems biology folks, I decided to model stock price change interactions using a dynamic Bayesian network. For this analysis I focused on the members of the Dow Jones Industrial Average (DJIA) that are listed on the New York Stock Exchange (NYSE).
A Bayesian network is an acyclic directed graph where the nodes are random variables and the edges indicate conditional probabilistic relationships between the nodes. For example, a given node might represent the percent change in a stock’s price, and edges entering the node might represent factors that influence the node’s value.
The graph’s structure may be specified by a knowledgeable researcher, or may be learned from data using a search algorithm. We will demonstrate the latter case below. When learning from time series data, the graph need no longer be acyclic and the resulting graph is called a dynamic Bayesian network. The reason the acyclic requirement is dropped is that one node might influence a second node at one time point, while the second node influences the first node at another time point. This is a common occurrence in biological circuits which contain feedback loops.
The following is a dynamic Bayesian network learned from a year of NYSE DJIA stocks’ daily changes in closing prices. The method used to learn the network from the data is described below.
From this we see that the probability associated with stock JNJ’s daily change in closing price is conditional on stock V’s daily change in closing price, and subsequently the probability associated with stock XOM’s daily change in closing price is conditional on JNJ’s daily change in closing price. Because this is a dynamic Bayesian network, we also see pairs such as XOM and UTX which are mutually dependent on each other. This results from the fact that the first stock in the pair might influence the second at one date, and the second might influence the first on a different date.
These results are pure unsupervised machine learning; I have not yet analyzed whether the proposed interactions make sense economically. Such analysis will be required to validate the method.
The graph shown above was generated by GraphViz.
The following describes the procedure used to learn the above network from the data. Python code implementing the procedure is attached to the end of this post.
I started with a list of daily stock prices for all the stocks listed in the NYSE, e.g.,
I then filtered out all symbols not found in the DJIA and all dates outside of the range 29 October 2013 through 28 October 2014. I also discarded the open, high, low, and adjusted prices from the above data structure, as well as the volume, since we are only modeling change in closing price.
Next, I sorted each symbol’s closing price by date and computed a “percent change in closing price” time series for each symbol from the sorted closing price time series.
I then “discretized” the data as follows: For each trading day in the time series, I computed the mean percent change in closing price across all stocks. Using this I created a new time series for each stock, assigning a zero to the date if the percent change was below the daily mean, and a one to the date if the percent change was equal to or above the daily mean. This produced a series of ones and zeros for each stock.
I performed this discretization process to simplify my learning of Bayesian network learning procedures, taking the idea from gene expression modeling where indication of whether a gene is expressed or not might be dependent on its expression level in relation to a control gene’s expression level. Nonetheless, a more sophisticated model could have used a trinary discretization, or no discretization at all.
I next “transposed” the discretized time series as follows: Placed each time series for each stock as a row in a matrix, and then transposed the matrix so that each stock is a column and each row is a “system state” for the stocks on a given day. (The “system state” is made up of zeros and ones indicating movement of the percent change in price, per stock). I assigned a probability (hereafter called the “base state probability”) to each of these system states as one divided by the total number of system states, and then checked for duplicate system states, adjusting the base state probability accordingly.
I then created an initial Bayesian network graph, assigning a node to each NYSE DJIA stock and no edges, as shown by the following networkx image:
The learning procedure applies a greedy algorithm to search for an optimal graph: mutating the graph (randomly adding, reversing, or removing an edge) and scoring it against the data. If the mutated graph’s score exceeds the previous graph’s score, the new graph is retained and is further mutated until a set number of iterations has run. The scoring procedure is described below:
Bayes theorem provides a means to relate the probability of observing data given a graph structure to the probability of observing a graph given data
In this equation, P(Data) is a constant that does not matter because we are simply trying to maximize P(Graph|Data). The prior P(Graph) is a function of the graph structure that we use to penalize graphs with many edges over graphs with fewer edges, seeking the graph that provides the simplest explanation of the data. Taking the log of each side gives us the equation we use for scoring a graph given the data
We are left with the need to compute P(Graph|Data), given by the following equation
Here t = 1 through T are the individual time points, and i = 1 through n are the nodes (stocks). For each time point t, we compute the probability of Xi given the state of its parent nodes in the graph. This is computed using the base state probabilities defined above and the system state at the time point.
I mutated the graph as described above 10,000 times, scoring each iteration and comparing it to the previous high score, which generated the network shown in the Results section. However, for the final graph I computed P(Data|Graph) for each time point, and found that the resulting probabilities proved really small:
These low probabilities make me suspicious of the resulting network. Again, analysis of how the particular stocks that the algorithm joined together relate economically is necessary to evaluate the method.