# hacking the stock market (part 1)

Caveat: I am not a technical investor–just a hobbyist, so take this analysis with a grain of salt. I am also just beginning with my Master’s work in statistics.

I wanted to examine the correlation between changes in the daily closing price of the Dow Jones Industrial Average (DJIA) and lags of those changes, to see if there is a pattern I could use. First I downloaded the DJIA data from Yahoo using Pandas:

```#
#
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import math
from scipy.stats.stats import pearsonr
from scipy.stats.stats import spearmanr

#
# load DJIA data from Yahoo server
#
djia = DataReader("DJIA",  "yahoo", datetime(2000,1,1), datetime.today())
```

I then generated the autocorrelation plots for the one-day differenced closing prices and the signs of the one-day differenced closing prices:

```#
# investigate the diff between DJIA closing prices
#
diff_1st_order_as_list = []
for d in diff_1st_order:
if not np.isnan(d):
diff_1st_order_as_list.append(d)
plt.subplot(2, 1, 1)
plt.acorr(diff_1st_order_as_list, maxlags=10)
plt.title("Autocorrelation of Diff of DJIA Adjusted Close")
plt.xlabel("Lag")
plt.ylabel("Correlation")

#
# sign of diff, not diff itself
#
diff_1st_order_sign = []
for d in diff_1st_order_as_list:
if not np.isnan(d / abs(d)):
diff_1st_order_sign.append(d / abs(d))
else:
diff_1st_order_sign.append(0)
plt.subplot(2, 1, 2)
plt.acorr(diff_1st_order_sign, maxlags=10)
plt.title("Autocorrelation of Sign of Diff of DJIA Adjusted Close")
plt.xlabel("Lag")
plt.ylabel("Correlation")
```

There is a very small negative correlation between the one-day closing price difference and the one-day lag of the one-day closing price difference. Similarly, there is an even smaller positive correlation between the one-day closing price difference and the three-day lag of the one-day closing price difference.

So I set out to find the proportion of times the difference and the one-day lag of the difference changes from day to day:

```#
# frequencies of 1-day lag changes in direction of closing price
#
count_opposite = 0
count_same = 0
i_list = []
j_list = []
for i in range(0, len(diff_1st_order_as_list) - 1):
price_diff_i = diff_1st_order_as_list[i]

i_list.append(price_diff_i)
j_list.append(price_diff_j)

sign_of_price_diff_i = 0
if not np.isnan(price_diff_i / abs(price_diff_i)):
sign_of_price_diff_i = int(price_diff_i / abs(price_diff_i))

sign_of_price_diff_j = 0
if not np.isnan(price_diff_j / abs(price_diff_j)):
sign_of_price_diff_j = int(price_diff_j / abs(price_diff_j))

if sign_of_price_diff_i == sign_of_price_diff_j:
count_same += 1
else:
count_opposite += 1

print
print 'Correlation coefficients for the diff lists:'
print '\t', 'Pearson R: ', pearsonr(i_list, j_list)[0]
print '\t', 'Spearman R: ', spearmanr(i_list, j_list)[0]
print

print 'Amount of time closing value direction remains the same: ', round(float(count_same) / (float(count_same) + float(count_opposite)), 3)
amount_time_changes = float(count_opposite) / (float(count_same) + float(count_opposite))
print 'Amount of time closing value direction changes: ', round(amount_time_changes, 3)
L = amount_time_changes - 1.959964*((math.sqrt(amount_time_changes*(1.0 - amount_time_changes)))/math.sqrt(float(len(diff_1st_order_as_list))))
U = amount_time_changes + 1.959964*((math.sqrt(amount_time_changes*(1.0 - amount_time_changes)))/math.sqrt(float(len(diff_1st_order_as_list))))
print 'Agresti-Coull C.I.: ', round(L, 3), '< p <', round(U, 3)
print
```

This analysis tells me that in the long run (at least over the period that I pulled DJIA data for), betting using one-day changes of direction of the closing price of the DJIA would slowly pay off. (However, we are ignoring the magnitudes of the changes in this analysis; the magnitudes may be insufficient to be worth the price of a trade. A future analysis will investigate this). The test for correlation between the two time-series (one-day difference and lag of one-day difference) shows that the Agresti-Coull confidence interval is appropriate (we have near independence), although this betting scheme relies on the thinest correlation detected by the autocorrelation plot.

There was another possible pattern in the autocorrelation plot above: a three-day lag positive correlation and a one-lag negative correlation. I decided to check out the proportion of times using the combination of the two would result in a prediction success greater than the null of 25% for the case that the three-day lag changes direction in opposite direction as the one-day lag and in the same direction as the zero-day value:

```#
# frequencies of combination of 1-day lag and 3-day lag changes in direction
# of closing price
#
count_matches = 0
count_non_matches = 0
i_list = []
j_list = []
k_list = []
for i in range(0, len(diff_1st_order_as_list) - 3):
price_diff_i = diff_1st_order_as_list[i]

i_list.append(price_diff_i)
j_list.append(price_diff_j)
k_list.append(price_diff_k)

# price_diff_i represents 3-day lag
sign_of_price_diff_i = 0
if not np.isnan(price_diff_i / abs(price_diff_i)):
sign_of_price_diff_i = int(price_diff_i / abs(price_diff_i))
sign_of_price_diff_j = 0

# price_diff_j represents 1-day lag
if not np.isnan(price_diff_j / abs(price_diff_j)):
sign_of_price_diff_j = int(price_diff_j / abs(price_diff_j))
sign_of_price_diff_k = 0

# price_diff_k represents current day
if not np.isnan(price_diff_k / abs(price_diff_k)):
sign_of_price_diff_k = int(price_diff_k / abs(price_diff_k))

if sign_of_price_diff_k != sign_of_price_diff_j and sign_of_price_diff_k == sign_of_price_diff_i:
count_matches += 1
else:
count_non_matches += 1

print 'Correlation coefficients for the diff lists:'
print '\t', 'Pearson R: ', pearsonr(i_list, j_list)[0]
print '\t', 'Spearman R: ', spearmanr(i_list, j_list)[0]
print '\t', 'Pearson R: ', pearsonr(i_list, k_list)[0]
print '\t', 'Spearman R: ', spearmanr(i_list, k_list)[0]
print '\t', 'Pearson R: ', pearsonr(j_list, k_list)[0]
print '\t', 'Spearman R: ', spearmanr(j_list, k_list)[0]
print

amount_time_changes = float(count_matches) / (float(count_matches) + float(count_non_matches))
print 'Amount of time 1-day change is opposite direction and 3-day change is same direction: ', round(amount_time_changes, 3)
L = amount_time_changes - 1.959964*((math.sqrt(amount_time_changes*(1.0 - amount_time_changes)))/math.sqrt(float(len(diff_1st_order_as_list))))
U = amount_time_changes + 1.959964*((math.sqrt(amount_time_changes*(1.0 - amount_time_changes)))/math.sqrt(float(len(diff_1st_order_as_list))))
print 'Agresti-Coull C.I.: ', round(L, 3), '< p <', round(U, 3)
print
```

To complete the code, we need to show the plot:

```#
# show the plot
#
plt.show()
```

I’m not certain the narrow margin of opportunity detected by this analysis is sufficient for the development of a trading strategy. More investigation is needed.