I decided to investigate the variation between trading days’ closing prices and the following trading days’ opening prices for stocks listed on the New York Stock Exchange. I started with data in the following format for all trading days between January 2nd 2000 and October 30th 2014:

I then calculated the percent change between one day’s closing price and the next day’s opening price for each day and each stock symbol in the data set, compiling these values into lists according to opening day. Finally, I plotted the list values in the following box plot:

In the above plot the outliers are not shown (they will be shown below). The median percent change for each day is approximately zero; statistical significance tests suggested the median was not zero, but for practical purposes it is.

For the sake of completeness, here is the box plot with outliers included:

Here we see that there are heavy tails on the right side of the distributions. However, for the sake of developing a trading strategy, I’m focusing on the interquartile range centered on zero.

# What’s Missing

I visually checked two stocks’ percent price change time series for autocorrelation using autocorrelation plots and found no evidence of autocorrelation. However, this was not a comprehensive analysis covering all the stocks traded in the NYSE. The statistics course I am taking right now will soon teach a hypothesis test for autocorrelation I will use in future analyses.

# Code

The following Python code implements the above described calculation:

# # load useful libraries # import datetime import pandas as pd import matplotlib.pyplot as plt from scipy.stats import wilcoxon from scipy.stats.mstats import kruskalwallis import numpy as np # # load data # data = {} f = open('source_data.csv') for i, line in enumerate(f): line = line.strip() symbol = line.split(',')[0] date_string = line.split(',')[1] open_price = float(line.split(',')[2]) close_price = float(line.split(',')[5]) year = int(date_string.split('-')[0]) month = int(date_string.split('-')[1]) day = int(date_string.split('-')[2]) date = datetime.datetime(year, month, day, 0, 0) if not data.has_key(symbol): data[symbol] = {'date' : [], 'open' : [], 'close' : [], 'weekday' : []} data[symbol]['date'].append(date) data[symbol]['open'].append(open_price) data[symbol]['close'].append(close_price) data[symbol]['weekday'].append(date.date().weekday()) f.close() # # convert to pandas dataframe, this ensures arrays are aligned by date # for symbol in sorted(data.keys()): closing_prices = data[symbol]['close'] opening_prices = data[symbol]['open'] weekdays = data[symbol]['weekday'] date = data[symbol]['date'] pd_closing_prices = pd.Series(closing_prices, index=date) pd_opening_prices = pd.Series(opening_prices, index=date) pd_weekdays = pd.Series(weekdays, index=date) d = {'close' : pd_closing_prices, 'open' : pd_opening_prices, 'weekday' : pd_weekdays} df = pd.DataFrame(d) del(data[symbol]['close']) del(data[symbol]['open']) del(data[symbol]['date']) del(data[symbol]['weekday']) data[symbol] = df # # compute percent change # for symbol in sorted(data.keys()): df = data[symbol] closing_prices = data[symbol]['close'] opening_prices = data[symbol]['open'] weekdays = data[symbol]['weekday'] date_index = data[symbol].index percent_change_list = [] for i in range(1, len(opening_prices)): price_at_open = opening_prices[i] price_at_close_day_before = closing_prices[i-1] weekday_of_open = weekdays[i] percent_change = (price_at_open - price_at_close_day_before) / price_at_close_day_before percent_change_list.append(percent_change) percent_change_series = pd.Series(percent_change_list, date_index[1:]) data[symbol]['percent_change'] = percent_change_series # # test for autocorrelation # #plt.figure() #plt.acorr(percent_change_series, maxlags=30) #plt.show() # # generate percent change array by weekday for all stocks combined # all_stocks = {0 : [], 1 : [], 2 : [], 3 : [], 4 : []} for symbol in data.keys(): df = data[symbol] percent_change = df['percent_change'] weekdays = df['weekday'] for i in range(1, len(percent_change)): all_stocks[weekdays[i]].append(percent_change[i]) # # plot # boxplot_data = [all_stocks[0], all_stocks[1], all_stocks[2], all_stocks[3], all_stocks[4]] plt.figure(figsize=(11, 9)) plt.boxplot(boxplot_data, widths=0.9) plt.xticks([1, 2, 3, 4, 5], ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']) plt.xlabel('Opening Weekday') plt.ylabel('Percent Change') plt.title('Percent Change from Previous Day\'s Close Price to Open Price, by Opening Weekday') plt.savefig('percent_change_between_close_and_open.png') plt.close() plt.figure(figsize=(11, 9)) plt.boxplot(boxplot_data, widths=0.9, showfliers=False) plt.xticks([1, 2, 3, 4, 5], ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']) plt.xlabel('Opening Weekday') plt.ylabel('Percent Change') plt.title('Percent Change from Previous Day\'s Close Price to Open Price, by Opening Weekday') plt.savefig('percent_change_between_close_and_open__NO_FLIERS.png') plt.close() # # test for median of zero # print for i, days_data in enumerate(boxplot_data): print i, wilcoxon(np.array(days_data)) # # Kruskal-Wallis test for same medians # print print kruskalwallis(np.array(boxplot_data[0]), np.array(boxplot_data[1]), np.arry(boxplot_data[2]), np.array(boxplot_data[3]), np.array(boxplot_data[4])) # # print lengths of each array # print for i, days_data in enumerate(boxplot_data): print i, len(days_data) print

“between January 2nd 2000 and October 30th 2014”-

–

“50 and 100 days Average Price is a principal tool used in trading common stocks”…

–

Significant as reading daily stock movement is the influence of Institutional Investors,

whose predictability is perhaps “Happening Through the “Boys within the Back Room.”

–

“A rigged Game”!

–

I observe where the Common Stock {CS} is the percentage from the annual top CS price; and,

the percentage from the annual bottom price. This presents me with perception’ the CS will

settle at a cost somewhere above the bottom stock cost and the perceived annual maximum.

–

Evaluating the daily volume of stock traded indicate ‘interest of the Boys within the room’.

–

I missed the Semiconductor Industry by ‘sitting on my hands’ ignoring my GUT feeling. AMD is my primary example. QUICK is next…” thought speculation, “act now” then I CHICKEN OUT”… [i HATE this thought process]

-“Cloud Computing” has arrived in force with EQIX another GEM has eclipsed the market in “Harden Data Centers”…

–

My favorite Semiconductor sector is “optical Wave Length Division Multiplexing Chips”…however, the stocks are too high for my budget.

–

Pharmaceutical CS are a “crap shoot” as I know nothing of market activity…I cannot understand what these corporations accomplish; except reading of the Winners [Big Fukin Time] and losers…[ugly…].

–

“You can always pick your Friends,

You can always pick your Nose; but,

You cannot always pick your Friend’s Nose”