net change of zero between closing and opening stock prices

I decided to investigate the variation between trading days’ closing prices and the following trading days’ opening prices for stocks listed on the New York Stock Exchange. I started with data in the following format for all trading days between January 2nd 2000 and October 30th 2014:

source_data_POST_CROP

I then calculated the percent change between one day’s closing price and the next day’s opening price for each day and each stock symbol in the data set, compiling these values into lists according to opening day. Finally, I plotted the list values in the following box plot:

percent_change_between_close_and_open__NO_FLIERS

In the above plot the outliers are not shown (they will be shown below). The median percent change for each day is approximately zero; statistical significance tests suggested the median was not zero, but for practical purposes it is.

For the sake of completeness, here is the box plot with outliers included:

percent_change_between_close_and_open

Here we see that there are heavy tails on the right side of the distributions. However, for the sake of developing a trading strategy, I’m focusing on the interquartile range centered on zero.

What’s Missing

I visually checked two stocks’ percent price change time series for autocorrelation using autocorrelation plots and found no evidence of autocorrelation. However, this was not a comprehensive analysis covering all the stocks traded in the NYSE. The statistics course I am taking right now will soon teach a hypothesis test for autocorrelation I will use in future analyses.

Code

The following Python code implements the above described calculation:

#
# load useful libraries
#
import datetime
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import wilcoxon
from scipy.stats.mstats import kruskalwallis
import numpy as np

#
# load data
#
data = {}
f = open('source_data.csv')
for i, line in enumerate(f):
    line = line.strip()
    symbol = line.split(',')[0]
    date_string = line.split(',')[1]
    open_price = float(line.split(',')[2])
    close_price = float(line.split(',')[5])

    year = int(date_string.split('-')[0])
    month = int(date_string.split('-')[1])
    day = int(date_string.split('-')[2])
    date = datetime.datetime(year, month, day, 0, 0)

    if not data.has_key(symbol):
        data[symbol] = {'date' : [], 'open' : [], 'close' : [], 'weekday' : []}
    data[symbol]['date'].append(date)
    data[symbol]['open'].append(open_price)
    data[symbol]['close'].append(close_price)
    data[symbol]['weekday'].append(date.date().weekday())
f.close()

#
# convert to pandas dataframe, this ensures arrays are aligned by date
#
for symbol in sorted(data.keys()):
    closing_prices = data[symbol]['close']
    opening_prices = data[symbol]['open']
    weekdays = data[symbol]['weekday']
    date = data[symbol]['date']

    pd_closing_prices = pd.Series(closing_prices, index=date)
    pd_opening_prices = pd.Series(opening_prices, index=date)
    pd_weekdays = pd.Series(weekdays, index=date)

    d = {'close' : pd_closing_prices,
         'open' : pd_opening_prices,
         'weekday' : pd_weekdays}
    df = pd.DataFrame(d)

    del(data[symbol]['close'])
    del(data[symbol]['open'])
    del(data[symbol]['date'])
    del(data[symbol]['weekday'])

    data[symbol] = df

#
# compute percent change
#
for symbol in sorted(data.keys()):
    df = data[symbol]
    closing_prices = data[symbol]['close']
    opening_prices = data[symbol]['open']
    weekdays = data[symbol]['weekday']
    date_index = data[symbol].index

    percent_change_list = []
    for i in range(1, len(opening_prices)):
        price_at_open = opening_prices[i]
        price_at_close_day_before = closing_prices[i-1]
        weekday_of_open = weekdays[i]

        percent_change = (price_at_open - price_at_close_day_before) / price_at_close_day_before
        percent_change_list.append(percent_change)

    percent_change_series = pd.Series(percent_change_list, date_index[1:])
    data[symbol]['percent_change'] = percent_change_series

    #
    # test for autocorrelation
    #
    #plt.figure()
    #plt.acorr(percent_change_series, maxlags=30)
    #plt.show()

#
# generate percent change array by weekday for all stocks combined
#
all_stocks = {0 : [], 1 : [], 2 : [], 3 : [], 4 : []}
for symbol in data.keys():
    df = data[symbol]
    percent_change = df['percent_change']
    weekdays = df['weekday']

    for i in range(1, len(percent_change)):
        all_stocks[weekdays[i]].append(percent_change[i])

#
# plot
#
boxplot_data = [all_stocks[0], all_stocks[1], all_stocks[2], all_stocks[3], all_stocks[4]]
plt.figure(figsize=(11, 9))
plt.boxplot(boxplot_data, widths=0.9)
plt.xticks([1, 2, 3, 4, 5], ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])
plt.xlabel('Opening Weekday')
plt.ylabel('Percent Change')
plt.title('Percent Change from Previous Day\'s Close Price to Open Price, by Opening Weekday')
plt.savefig('percent_change_between_close_and_open.png')
plt.close()

plt.figure(figsize=(11, 9))
plt.boxplot(boxplot_data, widths=0.9, showfliers=False)
plt.xticks([1, 2, 3, 4, 5], ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])
plt.xlabel('Opening Weekday')
plt.ylabel('Percent Change')
plt.title('Percent Change from Previous Day\'s Close Price to Open Price, by Opening Weekday')
plt.savefig('percent_change_between_close_and_open__NO_FLIERS.png')
plt.close()

#
# test for median of zero
#
print
for i, days_data in enumerate(boxplot_data):
    print i, wilcoxon(np.array(days_data))

#
# Kruskal-Wallis test for same medians
#
print
print kruskalwallis(np.array(boxplot_data[0]), np.array(boxplot_data[1]), np.arry(boxplot_data[2]), np.array(boxplot_data[3]), np.array(boxplot_data[4]))

#
# print lengths of each array
#
print
for i, days_data in enumerate(boxplot_data):
    print i, len(days_data)
print

One thought on “net change of zero between closing and opening stock prices

  1. “between January 2nd 2000 and October 30th 2014”-

    “50 and 100 days Average Price is a principal tool used in trading common stocks”…

    Significant as reading daily stock movement is the influence of Institutional Investors,
    whose predictability is perhaps “Happening Through the “Boys within the Back Room.”

    “A rigged Game”!

    I observe where the Common Stock {CS} is the percentage from the annual top CS price; and,
    the percentage from the annual bottom price. This presents me with perception’ the CS will
    settle at a cost somewhere above the bottom stock cost and the perceived annual maximum.

    Evaluating the daily volume of stock traded indicate ‘interest of the Boys within the room’.

    I missed the Semiconductor Industry by ‘sitting on my hands’ ignoring my GUT feeling. AMD is my primary example. QUICK is next…” thought speculation, “act now” then I CHICKEN OUT”… [i HATE this thought process]
    -“Cloud Computing” has arrived in force with EQIX another GEM has eclipsed the market in “Harden Data Centers”…

    My favorite Semiconductor sector is “optical Wave Length Division Multiplexing Chips”…however, the stocks are too high for my budget.

    Pharmaceutical CS are a “crap shoot” as I know nothing of market activity…I cannot understand what these corporations accomplish; except reading of the Winners [Big Fukin Time] and losers…[ugly…].

    “You can always pick your Friends,
    You can always pick your Nose; but,
    You cannot always pick your Friend’s Nose”

Leave a Reply

Your email address will not be published.