EC2 spot instance price change: no correlation with day of week

My plans for world domination involve heavy use of Amazon EC2 instances, but I have to be frugal about it so I’m running spot instances to save cash. Therefore a means of forecasting spot instance prices would be helpful.

Thus far I’ve had little success using mainstream forecasting tools such as ARIMA and exponential smoothing. So I’m now looking for explanatory variables to use in regressions. I thought I’d start with day of the week as a variable to see if there is a pattern.

I sampled 11 weeks of price changes (code below) from the two US west coast EC2 regions, counting how many times prices went up or down by day of the week the price change occurred on. Doing so produced the following measurements:

table_POST_CROPPED

From these results we see that price change variance goes down on weekends, and that price increases tend to occur slightly more frequently than price decreases on weekdays, though not by much.

I also logged the percent change in price for each price change and produced the following box plot (outliers are not displayed for now since they flood the image):

boxplot_wo_outliers

Again from these results we see that price change variance decreases on weekends. However, overall percent change in price for each price change shows a net of zero for all days. Therefore, I do not think the day of the week will be a useful predictor in a future model of EC2 spot prices. It may be possible to modify forecast confidence intervals based on day of the week due to the decrease in price change variance that occurs on weekends, but I’m not sure yet.

Plotting the box plot with the outliers displayed shows that the distributions are right skewed. I haven’t yet figured out how these extreme values are balanced on the negative side, but there might be some useful information in this occurrence.

boxplot_with_outliers

Code

Here is Python code to fetch the source data from Amazon. Your results will vary as Amazon only provides a certain number of past values depending on when you run the code:

# import useful libraries
import datetime
import os

# user settings
days_to_go_back = 200
access_key = 'your access key'
secret_key = 'your secret key'

# compute the start time
today = datetime.date.today()
delta = datetime.timedelta(days=-days_to_go_back)
start_time = str(today + delta) + 'T00:00:00'

# produce the cmd
cmd = 'ec2-describe-spot-price-history -H --aws-access-key ' + access_key + ' --aws-secret-key ' + secret_key + ' --start-time ' + start_time

# regions
cmd_list = []
regions = ['us-east-1', 'us-west-1', 'us-west-2']
for region in regions:
    cmd_list.append(cmd + ' --region ' + region)

# execute commands
os.system('rm output/AWS_price_history_data.txt')
os.system('touch output/AWS_price_history_data.txt')
for c in cmd_list:
    os.system(c + ' >> output/AWS_price_history_data.txt')

Here is Python code to produce the table and data for the box plots:

# load useful libraries
import datetime
import pytz
from pytz import timezone
import math

# timezones
eastern = timezone('US/Eastern')
pacific = timezone('US/Pacific')

# load data
data = {}
f = open('output/AWS_price_history_data.txt')
for i, line in enumerate(f):
    line = line.strip()
    if line.find('AvailabilityZone') >= 0:  continue
    price = line.split('\t')[1]
    timestamp = line.split('\t')[2]
    instance_type = line.split('\t')[3]
    description = line.split('\t')[4]
    zone = line.split('\t')[5]

    if zone.find('east') >= 0:  continue  # consider only west coast for now since date ranges differ between east and west

    year = int(timestamp.split('T')[0].split('-')[0])
    month = int(timestamp.split('T')[0].split('-')[1])
    day = int(timestamp.split('T')[0].split('-')[2])
    hour = int(timestamp.split('T')[1].split(':')[0])
    minute = int(timestamp.split('T')[1].split(':')[1])
    ts = datetime.datetime(year, month, day, hour, minute, 0, tzinfo=pytz.utc)

    if zone.find('east') >= 0:
        ts = ts.astimezone(eastern)
    if zone.find('west') >= 0:
        ts = ts.astimezone(pacific)

    if not data.has_key(instance_type):
        data[instance_type] = {}
    if not data[instance_type].has_key(description):
        data[instance_type][description] = {}
    if not data[instance_type][description].has_key(zone):
        data[instance_type][description][zone] = {'price' : [], 'timestamp' : []}

    data[instance_type][description][zone]['price'].append(float(price))
    data[instance_type][description][zone]['timestamp'].append(ts)

f.close()

# sort the prices by date
for instance_type in data.keys():
    for description in data[instance_type].keys():
        for zone in data[instance_type][description].keys():

            price_list = data[instance_type][description][zone]['price']
            timestamp_list = data[instance_type][description][zone]['timestamp']

            indices = [i[0] for i in sorted(enumerate(timestamp_list), key=lambda x:x[1])]
            new_timestamp_list = []
            new_price_list = []
            for i in indices:
                new_timestamp_list.append(timestamp_list[i])
                new_price_list.append(price_list[i])

            data[instance_type][description][zone]['price'] = new_price_list
            data[instance_type][description][zone]['timestamp'] = new_timestamp_list

# get days of week
for instance_type in data.keys():
    for description in data[instance_type].keys():
        for zone in data[instance_type][description].keys():
            timestamp_list = data[instance_type][description][zone]['timestamp']
            day_of_week_list = []
            for t in timestamp_list:
                day_of_week_list.append(t.date().weekday())
            data[instance_type][description][zone]['day_of_week_list'] = day_of_week_list

# need to make sure there is an exact number of full weeks in the analysis
for instance_type in data.keys():
    for description in data[instance_type].keys():
        for zone in data[instance_type][description].keys():
            timestamp_list = data[instance_type][description][zone]['timestamp']
            days_diff = (timestamp_list[-1] - timestamp_list[0]).days
            weeks_diff_to_use = int(math.floor(float(days_diff) / 7.)) - 1  # subtracting one allows us to shift the days without error
            data[instance_type][description][zone]['weeks'] = weeks_diff_to_use

            shift_dt = datetime.timedelta(days = 3)  # we are shifting by three days to deal with possible artifacts at beginning of data set
            dt = datetime.timedelta(weeks = weeks_diff_to_use)
            cutoff_time = timestamp_list[0] + dt + shift_dt
            start_cutoff_time = timestamp_list[0] + shift_dt

            for i in range(0, len(timestamp_list)):
                if timestamp_list[i] > cutoff_time:
                    break

            for j in range(0, len(timestamp_list)):
                if timestamp_list[j] > start_cutoff_time:
                    break

            data[instance_type][description][zone]['price'] = data[instance_type][description][zone]['price'][j:i]
            data[instance_type][description][zone]['timestamp'] = data[instance_type][description][zone]['timestamp'][j:i]
            data[instance_type][description][zone]['day_of_week_list'] = data[instance_type][description][zone]['day_of_week_list'][j:i]

# count price rises
price_rises_by_weekday = {0 : 0, 1 : 0, 2 : 0, 3 : 0, 4 : 0, 5 : 0, 6 : 0}
price_falls_by_weekday = {0 : 0, 1 : 0, 2 : 0, 3 : 0, 4 : 0, 5 : 0, 6 : 0}
percent_change_by_weekday = {0 : [], 1 : [], 2 : [], 3 : [], 4 : [], 5 : [], 6 : []}
for instance_type in data.keys():
    for description in data[instance_type].keys():
        for zone in data[instance_type][description].keys():
            price_list = data[instance_type][description][zone]['price']
            day_of_week_list = data[instance_type][description][zone]['day_of_week_list']
            for i in range(1, len(price_list)):
                change_in_price = price_list[i] - price_list[i-1]

                percent_change = change_in_price / price_list[i-1]

                percent_change_by_weekday[day_of_week_list[i]].append(percent_change)

                if change_in_price > 0.00001:
                    price_rises_by_weekday[day_of_week_list[i]] += 1
                if change_in_price < -0.00001:
                    price_falls_by_weekday[day_of_week_list[i]] += 1

# output percent change
weekdays = {0 :'Monday', 1 : 'Tuesday', 2 : 'Wednesday', 3  : 'Thursday', 4 : 'Friday', 5 : 'Saturday', 6 : 'Sunday'}
f = open('output/percent_change.csv', 'w')
f.write('weekday,percent.change\n')
for i in sorted(percent_change_by_weekday.keys()):
    for value in percent_change_by_weekday[i]:
        f.write(weekdays[i] + ',' + str(value) + '\n')
f.close()

# output counts
for i in sorted(price_rises_by_weekday.keys()):
    print weekdays[i] + ':  ' + str(price_rises_by_weekday[i])
print
for i in sorted(price_falls_by_weekday.keys()):
    print weekdays[i] + ':  ' + str(price_falls_by_weekday[i])
print
for i in sorted(price_falls_by_weekday.keys()):
    print weekdays[i] + ':  ' + str(price_rises_by_weekday[i] - price_falls_by_weekday[i])

Here is the R code used to make the plots:

# load the data
data <- read.csv("percent_change.csv")
str(data)

# reorder the factors
data$y = factor(data$weekday, levels(data$weekday)[c(2, 6, 7, 5, 1, 3, 4)])

# boxplot with no outliers
boxplot(percent.change ~ y, data=data, outline=FALSE, main="Percent Change in Spot Instance Price for all Changes in 11-Week Period", ylab="Percent Change")

# boxplot with outliers
boxplot(percent.change ~ y, data=data, main="Percent Change in Spot Instance Price for all Changes in 11-Week Period", ylab="Percent Change")

Leave a Reply

Your email address will not be published.