To inform the construction of a machine learning-based price prediction algorithm, we want to understand how many lags prove statistically significant with regard to autocorrelation in the seven major FOREX pairs. So we first choose 10,000 random time points between January 1, 2000 and January 1, 2017 for each of the seven pairs. Then we download historical price data for each time point through 500 points forward of that time point. This looks like:
For one particular point and currency pair, we review the autocorrelation function plot and the partial autocorrelation plot, just to see where we are:
The first plot retains statistical significance through about lag 45, indicating that useful predictive information might reside in a model that employs that many lags in the feature space.
We automate this eyeball analysis, calculating for each sample the lag number at which the correlation ceases to hold statistical significance. Plotting the results to compare instruments:
A Kruskal-Wallis test returns a p-value < 0.0001, strongly suggesting a difference in the medians. We arbitrarily select the 75% percentile of the whole sample space (lags=26) for continuing model development.