While traversing the darker residuals of the blogosphere, Data Scientist happens upon a blogger in distress. Our hero quickly swallows a can of Red Bull-infused spinach and springs to action:
The Popeye Challenge
Dr. Mike Sutton of Dysology.org requested assistance demonstrating (or debunking) the proposed causal link between high spinach production and the popularity of Popeye cartoons during the 1930s in the United States.
In 1931, cartoon readers learned that Popeye gained his superpowers from eating spinach. Ever since then the spinach industry regarded that event, along with the subsequent Popeye movie of 1936, as drivers for the record spinach production in the 1930s. Dr. Sutton wants to check the validity of this belief. In the text that follows I tackle the challenge by testing the Granger causality between the Popeye-related events and the spinach production boost in the 1930s.
Dr. Sutton provided yearly production data for US spinach production:
The first difference is level-stationary according to the KPSS test, so we use it in place of the original time series:
I modeled the announcement that spinach gives Popeye his superpowers as a unit “pulse” input in 1931, setting the signal’s value for all other years to zero, and then tested the Granger causality between the pulse input signal and the first-differenced spinach production time-series:
The Granger test results show that the impulse at 1931 Granger-causes the first-differenced production results, but not vice versa. Here I set the number of lags considered to two since the differenced data only goes back to 1929.
As a control, I then tested the Granger causality of an impulse for every year in the data set to see if the result reported above is spurious. Combining the computed p-values in a table:
The 1931 impulse gives the second lowest p-value for the “pulse Granger-causes production change” test (middle column in the table). The lowest value (1932) might result from continued cartoon discussion of Popeye’s spinach-based power source or from increased public awareness of Popeye’s secret during that year. Either way, since 1931 and 1932 show the least p-values for this test (each by an order of magnitude under the third lowest p-value), we have gained support for the idea that Popeye influenced spinach production during the 1930s.
The analysis presented above strengthens the argument that Popeye’s popularity during the 1930s prompted the record spinach production of that decade. This analysis itself cannot completely prove the link (see the caveats below) but it certainly adds to the conversation.
Time-series analysis is not my area of expertise, which is one reason I’m considering attending graduate school in statistics. Any comments or corrections of the method are extremely welcome!
The first-differenced time-series is heteroscedastic; not sure if this matters.
More importantly, the Granger-causing pulse inputs of 1931 and 1932 do not definitively prove the Popeye/spinach production connection. Other events during these years could explain the observed Granger causality (since the pulse input represents an event, but not necessarily a Popeye-related event). For example, if the Great Depression changed eating habits in a way that favored spinach production, the result might be similar.