*While traversing the darker residuals of the blogosphere, Data Scientist happens upon a blogger in distress. Our hero quickly swallows a can of Red Bull-infused spinach and springs to action:*

## The Popeye Challenge

Dr. Mike Sutton of Dysology.org requested assistance demonstrating (or debunking) the proposed causal link between high spinach production and the popularity of Popeye cartoons during the 1930s in the United States.

In 1931, cartoon readers learned that Popeye gained his superpowers from eating spinach. Ever since then the spinach industry regarded that event, along with the subsequent Popeye movie of 1936, as drivers for the record spinach production in the 1930s. Dr. Sutton wants to check the validity of this belief. In the text that follows I tackle the challenge by testing the *Granger causality* between the Popeye-related events and the spinach production boost in the 1930s.

Dr. Sutton provided yearly production data for US spinach production:

The first difference is level-stationary according to the KPSS test, so we use it in place of the original time series:

I modeled the announcement that spinach gives Popeye his superpowers as a unit “pulse” input in 1931, setting the signal’s value for all other years to zero, and then tested the Granger causality between the pulse input signal and the first-differenced spinach production time-series:

The Granger test results show that the impulse at 1931 Granger-causes the first-differenced production results, but not vice versa. Here I set the number of lags considered to two since the differenced data only goes back to 1929.

As a control, I then tested the Granger causality of an impulse for every year in the data set to see if the result reported above is spurious. Combining the computed p-values in a table:

The 1931 impulse gives the second lowest p-value for the “pulse Granger-causes production change” test (middle column in the table). The lowest value (1932) might result from continued cartoon discussion of Popeye’s spinach-based power source or from increased public awareness of Popeye’s secret during that year. Either way, since 1931 and 1932 show the least p-values for this test (each by an order of magnitude under the third lowest p-value), we have gained support for the idea that Popeye influenced spinach production during the 1930s.

## Conclusion

The analysis presented above strengthens the argument that Popeye’s popularity during the 1930s prompted the record spinach production of that decade. This analysis itself cannot completely prove the link (see the caveats below) but it certainly adds to the conversation.

## Caveats

Time-series analysis is not my area of expertise, which is one reason I’m considering attending graduate school in statistics. Any comments or corrections of the method are extremely welcome!

The first-differenced time-series is heteroscedastic; not sure if this matters.

More importantly, the Granger-causing pulse inputs of 1931 and 1932 do not definitively prove the Popeye/spinach production connection. Other events during these years could explain the observed Granger causality (since the pulse input represents an event, but not necessarily a Popeye-related event). For example, if the Great Depression changed eating habits in a way that favored spinach production, the result might be similar.

Very many thanks indeed.

So from what you have been able to achieve here I believe we can safely say that the apparent pattern is at least not random. Is that a correct synopsis of your findings?

I wrote a follow-up blog here to highlight your analysis and tentative conclusions: http://www.bestthinking.com/thinkers/science/social_sciences/sociology/mike-sutton?tab=blog

I am an undergraduate student doing my dissertation. I was shown this by Mike Sutton and wondered if you could apply the same kind of test the significance between two different trends. I am looking at the potential effect that media devices have on crime rates and have found that, as crime falls, internet use (as well as other devices) rise. I was wondering if you could apply this kind of test to see whether the relationship is significant. If not, it will obviously serve to refute the hypothesis. But if the two trends are significant (which they would appear to be visually), it will be a more effective test to prove that the hypothesis is plausible.

For a brief introduction to what I’m looking at, Mike Sutton has posted a basic blog about it on his Dysology website: http://dysology.org/page5.html

Would you be interested in helping me to find this out? If you are ,please get in touch with me via the email address provided with details of what you’d need to do a significance check and I’ll get them to you