Emily Williams and Stacie Dutton, SETEC Astronomy, San Francisco, California, USA
Despite abundant scientific evidence refuting the connection, the “lunar effect” persists as a common explanation for temporal variation in human behavior. Adherents of this idea implicate the lunar cycle in outcomes as diverse as lost elections and hemophilic episodes. We find the myth woven deeply into our cultural fabric, from obvious (and harmless) werewolf lore to the more subtle (and consequential) fear of menstruation.
Here we investigate the belief that crime in the United States increases during full moon periods, reporting our findings from both a frequency spectrum analysis and a regression analysis that examined over a decade of high-resolution Federal Bureau of Investigation (FBI) crime incident records, which we linked by incident date to the lunar cycle. Our findings jive with scientific consensus: no discernable correlation between lunar cycle and crime rate appears in our source data.
Competing Interests Statement
Stacie hunts vampires. Emily is a smooth criminal.
Data Source and Preparation
The FBI’s National Incident-Based Reporting System (NIBRS) tracks crime on a per-incident basis and makes the data easily available to researchers. Each of the system’s recorded crime incidents is unique, anonymised, geo-located, and time-stamped; making NIBRS an excellent data source from which to query the relationship between lunar cycle and criminal activity. Local law enforcement agencies report into NIBRS through a standardized protocol. While not all American law enforcement agencies participate in the program, enough do to create a reliable description of American crime.
We downloaded NIBRS data for 1996 through 2008, and then aggregated all the recorded incidents by incident date to produce a time-series of nationwide daily incident counts. Using the reporting agency information provided for each incident, we then extracted a second time-series containing incident counts for South Carolina.
The Midnight Coding Artifact
NIBRS timestamps its incidents to the hour. However, many law enforcement agencies, when unsure of the time a crime occurred, will enter midnight as the crime’s hour instead of the “NA” code mandated by the data entry protocol. The result is a spike in recorded incidents every midnight. While we would rather have worked with hourly rather than daily time-series, this artifact distorted the hourly incident counts too much for our comfort, so we chose to use daily incident counts.
Frequency Spectrum Analysis
After preparing the nationwide daily incident count time-series from the NIBRS data, we removed its trend (resulting from new agencies joining the program over the study period) and then computed the fast Fourier transform:
To ground-truth the resulting frequency-domain representation, we verified that the expected annual and weekly frequencies show strong peaks in the amplitude spectrum:
The amplitude spectrum shows a peak at 30.44 days (the average civil month length), but not at 29.53 days (the lunar month):
Given the absence of a 29.53-day peak in the spectrum, we concluded that no discernible lunar cycle resides in the time series, making correlation between full moon appearance and criminal activity unlikely.
Two concerns motivate further exploration however: First, we considered it plausible that the civil month frequency component masked the lunar cycle component due to the fact that civil months vary between 28 and 31 days in length. Second: we suspected that the once-per-day sampling rate of the incident count data limits our ability to resolve non-integer cycle periods. We therefore plan to repeat the frequency spectrum analysis with hourly incident counts once NIBRS becomes free of the midnight coding artifact for at least a decade. Until then, statistical modeling (below) provides a way to separately examine the correlations of civil month and lunar cycle to criminal activity in a manner robust to the resolution limits imposed by the daily sampling rate.
Here we construct a regression model from data that explicitly states the presence or absence of the full moon for each daily incident count, along with other plausible crime-related correlates (e.g., day of the week). While the frequency spectrum analysis presented above considered all incidents recorded in NIBRS, this analysis focuses particularly on South Carolina’s incidents.
We selected South Carolina for several reasons: First, the state piloted NIBRS in the early 1990s and therefore holds the cleanest data (due presumably to agency reporting experience). Second, limiting the regression dataset to only one state reduced this study’s demand on our computational resources. (The time-series analysis reported above required far less computing power to execute). Finally, South Carolina’s relatively small size allowed us to assume that sunrise, sunset, moonrise, and moonset occur at the same time for each of the state’s counties without the risk of inducing too much noise into the model. (Compare to the latitude variation across California where such an assumption would prove far less appropriate). Making this assumption greatly simplified ephemeris calculations.
We started by creating a stacked dataset containing variables which we suspected may explain daily crime incident counts, partially displayed in the image below. Each row specifies one day’s conditions.
Describing these variables in more detail:
We modeled “incident count” as a function of the other variables (with the exception of the row index “date”, which we excluded).
To capture trend information we included one-day and seven-day incident count lags as variables “incident count (lag 1)” and “incident count (lag 7)”, respectively. ”
Daylight duration” stores the number of daylight hours for each day considered, thereby providing the primary seasonal indicator in the dataset. Dummy variables “Christmas Eve or Day”, “Cinco de Mayo”, “Halloween”, “New Year’s Eve or Day”, and “St. Patrick’s Day” account for celebration-related seasonality.
Dummy variables also record higher-frequency seasonal information such as which day of the week a row of data comes from, and whether a particular day is the first or last day of its month. In our exploratory analysis, we noticed that the starts and ends of months correlated to single-day jumps and dips in incident counts. We suspect this results from enforcement quota artifacts, reporting artifacts, or payday-related crime.
In the time-series plot of South Carolina’s daily incident counts (below), we observed a clear change in the shape of the curve during years 2001 and 2002. We therefore added dummy variables “X2001” and “X2002” to account for this variation, which we suspect represents a reporting method artifact.
Finally, astronomical computations provide the daily lunar state central to this investigation. The variables “hours the moon is in the sky” and “hours the moon is in the night sky” indicate how much of the day (24-hour period) or night (period before sunrise and after sunset) the moon’s elevation from the point-of-view of a South Carolina observer exceeded zero degrees. The dummy variable “full moon” indicates whether the visible lunar surface at midnight for a given day exceeds 97% of the whole lunar surface. We also incorporated possible interaction effects into the model by adding variables for “full moon multiplied by hours the moon is in the sky” and “full moon multiplied by hours the moon is in the night sky”.
Linear regression on the stacked dataset containing these variables produced the following model. Gamma and Poisson regressions produced qualitatively similar estimates of each variable’s explanatory contribution:
Due to the high p-values computed for each lunar variable, this result increased our suspicion that the lunar cycle fails to correlate with daily crime incident counts. However, to rule out the possibility that multicolinearity obscured our view of actual contribution by one or more of these astronomical variables, we then conducted stepwise model selection to see if the search algorithm would retain any of the lunar variables. It did not:
The model containing the lunar variables and the model absent them offer nearly identical explanation for the variation in daily incident counts (as demonstrated by their respective R-squared statistics). We therefore conclude that our data offers no evidence that the number of daily crime incidents correlates with lunar phase.
The frequency spectrum and regression analyses presented above provide no evidence that crime increases during the full moon. We will therefore stick with scientific consensus on this one.
We are physical scientists, not criminologists, and therefore welcome feedback on the methods presented.