python datetime time-series average python-datetime

Average day of the year across December-January

Imagine a time series that peaks cyclically around end-December/early-January. The maxima of the series will then have dates like those showed in dt1 or dt2 below. I need to compute the average day of the year (DOY) of those maxima.

The problem is that a normal average would give very different results for dt1 (211) and dt2 (356). The cause is obviously that some elements of dt1 are in January, so the corresponding DOYs are very small and bring the resulting average down.

I originally worked this around using another month as the origin to calculate the DOYs, but this created problems with other time series peaked around the new origin.

Is there a general, all-year-round solution to this problem?

dt1 = [datetime(2000, 12, 15), datetime(2001, 12, 16), datetime(2002,12,20), datetime(2004,1,2) , datetime(2005,1,1)]
dt2 = [datetime(2000, 12, 15), datetime(2001, 12, 16), datetime(2002,12,20), datetime(2003,12,31), datetime(2004,12,30)]
doys1 = np.array([dt.timetuple().tm_yday for dt in dt1])
doys2 = np.array([dt.timetuple().tm_yday for dt in dt2])
print doys1.mean()
print doys2.mean()

Thanks!

Solution

After a bit of googling, I've found that what you're looking for is a way to calculate a mean of circular quantities. Some more googling revealed that this is implemented in the scipy library. I've found it thanks to this answer, however I've failed in my attempts to locate some proper documentation on the function itself and reverted to inspecting the source code in order to find out how it should be invoked.

>>> import numpy as np
>>> from scipy import stats
>>> from datetime import datetime
>>> 
>>> dt1 = [datetime(2000, 12, 15), datetime(2001, 12, 16), datetime(2002,12,20), datetime(2004,1,2) , datetime(2005,1,1)]
>>> dt2 = [datetime(2000, 12, 15), datetime(2001, 12, 16), datetime(2002,12,20), datetime(2003,12,31), datetime(2004,12,30)]
>>> doys1 = np.array([dt.timetuple().tm_yday for dt in dt1])
>>> doys2 = np.array([dt.timetuple().tm_yday for dt in dt2])
>>>
>>> stats.circmean(doys1, high=365)
357.39332727199502
>>> stats.circmean(doys2, high=365)
356.79551148217894

EDIT:

Documentation for the scipy.stats.circmean function is available here.