I want to use Pandas to calculate the average of three values over three separate months in a dataset which lists the year, month and value of interest in three different columns. Normally, I would just use pandas.loc and isin() and panads.groupby functions to do this, but one of the seasons I want to analyze is December-January-February which will go across two separate years of data (i.e. Dec. 2000, Jan. 2001, Feb. 2001). Wondering if anyone has any suggestions on how to deal with this type of thing.
2000 1 5
2000 2 6
2000 3 8
2000 4 10
2000 5 9
2000 6 11
2000 7 13
2000 8 6
2000 9 8
2000 10 7
2000 11 7
2000 12 4
2001 1 3
2001 2 5
(i.e. In this situation, Jan. and Feb. 2000 would be ignored, averages would be MAM: 9, JJA: 10, SON: 7.33, DJF: 4)
You can define custom quarters and use groupby
# Test data
df = pd.DataFrame({'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2],
'year': [2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001],
'value': [5.0, 6.0, 8.0, 10.0, 9.0, 11.0, 13.0, 6.0, 8.0, 7, 7, 4, 3.0, 5.0]})
# Custom quarters definition
quarters = {1: 'DJF', 2: 'DJF', 3: 'MAM', 4: 'MAM', 5: 'MAM', 6: 'JJA', 7: 'JJA', 8: 'JJA', 9: 'SON', 10: 'SON', 11: 'SON',
12: 'DJF'}
df = df.set_index(['month'])
# can be grouped by year and quarters
df.groupby(['year',quarters])['value'].mean()
year
2000 DJF 5.000000
JJA 10.000000
MAM 9.000000
SON 7.333333
2001 DJF 4.000000
# or only by quarters according to the needs
df.groupby(quarters)['value'].mean()
DJF 4.600000
JJA 10.000000
MAM 9.000000
SON 7.333333