[SOLVED] Using Pandas to calculate December-January-February average

Using Pandas to calculate December-January-February average

I want to use Pandas to calculate the average of three values over three separate months in a dataset which lists the year, month and value of interest in three different columns. Normally, I would just use pandas.loc and isin() and panads.groupby functions to do this, but one of the seasons I want to analyze is December-January-February which will go across two separate years of data (i.e. Dec. 2000, Jan. 2001, Feb. 2001). Wondering if anyone has any suggestions on how to deal with this type of thing.

2000  1  5
2000  2  6
2000  3  8
2000  4  10
2000  5  9
2000  6  11
2000  7  13
2000  8  6
2000  9  8
2000  10 7
2000  11 7
2000  12 4
2001  1  3
2001  2  5

(i.e. In this situation, Jan. and Feb. 2000 would be ignored, averages would be MAM: 9, JJA: 10, SON: 7.33, DJF: 4)

Solution

You can define custom quarters and use groupby

# Test data
df = pd.DataFrame({'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2],
                   'year': [2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001],
                   'value': [5.0, 6.0, 8.0, 10.0, 9.0, 11.0, 13.0, 6.0, 8.0, 7, 7, 4, 3.0, 5.0]})

# Custom quarters definition
quarters = {1: 'DJF', 2: 'DJF', 3: 'MAM', 4: 'MAM', 5: 'MAM', 6: 'JJA', 7: 'JJA', 8: 'JJA', 9: 'SON', 10: 'SON', 11: 'SON', 
    12: 'DJF'}

df = df.set_index(['month'])

# can be grouped by year and quarters
df.groupby(['year',quarters])['value'].mean()

year     
2000  DJF     5.000000
      JJA    10.000000
      MAM     9.000000
      SON     7.333333
2001  DJF     4.000000

# or only by quarters according to the needs
df.groupby(quarters)['value'].mean()

DJF     4.600000
JJA    10.000000
MAM     9.000000
SON     7.333333