pythonpandastime-series

Using Pandas to calculate December-January-February average


I want to use Pandas to calculate the average of three values over three separate months in a dataset which lists the year, month and value of interest in three different columns. Normally, I would just use pandas.loc and isin() and panads.groupby functions to do this, but one of the seasons I want to analyze is December-January-February which will go across two separate years of data (i.e. Dec. 2000, Jan. 2001, Feb. 2001). Wondering if anyone has any suggestions on how to deal with this type of thing.

2000  1  5
2000  2  6
2000  3  8
2000  4  10
2000  5  9
2000  6  11
2000  7  13
2000  8  6
2000  9  8
2000  10 7
2000  11 7
2000  12 4
2001  1  3
2001  2  5

(i.e. In this situation, Jan. and Feb. 2000 would be ignored, averages would be MAM: 9, JJA: 10, SON: 7.33, DJF: 4)


Solution

  • You can define custom quarters and use groupby

    # Test data
    df = pd.DataFrame({'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2],
                       'year': [2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001],
                       'value': [5.0, 6.0, 8.0, 10.0, 9.0, 11.0, 13.0, 6.0, 8.0, 7, 7, 4, 3.0, 5.0]})
    
    # Custom quarters definition
    quarters = {1: 'DJF', 2: 'DJF', 3: 'MAM', 4: 'MAM', 5: 'MAM', 6: 'JJA', 7: 'JJA', 8: 'JJA', 9: 'SON', 10: 'SON', 11: 'SON', 
        12: 'DJF'}
    
    df = df.set_index(['month'])
    
    # can be grouped by year and quarters
    df.groupby(['year',quarters])['value'].mean()
    
    year     
    2000  DJF     5.000000
          JJA    10.000000
          MAM     9.000000
          SON     7.333333
    2001  DJF     4.000000
    
    # or only by quarters according to the needs
    df.groupby(quarters)['value'].mean()
    
    DJF     4.600000
    JJA    10.000000
    MAM     9.000000
    SON     7.333333