pythonpandas

pandas pivot_table: can aggfunc work over a different grouping period from the table?


I have a pandas pivot table that shows payments made to different payees vs date, and I'm using a Grouper to group them into months, e.g.:

payee         payee_1  payee_2
date
2019-11-30     amount   amount
2019-12-31     amount   amount
2020-01-31     amount   amount
2020-02-29     amount   amount

I can use the standard aggfunc="sum" to show the "amounts" as the sum for each month, but what I want is to show the sum for 12 months up to each month. Is a custom aggregation function a way to do that? If so, how do I get the values to supply to the function?

This is a snippet of raw the data frame:

           date        payee   value
0    2020-10-13  payee_1      -43.74
1    2025-01-26  payee_1       -2.03
2    2022-04-08  payee_2       -9.54
3    2021-06-05  payee_1      -16.06
4    2021-12-08  payee_2         -20

And my pivot table definition:

pd.pivot_table(
    df,
    values="value",
    index=pd.Grouper(
        freq="ME",
        key="date"
    ),
    columns="payee",
    aggfunc="sum",
)

Solution

  • You can't reuse the same input data points for different output rows in a pivot_table. What you need is a rolling operation. Since you have a sum, you can perform it after the pivot_table. As months are not valid (non-fixed) frequencies for rolling operations, you would first need to reindex to fill the missing month-ends, then rolling.sum:

    idx = pd.date_range(*df['date'].agg(['min', 'max']), freq='ME')
    # DatetimeIndex(['2020-10-31', '2020-11-30', '2020-12-31', ...,
    #                '2024-10-31', '2024-11-30', '2024-12-31'],
    #              dtype='datetime64[ns]', freq='ME')
    
    out = (
        pd.pivot_table(
            df,
            values='value',
            index=pd.Grouper(freq='ME', key='date'),
            columns='payee',
            aggfunc='sum',
        )
        .reindex(idx.rename('date'))
        .rolling(12, min_periods=1).sum()
    )
    

    Output:

    payee       payee_1  payee_2
    date                        
    2020-10-31   -43.74      NaN
    2020-11-30   -43.74      NaN
    2020-12-31   -43.74      NaN
    2021-01-31   -43.74      NaN
    2021-02-28   -43.74      NaN
    2021-03-31   -43.74      NaN
    2021-04-30   -43.74      NaN
    2021-05-31   -43.74      NaN
    2021-06-30   -59.80      NaN
    2021-07-31   -59.80      NaN
    2021-08-31   -59.80      NaN
    2021-09-30   -59.80      NaN
    2021-10-31   -16.06      NaN
    2021-11-30   -16.06      NaN
    2021-12-31   -16.06   -20.00
    2022-01-31   -16.06   -20.00
    2022-02-28   -16.06   -20.00
    2022-03-31   -16.06   -20.00
    2022-04-30   -16.06   -29.54
    2022-05-31   -16.06   -29.54
    2022-06-30      NaN   -29.54
    2022-07-31      NaN   -29.54
    2022-08-31      NaN   -29.54
    2022-09-30      NaN   -29.54
    2022-10-31      NaN   -29.54
    2022-11-30      NaN   -29.54
    2022-12-31      NaN    -9.54
    2023-01-31      NaN    -9.54
    2023-02-28      NaN    -9.54
    2023-03-31      NaN    -9.54
    2023-04-30      NaN      NaN
    2023-05-31      NaN      NaN
    2023-06-30      NaN      NaN
    2023-07-31      NaN      NaN
    2023-08-31      NaN      NaN
    2023-09-30      NaN      NaN
    2023-10-31      NaN      NaN
    2023-11-30      NaN      NaN
    2023-12-31      NaN      NaN
    2024-01-31      NaN      NaN
    2024-02-29      NaN      NaN
    2024-03-31      NaN      NaN
    2024-04-30      NaN      NaN
    2024-05-31      NaN      NaN
    2024-06-30      NaN      NaN
    2024-07-31      NaN      NaN
    2024-08-31      NaN      NaN
    2024-09-30      NaN      NaN
    2024-10-31      NaN      NaN
    2024-11-30      NaN      NaN
    2024-12-31      NaN      NaN
    

    You could also perform a groupby.resample + unstack + rolling.sum:

    out = (df.set_index('date').groupby('payee').resample('ME')['value'].sum()
             .unstack('payee')
             .rolling(12, min_periods=1).sum()
          )
    

    Output:

    payee       payee_1  payee_2
    date                        
    2020-10-31   -43.74      NaN
    2020-11-30   -43.74      NaN
    2020-12-31   -43.74      NaN
    2021-01-31   -43.74      NaN
    2021-02-28   -43.74      NaN
    2021-03-31   -43.74      NaN
    2021-04-30   -43.74      NaN
    2021-05-31   -43.74      NaN
    2021-06-30   -59.80      NaN
    2021-07-31   -59.80      NaN
    2021-08-31   -59.80      NaN
    2021-09-30   -59.80      NaN
    2021-10-31   -16.06      NaN
    2021-11-30   -16.06      NaN
    2021-12-31   -16.06   -20.00
    2022-01-31   -16.06   -20.00
    2022-02-28   -16.06   -20.00
    2022-03-31   -16.06   -20.00
    2022-04-30   -16.06   -29.54
    2022-05-31   -16.06   -29.54
    2022-06-30     0.00   -29.54
    2022-07-31     0.00   -29.54
    2022-08-31     0.00   -29.54
    2022-09-30     0.00   -29.54
    2022-10-31     0.00   -29.54
    2022-11-30     0.00   -29.54
    2022-12-31     0.00    -9.54
    2023-01-31     0.00    -9.54
    2023-02-28     0.00    -9.54
    2023-03-31     0.00    -9.54
    2023-04-30     0.00      NaN
    2023-05-31     0.00      NaN
    2023-06-30     0.00      NaN
    2023-07-31     0.00      NaN
    2023-08-31     0.00      NaN
    2023-09-30     0.00      NaN
    2023-10-31     0.00      NaN
    2023-11-30     0.00      NaN
    2023-12-31     0.00      NaN
    2024-01-31     0.00      NaN
    2024-02-29     0.00      NaN
    2024-03-31     0.00      NaN
    2024-04-30     0.00      NaN
    2024-05-31     0.00      NaN
    2024-06-30     0.00      NaN
    2024-07-31     0.00      NaN
    2024-08-31     0.00      NaN
    2024-09-30     0.00      NaN
    2024-10-31     0.00      NaN
    2024-11-30     0.00      NaN
    2024-12-31     0.00      NaN
    2025-01-31    -2.03      NaN