I have a pandas pivot table that shows payments made to different payees vs date, and I'm using a Grouper to group them into months, e.g.:
payee payee_1 payee_2
date
2019-11-30 amount amount
2019-12-31 amount amount
2020-01-31 amount amount
2020-02-29 amount amount
I can use the standard aggfunc="sum" to show the "amounts" as the sum for each month, but what I want is to show the sum for 12 months up to each month. Is a custom aggregation function a way to do that? If so, how do I get the values to supply to the function?
This is a snippet of raw the data frame:
date payee value
0 2020-10-13 payee_1 -43.74
1 2025-01-26 payee_1 -2.03
2 2022-04-08 payee_2 -9.54
3 2021-06-05 payee_1 -16.06
4 2021-12-08 payee_2 -20
And my pivot table definition:
pd.pivot_table(
df,
values="value",
index=pd.Grouper(
freq="ME",
key="date"
),
columns="payee",
aggfunc="sum",
)
You can't reuse the same input data points for different output rows in a pivot_table
. What you need is a rolling operation. Since you have a sum, you can perform it after the pivot_table
. As months are not valid (non-fixed) frequencies for rolling operations, you would first need to reindex
to fill the missing month-ends, then rolling.sum
:
idx = pd.date_range(*df['date'].agg(['min', 'max']), freq='ME')
# DatetimeIndex(['2020-10-31', '2020-11-30', '2020-12-31', ...,
# '2024-10-31', '2024-11-30', '2024-12-31'],
# dtype='datetime64[ns]', freq='ME')
out = (
pd.pivot_table(
df,
values='value',
index=pd.Grouper(freq='ME', key='date'),
columns='payee',
aggfunc='sum',
)
.reindex(idx.rename('date'))
.rolling(12, min_periods=1).sum()
)
Output:
payee payee_1 payee_2
date
2020-10-31 -43.74 NaN
2020-11-30 -43.74 NaN
2020-12-31 -43.74 NaN
2021-01-31 -43.74 NaN
2021-02-28 -43.74 NaN
2021-03-31 -43.74 NaN
2021-04-30 -43.74 NaN
2021-05-31 -43.74 NaN
2021-06-30 -59.80 NaN
2021-07-31 -59.80 NaN
2021-08-31 -59.80 NaN
2021-09-30 -59.80 NaN
2021-10-31 -16.06 NaN
2021-11-30 -16.06 NaN
2021-12-31 -16.06 -20.00
2022-01-31 -16.06 -20.00
2022-02-28 -16.06 -20.00
2022-03-31 -16.06 -20.00
2022-04-30 -16.06 -29.54
2022-05-31 -16.06 -29.54
2022-06-30 NaN -29.54
2022-07-31 NaN -29.54
2022-08-31 NaN -29.54
2022-09-30 NaN -29.54
2022-10-31 NaN -29.54
2022-11-30 NaN -29.54
2022-12-31 NaN -9.54
2023-01-31 NaN -9.54
2023-02-28 NaN -9.54
2023-03-31 NaN -9.54
2023-04-30 NaN NaN
2023-05-31 NaN NaN
2023-06-30 NaN NaN
2023-07-31 NaN NaN
2023-08-31 NaN NaN
2023-09-30 NaN NaN
2023-10-31 NaN NaN
2023-11-30 NaN NaN
2023-12-31 NaN NaN
2024-01-31 NaN NaN
2024-02-29 NaN NaN
2024-03-31 NaN NaN
2024-04-30 NaN NaN
2024-05-31 NaN NaN
2024-06-30 NaN NaN
2024-07-31 NaN NaN
2024-08-31 NaN NaN
2024-09-30 NaN NaN
2024-10-31 NaN NaN
2024-11-30 NaN NaN
2024-12-31 NaN NaN
You could also perform a groupby.resample
+ unstack
+ rolling.sum
:
out = (df.set_index('date').groupby('payee').resample('ME')['value'].sum()
.unstack('payee')
.rolling(12, min_periods=1).sum()
)
Output:
payee payee_1 payee_2
date
2020-10-31 -43.74 NaN
2020-11-30 -43.74 NaN
2020-12-31 -43.74 NaN
2021-01-31 -43.74 NaN
2021-02-28 -43.74 NaN
2021-03-31 -43.74 NaN
2021-04-30 -43.74 NaN
2021-05-31 -43.74 NaN
2021-06-30 -59.80 NaN
2021-07-31 -59.80 NaN
2021-08-31 -59.80 NaN
2021-09-30 -59.80 NaN
2021-10-31 -16.06 NaN
2021-11-30 -16.06 NaN
2021-12-31 -16.06 -20.00
2022-01-31 -16.06 -20.00
2022-02-28 -16.06 -20.00
2022-03-31 -16.06 -20.00
2022-04-30 -16.06 -29.54
2022-05-31 -16.06 -29.54
2022-06-30 0.00 -29.54
2022-07-31 0.00 -29.54
2022-08-31 0.00 -29.54
2022-09-30 0.00 -29.54
2022-10-31 0.00 -29.54
2022-11-30 0.00 -29.54
2022-12-31 0.00 -9.54
2023-01-31 0.00 -9.54
2023-02-28 0.00 -9.54
2023-03-31 0.00 -9.54
2023-04-30 0.00 NaN
2023-05-31 0.00 NaN
2023-06-30 0.00 NaN
2023-07-31 0.00 NaN
2023-08-31 0.00 NaN
2023-09-30 0.00 NaN
2023-10-31 0.00 NaN
2023-11-30 0.00 NaN
2023-12-31 0.00 NaN
2024-01-31 0.00 NaN
2024-02-29 0.00 NaN
2024-03-31 0.00 NaN
2024-04-30 0.00 NaN
2024-05-31 0.00 NaN
2024-06-30 0.00 NaN
2024-07-31 0.00 NaN
2024-08-31 0.00 NaN
2024-09-30 0.00 NaN
2024-10-31 0.00 NaN
2024-11-30 0.00 NaN
2024-12-31 0.00 NaN
2025-01-31 -2.03 NaN