I've had a look and can't seem to find a solution to this issue. I'm wanting to calculate the rolling sum of the previous 30 days' worth of data at each date in the dataframe - by subgroup - for a set of data that isn't daily - it's spaced fairly irregularly. I've been attempting to use ChatGPT which is getting in a twist over it.
Initially the suggestion was that I'd not converted the Date column to datetime format to allow for the rolling calculation, but now from the code below:
import pandas as pd
from datetime import datetime, timedelta
import numpy as np
# Create a dataset with irregularly spaced dates spanning two years
np.random.seed(42)
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='10D') # Every 10 days
data = {'Date': np.random.choice(date_rng, size=30),
'Group': np.random.choice(['A', 'B'], size=30),
'Value': np.random.randint(1, 30, size=30)}
df = pd.DataFrame(data)
# Sort DataFrame by date
df.sort_values(by='Date', inplace=True)
df['Date'] = pd.to_datetime(df['Date'])
# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = df.groupby('Group')['Value'].transform(lambda x: x.rolling(window='30D', min_periods=1).sum())
I'm getting an error of:
ValueError: window must be an integer 0 or greater
I've found conflicting comments online as to whether the format '30D' works in rolling windows but I'm none the wiser as to a solution to this. Any help appreciated.
Running in VSCode in Python 3.11.8.
The issue if that you need to specify which column to use as Date but don't have access to the Date with groupby.transform
.
You could use groupby.apply
:
# Calculate cumulative sum by group within the previous 30 days from each day
df['RollingSum_Last30Days'] = (df.groupby('Group', group_keys=False)
.apply(lambda x: x.rolling(window='30D', on='Date', min_periods=1)['Value'].sum())
)
Output:
Date Group Value RollingSum_Last30Days
9 2022-01-11 A 22 22.0
12 2022-01-11 A 22 44.0
6 2022-01-21 A 4 48.0
1 2022-05-21 B 14 14.0
23 2022-05-21 A 8 8.0
15 2022-07-20 B 26 26.0
4 2022-07-20 A 18 18.0
18 2022-07-30 B 10 36.0
7 2022-07-30 A 2 20.0
5 2022-08-19 A 8 10.0
10 2022-10-18 B 10 10.0
16 2022-11-17 B 12 12.0
11 2023-01-06 B 4 4.0
21 2023-02-15 B 16 16.0
26 2023-04-06 B 28 28.0
19 2023-04-26 A 4 4.0
28 2023-05-16 B 8 8.0
0 2023-05-26 B 3 11.0
8 2023-06-05 A 6 6.0
29 2023-06-25 A 21 27.0
17 2023-07-25 A 2 2.0
20 2023-08-04 B 14 14.0
22 2023-08-14 B 15 29.0
14 2023-08-14 B 18 47.0
3 2023-08-24 A 4 4.0
24 2023-09-03 B 14 47.0
25 2023-09-03 A 23 27.0
27 2023-09-03 A 25 52.0
13 2023-09-23 B 29 43.0
2 2023-12-12 A 17 17.0