I'm trying to calculate a moving sum in Python based off days, however I don't have every date listed in my data frame. In the example below the cum sum should account for the fact that the 2021-01-03 date is missing from the data frame
import pandas as pd
df = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
'value':[1, 1, 1, 1]})
df_correct = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
'value':[1, 1, 1, 1],
'2_day_cum_sum':[1, 2, 1, 2]})
print(df_correct)
I know how to calculate a rolling sum using the rolling function from pandas, but I don't know how to account for the missing date. My current vision is to create a data frame that has every combination of date and other variables (not in this basic example), and then merge the active data set to that, so I can effectively use the rolling function from pandas. However, I feel like there is a better approach that exists. Any thoughts?
I know you mentioned about the rolling function, not sure if you know that you could specify a date window in the rolling function.. Let's try:
# ensure date is in right format
df['date'] = pd.to_datetime(df.date)
# set date as index
df = df.set_index('date')
# rolling sum with 2 days as window
df['2_day_cum_sum'] = df['value'].rolling('2d').sum()
# reset index
df = df.reset_index()
I'm able to get your expected output and I think this suffices for your use, unless you'd like to input a different value for when the date is missing - e.g. set value as 5 if date is missing. If that is so then we'll need a different solution