pythonrolling-sum

How to calculate a moving sum in python of dates if every date doesn't exist in the data?


I'm trying to calculate a moving sum in Python based off days, however I don't have every date listed in my data frame. In the example below the cum sum should account for the fact that the 2021-01-03 date is missing from the data frame

import pandas as pd

df = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
                   'value':[1, 1, 1, 1]})

df_correct = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
                           'value':[1, 1, 1, 1],
                           '2_day_cum_sum':[1, 2, 1, 2]})

print(df_correct)

I know how to calculate a rolling sum using the rolling function from pandas, but I don't know how to account for the missing date. My current vision is to create a data frame that has every combination of date and other variables (not in this basic example), and then merge the active data set to that, so I can effectively use the rolling function from pandas. However, I feel like there is a better approach that exists. Any thoughts?


Solution

  • I know you mentioned about the rolling function, not sure if you know that you could specify a date window in the rolling function.. Let's try:

    # ensure date is in right format
    df['date'] = pd.to_datetime(df.date)
    
    # set date as index
    df = df.set_index('date')
    
    # rolling sum with 2 days as window
    df['2_day_cum_sum'] = df['value'].rolling('2d').sum()
    
    # reset index
    df = df.reset_index()
    

    I'm able to get your expected output and I think this suffices for your use, unless you'd like to input a different value for when the date is missing - e.g. set value as 5 if date is missing. If that is so then we'll need a different solution