[SOLVED] Rolling average for forward values keeps breaking

Rolling average for forward values keeps breaking

dataset sample: https://pastebin.com/DeLhAsKL

if df['Actual_v_Predicted'] == "Predicted", it is meant to do a rolling mean average of the last 7 values and forward fill the ratio column, however, this keeps breaking after reaching 01/07/2025.

the way I've done this atm is

df['ratio'] = np.where(df['Actual_v_Predicted'] == "Predicted", np.nan,df['ratio'] )
df['ratio'] = df['ratio'].fillna(df['ratio'].rolling(window=7, min_periods=1).mean())

I'm not sure why its breaking, but any help would be appreciated.

Solution

There are several things you might need to tackle:

Sorting date values

df = df.sort_values(by='Date')

Rolling Window Forward Fill

The rolling mean is computed on previous values within a defined window. If all previous values are NaN, the rolling mean will also produce NaN. You may need to forward-fill after applying the rolling mean.

So, instead of directly computing the rolling mean and assigning it back, you can try forward-filling explicitly for missing values after rolling mean:

df['ratio'] = np.where(df['Actual_v_Predicted'] == "Predicted", np.nan, df['ratio'])

# Compute rolling mean only on 'Actual' rows (already handled by np.where)
rolling_mean = df['ratio'].rolling(window=7, min_periods=1).mean()

# Fill in 'Predicted' rows using rolling mean and then forward fill if needed
df['ratio'] = df['ratio'].fillna(rolling_mean).ffill()

Might need to date format

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')

Check the edge cases

After 01/07/2025, if there are no "Actual" values to base the rolling mean on, you'll get NaN. In such cases, you may need to handle long gaps or future dates more carefully. One option is to always fill missing values forward:

df['ratio'] = df['ratio'].ffill()

I have tried the following code with your dataset and it executed with no issues:

import pandas as pd
import numpy as np

df = pd.read_csv('dataset.txt', sep='\t')
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')

df = df.sort_values(by='Date')
df['ratio'] = np.where(df['Actual_v_Predicted'] == "Predicted", np.nan, df['ratio'])

rolling_mean = df['ratio'].rolling(window=7, min_periods=1).mean()
df['ratio'] = df['ratio'].fillna(rolling_mean).ffill()

df.to_csv('updated_file.txt', sep='\t', index=False)

print(df)