I want to calculate the centered rolling average in Pandas, where the center value is excluded.
The code below works perfectly, but it includes the center value:
df.Value.rolling(window=11, center=True, min_periods=1).mean()
The window=11
and min_periods=1
are important, since I want to calculate the average for every value in df.
For example,
Here is a table with the expected results:
Values | Expected |
---|---|
13313 | 27180.6 |
12792 | 28897 |
20254 | 28770.14286 |
34915 | 27468.5 |
31410 | 29037 |
36532 | 30028.5 |
36958 | 34071.1 |
35471 | 36600.66667 |
33018 | 38950.625 |
38080 | 38804 |
44074 | 39037.33333 |
54165 | 37520.2 |
And calculation method from Excel:
You can modify my first approach of the initial answer to have a dynamic division:
win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)
df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)
Output:
Values Expected out
0 13313 27180.60000 27180.600000
1 12792 28897.00000 28897.000000
2 20254 28770.14286 28770.142857
3 34915 27468.50000 27468.500000
4 31410 29037.00000 29037.000000
5 36532 30028.50000 30028.500000
6 36958 34071.10000 34071.100000
7 35471 36600.66667 36600.666667
8 33018 38950.62500 38950.625000
9 38080 38804.00000 38804.000000
10 44074 39037.33333 39037.333333
11 54165 37520.20000 37520.200000
You have different ways, whether or not the operation is really a mean:
np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})
win = 3
df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()
df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.sum().sub(df['Value']).div(win-1)
)
half = win//2
mask = [False]*half+[True]+[False]*half
df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.apply(lambda s: s.mask(mask[:len(s)]).mean())
)
Output:
Value mean mean_without_center mean_without_center_apply
0 5 2.500000 0.0 5.0
1 0 2.666667 4.0 4.0
2 3 2.000000 1.5 1.5
3 3 4.333333 5.0 5.0
4 7 6.333333 6.0 6.0
5 9 6.333333 5.0 5.0
6 3 5.666667 7.0 7.0
7 5 3.333333 2.5 2.5
8 2 3.666667 4.5 4.5
9 4 3.000000 1.0 2.0
Or with numpy's sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
win = 3
half = win//2
df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
constant_values=np.nan),
win)
[:, np.r_[:half, half+1:win]],
axis=1
)
Output:
Value swv_mean
0 5 5.0
1 0 0.0
2 3 4.0
3 3 1.5
4 7 5.0
5 9 6.0
6 3 5.0
7 5 7.0
8 2 2.5
9 4 4.5