I have a dataframe with a timestamp column/index and I am calculating the moving average over the last 5 seconds.
df['Price'].rolling(window=time_diff, min_periods=1, closed='both').mean()
So far so good.
Now I also need to calculate the moving average for the next 5 seconds. However, my timestamps are not evenly spaced such that I can't just shift the dataframe to recalculate the second average.
The data looks like this:
Timestamp Price Start Stop
0, 2019-01-02 08:30:00, 56.565, 0, 5
1, 2019-01-02 08:30:01, 56.565, 1, 6
2, 2019-01-02 08:30:02, 56.565, 2, 6
3, 2019-01-02 08:30:03, 56.540, 3, 7
4, 2019-01-02 08:30:04, 56.545, 4, 7
5, 2019-01-02 08:30:05, 56.545, 5, 8
6, 2019-01-02 08:30:07, 56.540, 6, 10
7, 2019-01-02 08:30:09, 56.550, 7, 12
8, 2019-01-02 08:30:10, 56.545, 8, 12
9, 2019-01-02 08:30:11, 56.550, 9, 12
10,2019-01-02 08:30:12, 56.570, 10, 13
For example: At index 5 the average over the last 5 seconds would be 56.5541 And I need to compute the average over the next 5 seconds excluding the current time, i.e. index 6,7,8 (56.545).
Using df.index.get_indexer() I am able to extract the index of the last row to be included in the average,
df['stop'] = df.index.get_indexer(df['Date-Time-Exch'] + time_diff, method='bfill')
I was hoping that I could somehow use the values in 'start' and 'stop' for slicing with iloc like
df.iloc[df['start']:df['stop'], 1].mean()
but this does not work.
Alternatively, I came up with this:
def get_indexes(time_index, offset):
start, end = df.index.get_indexer([time_index, time_index + offset], method='bfill')
avg = df.iloc[start + 1:end + 1, 1].mean()
return avg
which used with .apply() is sadly far too slow to be useful.
Hopefully you can help me because I have been stuck on this problem for some time now.
You can calculate rolling forward by reverting your dataframe, then calculating rolling average, then reverting again. Also you need to specify closed='left'
(see documentation) when doing this, since you don't want to include current value in your forward average:
rolling = df.Price.rolling(window='5s', closed='both').mean().rename('Mean past')
rolling_forward = df[::-1].Price.rolling(window='5s', closed='left').mean()[::-1].rename('Mean future')
df[['Price']].merge(rolling, on='Timestamp').merge(rolling_forward, on='Timestamp')
Will output
Price Mean past Mean future
Timestamp
2019-01-02 08:30:00 56.565 56.565000 56.552000
2019-01-02 08:30:01 56.565 56.565000 56.548750
2019-01-02 08:30:02 56.565 56.565000 56.542500
2019-01-02 08:30:03 56.540 56.558750 56.543333
2019-01-02 08:30:04 56.545 56.556000 56.545000
2019-01-02 08:30:05 56.545 56.554167 56.545000
2019-01-02 08:30:07 56.540 56.547000 56.553750
2019-01-02 08:30:09 56.550 56.545000 56.555000
2019-01-02 08:30:10 56.545 56.545000 56.560000
2019-01-02 08:30:11 56.550 56.546250 56.570000
2019-01-02 08:30:12 56.570 56.551000 NaN