pythonpandaspandas-rolling

Pandas: Get average of a dynamic number of rows


I have a dataframe with a timestamp column/index and I am calculating the moving average over the last 5 seconds. df['Price'].rolling(window=time_diff, min_periods=1, closed='both').mean() So far so good. Now I also need to calculate the moving average for the next 5 seconds. However, my timestamps are not evenly spaced such that I can't just shift the dataframe to recalculate the second average.

The data looks like this:

   Timestamp             Price    Start Stop 
0, 2019-01-02 08:30:00,  56.565,  0,    5
1, 2019-01-02 08:30:01,  56.565,  1,    6
2, 2019-01-02 08:30:02,  56.565,  2,    6
3, 2019-01-02 08:30:03,  56.540,  3,    7
4, 2019-01-02 08:30:04,  56.545,  4,    7
5, 2019-01-02 08:30:05,  56.545,  5,    8
6, 2019-01-02 08:30:07,  56.540,  6,    10
7, 2019-01-02 08:30:09,  56.550,  7,    12
8, 2019-01-02 08:30:10,  56.545,  8,    12
9, 2019-01-02 08:30:11,  56.550,  9,    12
10,2019-01-02 08:30:12,  56.570,  10,   13

For example: At index 5 the average over the last 5 seconds would be 56.5541 And I need to compute the average over the next 5 seconds excluding the current time, i.e. index 6,7,8 (56.545).

Using df.index.get_indexer() I am able to extract the index of the last row to be included in the average,

df['stop'] = df.index.get_indexer(df['Date-Time-Exch'] + time_diff, method='bfill')

I was hoping that I could somehow use the values in 'start' and 'stop' for slicing with iloc like

df.iloc[df['start']:df['stop'], 1].mean()

but this does not work.

Alternatively, I came up with this:

def get_indexes(time_index, offset):
    start, end = df.index.get_indexer([time_index, time_index + offset], method='bfill')
    avg = df.iloc[start + 1:end + 1, 1].mean()
    return avg

which used with .apply() is sadly far too slow to be useful.

Hopefully you can help me because I have been stuck on this problem for some time now.


Solution

  • You can calculate rolling forward by reverting your dataframe, then calculating rolling average, then reverting again. Also you need to specify closed='left' (see documentation) when doing this, since you don't want to include current value in your forward average:

    rolling = df.Price.rolling(window='5s', closed='both').mean().rename('Mean past')
    rolling_forward = df[::-1].Price.rolling(window='5s', closed='left').mean()[::-1].rename('Mean future')
    df[['Price']].merge(rolling, on='Timestamp').merge(rolling_forward, on='Timestamp')
    

    Will output

                        Price   Mean past   Mean future
    Timestamp           
    2019-01-02 08:30:00 56.565  56.565000   56.552000
    2019-01-02 08:30:01 56.565  56.565000   56.548750
    2019-01-02 08:30:02 56.565  56.565000   56.542500
    2019-01-02 08:30:03 56.540  56.558750   56.543333
    2019-01-02 08:30:04 56.545  56.556000   56.545000
    2019-01-02 08:30:05 56.545  56.554167   56.545000
    2019-01-02 08:30:07 56.540  56.547000   56.553750
    2019-01-02 08:30:09 56.550  56.545000   56.555000
    2019-01-02 08:30:10 56.545  56.545000   56.560000
    2019-01-02 08:30:11 56.550  56.546250   56.570000
    2019-01-02 08:30:12 56.570  56.551000   NaN