pythonpandasdataframerolling-sum

Python Pandas rolling sum place value at the top of window


I would like to use the rolling method of Pandas. I need a slight adjustment, however, I would like the 'value' placed at the top of the 'window'.

Currently, I am using this:

self.df['new_col'] = self.df['Zone3'].rolling(4).sum()

Which is producing this:

       Date     Time     Period  Zone4  Zone3  new_col
0   2018-02-23  00:00     900     11      2      NaN
1   2018-02-23  00:15     900     11      1      NaN
2   2018-02-23  00:30     900      7      3      NaN
3   2018-02-23  00:45     900      2      0      6.0
4   2018-02-23  01:00     900      3      2      6.0
5   2018-02-23  01:15     900      7      0      5.0
6   2018-02-23  01:30     900      2      4      6.0

What I would actually like:

       Date     Time     Period  Zone4  Zone3  new_col
0   2018-02-23  00:00     900     11      2      6.0
1   2018-02-23  00:15     900     11      1      6.0
2   2018-02-23  00:30     900      7      3      5.0
3   2018-02-23  00:45     900      2      0      6.0
4   2018-02-23  01:00     900      3      2      NaN
5   2018-02-23  01:15     900      7      0      NaN
6   2018-02-23  01:30     900      2      4      NaN

Notice the sum value is at the start of the window (position 1), and not at the end (position 4)

If rolling is the wrong method, great, any method would help. I know how to do this in a "pythonic" way (using a for loop), I was just hoping to do it in place with the dataframe using pandas.

Thank you in advance


Solution

  • Use shift:

    self.df['new_col'] = self.df['Zone3'].rolling(4).sum().shift(-3)
    

    Or more general:

    N = 4
    df['new_col'] = df['Zone3'].rolling(N).sum().shift(-N+1)
    print (df)
             Date   Time  Period  Zone4  Zone3  new_col
    0  2018-02-23  00:00     900     11      2      6.0
    1  2018-02-23  00:15     900     11      1      6.0
    2  2018-02-23  00:30     900      7      3      5.0
    3  2018-02-23  00:45     900      2      0      6.0
    4  2018-02-23  01:00     900      3      2      NaN
    5  2018-02-23  01:15     900      7      0      NaN
    6  2018-02-23  01:30     900      2      4      NaN
    

    N = 2
    df['new_col'] = df['Zone3'].rolling(N).sum().shift(-N+1)
    print (df)
             Date   Time  Period  Zone4  Zone3  new_col
    0  2018-02-23  00:00     900     11      2      3.0
    1  2018-02-23  00:15     900     11      1      4.0
    2  2018-02-23  00:30     900      7      3      3.0
    3  2018-02-23  00:45     900      2      0      2.0
    4  2018-02-23  01:00     900      3      2      2.0
    5  2018-02-23  01:15     900      7      0      4.0
    6  2018-02-23  01:30     900      2      4      NaN