pythonpandassliding-window

How to use the apply function to return a list to new column in Pandas


I have a Pandas dataframe:

import pandas as pd
import numpy as np

np.random.seed(150)
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=['A', 'B'])

I want to add a new column "C" whose values ​​are the combined-list of every three rows in column "B". So I use the following method to achieve my needs, but this method is slow when the data is large.

>>> df['C'] = [df['B'].iloc[i-2:i+1].tolist() if i >= 2 else None for i in range(len(df))]
>>> df
   A  B          C
0  4  9       None
1  0  2       None
2  4  5  [9, 2, 5]
3  7  9  [2, 5, 9]
4  8  3  [5, 9, 3]
5  8  1  [9, 3, 1]
6  1  4  [3, 1, 4]
7  4  1  [1, 4, 1]
8  1  9  [4, 1, 9]
9  3  7  [1, 9, 7]

When I try to use the df.apply function, I get an error message:

df['C'] = df['B'].rolling(window=3).apply(lambda x: list(x), raw=False)

TypeError: must be real number, not list

I remember that Pandas apply doesn't seem to return a list, so how do I do this? I searched the forum, but couldn't find a suitable topic about apply and return.


Solution

  • You can use numpy's sliding_window_view:

    from numpy.lib.stride_tricks import sliding_window_view as swv
    
    N = 3
    df['C'] = pd.Series(swv(df['B'], N).tolist(), index=df.index[N-1:])
    

    Output:

       A  B          C
    0  4  9        NaN
    1  0  2        NaN
    2  4  5  [9, 2, 5]
    3  7  9  [2, 5, 9]
    4  8  3  [5, 9, 3]
    5  8  1  [9, 3, 1]
    6  1  4  [3, 1, 4]
    7  4  1  [1, 4, 1]
    8  1  9  [4, 1, 9]
    9  3  7  [1, 9, 7]