I have a Pandas dataframe:
import pandas as pd
import numpy as np
np.random.seed(150)
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=['A', 'B'])
I want to add a new column "C" whose values are the combined-list of every three rows in column "B". So I use the following method to achieve my needs, but this method is slow when the data is large.
>>> df['C'] = [df['B'].iloc[i-2:i+1].tolist() if i >= 2 else None for i in range(len(df))]
>>> df
A B C
0 4 9 None
1 0 2 None
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]
When I try to use the df.apply function, I get an error message:
df['C'] = df['B'].rolling(window=3).apply(lambda x: list(x), raw=False)
TypeError: must be real number, not list
I remember that Pandas apply
doesn't seem to return a list, so how do I do this? I searched the forum, but couldn't find a suitable topic about apply and return.
You can use numpy
's sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
N = 3
df['C'] = pd.Series(swv(df['B'], N).tolist(), index=df.index[N-1:])
Output:
A B C
0 4 9 NaN
1 0 2 NaN
2 4 5 [9, 2, 5]
3 7 9 [2, 5, 9]
4 8 3 [5, 9, 3]
5 8 1 [9, 3, 1]
6 1 4 [3, 1, 4]
7 4 1 [1, 4, 1]
8 1 9 [4, 1, 9]
9 3 7 [1, 9, 7]