I have the following pandas dataframe:
# Create the DataFrame
df = pd.DataFrame({
'id': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2],
'date': [1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12],
'value': [11, 12, 13, 14, 15, 16, 17, 18, 21, 22, 23, 24, 25, 26, 27, 28]
})
df
id date value
0 1 1 11
1 1 2 12
2 1 3 13
3 1 4 14
4 1 5 15
5 1 6 16
6 1 7 17
7 1 8 18
8 2 5 21
9 2 6 22
10 2 7 23
11 2 8 24
12 2 9 25
13 2 10 26
14 2 11 27
15 2 12 28
I want to query the above dataframe, in a rolling window manner, for both ids
. The rolling window should be of size n
.
So, if n==2
, in the 1st iteration I would like to query this:
df.query('(id==1 and (date==1 or date==2)) or (id==2 and (date==5 or date==6))')
id date value
0 1 1 11
1 1 2 12
8 2 5 21
9 2 6 22
in the 2nd iteration I would like to query this:
df.query('(id==1 and (date==2 or date==3)) or (id==2 and (date==6 or date==7))')
id date value
1 1 2 12
2 1 3 13
9 2 6 22
10 2 7 23
in the 3rd iteration I would like to query this:
df.query('(id==1 and (date==3 or date==4)) or (id==2 and (date==7 or date==8))')
id date value
2 1 3 13
3 1 4 14
10 2 7 23
11 2 8 24
etc. How could I do that in pandas ? My data has around 500 ids
The exact expected logic is not fully clear, but assuming you want to loop over the groups/rolls, you could combine groupby.nth
with sliding_window_view
. By reusing the DataFrameGroupBy
object, will only need to compute the groups once:
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv
n = 2
max_size = df['id'].value_counts(sort=False).max()
g = df.sort_values(by=['id', 'date']).groupby('id', sort=False)
for idx in swv(np.arange(max_size), n):
print(f'rows {idx}')
print(g.nth(idx))
Output:
rows [0 1]
id date value
0 1 1 11
1 1 2 12
8 2 5 21
9 2 6 22
rows [1 2]
id date value
1 1 2 12
2 1 3 13
9 2 6 22
10 2 7 23
rows [2 3]
id date value
2 1 3 13
3 1 4 14
10 2 7 23
11 2 8 24
rows [3 4]
id date value
3 1 4 14
4 1 5 15
11 2 8 24
12 2 9 25
rows [4 5]
id date value
4 1 5 15
5 1 6 16
12 2 9 25
13 2 10 26
rows [5 6]
id date value
5 1 6 16
6 1 7 17
13 2 10 26
14 2 11 27
rows [6 7]
id date value
6 1 7 17
7 1 8 18
14 2 11 27
15 2 12 28
Alternatively, and assuming groups with an identical size and sorted by id/date, using shifted indexing:
n = 2
ngroups = df['id'].nunique()
for idx in swv(np.arange(len(df)).reshape(-1, ngroups, order='F'), n, axis=0):
print(f'indices: {idx.ravel()}')
print(df.iloc[idx.flat])
Output:
indices: [0 1 8 9]
id date value
0 1 1 11
1 1 2 12
8 2 5 21
9 2 6 22
indices: [ 1 2 9 10]
id date value
1 1 2 12
2 1 3 13
9 2 6 22
10 2 7 23
indices: [ 2 3 10 11]
id date value
2 1 3 13
3 1 4 14
10 2 7 23
11 2 8 24
indices: [ 3 4 11 12]
id date value
3 1 4 14
4 1 5 15
11 2 8 24
12 2 9 25
indices: [ 4 5 12 13]
id date value
4 1 5 15
5 1 6 16
12 2 9 25
13 2 10 26
indices: [ 5 6 13 14]
id date value
5 1 6 16
6 1 7 17
13 2 10 26
14 2 11 27
indices: [ 6 7 14 15]
id date value
6 1 7 17
7 1 8 18
14 2 11 27
15 2 12 28