[SOLVED] Pandas keep every nth row with special rule

Pandas keep every nth row with special rule

For example, I want to keep every 3rd row, but I must keep numbers divisible by 3(or some special rule like that). When I see a number divisible by 3, that restarts the count, meaning I will start counting to 3 from there, unless I see anoter value divisible by 3. Example given below:

import pandas as pd
df = pd.DataFrame.from_dict({'x': [0, 1, 2, 3, 4, 5, 7, 8, 9, 11, 12, 13, 14, 17, 20, 23]})
filtered = pd.DataFrame.from_dict({'x': [0, 3,  7,  9,  12,  17]}) # this is the desired dataframe
print (df, '\n\n--------------\n\n', filtered)

     x
0    0
1    1
2    2
3    3
4    4
5    5
6    7
7    8
8    9
9   11
10  12
11  13
12  14
13  17
14  20
15  23 

--------------

     x
0   0
1   3
2   7
3   9
4  12
5  17

Solution

You can use a custom groupby.cumcount:

# identify starts of groups
m1 = df['x'].mod(3).eq(0)

# for each group, get every third row
m2 = (df.groupby(m1.cumsum())
        .cumcount().mod(3).eq(0)
      )

out = df[m2]

Output:

Intermediates:

     x     m1  m1.cumsum()  cumcount     m2
0    0   True            1         0   True
1    1  False            1         1  False
2    2  False            1         2  False
3    3   True            2         0   True
4    4  False            2         1  False
5    5  False            2         2  False
6    7  False            2         3   True
7    8  False            2         4  False
8    9   True            3         0   True
9   11  False            3         1  False
10  12   True            4         0   True
11  13  False            4         1  False
12  14  False            4         2  False
13  17  False            4         3   True
14  20  False            4         4  False
15  23  False            4         5  False