For example, I want to keep every 3rd row, but I must keep numbers divisible by 3(or some special rule like that). When I see a number divisible by 3, that restarts the count, meaning I will start counting to 3 from there, unless I see anoter value divisible by 3. Example given below:
import pandas as pd
df = pd.DataFrame.from_dict({'x': [0, 1, 2, 3, 4, 5, 7, 8, 9, 11, 12, 13, 14, 17, 20, 23]})
filtered = pd.DataFrame.from_dict({'x': [0, 3, 7, 9, 12, 17]}) # this is the desired dataframe
print (df, '\n\n--------------\n\n', filtered)
x
0 0
1 1
2 2
3 3
4 4
5 5
6 7
7 8
8 9
9 11
10 12
11 13
12 14
13 17
14 20
15 23
--------------
x
0 0
1 3
2 7
3 9
4 12
5 17
You can use a custom groupby.cumcount
:
# identify starts of groups
m1 = df['x'].mod(3).eq(0)
# for each group, get every third row
m2 = (df.groupby(m1.cumsum())
.cumcount().mod(3).eq(0)
)
out = df[m2]
Output:
x
0 0
3 3
6 7
8 9
10 12
13 17
Intermediates:
x m1 m1.cumsum() cumcount m2
0 0 True 1 0 True
1 1 False 1 1 False
2 2 False 1 2 False
3 3 True 2 0 True
4 4 False 2 1 False
5 5 False 2 2 False
6 7 False 2 3 True
7 8 False 2 4 False
8 9 True 3 0 True
9 11 False 3 1 False
10 12 True 4 0 True
11 13 False 4 1 False
12 14 False 4 2 False
13 17 False 4 3 True
14 20 False 4 4 False
15 23 False 4 5 False