I am looking to extract rows from my dataset based on a pattern like condition. The condition I'm looking for is finding periods in a battery's charging history where it discharged from 100-0% without charging in between.
For example, in this dataset below I would be interested in a function that would only return timestamp of 7 to 12 as it contains a full discharge of the battery.
timestamp | Charge level (%) |
---|---|
1 | 50 |
2 | 55 |
3 | 40 |
4 | 60 |
5 | 80 |
6 | 100 |
7 | 100 |
8 | 85 |
9 | 60 |
10 | 55 |
11 | 40 |
12 | 0 |
13 | 20 |
The approach I have tried is to use the loc
function in Pandas to look for rows with a charge level of 0% and then backtrack until I reach a row with a charge level of 100%.
But I am struggling with the backtracking part in this approach.
The idea I use is to keep only the rows with 0 and 100 and the final rows of interest will be the ones with 100 followed by 0.
I also updated your example to include some more difficult cases like when it start discharging and then start charging before it was fully discharged.
with pd.option_context('display.max_columns', None):
display(df.T)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
level 20 0 40 60 80 100 100 85 60 55 40 0 100 50 20 55 100 100 0 100 60 30 0 20
mask =((df["level"] == 100) | (df["level"] == 0))
res = df[mask].copy()
mask1 = (res["level"] == 100) & (res["level"].shift(-1) == 0)
mask2 = (res["level"] == 0) & (res["level"].shift(1) == 100)
res = res.loc[mask1 | mask2, ["time", "level"]]
display(res)
time level
6 7 100
11 12 0
17 18 100
18 19 0
19 20 100
22 23 0
where we have the start and end index of all the segments of interest.