pythonpandasdataframefiltergroup-by

Get last row that satisfies a condition using pandas groupby


I have a DataFrame like this:

date_from date_to item_id VALUE_NEW VALUE_OLD cost_var
1/1/1900 00:00:00 11/3/2022 15:31:18 452953 5366,46 4024,71 33.34%
11/3/2022 15:31:18 1/1/2200 00:00:00 452953 9122,57 5366,46 69.99%
1/1/1900 00:00:00 11/3/2022 15:31:18 452954 5366,46 4024,71 33.34%
11/3/2022 15:31:18 1/1/2200 00:00:00 452954 9122,57 5366,46 69.99%
1/1/1900 00:00:00 21/7/2021 16:30:46 452961 6170,98 4024,71 53.33%
21/7/2021 16:30:46 11/3/2022 15:31:09 452961 5312 6170,98 13.92%
11/3/2022 15:31:09 1/1/2200 00:00:00 452961 9122,57 5312 71.74%
1/1/1900 00:00:00 13/10/2021 14:39:55 801286 4052,1 1332,8 204.03%
13/10/2021 14:39:55 13/10/2021 14:43:09 801286 4,4732 4052,1 99.89%
13/10/2021 14:43:09 3/2/2022 17:16:23 801286 4473,2 4,4732 99900.00%
3/2/2022 17:16:23 1/1/2200 00:00:00 801286 4946,8 4473,2 10.59%

I need to check each item_id, and get the last row where cost_var is >60%. If it's the last row, that's ok, but if there is a next one, and it is <60%, I have to drop the last row>60%. Output should look like this:

date_from date_to item_id VALUE_NEW VALUE_OLD cost_var
11/3/2022 15:31:18 1/1/2200 00:00:00 452953 9122,57 5366,46 69.99%
11/3/2022 15:31:18 1/1/2200 00:00:00 452954 9122,57 5366,46 69.99%
11/3/2022 15:31:09 1/1/2200 00:00:00 452961 9122,57 5312 71.74%

Item 802186 returned no value, because last row>60% (99900.00%) has a next row and cost_var<60% (10.59%)... Is it possible to do? I couldn't find a way to solve it.


Solution

  • We can select the last row of each item_id using groupby and only select ones where cost_var > 60% using query.

    df.groupby('item_id', as_index=False).last().query("cost_var.str.rstrip('%').astype('float')>60")
    

    enter image description here