pandas

Pandas group by and choose all rows except last one in group


I have a pandas df as follows:

MATERIAL    DATE         HIGH    LOW
AAA       2022-01-01     10      0
AAA       2022-01-02     0       0
AAA       2022-01-03     5       2
BBB       2022-01-01     0       0
BBB       2022-01-02     10      5
BBB       2022-01-03     8       4

I want to groupby MATERIAL and sort_values by DATE and choose all rows except last one in the group. The resulting result should be:

MATERIAL    DATE         HIGH    LOW
AAA       2022-01-01     10      0
AAA       2022-01-02     0       0
BBB       2022-01-01     0       0
BBB       2022-01-02     10      5

I have tried df.sort_values('DATE').groupby('MATERIAL').head(-1) but this results in an empty df. The DATE is a pd.datetime object. Thanks!


Solution

  • Another way is to sort by dates first, then group and take every row except the last one using indexing:

    >>> df.sort_values("DATE").groupby("MATERIAL").apply(lambda group_df: group_df.iloc[:-1])
               MATERIAL        DATE  HIGH  LOW
    MATERIAL                                  
    AAA      0      AAA  2022-01-01    10    0
             1      AAA  2022-01-02     0    0
    BBB      3      BBB  2022-01-01     0    0
             4      BBB  2022-01-02    10    5