I have a pandas df as follows:
MATERIAL DATE HIGH LOW
AAA 2022-01-01 10 0
AAA 2022-01-02 0 0
AAA 2022-01-03 5 2
BBB 2022-01-01 0 0
BBB 2022-01-02 10 5
BBB 2022-01-03 8 4
I want to groupby
MATERIAL
and sort_values
by DATE
and choose all rows except last one in the group.
The resulting result should be:
MATERIAL DATE HIGH LOW
AAA 2022-01-01 10 0
AAA 2022-01-02 0 0
BBB 2022-01-01 0 0
BBB 2022-01-02 10 5
I have tried df.sort_values('DATE').groupby('MATERIAL').head(-1)
but this results in an empty df.
The DATE
is a pd.datetime
object.
Thanks!
Another way is to sort by dates first, then group and take every row except the last one using indexing:
>>> df.sort_values("DATE").groupby("MATERIAL").apply(lambda group_df: group_df.iloc[:-1])
MATERIAL DATE HIGH LOW
MATERIAL
AAA 0 AAA 2022-01-01 10 0
1 AAA 2022-01-02 0 0
BBB 3 BBB 2022-01-01 0 0
4 BBB 2022-01-02 10 5