pythondatemaxlines-of-code

Check maximum date and keep that row only


I have a df with several duplicated values in column B.

What I need is to look for the most recent date of column A for each value in column B and relove the lines that are not the most recent:

A            B          E
26/12/2023  apple         7,9
26/12/2022  apple         8,3
26/12/2023  pear          28,6
26/12/2022  orange        33,3
26/12/2023  wildberry     24,7
26/12/2022  wildberry     29,1
26/12/2023  grapes        17,1

The result should be :

A            B          E
26/12/2023  apple          7,9
26/12/2023  pear          28,6
26/12/2022  orange        33,3
26/12/2023  wildberry     24,7
26/12/2023  grapes        17,1

Could you help me find the correct formula? I am a beginner and got lost in a loc function

I am a beginner and got lost in a loc function


Solution

  • You can use group by:

    df.groupby('B')['A'].max()
    

    https://scales.arabpsychology.com/stats/how-to-find-the-max-value-by-group-in-pandas/