pythonpandas

Column to flag the most max row of a group


With this code:

data = pd.DataFrame({'car_plate':['1v','2f','3a','h0','m1'],
'car_brand':['Honda','Honda','Ford','Audi','Audi'],
'last_seen':['01/01/2020','01/02/2020','01/03/2020','01/04/2020','01/05/2020']})

data['last_seen'] = pd.to_datetime(data['last_seen'])

I have the following dataframe:

car_plate car_brand last_seen
0 1v Honda 2020-01-01 00:00:00
1 2f Honda 2020-01-02 00:00:00
2 3a Ford 2020-01-03 00:00:00
3 h0 Audi 2020-01-04 00:00:00
4 m1 Audi 2020-01-05 00:00:00

I would like that for ever car_brand, I mark to 1 the row that have the most recent last_seen, so the output is:

car_plate car_brand last_seen recent
0 1v Honda 2020-01-01 00:00:00 0
1 2f Honda 2020-01-02 00:00:00 1
2 3a Ford 2020-01-03 00:00:00 1
3 h0 Audi 2020-01-04 00:00:00 0
4 m1 Audi 2020-01-05 00:00:00 1

As the 2f is the most recent Honda, 3a is the most recent Ford, and the m1 is the most recent Audi.

I've been thinking of this idea, pseudo code: group by car_brand, get max last_seen, get the index of that row, create a column seen and set to 1 those indexes.

P.S.: I'm also wondering how it would work if I add more levels to the combintion, e.g car_owner, and I would like the most recent car_plate + car_owner combination for every car_brand.


Solution

  • One option is to create a temporary column, containing the last row per group, and then use a boolean mask to assign the 0s and 1s:

    last = data.groupby('car_brand').last_seen.transform('last')
    data.assign(recent = data.last_seen.eq(last).astype(int))
    
      car_plate car_brand  last_seen  recent
    0        1v     Honda 2020-01-01       0
    1        2f     Honda 2020-01-02       1
    2        3a      Ford 2020-01-03       1
    3        h0      Audi 2020-01-04       0
    4        m1      Audi 2020-01-05       1