[SOLVED] Column to flag the most max row of a group

Column to flag the most max row of a group

With this code:

data = pd.DataFrame({'car_plate':['1v','2f','3a','h0','m1'],
'car_brand':['Honda','Honda','Ford','Audi','Audi'],
'last_seen':['01/01/2020','01/02/2020','01/03/2020','01/04/2020','01/05/2020']})

data['last_seen'] = pd.to_datetime(data['last_seen'])

I have the following dataframe:

	car_plate	car_brand	last_seen
0	1v	Honda	2020-01-01 00:00:00
1	2f	Honda	2020-01-02 00:00:00
2	3a	Ford	2020-01-03 00:00:00
3	h0	Audi	2020-01-04 00:00:00
4	m1	Audi	2020-01-05 00:00:00

I would like that for ever car_brand, I mark to 1 the row that have the most recent last_seen, so the output is:

	car_plate	car_brand	last_seen	recent
0	1v	Honda	2020-01-01 00:00:00	0
1	2f	Honda	2020-01-02 00:00:00	1
2	3a	Ford	2020-01-03 00:00:00	1
3	h0	Audi	2020-01-04 00:00:00	0
4	m1	Audi	2020-01-05 00:00:00	1

As the 2f is the most recent Honda, 3a is the most recent Ford, and the m1 is the most recent Audi.

I've been thinking of this idea, pseudo code: group by car_brand, get max last_seen, get the index of that row, create a column seen and set to 1 those indexes.

P.S.: I'm also wondering how it would work if I add more levels to the combintion, e.g car_owner, and I would like the most recent car_plate + car_owner combination for every car_brand.

Solution

One option is to create a temporary column, containing the last row per group, and then use a boolean mask to assign the 0s and 1s:

last = data.groupby('car_brand').last_seen.transform('last')
data.assign(recent = data.last_seen.eq(last).astype(int))

  car_plate car_brand  last_seen  recent
0        1v     Honda 2020-01-01       0
1        2f     Honda 2020-01-02       1
2        3a      Ford 2020-01-03       1
3        h0      Audi 2020-01-04       0
4        m1      Audi 2020-01-05       1