With this code:
data = pd.DataFrame({'car_plate':['1v','2f','3a','h0','m1'],
'car_brand':['Honda','Honda','Ford','Audi','Audi'],
'last_seen':['01/01/2020','01/02/2020','01/03/2020','01/04/2020','01/05/2020']})
data['last_seen'] = pd.to_datetime(data['last_seen'])
I have the following dataframe:
car_plate | car_brand | last_seen | |
---|---|---|---|
0 | 1v | Honda | 2020-01-01 00:00:00 |
1 | 2f | Honda | 2020-01-02 00:00:00 |
2 | 3a | Ford | 2020-01-03 00:00:00 |
3 | h0 | Audi | 2020-01-04 00:00:00 |
4 | m1 | Audi | 2020-01-05 00:00:00 |
I would like that for ever car_brand
, I mark to 1
the row that have the most recent last_seen
, so the output is:
car_plate | car_brand | last_seen | recent | |
---|---|---|---|---|
0 | 1v | Honda | 2020-01-01 00:00:00 | 0 |
1 | 2f | Honda | 2020-01-02 00:00:00 | 1 |
2 | 3a | Ford | 2020-01-03 00:00:00 | 1 |
3 | h0 | Audi | 2020-01-04 00:00:00 | 0 |
4 | m1 | Audi | 2020-01-05 00:00:00 | 1 |
As the 2f
is the most recent Honda, 3a
is the most recent Ford, and the m1
is the most recent Audi.
I've been thinking of this idea, pseudo code: group by car_brand
, get max last_seen
, get the index of that row, create a column seen
and set to 1 those indexes.
P.S.: I'm also wondering how it would work if I add more levels to the combintion, e.g car_owner
, and I would like the most recent car_plate
+ car_owner
combination for every car_brand
.
One option is to create a temporary column, containing the last row per group, and then use a boolean mask to assign the 0s and 1s:
last = data.groupby('car_brand').last_seen.transform('last')
data.assign(recent = data.last_seen.eq(last).astype(int))
car_plate car_brand last_seen recent
0 1v Honda 2020-01-01 0
1 2f Honda 2020-01-02 1
2 3a Ford 2020-01-03 1
3 h0 Audi 2020-01-04 0
4 m1 Audi 2020-01-05 1