I'm working on a multi-label image classifaction task. I have a dataframe with two columns (id and labels). I want to create a new column, which checks the ids for duplicates and if there is a duplicate (which is the case) the additional label should be assigned to the new column. The result should be a new column including all labels. Im struggling to write the labels in a new column as a list. Does anyone can support me here?
My df has the following structures:
| id | labels |
| -------- | -------------- |
| x.jpg | label_1 |
| x.jpg | label_2 |
New dataframe
| id | labels | all_labels |
| -------- | -------------- |-------------------
| x.jpg | label_1 | [label_1, label_2, and other if existent]
| x.jpg | label_2 |
I think this does what you want although the format is a bit different:
newdf = df.groupby('id')['labels'].agg(list).reset_index(name='labels')
produces
id labels
0 x.jpg [label_1, label_2]
1 y.jpg [label_3]