I have a dataframe with various columns, three of which are columns with lists (each cell has a list). These three columns have mutually exclusive values.
I have another column which has the label (A, B, C, D or E)
vot_in_favour vot_against vot_abstention label
0 [A, B, C] [] [D, E] A
1 [A, D, E] [C] [B] C
2 [B, C] [A] [D, E] D
I want a column vote
which has the name of the column according to the label, like the following:
vote
0 vot_in_favour
1 vot_against
2 vot_abstention
I tried something like df1['vote'] = df.drop("label", axis=1).isin(df["label"]).any(1)
, but do not know how to make this to match any value of the lists. I have visited similar issues, but the list columns are posing a challenge.
You will somehow need to get the values out of the lists, compare with the corresponding labels and map that with the column labels.
The simplest way is to explode the columns sequentially, compare the values with 'label'
column values (using eq
), get the column labels (mul
) and get back into original shape (stack
+ groupby.first
).
exp_df = df.explode('vot_in_favour').explode('vot_against').explode('vot_abstention')
df['vote'] = (
exp_df
.filter(like='vot')
.eq(exp_df['label'], axis=0)
.pipe(lambda x: x.mul(x.columns))
.stack()
.replace('', pd.NA)
.groupby(level=0).first()
)
Yet another method is to explode the columns, compare with 'label'
and filter out duplicate rows.
# explode all three columns
exp_df = df.explode('vot_in_favour').explode('vot_against').explode('vot_abstention')
# compare labels with votes to find matches and assign index to be used later to remove duplicates
exp_df = exp_df.eq(exp_df.pop('label'), axis=0).assign(index=lambda x: x.index)
# remove duplicates and all False rows and get the matches in each row
df['vote'] = exp_df[exp_df.any(axis=1) & ~exp_df.duplicated() & ~exp_df.pop('index').duplicated()].idxmax(axis=1)
Both version produce the desired output as follows:
vot_in_favour vot_against vot_abstention label vote
0 [A, B, C] [] [D, E] A vot_in_favour
1 [A, D, E] [C] [B] C vot_against
2 [B, C] [A] [D, E] D vot_abstention