I have a dataset with unique ID with columns which have several bool values for each ID. Therefore, I need to convert these columns into a single categorical variables concatenating multiple true bool values. And if out of mentioned bool ID have 3 true values we assign to categories as "Win"
ID | BoolCol_1 | BoolCol_2 | BoolCol_3 | BoolCol_4 | Other Col 1 | Other Col 2 |
---|---|---|---|---|---|---|
1 | 1 | 2 | 2 | 1 | x | Y |
2 | 2 | 1 | 1 | 1 | A | b |
1 -> True 2 -> False
ID are unique.
I am not able to think in my head how to solve this puzzle
Use the following approach:
bool_cols = ['BoolCol_1', 'BoolCol_2', 'BoolCol_3', 'BoolCol_4']
cnts = df[bool_cols].stack().groupby(level=0).value_counts().unstack()[1]
df['cat_col'] = pd.Series(np.where(cnts >= 3, 'W', 'L'), dtype='category')
Now cat_col
is categorical column with fixed values W
(win), L
(lose)
In [229]: df
Out[229]:
ID BoolCol_1 BoolCol_2 BoolCol_3 BoolCol_4 Other Col 1 Other Col 2 cat_col
0 1 1 2 2 1 x Y L
1 2 2 1 1 1 A b W