I have a dataset like the below which was got by traversing through a hierarchy:
id | name | rank | manager_id | manager_id_1 | manager_id_2 | ... | manager_id_20 |
---|
There is not a set number of levels, so the data might well go to manager_id_25 in future.
For every row, I want to find the first column from manager_id onwards which satisfies a condition. The condition is to check if a specific substring is present in the column.
For example, if the row data is:
id | name | rank | manager_id | manager_id_1 | manager_id_2 | manager_id_3 | ... | result |
---|---|---|---|---|---|---|---|---|
xx1 | namexx | T | xx2-TR | xx3-FR | xx4-FR | xx22-ER | ... | xx3-FR |
and I want the first column that contains "FR" in the data, then "xx3-FR" has to be populated in the result column.
First, find all columns that are relevant, i.e. start manager_id
:
manager_columns = [col for col in df.columns if col.startswith('manager_id')]
Now find your substring within your selection and return the first column value that satisfies the condition:
df['result'] = df[manager_columns].apply(lambda row: next((val for val in row if 'FR' in val), None), axis=1)