pythonpandasnumpy

How to get the first column that satisfies condition


I have a dataset like the below which was got by traversing through a hierarchy:

id name rank manager_id manager_id_1 manager_id_2 ... manager_id_20

There is not a set number of levels, so the data might well go to manager_id_25 in future.

For every row, I want to find the first column from manager_id onwards which satisfies a condition. The condition is to check if a specific substring is present in the column.

For example, if the row data is:

id name rank manager_id manager_id_1 manager_id_2 manager_id_3 ... result
xx1 namexx T xx2-TR xx3-FR xx4-FR xx22-ER ... xx3-FR

and I want the first column that contains "FR" in the data, then "xx3-FR" has to be populated in the result column.


Solution

  • First, find all columns that are relevant, i.e. start manager_id:

    manager_columns = [col for col in df.columns if col.startswith('manager_id')]
    

    Now find your substring within your selection and return the first column value that satisfies the condition:

    df['result'] = df[manager_columns].apply(lambda row: next((val for val in row if 'FR' in val), None), axis=1)