pythondataframelistfilterisin

Python Dataframe: For Each Row Pick One Value from Five Different Columns And Place In New Column Based on if Value is in List


Let's say I'm starting with the following input table below:

date id val1 val2 val3 val4 val5 val_final
2023-08-29 B3241 496C
2023-09-08 A3290 349C 078F 274F
2023-09-12 D2903 349C 072F 307C 170F 201D
2023-09-14 I13490 497C 0349 303F 101A

The code to create the initial input table is below:

import pandas as pd
df = pd.DataFrame({'date':["2023-08-29","2023-09-08","2023-09-12", "2023-09-14"],'id':["B3241","A3290","D2903", "I13490"],'val1':["496C","349C","349C", "497C"], 'val2':["","078F","072F", "0349"], 'val3':["","274F","307C", "303F"], 'val4':["","","170F", "101A"], 'val5':["","","201D",""]})

I want to look at columns "val1" through "val5" and see which rows contain a value from my code list. I want to populate the "val_final" column accordingly (if the value is in the list).

code_list = ['349C', '303F', '201D', '497C']

If multiple columns contain values from code_list I want to pick the one that's on the right most column.

Given the above logic, my desired output table would look like this:

date id val1 val2 val3 val4 val5 val_final
2023-08-29 B3241 496C
2023-09-08 A3290 349C 078F 274F 349C
2023-09-12 D2903 349C 072F 307C 170F 201D 201D
2023-09-14 I13490 497C 349C 303F 101A 303F

I searched all over StackOverflow to try and solve this problem but to no avail. I assume doing if then else is an option starting with column "val5" and going down to column "val1" but I want to learn a more efficient way to do this.


Solution

  • IIUC you can use apply for each row with a function that returns the first value that exists in the code_list starting with the right-most columns:

    # returns the first match
    def lambda_func(row, code_list):
      for x in row:
        if x in code_list:
          return x  
      return None
    
    # define your list of "val" columns
    val_cols = [x for x in df.columns if x.startswith('val')][::-1]
    
    df['val_final'] = df.apply(lambda row: lambda_func(row[val_cols], code_list), axis=1)
    print(df)
    

    Output:

             date      id  val1  val2  val3  val4  val5 val_final
    0  2023-08-29   B3241  496C                              None
    1  2023-09-08   A3290  349C  078F  274F                  349C
    2  2023-09-12   D2903  349C  072F  307C  170F  201D      201D
    3  2023-09-14  I13490  497C  0349  303F  101A            303F