pythondataframe

fill a column with values from another columns using conditions for a long dataframe


I want to fill a dataframe with values of another column I use an conditions to fill in the values. The condition that i want to fill in is when the value in the column 'Essentieel_Optioneel' == 'essentieel' it need to fill the value with perc_essentieel_skill. This is the same for when the value 'optioneel'.

When pare this back i get an erorr enter image description here

conditions = [
     (df1['Essentieel_Optioneel'] <= 'essentieel'),
     (df1['Essentieel_Optioneel'] <= 'optioneel')

]

values = df1[['perc_essentieel_skill','perc_essentieel_skill']]
df1['vector'] = np.select(conditions, values)

df1
    811 'list of cases must be same length as list of conditions')
    813 # Now that the dtype is known, handle the deprecated select([], []) case
    814 if len(condlist) == 0:

ValueError: list of cases must be same length as list of conditions

I have the feeling that my dataframe is to long, i have a frame 19913 * 12 columns.

i have the feeling that i have to use a for loop.


Solution

  • You should provide a complete example for clarity, but assuming you want to use the two columns as replacement, you would need to transpose the values and convert to numpy array:

    values = df1[['perc_essentieel_skill1', 'perc_essentieel_skill2']].T.values
    df1['vector'] = np.select(conditions, values)
    

    Or manually assign the columns for each condition:

    df1['vector'] = np.select(conditions, [df1['perc_essentieel_skill1'],
                                           df1['perc_essentieel_skill2']])
    

    Example:

      Essentieel_Optioneel perc_essentieel_skill1 perc_essentieel_skill2 vector
    0           essentieel                     A1                     B1     A1
    1           essentieel                     A2                     B2     A2
    2            optioneel                     A3                     B3     B3
    

    If you have discrete categories in "Essentieel_Optioneel", you could also refactor the code to use indexing lookup:

    d = {'essentieel': 'perc_essentieel_skill1',
         'optioneel': 'perc_essentieel_skill2'
        }
    
    idx, cols = pd.factorize(df1['Essentieel_Optioneel'].map(d))
    
    df1['vector'] = df1.reindex(cols, axis=1).to_numpy()[np.arange(len(df1)), idx]