pythonpandasmaskcategorical

Add categorical variable based on conditional selections / dataframe masks


I made three conditional selections on my dataframe. So lets say:

final_df[(final_df['acceptance_advice'] == 'standard') & (final_df['acceptance'] == 'ok')]
final_df[(final_df['acceptance_advice'] == 'not accepted') & (final_df['acceptance'] == 'ok')]
final_df[(final_df['acceptance_advice'] == 'postponed') & (final_df['acceptance'] == 'declined')]
  

Now I want to add a categorical variable (the class I am going to use for prediction) from each of these selections. So let's say: the first selection should be class 1 and the second should class 2 and the third selection should be class 3.

I have tried:

cat_1 = final_df[(final_df['acceptance_advice'] == 'standard') & (final_df['acceptance'] == 'ok')]
cat_2 = final_df[(final_df['acceptance_advice'] == 'not accepted') & (final_df['acceptance'] == 'ok')]
cat_3 = final_df[(final_df['acceptance_advice'] == 'postponed') & (final_df['acceptance'] == 'declined')]

final_df['class'] = (cat_1 | cat_2 | cat_3).astype(int)

But it only worked on two categories (e.g. 0 and 1) but not on three.

final_df looks something like this:

id feature1 feature2 acceptance_advice acceptance
some value some value some value some value some value
some value some value some value some value some value
some value some value some value some value some value
some value some value some value some value some value

I want it to look like this:

id feature1 feature2 acceptance_advice acceptance class
some value some value some value some value some value 1
some value some value some value some value some value 2
some value some value some value some value some value 1
some value some value some value some value some value 3

I want to add a column class, which should be the class to be predicted.


Solution

  • You can test the following to add a class column -

    def set_class(df):
        
        if (df['acceptance_advice'] == 'standard') & (df['acceptance'] == 'ok'):
            return "1"
        elif (df['acceptance_advice'] == 'not accepted') & (df['acceptance'] == 'ok'):
            return "2"
        elif (df['acceptance_advice'] == 'postponed') & (df['acceptance'] == 'declined'):
            return "3"
    
    df['class'] = df.apply(set_class, axis = 1)