[SOLVED] Issue with imblearn: SMOTENC " TypeError: '<' not supported between instances of 'int' and 'str"'

Issue with imblearn: SMOTENC " TypeError: '<' not supported between instances of 'int' and 'str"'

I am using SMOTENC to solve an unbalanced classification problem.

df_train, df_test = train_test_split(input_table_1_df, test_size=0.25, stratify=input_table_1_df["Target_Variable_SX_FASCIA_1"])
    
                  ###### SMOTE ######
    # Create features table and target table
    df_x = df_train.loc[ : , df_train.columns != "Target_Variable_SX_FASCIA_1"] 
    df_y = df_train.drop(["Target_Variable_SX_FASCIA_1"], axis=1)
    
    # From pandas to numpy arrays
    from imblearn.over_sampling import SMOTENC
    
    df_X=df_x.to_numpy()
    df_Y=df_y.to_numpy()
    
    column_name_x=list(df_x.columns) 
    column_name_y=list(df_y.columns) 
    
    # Resampling
    smote_nc = SMOTENC(categorical_features=[0,1,2,3,4,5], random_state=0,sampling_strategy=.2)
    X_resampled, Y_resampled = smote_nc.fit_resample(df_X, df_Y)
    X_resampled_df= pd.DataFrame(X_resampled,columns=column_name_x)
    Y_resampled_df= pd.DataFrame(Y_resampled,columns=column_name_y)
    Training_set_Passivi_Fascia_1 = pd.concat([X_resampled_df, Y_resampled_df], axis=1)

I got the following error at line:

X_resampled, Y_resampled = smote_nc.fit_resample(df_X, df_Y)

TypeError: '<' not supported between instances of 'int' and 'str'

I can understand that it is a matter of variable types, but I can not figure out how to solve this error. I already tried to:

Replace all missing values
Fix all possible variables type misspecification

Other useful information: The first 6 variables of the dataset are string, others are double or integer.

Just ask if you need further information.

Thanks in advance.

Solution

It would be helpful if you can print head of df_x and df_y.

What I can infer from this line

df_y = df_train.drop(["Target_Variable_SX_FASCIA_1"], axis=1)

You are essentially dropping off the target variable and keeping the predictors in df_y. My assumption is "Target_Variable_SX_FASCIA_1" is the column name of the target variable so it should be

df_y = df_train["Target_Variable_SX_FASCIA_1"].values