pythonpandasscikit-learnlabel-encoding

y contains previously unseen labels: 'Male' in Label encoder


I am trying to convert the categorical column of my dataset into numerical using LabelEncoder. dataset

Here is the conversion code:

for i in cat_columns:
    df[i]=encoder.fit_transform(df[i])

After conversion dataset looks like dataset after transformation

But the problem is whenever I try to transform my test dataset it gives an error that

y contains previously unseen labels: 'Male'

Code for transformation on test data :

for i in cat_columns:
    df1[i]=encoder.transform(df1[i])

test data

Now how can i solve this problem?


Solution

  • I guess the problem is that you are using the same encoder to fit all the different columns. You should instead fit each column using a different encoder. For example, you can use a dictionary to store the different encoders:

    from sklearn import preprocessing
    
    encoders = {}
    for i in cat_columns:
        encoders[i] = preprocessing.LabelEncoder()
        df[i] = encoders[i].fit_transform(df[i])
        
    for i in cat_columns:
        df1[i] = encoders[i].transform(df1[i])
    

    The error you encounter (previously unseen labels: 'Male') is caused by the fact you are trying to transform the gender column using the last encoder you create in the previous for loop, which in your case might be a smoking_status label encoder.