I am trying to convert the categorical column of my dataset into numerical using LabelEncoder. dataset
Here is the conversion code:
for i in cat_columns:
df[i]=encoder.fit_transform(df[i])
After conversion dataset looks like dataset after transformation
But the problem is whenever I try to transform my test dataset it gives an error that
y contains previously unseen labels: 'Male'
Code for transformation on test data :
for i in cat_columns:
df1[i]=encoder.transform(df1[i])
Now how can i solve this problem?
I guess the problem is that you are using the same encoder to fit all the different columns. You should instead fit each column using a different encoder. For example, you can use a dictionary to store the different encoders:
from sklearn import preprocessing
encoders = {}
for i in cat_columns:
encoders[i] = preprocessing.LabelEncoder()
df[i] = encoders[i].fit_transform(df[i])
for i in cat_columns:
df1[i] = encoders[i].transform(df1[i])
The error you encounter (previously unseen labels: 'Male'
) is caused by the fact you are trying to transform the gender
column using the last encoder you create in the previous for loop, which in your case might be a smoking_status
label encoder.