I'm working on a classification task using the XGBoost classifier model. My dataset contains categorical variables, my target classes ('Dropout', 'Enrolled', 'Graduate').
from xgboost import XGBClassifier
xgb = XGBClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.1,
subsample=0.8,
colsample_bytree=0.8,
eval_metric='mlogloss'
)
xgb.fit(X_train, y_train)
I get the following error
ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2],
got ['Dropout' 'Enrolled' 'Graduate']
After that, I'm using label encoder techniques; it works fine. But I need ['Dropout' 'Enrolled' 'Graduate'] this categorical in the production section. How I can change this [0 1 2] to ['Dropout' 'Enrolled' 'Graduate'] after the train XGBClassifier.
LabelEncoder
can not only transorm
your original target into numerical values but also it can inverse_transform
numerical values into your original target values.
So your code should look like this:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit_transform(y_train)
# fit
from xgboost import XGBClassifier
xgb = XGBClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.1,
subsample=0.8,
colsample_bytree=0.8,
eval_metric='mlogloss'
)
xgb.fit(X_train, y_train)
# prediction
pred = xgb.predict(X_test)
original_pred = le.inverse_transform(pred)
Official documentation here.