pythonmachine-learningxgboostvalueerror

ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2], got ['Dropout' 'Enrolled' 'Graduate']


I'm working on a classification task using the XGBoost classifier model. My dataset contains categorical variables, my target classes ('Dropout', 'Enrolled', 'Graduate').

from xgboost import XGBClassifier

xgb = XGBClassifier(
    n_estimators=200,
    max_depth=6,  
    learning_rate=0.1,  
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric='mlogloss'  
)

xgb.fit(X_train, y_train)

I get the following error

ValueError: Invalid classes inferred from unique values of `y`. Expected: [0 1 2], 
got ['Dropout' 'Enrolled' 'Graduate']

After that, I'm using label encoder techniques; it works fine. But I need ['Dropout' 'Enrolled' 'Graduate'] this categorical in the production section. How I can change this [0 1 2] to ['Dropout' 'Enrolled' 'Graduate'] after the train XGBClassifier.


Solution

  • LabelEncoder can not only transorm your original target into numerical values but also it can inverse_transform numerical values into your original target values.

    So your code should look like this:

    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    le.fit_transform(y_train)
    
    
    # fit
    from xgboost import XGBClassifier
    
    xgb = XGBClassifier(
        n_estimators=200,
        max_depth=6,  
        learning_rate=0.1,  
        subsample=0.8,
        colsample_bytree=0.8,
        eval_metric='mlogloss'  
    )
    
    xgb.fit(X_train, y_train)
    
    
    
    # prediction
    pred = xgb.predict(X_test)
    original_pred = le.inverse_transform(pred)
    

    Official documentation here.