I try to build multiclass classification model in Python using XGBoost OvR (OneVsRest) like below:
from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import roc_auc_score
X_train, X_test, y_train, y_test = train_test_split(abt.drop("TARGET", axis=1)
, abt["TARGET"]
, train_size = 0.70
, test_size=0.30
, random_state=123
, stratify = abt["TARGET"])
model_1 = OneVsRestClassifier(XGBClassifier())
When I used above code I have HUGE overfitting: AUC_TRAIN: 0.9988, AUC_TEST: 0.7650
Si, I decided to use: class_weight.compute_class_weight
from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight('balanced',
np.unique(y_train),
y_train)
model_1.fit(X_train, y_train, class_weight=class_weights)
metrics.roc_auc_score(y_train, model_loop_a.predict_proba(X_train), multi_class='ovr')
metrics.roc_auc_score(y_test, model_loop_a.predict_proba(X_test), multi_class='ovr')
Nevertheless, when I try to use class_weight.compute_class_weight
like above, I have the following error: TypeError: fit() got an unexpected keyword argument 'class_weight'
How can i fix that, or maybe you have some other idea how to avoid such HUGE overfitting on my multiclass classification model in Python ?
The issue in your case seems to be that the OneVsRestClassifier
object does not support the class_weight
parameter as base estimator see doc
A way around this would be to use the "balanced" parameter (as a float = 1) in the XGBClassifier
definition (this will automatically adjust the weights of each class based on their frequency in the training set).
model_1 = OneVsRestClassifier(XGBClassifier(scale_pos_weight=1))
This will force the balancing of positive and negative weights.
scale_pos_weight (Optional[float]) – Balancing of positive and negative weights.
See also the doc: https://xgboost.readthedocs.io/en/stable/python/python_api.html