pythonoptimizationxgboosthyperparameters

changing early_stopping_rounds on xgboost doesn't effect the performance. what's wrong?


I have a binary classification dataset. I use xgboost. I changed early_stopping_rounds value and fit. it gave same results every time. I shared screenshots below. what is the reason of same results?

early_stopping_rounds=10 early_stopping_rounds=10

early_stopping_rounds=16 early_stopping_rounds=16

early_stopping_rounds=30 early_stopping_rounds=30

and lastly epoch eval_metric plots: enter image description here


Solution

  • Early stopping works by ending training with fewer trees than n_estimators when and if the last early_stopping_rounds many trees added have not improved the performance on the evaluation set.

    When there are multiple evaluation sets, the last one is the one used for early stopping, and when there are multiple evaluation metrics, again the last one is used. So here we need to pay attention to the AUC Test curve in your plots. Since it continues to increase all the way to 100 (the default for n_estimators), early stopping never kicks in, no matter how many iterations you set for it to wait for improvement.

    Also, from the docs:

    Note that xgboost.train() will return a model from the last iteration, not the best one.

    That's from the native API documentation, and I'm not certain if that note applies equally to the scikit-learn API.