pythonmachine-learningxgboost

XGBoost does not predict properly on input that's equal to traning data


Why this quite simple example of XGBoost ML produces all-nulls even on input, that's equivalent to training data? This looks like a trivial case of input which should not require any fine tuning of ML, but even if I tweak hyperparams for ML (max_depth, eta and so on) nothing changes.

import pandas as pd
import xgboost as xgb

X = pd.DataFrame(([[0], [1], [2], [3], [4], [5]]), columns=['x'])
y = pd.DataFrame([0, 1, 0, 1, 0, 1], columns=['y'])

model = xgb.XGBClassifier()
model.fit(X, y)
print(model.predict([[0], [1], [2], [3], [4], [5]]))

[0 0 0 0 0 0]

Solution

  • There is nothing wrong with the code or python package.

    Based on the comment below this post, indeed, XGBoost’s default regularization is more aggressive. In a small, non-monotonic dataset like yours, this regularization can prevent the model from making any splits, resulting in a constant prediction (all 0s). Tuning or reducing these regularization parameters gives XGBoost more flexibility to overfit—allowing it to capture the alternating pattern.

    I just tried GradientBoostingClassifier and the classifier works as intended.