[SOLVED] XGBoost does not predict properly on input that's equal to traning data

XGBoost does not predict properly on input that's equal to traning data

Why this quite simple example of XGBoost ML produces all-nulls even on input, that's equivalent to training data? This looks like a trivial case of input which should not require any fine tuning of ML, but even if I tweak hyperparams for ML (max_depth, eta and so on) nothing changes.

import pandas as pd
import xgboost as xgb

X = pd.DataFrame(([[0], [1], [2], [3], [4], [5]]), columns=['x'])
y = pd.DataFrame([0, 1, 0, 1, 0, 1], columns=['y'])

model = xgb.XGBClassifier()
model.fit(X, y)
print(model.predict([[0], [1], [2], [3], [4], [5]]))

[0 0 0 0 0 0]

Solution

There is nothing wrong with the code or python package.

Based on the comment below this post, indeed, XGBoost’s default regularization is more aggressive. In a small, non-monotonic dataset like yours, this regularization can prevent the model from making any splits, resulting in a constant prediction (all 0s). Tuning or reducing these regularization parameters gives XGBoost more flexibility to overfit—allowing it to capture the alternating pattern.

I just tried GradientBoostingClassifier and the classifier works as intended.