I'm curious why a single-estimator Adaboost "ensemble", a single-estimator Gradient Boosted "ensemble" and a single decision tree give different values.
The code below compares three models, all using the same base estimator (regression tree with max_depth = 4 and loss based on mse.)
Extracting and inspecting the trees indicate they are very different, even though each should have been trained in the same fashion.
from sklearn.datasets import load_diabetes
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, export_text
data = load_diabetes()
X = data['data']
y = data['target']
simple_model = DecisionTreeRegressor(max_depth=4)
prototype = DecisionTreeRegressor(max_depth=4)
simple_ada = AdaBoostRegressor(prototype, n_estimators=1)
simple_gbr = GradientBoostingRegressor(max_depth=4, n_estimators=1, criterion='mse')
simple_model.fit(X, y)
simple_ada.fit(X, y)
simple_gbr.fit(X, y)
ada_one = simple_ada.estimators_[0]
gbr_one = simple_gbr.estimators_[0][0]
print(export_text(simple_model))
print(export_text(ada_one))
print(export_text(gbr_one))
AdaBoostRegressor
performs weighted bootstrap sampling for each of its trees (unlike AdaBoostClassifier
which IIRC just fits the base classifier using sample weights): source. So there's no way to enforce that a single-tree AdaBoost regressor matches a single decision tree (without, I suppose, doing the bootstrap sampling manually and fitting the single decision tree).
GradientBoostingRegressor
has an initial value for each sample to boost from:
init : estimator or ‘zero’, default=None
An estimator object that is used to compute the initial predictions.init
has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default aDummyEstimator
is used, predicting either the average target value (for loss=’squared_error’), or a quantile for the other losses.
So the main difference between your tree and single-estimator-gbm is that the latter's leaf values are shifted by the average target value. Setting init='zero'
gets us much closer, but I do see some differences in chosen splits further down the tree. That is due to ties in optimal split values, and can be fixed by setting a common random_state
throughout.