machine-learningscikit-learndecision-treeensemble-learningadaboost

Why does AdaBoost or GradientBoosting ensemble with a single estimator give different values than the single estimator?


I'm curious why a single-estimator Adaboost "ensemble", a single-estimator Gradient Boosted "ensemble" and a single decision tree give different values.

The code below compares three models, all using the same base estimator (regression tree with max_depth = 4 and loss based on mse.)

  1. The base estimate as a bare tree model
  2. A single-estimator Adaboost using the base estimator as a prototype
  3. A single-estimator GBR using the base estimator as a prototype

Extracting and inspecting the trees indicate they are very different, even though each should have been trained in the same fashion.

from sklearn.datasets import load_diabetes
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, export_text

data = load_diabetes()
X = data['data']
y = data['target']

simple_model = DecisionTreeRegressor(max_depth=4)
prototype = DecisionTreeRegressor(max_depth=4)
simple_ada = AdaBoostRegressor(prototype, n_estimators=1)
simple_gbr = GradientBoostingRegressor(max_depth=4, n_estimators=1, criterion='mse')

simple_model.fit(X, y)
simple_ada.fit(X, y)
simple_gbr.fit(X, y)

ada_one = simple_ada.estimators_[0]
gbr_one = simple_gbr.estimators_[0][0]

print(export_text(simple_model))
print(export_text(ada_one))
print(export_text(gbr_one))

Solution

  • AdaBoostRegressor performs weighted bootstrap sampling for each of its trees (unlike AdaBoostClassifier which IIRC just fits the base classifier using sample weights): source. So there's no way to enforce that a single-tree AdaBoost regressor matches a single decision tree (without, I suppose, doing the bootstrap sampling manually and fitting the single decision tree).


    GradientBoostingRegressor has an initial value for each sample to boost from:

    init : estimator or ‘zero’, default=None
    An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’squared_error’), or a quantile for the other losses.

    So the main difference between your tree and single-estimator-gbm is that the latter's leaf values are shifted by the average target value. Setting init='zero' gets us much closer, but I do see some differences in chosen splits further down the tree. That is due to ties in optimal split values, and can be fixed by setting a common random_state throughout.