pythonscikit-learnhyperparametersgbm

How to find the optimum number of estimators using "OOB" method in sklearn boosting?


The gbm package in R has a function gbm.perf to find the optimum number of trees for the model using different methods like "Out-of-Bag" or "Cross-Validation" error, which helps to avoid over-fitting.

Does Gradientboosting inScikit learn library in python also have a similar function to find the optimum number of trees using the "out of bag" method ?

#r code

mod1 = gbm(var~.,data=dat, interaction.depth = 3)
best.iter = gbm.perf(mod1,method="OOB")
scores = mean(predict(mod1,x,best.iter))

#python code

modl = GradientBoostingRegressor(max_depth= 3)
modl.fit(x,y)
scores = np.mean(modl.predict(dat))

Solution

  • Yes,gbm in scikit learn also have a method to find the best iterations using the oob just like in R. can refer to the below link

    "in order to use oob_improvement_ in gdm the subsample should be less than 0.5"

    # Fit regressor with out-of-bag estimates
    params = {
    "n_estimators": 1200,
    "max_depth": 3,
    "subsample": 0.5
    }
    modl = ensemble.GradientBoostingRegressor(**params)
    n_estimators = params["n_estimators"]
    z=np.arange(n_estimators)+1
    # negative cumulative sum of oob improvements
    cumsum = -np.cumsum(modl.oob_improvement_)
    # min loss according to OOB
    oob_best_iter = z[np.argmin(cumsum)]
    print(oob_best_iter)
    modl= GradientBoostingRegressor(max_depth=3
    ,subsample=0.5,n_estimators=oob_best_iter)
    modl.fit(x,y)