The gbm
package in R has a function gbm.perf
to find the optimum number of trees for the model using different methods like "Out-of-Bag" or "Cross-Validation" error, which helps to avoid over-fitting.
Does Gradientboosting inScikit learn
library in python also have a similar function to find the optimum number of trees using the "out of bag" method ?
#r code
mod1 = gbm(var~.,data=dat, interaction.depth = 3)
best.iter = gbm.perf(mod1,method="OOB")
scores = mean(predict(mod1,x,best.iter))
#python code
modl = GradientBoostingRegressor(max_depth= 3)
modl.fit(x,y)
scores = np.mean(modl.predict(dat))
Yes,gbm
in scikit learn
also have a method to find the best iterations using the oob
just like in R
. can refer to the below link
"in order to use oob_improvement_ in gdm the subsample should be less than 0.5"
# Fit regressor with out-of-bag estimates
params = {
"n_estimators": 1200,
"max_depth": 3,
"subsample": 0.5
}
modl = ensemble.GradientBoostingRegressor(**params)
n_estimators = params["n_estimators"]
z=np.arange(n_estimators)+1
# negative cumulative sum of oob improvements
cumsum = -np.cumsum(modl.oob_improvement_)
# min loss according to OOB
oob_best_iter = z[np.argmin(cumsum)]
print(oob_best_iter)
modl= GradientBoostingRegressor(max_depth=3
,subsample=0.5,n_estimators=oob_best_iter)
modl.fit(x,y)