rmachine-learninggradient-descentgbmboosting

R: implementing my own gradient boosting algorithm


I am trying to write my own gradient boosting algorithm. I understand there are existing packages like gbm and xgboost, but I wanted to understand how the algorithm works by writing my own.

I am using the iris data set, and my outcome is Sepal.Length (continuous). My loss function is mean(1/2*(y-yhat)^2) (basically the mean squared error with 1/2 in front), so my corresponding gradient is just the residual y - yhat. I'm initializing the predictions at 0.

library(rpart)
data(iris)

#Define gradient
grad.fun <- function(y, yhat) {return(y - yhat)}

mod <- list()

grad_boost <- function(data, learning.rate, M, grad.fun) {
  # Initialize fit to be 0
  fit <- rep(0, nrow(data))
  grad <- grad.fun(y = data$Sepal.Length, yhat = fit)

  # Initialize model
  mod[[1]] <- fit

  # Loop over a total of M iterations
  for(i in 1:M){

    # Fit base learner (tree) to the gradient
    tmp <- data$Sepal.Length
    data$Sepal.Length <- grad
    base_learner <- rpart(Sepal.Length ~ ., data = data, control = ("maxdepth = 2"))
    data$Sepal.Length <- tmp

    # Fitted values by fitting current model
    fit <- fit + learning.rate * as.vector(predict(base_learner, newdata = data))

    # Update gradient
    grad <- grad.fun(y = data$Sepal.Length, yhat = fit)

    # Store current model (index is i + 1 because i = 1 contain the initialized estiamtes)
    mod[[i + 1]] <- base_learner

  }
  return(mod)
}

With this, I split up the iris data set into a training and testing data set and fit my model to it.

train.dat <- iris[1:100, ]
test.dat <- iris[101:150, ]
learning.rate <- 0.001
M = 1000
my.model <- grad_boost(data = train.dat, learning.rate = learning.rate, M = M, grad.fun = grad.fun)

Now I calculate the predicted values from my.model. For my.model, the fitted values are 0 (vector of initial estimates) + learning.rate * predictions from tree 1 + learning rate * predictions from tree 2 + ... + learning.rate * predictions from tree M.

yhats.mymod <- apply(sapply(2:length(my.model), function(x) learning.rate * predict(my.model[[x]], newdata = test.dat)), 1, sum)

# Calculate RMSE
> sqrt(mean((test.dat$Sepal.Length - yhats.mymod)^2))
[1] 2.612972

I have a few questions

  1. Does my gradient boosting algorithm look right?
  2. Did I calculate the predicted values yhats.mymod correctly?

Solution

    1. Yes this looks correct. At each step you are fitting to the psuedo-residuals, which are computed as the derivative of loss with respect to the fit. You have correctly derived this gradient at the start of your question, and even bothered to get the factor of 2 right.
    2. This also looks correct. You are aggregating across the models, weighted by learning rate, just as you did during training.

    But to address something that was not asked, I noticed that your training setup has a few quirks.