rseedglmnet

Does set.seed() inside a loop have any effect if foldid is fixed in cv.glmnet()?


I'm working with survival models (family = "cox") using the cv.glmnet() function from the glmnet package in R, and I’m trying to understand how set.seed() interacts with the foldid argument. Maybe it's an easy question but I'm pretty confused about the obtained results.

In the code below, I generate a fixed foldid before entering a loop that runs cross-validation multiple times with different seeds:

set.seed(849)
foldid <- sample(1:10, size = nrow(x), replace = TRUE)

n <- 100
lambdas <- NULL

for (i in 1:n) {
  set.seed(i)
  fit <- cv.glmnet(x, y, family = "cox", alpha = best_alpha, foldid = foldid)
  errors <- data.frame(lambda = fit$lambda, cvm = fit$cvm)
  lambdas <- rbind(lambdas, errors)
}

My goal was to observe how lambda.min varies across different cross-validation splits. But now I’m not sure since foldid is already fixed, cv.glmnet() does not re-sample the folds. Am I just fitting the same model 100 times?

I have tried once I obtained minimum different results which did not bring light and confused me a little bit more.


Solution

  • Yes, this is the only relevant piece of code from glmnet::cv.glmnet:

    if (is.null(foldid)) 
       foldid = sample(rep(seq(nfolds), length = N))
    

    So, if you provide foldid then no further resampling occurs. You are fitting to the same folds each time.