I'm working with survival models (family = "cox") using the cv.glmnet()
function from the glmnet package in R, and I’m trying to understand how set.seed()
interacts with the foldid
argument. Maybe it's an easy question but I'm pretty confused about the obtained results.
In the code below, I generate a fixed foldid before entering a loop that runs cross-validation multiple times with different seeds:
set.seed(849)
foldid <- sample(1:10, size = nrow(x), replace = TRUE)
n <- 100
lambdas <- NULL
for (i in 1:n) {
set.seed(i)
fit <- cv.glmnet(x, y, family = "cox", alpha = best_alpha, foldid = foldid)
errors <- data.frame(lambda = fit$lambda, cvm = fit$cvm)
lambdas <- rbind(lambdas, errors)
}
My goal was to observe how lambda.min varies across different cross-validation splits. But now I’m not sure since foldid
is already fixed, cv.glmnet()
does not re-sample the folds. Am I just fitting the same model 100 times?
I have tried once I obtained minimum different results which did not bring light and confused me a little bit more.
Yes, this is the only relevant piece of code from glmnet::cv.glmnet
:
if (is.null(foldid))
foldid = sample(rep(seq(nfolds), length = N))
So, if you provide foldid
then no further resampling occurs. You are fitting to the same folds each time.