rmachine-learningregressionmodel-comparison

Is there a way in R to determine AIC from cv.glmnet?


I am using the glmnet package in R, and not(!) the caret package for my binary ElasticNet regression. I have come to the point where I would like to compare models (e.g. lambda set to lambda.1se or lambda.min, and models where k-fold is set to 5 or 10). But, I have not yet achieved to compute the AICc or BIC for my models. How do I do that? I tried this and this but it did not work for me, I only get an empty list. Code:

set.seed(123)
foldid <- sample(rep(seq(10), length.out = nrow(x.train)))
list.of.fits.df <- list()
for (i in 0:10){
     fit.name <- paste0("alpha", i/10) 
     list.of.fits.df[[fit.name]] <- cv.glmnet(x.train, y.train, type.measure = c("auc"), alpha = i/10, family = "binomial", nfolds = 10, foldid = foldid, parallel = TRUE)
 }
best.fit <- coef(list.of.fits.df[[fit.name]], s = list.of.fits.df[[fit.name]]$lambda.1se)
best.fit.min <- coef(list.of.fits.df[[fit.name]], s = list.of.fits.df[[fit.name]]$lambda.min)

#AICc & BIC
#???

How can I find the AICc and BIC for my best fit model?


Solution

  • You can alter the solution given in this answer slightly to obtain the desired result The reason it doesn't work "out of the box" is that the cv.glmnet function returns the result of several fits, but the individual results are stored in x$glmnet.fit, and we can use this to create a simple function for calculating AICc and BIC.

    glmnet_cv_aicc <- function(fit, lambda = 'lambda.1se'){
      whlm <- which(fit$lambda == fit[[lambda]])
      with(fit$glmnet.fit,
           {
             tLL <- nulldev - nulldev * (1 - dev.ratio)[whlm]
             k <- df[whlm]
             n <- nobs
             return(list('AICc' = - tLL + 2 * k + 2 * k * (k + 1) / (n - k - 1),
                         'BIC' = log(n) * k - tLL))
           })
    }
    

    All we'll then have to do is provide the model and get our estimated AICc.

    best.aicc <- glmnet_cv_aicc(list.of.fits.df[[fit.name]])
    best.aicc.min <- glmnet_cv_aicc(list.of.fits.df[[fit.name]], 'lambda.min')
    

    For a reproducible example, one could use one of the many examples provided in help(glmnet)

    n = 500
    p = 30
    nzc = trunc(p/10)
    x = matrix(rnorm(n * p), n, p)
    beta3 = matrix(rnorm(30), 10, 3)
    beta3 = rbind(beta3, matrix(0, p - 10, 3))
    f3 = x %*% beta3
    p3 = exp(f3)
    p3 = p3/apply(p3, 1, sum)
    g3 = glmnet:::rmult(p3)
    set.seed(10101)
    cvfit = cv.glmnet(x, g3, family = "multinomial")
    print(glmnet_cv_aicc(cvfit))
    # Output
    #$AICc
    #[1] -556.2404
    #
    #$BIC
    #[1] -506.3058
    print(glmnet_cv_aicc(cvfit, 'lambda.min'))
    # Output
    #$AICc
    #[1] -601.0234
    #
    #$BIC
    #[1] -506.4068