rglmnet

Lasso regressions with GLMNET function in R


I had a clarification question regarding the documentation for the glmnet package and corresponding function. At the bottom of page 7 of the glmnet package vignette, the authors write, "This {the output of glmnet()} displays the call that produced the object fit and a three-column matrix with columns Df (the number of nonzero coefficients), %dev (the percent deviance explained) and Lambda (the corresponding value of λ)." However, I find that Df does not actually represent the number of nonzero coefficients (or- equally likely- I am missing something).

Consider the following example. I run a lasso regression using glmnet(), extract the lambda for the model with Df = 15, and then extract the coefficients, only to find that there are 19 nonzero coefficients (+ an intercept). Any ideas on what's happening here? Any insight would be greatly appreciated.

# packages
library(glmnet)
library(tidyverse)

# generate random data
set.seed(100)
inputs <- matrix(runif(n = 10000, min = 1, max = 20), nrow = 100)
response <- runif(n = 100, min = 1, max = 20)

# run linear lasso regression
lasso_result <- glmnet(inputs,
                       response,
                       family = "gaussian",
                       nlambda = 200) 

# select model with 15 nonzero coefficients
model1 <- print(lasso_result) %>% filter(Df == 15)

# extract coefficients from model
model1_coef <- coef(lasso_result, s = model1$Lambda)

# remove coefficients shrunk to zero(.)
# length should be 16 (15 nonzero coefficients + Intercept)
length(model1_coef[model1_coef[, 1] != 0,])

Solution

  • Per Trevor Hastie, the maintainer of glmnet:

    The printout rounds the lambda: model1 <- print(lasso_result). Instead, access the precision lambda directly: lasso_result$lambda[11] to select the right model.