I had a clarification question regarding the documentation for the glmnet package and corresponding function. At the bottom of page 7 of the glmnet package vignette, the authors write, "This {the output of glmnet()} displays the call that produced the object fit and a three-column matrix with columns Df (the number of nonzero coefficients), %dev (the percent deviance explained) and Lambda (the corresponding value of λ)." However, I find that Df does not actually represent the number of nonzero coefficients (or- equally likely- I am missing something).
Consider the following example. I run a lasso regression using glmnet(), extract the lambda for the model with Df = 15, and then extract the coefficients, only to find that there are 19 nonzero coefficients (+ an intercept). Any ideas on what's happening here? Any insight would be greatly appreciated.
# packages
library(glmnet)
library(tidyverse)
# generate random data
set.seed(100)
inputs <- matrix(runif(n = 10000, min = 1, max = 20), nrow = 100)
response <- runif(n = 100, min = 1, max = 20)
# run linear lasso regression
lasso_result <- glmnet(inputs,
response,
family = "gaussian",
nlambda = 200)
# select model with 15 nonzero coefficients
model1 <- print(lasso_result) %>% filter(Df == 15)
# extract coefficients from model
model1_coef <- coef(lasso_result, s = model1$Lambda)
# remove coefficients shrunk to zero(.)
# length should be 16 (15 nonzero coefficients + Intercept)
length(model1_coef[model1_coef[, 1] != 0,])
Per Trevor Hastie, the maintainer of glmnet:
The printout rounds the lambda: model1 <- print(lasso_result)
.
Instead, access the precision lambda directly: lasso_result$lambda[11]
to select the right model.