I am using LASSO (from package glmnet) to select variables. I have fitted a glmnet
model and plotted coefficients against lambda's.
library(glmnet)
set.seed(47)
x = matrix(rnorm(100 * 3), 100, 3)
y = rnorm(100)
fit = glmnet(x, y)
plot(fit, xvar = "lambda", label = TRUE)
Now I want to get the order in which coefficients become 0. In other words, at what lambda does each coefficient become 0?
I don't find a function in glmnet to extract such result. How can I get it?
Function glmnetPath
in my initial answer is now in an R package called solzy.
## you may need to first install package "remotes" from CRAN
remotes::install_github("ZheyuanLi/solzy")
## Zheyuan Li's R functions on Stack Overflow
library(solzy)
## use function `glmnetPath` for your example
glmnetPath(fit)
#$enter
# i j ord var lambda
#1 3 2 1 V3 0.15604809
#2 2 19 2 V2 0.03209148
#3 1 24 3 V1 0.02015439
#
#$leave
# i j ord var lambda
#1 1 23 1 V1 0.02211941
#2 2 18 2 V2 0.03522036
#3 3 1 3 V3 0.17126258
#
#$ignored
#[1] i var
#<0 rows> (or 0-length row.names)
Interpretation of enter
As lambda
decreases, variables (see i
for numeric ID and var
for variable names) enter the model in turn (see ord
for the order). The corresponding lambda
for the event is fit$lambda[j]
.
variable 3 enters the model at lambda = 0.15604809, the 2nd value in fit$lambda
;
variable 2 enters the model at lambda = 0.03209148, the 19th value in fit$lambda
;
variable 1 enters the model at lambda = 0.02015439, the 24th value in fit$lambda
.
Interpretation of leave
As lambda
increases, variables (see i
for numeric ID and var
for variable names) leave the model in turn (see ord
for the order). The corresponding lambda
for the event is fit$lambda[j]
.
variable 1 leaves the model at lambda = 0.02211941, the 23rd value in fit$lambda
;
variable 2 leaves the model at lambda = 0.03522036, the 18th value in fit$lambda
;
variable 3 leaves the model at lambda = 0.17126258, the 1st value in fit$lambda
.
Interpretation of ignored
If not an empty data.frame, it lists variables that never enter the model. That is, they are effectively ignored. (Yes, this can happen!)
Note: fit$lambda
is decreasing, so j
is in ascending order in enter
but in descending order in leave
.
To further explain indices i
and j
, take variable 2 as an example. It leaves the model (i.e., its coefficient becomes 0) at j
= 18 and enters the model (i.e., its coefficient becomes non-zero) at j
= 19. You can verify this:
fit$beta[2, 1:18]
## all zeros
fit$beta[2, 19:ncol(fit$beta)]
## all non-zeros
See Obtain variable selection order from glmnet for a more complicated example.