I have successfully run LASSO Regressions on each of n data sets within the 'datasets' list in my R Environment/Workspace using the enet function from the elastic net library in R, but just to ensure there is nothing arbitrary about my results based on how I wrote my code, I want to repeat all of this using the glmnet function from the library of the same name.
My existing code is shown below:
set.seed(100) # to ensure replicability
L_fits <- lapply(data, function(i)
enet(x = as.matrix(select(i, starts_with("X"))),
y = i$Y, lambda = 0, normalize = FALSE))
# This stores and prints out all of the regression
# equation specifications selected by LASSO when called
L_Coeffs <- lapply(L_fits,
function(j) predict(j, x = as.matrix(select(j, starts_with("X"))),
s = 0.1, mode = "fraction",
type = "coefficients")[["coefficients"]])
### Write my own custom function which will separate out and return a
### new list containing just the Independent Variables/Factors/Predictors
### which are 'selected' or chosen for each individual dataset.
LASSOs_Selections <- lapply(L_Coeffs, function(k) names(k[k > 0]))
I have already figured it out this far, but I can't figure out how to capture just the coefficients, then just the variables selected by LASSO:
set.seed(100) # to ensure replicability
L_fits <- lapply(data, function(i)
glmnet(x = as.matrix(select(i, starts_with("X"))),
y = i$Y, alpha = 0))
7
one approach:
library(glmnet)
library(dplyr) ## for convenience filtering
## load glmnet sample data:
data(QuickStartExample)
## create two sample datasets
## lists with components x (20 x 10 matrix) and y:
ex1 = ex2 = QuickStartExample
ex2$x = ex2$x * (1 + .05 * rnorm(200))
glmnet
models for example datasets:L_fits = list(ex1, ex2) |>
Map(f = \(ex) glmnet(x = ex$x, y = ex$y, alpha = 1))
|>
is R's native pipe operator, and \(x) x
is shorthand for function(x) x
L_coefs = L_fits |>
Map(f = \(model) coef(model, s = .1))
note the specification of lambda: s = .1: otherwise, a matrix of coefficients for lambda = 1 : 10 * .1 will be returned
L_coefs |>
Map(f = \(matr) matr |> as.matrix() |>
as.data.frame() |>
filter(s1 != 0)
)
edit
to return only the variables retained by glmnet::glmnet
, you can e.g. keep only the filtered dataframes' rownames (which correspond to the variable names):
L_coefs |>
Map(f = \(matr) matr |> as.matrix() |>
as.data.frame() |>
filter(s1 != 0) |>
rownames()
)