rmachine-learningreplicationfeature-selectionlasso-regression

How to replicate my results from running n LASSOs iteratively using elastic net but now with glmnet


I have successfully run LASSO Regressions on each of n data sets within the 'datasets' list in my R Environment/Workspace using the enet function from the elastic net library in R, but just to ensure there is nothing arbitrary about my results based on how I wrote my code, I want to repeat all of this using the glmnet function from the library of the same name.

My existing code is shown below:

set.seed(100)     # to ensure replicability
L_fits <- lapply(data, function(i) 
               enet(x = as.matrix(select(i, starts_with("X"))), 
                    y = i$Y, lambda = 0, normalize = FALSE))

# This stores and prints out all of the regression 
# equation specifications selected by LASSO when called
L_Coeffs <- lapply(L_fits, 
                       function(j) predict(j, x = as.matrix(select(j, starts_with("X"))), 
                                           s = 0.1, mode = "fraction", 
                                           type = "coefficients")[["coefficients"]])

### Write my own custom function which will separate out and return a 
### new list containing just the Independent Variables/Factors/Predictors
### which are 'selected' or chosen for each individual dataset. 
LASSOs_Selections <- lapply(L_Coeffs, function(k) names(k[k > 0]))

I have already figured it out this far, but I can't figure out how to capture just the coefficients, then just the variables selected by LASSO:

set.seed(100)     # to ensure replicability
L_fits <- lapply(data, function(i) 
               glmnet(x = as.matrix(select(i, starts_with("X"))), 
                      y = i$Y, alpha = 0))

7

Solution

  • one approach:

    library(glmnet)
    library(dplyr) ## for convenience filtering
    
    ## load glmnet sample data:
    data(QuickStartExample)
    
    ## create two sample datasets
    ## lists with components x (20 x 10 matrix) and y:
    ex1 = ex2 = QuickStartExample
    ex2$x = ex2$x * (1 + .05 * rnorm(200))
    
    L_fits = list(ex1, ex2) |>
      Map(f = \(ex) glmnet(x = ex$x, y = ex$y, alpha = 1))
    

    |> is R's native pipe operator, and \(x) x is shorthand for function(x) x

    L_coefs = L_fits |> 
      Map(f = \(model) coef(model, s = .1))
    

    note the specification of lambda: s = .1: otherwise, a matrix of coefficients for lambda = 1 : 10 * .1 will be returned

    L_coefs |>
      Map(f = \(matr) matr |> as.matrix() |> 
                      as.data.frame() |>
                      filter(s1 != 0)
          )
    

    edit

    to return only the variables retained by glmnet::glmnet, you can e.g. keep only the filtered dataframes' rownames (which correspond to the variable names):

    L_coefs |>
      Map(f = \(matr) matr |> as.matrix() |> 
                      as.data.frame() |>
                      filter(s1 != 0) |> 
                      rownames()
          )