rmachine-learningregressionfeature-selectionlasso-regression

How to get just the variables selected by LASSO Regressions ran using glmnet on each of k datasets


I am running 40 LASSO Regressions sequentially on each of the 40 datasets in a list object called 'datasets' in R:

datasets <- lapply(filepaths_list, fread)
# change column names of all the columns in the dataframe 'datasets'
datasets <- lapply(datasets, function(dataset_i) { 
  colnames(dataset_i) <- c("Y", "X1", "X2", "X3", "X4", "X5", "X6", "X7", 
                           "X8", "X9", "X10", "X11", "X12", "X13", "X14", 
                           "X15", "X16", "X17", "X18", "X19", "X20", "X21", 
                           "X22", "X23", "X24", "X25", "X26", "X27", "X28", 
                           "X29", "X30")
  dataset_i }))

... And I have just ran those LASSOs using the following:

# This function fits all 40 LASSO regressions for/on
# each of the corresponding 40 datasets stored in the object
# of that name, then outputs standard regression results which 
# are typically called returned for any regression ran using R
set.seed(11)     # to ensure replicability
LASSO.fits <- lapply(datasets, function(i) 
               glmnet(x = as.matrix(select(i, starts_with("X"))), 
                      y = i$Y, alpha = 0)))

However, neither of my following attempts returns just the coefficients:

LASSO.Coeffs <- lapply(LASSO.fits, coef.glmnet)

LASSO.Coeffs2 <- lapply(LASSO.fits, 
                        function(i) predict(i, s = 0.1, type = "coefficients"))

They both return lists. But what I need here is a list of 40 elements, each of which just contains the coefficient names and their estimates, so that from there I can finish up by executing the following line of code:

IVs_Selected_by_LASSO <- lapply(LASSO.Coeffs, function(i) names(i[i > 0]))

Solution

  • Try

    lapply(
      LASSO.fits,
      function(x){
        t(data.matrix(predict(x,s=0.1,type="coefficients")))
      }
    )
    

    should give you the coefficients in vector form.