rglmnetr-parsnip

Simple glmnet model, predict() results in 'Error in lambda[1] - s : non-numeric argument to binary operator'


So I've been trying to use predict() with various forms of dataframe formats, but they don't seem to work. I've tried 1) excluding the dependent variable, 2) including the dependent variable with sliced data, 3) including dependent variable with NA values in it, and many other things.

R 4.1.0
R Studio 1.4.1717

The code below demonstrates 3).

library(tidyverse)
library(lubridate)
library(tidymodels)

df <- data.frame(y  = sample(5000000:120000000, 100, replace = TRUE),
                 yearr = sample(2015:2021, 100, replace = TRUE),
                 monthh = sample(1:12, 100, replace = TRUE),
                 dayy = sample(1:31, 100, replace = TRUE))

rm(df_slice)
df_slice = df |>
  slice(1:50) |>
  select(yearr, monthh, dayy) |>
  mutate(y = NA)

m = linear_reg(mode = 'regression', penalty = varying(), mixture = 0.6) |>
  set_engine("glmnet") |>
  fit(y ~ ., data = df)

predict(m, df_slice)
predict.model_fit(m, df_slice)
predict_raw(m, df_slice)

The last three lines of code throw Error in lambda[1] - s : non-numeric argument to binary operator debug messages. I made sure that all of the variables are numeric in both df and df_slice but still unsure of what is going on. I just want to get the predicted/fitted values, as well as 'future' values if I were to do a train-test split. Why is this not working?


Solution

  • You are using a glmnet, and the penalty you are tuning is the L2 norm which is also known as lambda in glmnet, see the help page

    If you set penalty = varying() , you are running glmnet across a series of L2 norm, and when you call predict, you need to provide a value of lambda to predict. So with your example now, you should not use penalty = varying() but provide a value of lambda :

    library(tidyverse)
    library(lubridate)
    library(tidymodels)
    
    m = linear_reg(mode = 'regression', penalty = 1, mixture = 0.6) %>%
      set_engine("glmnet") %>%
      fit(y ~ ., data = df)
    
    predict(m, df_slice)
    

    Otherwise, you need to tune and find a suitable lambda, then pass this to refit the model:

    my_cv = vfold_cv(df)
    rec = recipe(y ~. ,data=df) %>% prep(training = df,retain=TRUE)
    fit = linear_reg(mode = 'regression', penalty = tune(), mixture = 0.6) %>%
      set_engine("glmnet") 
    
    wflow = workflow() %>%
    add_recipe(rec) %>%
    add_model(fit)
    
    res = wflow %>% tune_grid(my_cv)
    
    best_params = res %>% select_best(metric = "rmse")
    
    m = wflow %>%
      finalize_workflow(best_params) %>%
      fit(data = df)
    
    predict(m,df_slice)