So I've been trying to use predict()
with various forms of dataframe formats, but they don't seem to work. I've tried 1) excluding the dependent variable, 2) including the dependent variable with sliced data, 3) including dependent variable with NA values in it, and many other things.
R 4.1.0
R Studio 1.4.1717
The code below demonstrates 3).
library(tidyverse)
library(lubridate)
library(tidymodels)
df <- data.frame(y = sample(5000000:120000000, 100, replace = TRUE),
yearr = sample(2015:2021, 100, replace = TRUE),
monthh = sample(1:12, 100, replace = TRUE),
dayy = sample(1:31, 100, replace = TRUE))
rm(df_slice)
df_slice = df |>
slice(1:50) |>
select(yearr, monthh, dayy) |>
mutate(y = NA)
m = linear_reg(mode = 'regression', penalty = varying(), mixture = 0.6) |>
set_engine("glmnet") |>
fit(y ~ ., data = df)
predict(m, df_slice)
predict.model_fit(m, df_slice)
predict_raw(m, df_slice)
The last three lines of code throw Error in lambda[1] - s : non-numeric argument to binary operator
debug messages. I made sure that all of the variables are numeric in both df
and df_slice
but still unsure of what is going on. I just want to get the predicted/fitted values, as well as 'future' values if I were to do a train-test split. Why is this not working?
You are using a glmnet
, and the penalty
you are tuning is the L2 norm which is also known as lambda
in glmnet, see the help page
If you set penalty = varying()
, you are running glmnet across a series of L2 norm, and when you call predict, you need to provide a value of lambda to predict. So with your example now, you should not use penalty = varying()
but provide a value of lambda
:
library(tidyverse)
library(lubridate)
library(tidymodels)
m = linear_reg(mode = 'regression', penalty = 1, mixture = 0.6) %>%
set_engine("glmnet") %>%
fit(y ~ ., data = df)
predict(m, df_slice)
Otherwise, you need to tune and find a suitable lambda
, then pass this to refit the model:
my_cv = vfold_cv(df)
rec = recipe(y ~. ,data=df) %>% prep(training = df,retain=TRUE)
fit = linear_reg(mode = 'regression', penalty = tune(), mixture = 0.6) %>%
set_engine("glmnet")
wflow = workflow() %>%
add_recipe(rec) %>%
add_model(fit)
res = wflow %>% tune_grid(my_cv)
best_params = res %>% select_best(metric = "rmse")
m = wflow %>%
finalize_workflow(best_params) %>%
fit(data = df)
predict(m,df_slice)