rtidymodels

Tuning ridge regression with tidymodel using nested resampling


I want to tune a ridge regression using tidymodels. I have looked at this nested sampling tutorial, but not sure how to increase the tuning from one to two hyperparameters. Please see example below:

Example data:

library("mlbench")
sim_data <- function(n) {
  tmp <- mlbench.friedman1(n, sd = 1)
  tmp <- cbind(tmp$x, tmp$y)
  tmp <- as.data.frame(tmp)
  names(tmp)[ncol(tmp)] <- "y"
  tmp
}
set.seed(9815)
train_dat <- sim_data(50)

Setting inner and outer folds:

library(tidymodels)
results_nested_resampling <- rsample::nested_cv(train_dat,
                                                outside = vfold_cv(v=10, repeats = 1),
                                                inside = vfold_cv(v=10, repeats = 1))

Function to fit the model and compute the RMSE works:

svm_rmse <- function(object, penalty = 1, mixture = 1) {
  y_col <- ncol(object$data)

  mod <-
    parsnip::linear_reg(penalty = penalty, mixture = mixture) %>% # tune() uses the grid
    parsnip::set_engine("glmnet") %>% 
    fit(y ~ ., data = analysis(object))
    
  holdout_pred <-
    predict(mod, assessment(object) %>% dplyr::select(-y)) %>%
    bind_cols(assessment(object) %>% dplyr::select(y))
  rmse(holdout_pred, truth = y, estimate = .pred)$.estimate
}

# In some case, we want to parameterize the function over the tuning parameter:
rmse_wrapper <- function(penalty, mixture, object) svm_rmse(object, penalty, mixture)

# testing rmse_wrapper
rmse_wrapper(penalty=0.1, mixture=0.1, object=results_nested_resampling$inner_resamples[[5]]$splits[[1]])

But function to tune over the two hyperparameters does not work:

tune_over_cost <- function(object) {
  
  glmn_grid <- base::expand.grid(
    penalty = 10^seq(-3, -1, length = 20),
    mixture = (0:5) / 5)
  
  
  df3_glmn_grid %>%
    mutate(RMSE = map_dbl(glmn_grid$penalty, glmn_grid$mixture, rmse_wrapper,  object = object))
}

tune_over_cost(object=results_nested_resampling$inner_resamples[[5]])

Solution

  • Try using map2_dbl instead of map_dbl.

    That is, change this line of code: mutate(RMSE = map_dbl(glmn_grid$penalty, glmn_grid$mixture, rmse_wrapper, object = object))

    to this line: mutate(RMSE = map2_dbl(penalty, mixture, rmse_wrapper, object = object))