rtidymodels

workflow_set with A dplyr-style selector to choose the outcome and predictors R


I'm looking to use workflow_set to iterate through different variables in my model creation.

in https://www.tmwr.org/workflow-sets it states that

There are three possible kinds of preprocessors:

but it only provides an example of a standard R formula. I would prefer to work in the Dplyr style so if I make my own functions I can prescribe the outcome and predictors easily in the future. I have looked around and can't find any examples of Dplyr style.

but i have found that recipie%>%updateroal%>%updateroal works but its very long winded. (exsample 2)

my third way add_variable doesn't work, unfortunately.

Does anyone know how to write the dplyr style for workflow_set in a better way than I have found?

example 1 R formula style

 set.seed(123)
  data <- data.frame(
    x1 = rnorm(100),
    x2 = rnorm(100),
    x3 = rnorm(100),
    x4 = rnorm(100),
    y = rnorm(100)
  )
  
  variables<-list(
    first = y ~ x1,
    second = y ~ x2,
    third = y ~ x1+x2,
    fourth = y ~ x3+x4
  )
  
  
  lm_model <- 
    linear_reg() %>% 
    set_engine("lm")
  
  location_models <- workflow_set(preproc = variables, models = list(lm = lm_model))
  location_models
  location_models$fit[[4]]
  extract_workflow(location_models, id = "third_lm")
  
  location_models <-
    location_models %>%
  mutate(fit = map(info, \(x) fit(x$workflow[[1]], data)))
  location_models$fit[[4]]

example 2 update role

variables<-list(
    fist = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( x1, new_role = "predictor"),
    second = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( x2, new_role = "predictor"),
    third = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( c(x1,x2), new_role = "predictor"),
    forth = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( c(x3,x4), new_role = "predictor")
  )

example 3 add variable (dosnt work)

variables<-list(
  fist = add_variables(outcomes = y, predictors = x1),
  second = add_variables(outcomes = y, predictors = x2),
  third = add_variables(outcomes = y, predictors = c(x1,x2)),
  forth = add_variables(outcomes = y, predictors = c(x3,x4))
  )

Solution

  • The help page at ?workflow_sets has the answer. The preprocessor list can contain:

    You can use

    variables <- list(
      fist   = workflow_variables(outcomes = y, predictors = x1),
      second = workflow_variables(outcomes = y, predictors = x2),
      third  = workflow_variables(outcomes = y, predictors = c(x1, x2)),
      forth  = workflow_variables(outcomes = y, predictors = c(x3, x4))
    )