I'm looking to use workflow_set to iterate through different variables in my model creation.
in https://www.tmwr.org/workflow-sets it states that
There are three possible kinds of preprocessors:
but it only provides an example of a standard R formula. I would prefer to work in the Dplyr style so if I make my own functions I can prescribe the outcome and predictors easily in the future. I have looked around and can't find any examples of Dplyr style.
but i have found that recipie%>%updateroal%>%updateroal works but its very long winded. (exsample 2)
my third way add_variable doesn't work, unfortunately.
Does anyone know how to write the dplyr style for workflow_set in a better way than I have found?
example 1 R formula style
set.seed(123)
data <- data.frame(
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
y = rnorm(100)
)
variables<-list(
first = y ~ x1,
second = y ~ x2,
third = y ~ x1+x2,
fourth = y ~ x3+x4
)
lm_model <-
linear_reg() %>%
set_engine("lm")
location_models <- workflow_set(preproc = variables, models = list(lm = lm_model))
location_models
location_models$fit[[4]]
extract_workflow(location_models, id = "third_lm")
location_models <-
location_models %>%
mutate(fit = map(info, \(x) fit(x$workflow[[1]], data)))
location_models$fit[[4]]
example 2 update role
variables<-list(
fist = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( x1, new_role = "predictor"),
second = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( x2, new_role = "predictor"),
third = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( c(x1,x2), new_role = "predictor"),
forth = recipe(data)%>%update_role( y, new_role = "outcome")%>%update_role( c(x3,x4), new_role = "predictor")
)
example 3 add variable (dosnt work)
variables<-list(
fist = add_variables(outcomes = y, predictors = x1),
second = add_variables(outcomes = y, predictors = x2),
third = add_variables(outcomes = y, predictors = c(x1,x2)),
forth = add_variables(outcomes = y, predictors = c(x3,x4))
)
The help page at ?workflow_sets
has the answer. The preprocessor list can contain:
recipes::recipe()
.workflows::workflow_variables()
.You can use
variables <- list(
fist = workflow_variables(outcomes = y, predictors = x1),
second = workflow_variables(outcomes = y, predictors = x2),
third = workflow_variables(outcomes = y, predictors = c(x1, x2)),
forth = workflow_variables(outcomes = y, predictors = c(x3, x4))
)