rtidymodelsr-recipesr-parsnip

Same Recipe and Model with Different Outcomes


I have a dataset with multiple columns for the outcome variables that I would like to predict with the same preprocessing steps and models. Is there a way to run the same recipe and models (with tuning - I'm using workflow_map()) on multiple outcome variables (separate models for each outcome)?

Essentially, I want loop through the same preprocessing steps and models for each outcome. Basically I want to avoid having to do this:

model_recipe1 <- recipe(outcome_1 ~ ., data) %>%
                 step_1

model_recipe2 <- recipe(outcome_2 ~ ., data) %>%
                 step_1

model_recipe3 <- recipe(outcome_3 ~ ., data) %>%
                 step_1


and would instead like to do something like this:

model_recipe <- recipe(outcome[i] ~ ., data) %>%
                 step_1

Solution

  • I'm not sure if we 100% recommend the approach you are trying, but it will work in some circumstances:

    library(tidymodels)
    
    folds <- bootstraps(mtcars, times = 5)
    wf_set <- workflow_set(list(mpg ~ ., wt ~ ., disp ~ .), list(linear_reg()))
    workflow_map(wf_set, "fit_resamples", resamples = folds)
    #> # A workflow set/tibble: 3 × 4
    #>   wflow_id             info             option    result   
    #>   <chr>                <list>           <list>    <list>   
    #> 1 formula_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    #> 2 formula_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    #> 3 formula_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    

    Created on 2022-08-04 by the reprex package (v2.0.1)

    To make many recipes in an iterative fashion, you'll need a bit of metaprogramming such as with rlang. You can write a function to take (in this case) a string and create a recipe:

    library(rlang)
    
    my_recipe <- function(outcome) {
      form <- new_formula(ensym(outcome), expr(.))
      recipe(form, data = mtcars) %>%
        step_normalize(all_numeric_predictors())
    }
    

    And then you can use this function with purrr::map() across your outcomes:

    library(tidymodels)
    library(rlang)
    
    folds <- bootstraps(mtcars, times = 5)
    
    wf_set <- workflow_set(
      map(c("mpg", "wt", "disp"), my_recipe), 
      list(linear_reg())
      )
    
    workflow_map(wf_set, "fit_resamples", resamples = folds)
    #> # A workflow set/tibble: 3 × 4
    #>   wflow_id            info             option    result   
    #>   <chr>               <list>           <list>    <list>   
    #> 1 recipe_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    #> 2 recipe_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    #> 3 recipe_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
    

    Created on 2022-08-04 by the reprex package (v2.0.1)