tidymodelsrecipe

How to Extract Preprocessed Variables in Recipes from Tidymodels


I’m using the recipes package from the tidymodels to preprocess my data. I’ve applied a series of preprocessing steps using recipe() to two different groups. I’m trying to compare which variables were preprocessed differently between two groups.

My Questions:

How can I extract the list of variables that were transformed in each step?

Thank you!

I stored the prep() results and tried to find the variables processed in each step, but it was not clear to me how the output is organized.


Solution

  • I believe the best way for you to get that info is to tidy() the prepped recipe:

    library(recipes)
    
    rec <- 
      recipe(mpg ~ ., mtcars) |> 
      step_normalize(all_numeric_predictors())
    
    ## before preprocessing is applied:
    tidy(rec, number = 1)
    #> # A tibble: 1 × 4
    #>   terms                    statistic value id             
    #>   <chr>                    <chr>     <dbl> <chr>          
    #> 1 all_numeric_predictors() <NA>         NA normalize_M0mBp
    

    Before you apply prep() the recipe does not store which columns were chosen (because they haven't been chosen yet) but after prep() the info is there:

    
    ## after preprocessing is applied:
    prep(rec) |> tidy(number = 1)
    #> # A tibble: 20 × 4
    #>    terms statistic   value id             
    #>    <chr> <chr>       <dbl> <chr>          
    #>  1 cyl   mean        6.19  normalize_M0mBp
    #>  2 disp  mean      231.    normalize_M0mBp
    #>  3 hp    mean      147.    normalize_M0mBp
    #>  4 drat  mean        3.60  normalize_M0mBp
    #>  5 wt    mean        3.22  normalize_M0mBp
    #>  6 qsec  mean       17.8   normalize_M0mBp
    #>  7 vs    mean        0.438 normalize_M0mBp
    #>  8 am    mean        0.406 normalize_M0mBp
    #>  9 gear  mean        3.69  normalize_M0mBp
    #> 10 carb  mean        2.81  normalize_M0mBp
    #> 11 cyl   sd          1.79  normalize_M0mBp
    #> 12 disp  sd        124.    normalize_M0mBp
    #> 13 hp    sd         68.6   normalize_M0mBp
    #> 14 drat  sd          0.535 normalize_M0mBp
    #> 15 wt    sd          0.978 normalize_M0mBp
    #> 16 qsec  sd          1.79  normalize_M0mBp
    #> 17 vs    sd          0.504 normalize_M0mBp
    #> 18 am    sd          0.499 normalize_M0mBp
    #> 19 gear  sd          0.738 normalize_M0mBp
    #> 20 carb  sd          1.62  normalize_M0mBp
    

    Created on 2024-05-26 with reprex v2.1.0