I’m using the recipes package from the tidymodels to preprocess my data. I’ve applied a series of preprocessing steps using recipe() to two different groups. I’m trying to compare which variables were preprocessed differently between two groups.
My Questions:
How can I extract the list of variables that were transformed in each step?
Thank you!
I stored the prep() results and tried to find the variables processed in each step, but it was not clear to me how the output is organized.
I believe the best way for you to get that info is to tidy()
the prepped recipe:
library(recipes)
rec <-
recipe(mpg ~ ., mtcars) |>
step_normalize(all_numeric_predictors())
## before preprocessing is applied:
tidy(rec, number = 1)
#> # A tibble: 1 × 4
#> terms statistic value id
#> <chr> <chr> <dbl> <chr>
#> 1 all_numeric_predictors() <NA> NA normalize_M0mBp
Before you apply prep()
the recipe does not store which columns were chosen (because they haven't been chosen yet) but after prep()
the info is there:
## after preprocessing is applied:
prep(rec) |> tidy(number = 1)
#> # A tibble: 20 × 4
#> terms statistic value id
#> <chr> <chr> <dbl> <chr>
#> 1 cyl mean 6.19 normalize_M0mBp
#> 2 disp mean 231. normalize_M0mBp
#> 3 hp mean 147. normalize_M0mBp
#> 4 drat mean 3.60 normalize_M0mBp
#> 5 wt mean 3.22 normalize_M0mBp
#> 6 qsec mean 17.8 normalize_M0mBp
#> 7 vs mean 0.438 normalize_M0mBp
#> 8 am mean 0.406 normalize_M0mBp
#> 9 gear mean 3.69 normalize_M0mBp
#> 10 carb mean 2.81 normalize_M0mBp
#> 11 cyl sd 1.79 normalize_M0mBp
#> 12 disp sd 124. normalize_M0mBp
#> 13 hp sd 68.6 normalize_M0mBp
#> 14 drat sd 0.535 normalize_M0mBp
#> 15 wt sd 0.978 normalize_M0mBp
#> 16 qsec sd 1.79 normalize_M0mBp
#> 17 vs sd 0.504 normalize_M0mBp
#> 18 am sd 0.499 normalize_M0mBp
#> 19 gear sd 0.738 normalize_M0mBp
#> 20 carb sd 1.62 normalize_M0mBp
Created on 2024-05-26 with reprex v2.1.0