rtidymodelsr-recipes

Recipe formula in tidymodels


Imagine there is a formula for simple regression: y~f1+f2+f3, where f1 is a factor with A,B,C levels. f2 and f3 are numerics Further i'm using following recipe:

recipe(y~f1+f2+f3, data) %>%
    step_dummy(f1) %>%
    step_log(f3)

Question. Eventually initial formula turns to y~f1_A+f1_B+f1_C+f2+log(f3), right?

Question2. If I would have added

+step_pca(comp5)

it would become

y~PC1+PC2+..PC5?

Hope it make sense

Thanks in advance


Solution

  • For the first question

    Eventually initial formula turns to y~f1_A+f1_B+f1_C+f2+log(f3), right?

    Almost! The log step renames the variable (so the logged variables are just in column f3). The other parts are right.

    Question 2:

    If I would have added

    +step_pca(comp5)

    it would become

    y~PC1+PC2+..PC5?

    Yes(ish). The names that come out of step_pca() are designed to be sortable. If you have fewer than 10 components, then the above is right. If you have 11 to 99 components, then they are PC01 ... PC99.

    Finally, recipes don't just make a formula to do these computations (you probably didn't mean that but just to be sure). However, there is a little-known formula method that you can use on the recipes once it is prepared:

    library(tidymodels)
    
    pen_rec <- 
      recipe(island ~ species + body_mass_g, data = penguins) %>% 
      step_dummy(species) %>% 
      step_log(body_mass_g) %>% 
      prep()
    
    formula(pen_rec)
    #> island ~ body_mass_g + species_Chinstrap + species_Gentoo
    #> <environment: 0x125456170>
    

    Created on 2023-10-28 with reprex v2.0.2