rtidymodels

Can I create new features from a matrix?


I'd like to create features from an ispline in tidymodels. One way to do this would be to use step_mutate as follows:

library(tidymodels)
library(tidyverse)
library(splines2)

data <- data.frame(x = seq(0, 10, 0.1))


rec <- recipe(.~x, data=data) %>% 
  step_mutate(
    model.matrix(~isp(x, df=5))
  ) %>% 
  prep()



bake(rec, new_data = data)
#> # A tibble: 101 × 2
#>        x model.matrix(~isp(x, df = 5…¹ [,"isp(x, df = 5)1"] [,"isp(x, df = 5)2"]
#>    <dbl>                         <dbl>                <dbl>                <dbl>
#>  1   0                               1               0                   0      
#>  2   0.1                             1               0.0776              0.00118
#>  3   0.2                             1               0.151               0.00461
#>  4   0.3                             1               0.219               0.0102 
#>  5   0.4                             1               0.284               0.0177 
#>  6   0.5                             1               0.344               0.0271 
#>  7   0.6                             1               0.400               0.0382 
#>  8   0.7                             1               0.453               0.0509 
#>  9   0.8                             1               0.502               0.0651 
#> 10   0.9                             1               0.548               0.0806 
#> # ℹ 91 more rows
#> # ℹ abbreviated name: ¹​`model.matrix(~isp(x, df = 5))`[,"(Intercept)"]
#> # ℹ 1 more variable: `model.matrix(~isp(x, df = 5))`[4:6] <dbl>

Created on 2024-06-29 with reprex v2.1.0

This is, however, a 101x2 dataframe. The spline features are somehow nested

bake(rec, new_data = data) %>% 
  dim
#> [1] 101   2

Created on 2024-06-29 with reprex v2.1.0

Is there a way I can use step_mutate or other tidymodels functions to get each feature as a column right out from the call to bake?


Solution

  • The short answer is no.

    Long answer. Ideally you would only use step_mutate() for single column creation. If you want to create multiple columns at a time, then I would suggest that you go ahead and write your own custom step.

    For this specific issue, if you are wanting isplines using the {splines2} package, then you can use step_spline_monotone() which does just that.

    This also illustrates why step_mutate() isn't going to help here. Using splines like this should be a learned task, hence you need to save some information such that you can apply the same transformation to the new data.