rr-caretr-recipes

How to exclude certain variables from recipe?


When using the step_regex function to build a recipe for a model, it creates additional columns for certain patterns in the original column. Is there way to exclude the original column from the recipe once I'm done with it?

For example in the example below, the product contains both original description column and two newly created by step_regex. I want a solution that's integrated with the recipe object, so that I can use it directly in the caret::train.

library(recipe)
data(covers)

rec <- recipe(~ description, covers) %>%
  step_regex(description, pattern = "(rock|stony)", result = "rocks") %>%
  step_regex(description, pattern = "ratake families")

rec2 <- prep(rec, training = covers)

with_dummies <- bake(rec2, newdata = covers)

Solution

  • Just found the solution. I think I can change the role for the columns I don't want to be used as predictors.

    rec <- rec %>% add_role(description, new_role = "dont_use")