rtidymodelsrecipe

Is there a way to to tune parameters inside a recipe workflow?


I'd like to use the step_impute_knn function from the recipe package to impute some missing values in my data. I've tested it with the default parameters (neighbours = 5, nthread = 1 and eps = 1e-08) and can see that the resulting means and standard deviations for numerical variables (for example) are fairly close to the original data after imputation.

I'd like, however, to tune these parameters to see if there is an optimal set but I don't even know how to start inside the recipe package. The answers here and here are too complex or specific for me to understand.

The function step_impute_knn doesn't provide any tuning options, as far as I can see and I'd rather not do it manually. Is there a simple way to do this?

Sample data:

train <- structure(list(PassengerId = c("0001_01", "0002_01", "0003_01", 
"0003_02", "0004_01", "0005_01"), HomePlanet = c("Europa", "Earth", 
"Europa", "Europa", "Earth", NA), CryoSleep = c("False", 
"False", "False", "False", "False", "False"), Cabin = c("B/0/P", 
"F/0/S", "A/0/S", "A/0/S", "F/1/S", "F/0/P"), Destination = c("TRAPPIST-1e", 
"TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "PSO J318.5-22"
), Age = c(39, 24, 58, 33, 16, 44), VIP = c("False", "False", 
"True", "False", "False", "False"), RoomService = c(0, 109, 43, 
0, 303, 0), FoodCourt = c(0, 9, 3576, 1283, 70, 483), ShoppingMall = c(0, 
25, 0, 371, 151, 0), Spa = c(0, 549, 6715, 3329, 565, 291), VRDeck = c(0, 
44, 49, 193, 2, 0), Name = c("Maham Ofracculy", "Juanna Vines", 
"Altark Susent", "Solam Susent", "Willy Santantines", "Sandie Hinetthews"
), Transported = c("False", "True", "False", "False", "True", 
"True")), row.names = c(NA, 6L), class = "data.frame")

What I have so far:

train_no_na <- train %>%
na.omit()

imp_knn_blueprint <- recipe(Transported ~ ., data = train_no_na) %>%
     step_impute_knn(recipe = ., HomePlanet, 
              impute_with = imp_vars(.), neighbors = 5, 
              options = list(nthread = 1, eps = 1e-08))

imp_knn_prep <- prep(imp_knn_blueprint, training = train_no_na)
imp_knn_5 <- bake(imp_knn_prep, new_data = train)

Solution

  • Yes, you can (although we don't consider nthread or eps tuning parameters).

    You would give them a value of tune() in the recipe and treat it like any other tuning parameter associated with the model.

    You would use tune_grid() or one of the other tuning parameter functions. tidymodels even understands what this particular parameter is and has built-in default ranges (although you can pick the gird yourself)

    There's an example of tuning recipe parameters in the tidymodels book and also on the tune_grid help page (in the examples).