rtidymodelsr-recipes

Error in step_select(): The column 'normalized_used_price' is missing from 'new_data'


train data test

set.seed(123)
recipe_obj <- recipe(normalized_used_price ~ ., data = train) %>%
  step_select(-weight, -screen_size, -release_year, -normalized_new_price) %>% 
  step_string2factor(all_nominal_predictors()) %>% 
  step_impute_knn(all_predictors()) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_scale(all_numeric_predictors())

[...]

final_wf <- workflow() %>% 
  add_recipe(recipe_obj) %>% 
  add_model(final_model)

fn_model <- fit(final_wf, train)

predict(fn_model, test)

I am trying to predict values using my test dataset. How ever its not working


Solution

  • Using step_select() for negative selections end up trying to select the outcome which is not what you want. Instead use step_rm()

    set.seed(123)
    recipe_obj <- recipe(normalized_used_price ~ ., data = train) %>%
      step_rm(weight, screen_size, release_year, normalized_new_price) %>% 
      step_string2factor(all_nominal_predictors()) %>% 
      step_impute_knn(all_predictors()) %>% 
      step_dummy(all_nominal_predictors()) %>% 
      step_scale(all_numeric_predictors())
    
    [...]
    
    final_wf <- workflow() %>% 
      add_recipe(recipe_obj) %>% 
      add_model(final_model)
    
    fn_model <- fit(final_wf, train)
    
    predict(fn_model, test)