I'm trying to add a "step_woe" step to a recipe, where previously i added a "step_discretize_xgb" but i keep getting an error message because of the variables types i need to transform with the step_woe.
Here's a short example of my code, with only one variable.
library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)
TG <- sample(c(0,1), 1000, replace = TRUE)
V1 <- rnorm(1000)
train <- tibble(VARIABLE_1 = V1,
TARGET = TG)
rec <- recipes::recipe(TARGET ~ .,
data = train) %>%
step_discretize_xgb(all_numeric_predictors(),
outcome = vars(TARGET)) %>%
step_woe(all_of("VARIABLE_1"),
outcome = vars(TARGET)) %>%
prep(training = train)
PS - I've checked that this variable is a factor and it is binned. I tried without the "all_of" and quotes, ie, just VARIABLE_1.
The message is:
Error in
check_type()
: ! All columns selected for the step should be factor or character Backtrace:
- ... %>% prep(training = train)
- recipes:::prep.recipe(., training = train)
- embed:::prep.step_woe(x$steps[[i]], training = training, info = x$term_info)
- recipes::check_type(training[, outcome_name], quant = FALSE)
Error in check_type(training[, outcome_name], quant = FALSE) :
This is an unfortunate error message from {embed}. You are getting this error because outcome
of step_woe()
needs to be a categorical variable. Since TG
appears to be a categorical variable, you can code it as such and it will work.
I have opened an issue to make this error clearer: https://github.com/tidymodels/embed/issues/147
library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)
TG <- sample(c("0", "1"), 1000, replace = TRUE)
V1 <- rnorm(1000)
train <- tibble(VARIABLE_1 = V1,
TARGET = TG)
rec <- recipes::recipe(TARGET ~ .,
data = train) %>%
step_discretize_xgb(all_numeric_predictors(),
outcome = vars(TARGET)) %>%
step_woe(all_of("VARIABLE_1"),
outcome = vars(TARGET)) %>%
prep(training = train)
rec
#> Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 1
#>
#> Training data contained 1000 data points and no missing data.
#>
#> Operations:
#>
#> Discretizing variables using xgboost VARIABLE_1 [trained]
#> WoE version against outcome TARGET for VARIABLE_1 [trained]
Created on 2022-11-21 with reprex v2.0.2