I am a little bit lost in tidymodels. I have a some data from topicmodeling:
I want to predict/classify the prevalent_topic based on value1 and value2:
prevalent_topic ~ value1 + value2
I started with multiclass classification using glmnet and nnet with tidymodels. Now I want to try "one-vs-rest" binary classification and created a recipe to begin with:
dfFT_rec <- recipe( ~ value1 + value2, data = dfFT_train) %>%
step_dummy(prevalent_topic, one_hot = TRUE) %>%
step_normalize(c(value1, value2))
The second step creates dummy variables that I would like to use as outcome, e.g. "prevalent_topic_Topic_1", ""prevalent_topic_Topic_2", ...
I tried to update the recipe's formula to use "prevalent_topic_Topic_1 ~ value1 + value2" but that did not work. I also tried to fit a workflow to my data without specifying the outcome but only got an error: "logistic_reg()
was unable to find an outcome."
Is this possible at all? Or is there a different way to turn an outcome factor variable into dummy-coded outcome variables?
As long as the values in prevalent_topic are mutually exclusive (and are in the normal factor class), you can use multinom_reg()
to get a model. Instead of fitting a set of logistic regressions, you can simultaneously model all of your categories.
If there are not mutually exclusive (like a multiple choice question), you would probably need to make separate factors and model each separately. That "multilabel" structure isn't currently supported in tidymodels. You might look at the recipe step step_dummy_multi_choice()
(https://recipes.tidymodels.org/reference/step_dummy_multi_choice.html) (followed by step_bin2factor()
(https://recipes.tidymodels.org/reference/step_bin2factor.html)) to make the different outcome columns.