Is it possible to use tidyselect in specifying columns in a formula?
For example, something akin to
lr_fit <- parsnip::logistic_reg() %>%
fit(y ~ starts_with("a"), data=df)
Related: https://stats.stackexchange.com/questions/582174/how-select-multiple-columns-in-lm-in-r
The other solution I came up with is picking the specific columns in the dataframe and then using y ~ .
for all other columns.
The default engine for logistic_reg()
is glm()
which has a formula interface so parsnip simply passes on the data and the formula. glm()
is not written to work with tidyselect so that's when this journey stops.
You've already had the right idea with using ~ .
and selecting your variables before the data gets passed to glm()
. You can extend that idea by using a recipe for preprocessing, which gives you access to much more than just select()
with tidyselect syntax.