I'm comparing a few ML models on my dataset using tidymodels and workflowsets in R, and I want to compare them to a commonly used baseline heuristic rule in the domain as well at the same time.
I thought it might be simple to specify either the rule e.g. y_pred = (x1 > 3)|(x2 <1)
as a model on the same data, tune nothing (as it won't change) and then compare easily using yardstick etc to all the other models as it's just a poorly fit model.
I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.
The community-contrubuted parsnip extension package bespoke allows folks to define these sorts of models. Install with:
pak::pak("macmillancontentscience/bespoke")
The main function, bespoke()
, takes a data frame as input and returns a vector (integer, character, or factor) indicating the outcomes as output (with one value per input row). A quick example of how that might look in action:
library(parsnip)
library(bespoke)
dat <- data.frame(
y = factor(sample(c("a", "b"), 10, replace = TRUE)),
x1 = rnorm(10),
x2 = rnorm(10, .5)
)
make_pred <- function(x) {
y_pred <- x$x1 > x$x2
factor(y_pred, labels = c("a", "b"))
}
model_spec <- bespoke(fn = make_pred)
model_spec
#> bespoke Model Specification (classification)
#>
#> Main Arguments:
#> fn = make_pred
#>
#> Computational engine: bespoke
model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)
predict(model_fit, dat)
#> # A tibble: 10 × 1
#> .pred_class
#> <fct>
#> 1 b
#> 2 b
#> 3 b
#> 4 a
#> 5 a
#> 6 b
#> 7 a
#> 8 b
#> 9 a
#> 10 b
Created on 2024-03-20 with reprex v2.1.0
:)