rheuristicstidymodels

How to specify a dummy model/heuristic rule as a model in tidymodels?


I'm comparing a few ML models on my dataset using tidymodels and workflowsets in R, and I want to compare them to a commonly used baseline heuristic rule in the domain as well at the same time.

I thought it might be simple to specify either the rule e.g. y_pred = (x1 > 3)|(x2 <1) as a model on the same data, tune nothing (as it won't change) and then compare easily using yardstick etc to all the other models as it's just a poorly fit model.

I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.


Solution

  • The community-contrubuted parsnip extension package bespoke allows folks to define these sorts of models. Install with:

    pak::pak("macmillancontentscience/bespoke")
    

    The main function, bespoke(), takes a data frame as input and returns a vector (integer, character, or factor) indicating the outcomes as output (with one value per input row). A quick example of how that might look in action:

    library(parsnip)
    library(bespoke)
    
    dat <- data.frame(
      y = factor(sample(c("a", "b"), 10, replace = TRUE)), 
      x1 = rnorm(10), 
      x2 = rnorm(10, .5)
    )
    
    make_pred <- function(x) {
      y_pred <- x$x1 > x$x2
      factor(y_pred, labels = c("a", "b"))
    }
    
    model_spec <- bespoke(fn = make_pred)
    
    model_spec
    #> bespoke Model Specification (classification)
    #> 
    #> Main Arguments:
    #>   fn = make_pred
    #> 
    #> Computational engine: bespoke
    
    model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)
    
    predict(model_fit, dat)
    #> # A tibble: 10 × 1
    #>    .pred_class
    #>    <fct>      
    #>  1 b          
    #>  2 b          
    #>  3 b          
    #>  4 a          
    #>  5 a          
    #>  6 b          
    #>  7 a          
    #>  8 b          
    #>  9 a          
    #> 10 b
    

    Created on 2024-03-20 with reprex v2.1.0

    :)