I am testing the performance of a prediction model (binary scenario: 0 or 1) using tidymodels in R. I have created importance weights before the fitting process for all individuals in my dataset. I have done the data splitting part and the full workflow, but I would like my performance metrics (roc_auc and brier_class) to be weighted as well, based on the importance weights.
Right now I have:
set.seed(234)
data_folds <- group_vfold_cv(data,group = Hospital)
# Specify log reg
model <- logistic_reg(mode = "classification",engine = "glm")
# Workflows now
current_wf <- workflow() |>
add_case_weights(imp_weights) |>
add_formula(Status30D ~ Max_NEWS) |>
add_model(model)
current_wf
# Set up parallel processing
doParallel::registerDoParallel(cores = 6)
cntrl <- control_resamples(save_pred = T)
# Internal-External validation of the current EWS (possibly checking demographic parity also)
current_fit <- fit_resamples(current_wf,resamples = data_folds,
metrics = metric_set(roc_auc,brier_class)
,control = cntrl)
Is there a way to specify my need for weighted performance metrics in the fit_resamples function or somewhere else?
Our definition of importance weights is that they are only used during training. You can make your own case weight type to allow them to be used for both.
First, define your new weight type: https://github.com/tidymodels/hardhat/blob/main/R/case-weights.R
Second, let tune know that they should be used when the model is evaluated: https://github.com/tidymodels/tune/blob/0bc1a5c2affd3b1eb69d4d25b06ee31e5e29501d/R/case_weights.R#L22