I am creating a logistic regression model using some data from patients in an Intensive Care Unit. The model seeks to predict if a patient is likely to live or die in the next 7 days based on their response to a certain treatment.
For this I am using the tidymodels
suite in R. I have successfully trained and tuned an elastic net logistic regression model, but I want to see the specific models that have been created (i.e. which variables are in that model, break down the weighting it is giving each variable, etc). I am very close, but I just can't quite get the last little step.
My workflow is as follows:
# initial split of data
proning_initial_split28 <-
raw_proning_mortality %>%
initial_split(prop = 0.9, strata = mortality_28)
proning_modelTrain_28 <-
proning_initial_split28 %>%
training()
Creation of k-fold object with 5 folds-
lr_v_fold <-
vfold_cv(data = proning_modelTrain_28,
v = 5,
repeats = 5,
strata = mortality_28)
Creation of recipe for data processing-
lr_recipe <-
recipe(proning_modelTrain_28, formula = mortality_28 ~ .) %>%
step_rm(mortality_07) %>%
step_dummy(all_factor_predictors(), -mortality_28) %>%
step_impute_bag(all_predictors()) %>%
step_corr(all_predictors(), threshold = 0.9) %>%
step_zv(all_predictors()) %>%
prep()
Creation of model and tuning grid for model-
lr_model_01 <-
logistic_reg(mode = 'classification',
engine = 'glmnet',
penalty = tune(),
mixture = tune()) %>%
set_args(maxit=1e+06)
lr_tuning_grid <-
grid_max_entropy(penalty(),
mixture(),
iter = 2000)
Creation of final workflow to bring it all together. I have passed the control_grid()
instructions to both save the predictions made and to save the model which is generated at each step-
lr_workflow <-
workflow() %>%
add_model(lr_model_01) %>%
add_recipe(lr_recipe) %>%
tune_grid(resamples = lr_v_fold,
grid = lr_tuning_grid,
metrics = metric_set(sens, spec, ppv, npv, roc_auc),
control = control_grid(save_pred = T,
extract = extract_fit_parsnip))
I cannot access the models within this final workflow object. lr_workflow$.extracts
seems to contain the models (example below).
However, getting much deeper is difficult.
The above image shows item [[25[]]. I can go in part of this with lr_workflow$.extracts[[25]]$.extracts[1]
, but the output I get is as follows-
[[1]]
parsnip model object
Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial", alpha = ~0.0505244575906545, maxit = ~1e+06)
Df %Dev Lambda
1 0 0.00 4.4620
2 1 0.16 4.0660
3 1 0.33 3.7050
(continues for a total 100 rows)
How can I get a better breakdown of any of the logistic regression models I have trained? By this I mean a breakdown that resembles the image below (the image is an illustrative example, and unconnected to my specific data)-
Here is a solution. The trick is that you need to save the preprocessing workflow in an object, so you can afterwards use it to get the model. Note that I changed the original lr_workflow
so it contains the preprocessing workflow and created a new object named tune_wf
to get the tuning results.
# Create preprocessing workflow
lr_workflow <-
workflow() %>%
add_model(lr_model_01) %>%
add_recipe(lr_recipe)
# Tune parameters
tune_wf <- lr_workflow %>%
tune_grid(resamples = lr_v_fold,
grid = lr_tuning_grid,
metrics = metric_set(sens, spec, ppv, npv, roc_auc),
control = control_grid(save_pred = T,
extract = extract_fit_parsnip))
# Collect metrics
tune_wf %>%
collect_metrics()
# Get model with highest roc_auc or another meric
best_mod <- tune_wf %>%
select_best("roc_auc")
# Get the model
final_wf <- lr_workflow %>%
finalize_workflow(best_mod)
# last fit the model
final_wf %>%
last_fit(data_split)