rmachine-learninglogistic-regressiontidymodelsr-parsnip

How can I extract the logistic regression model from this workflow?


I am creating a logistic regression model using some data from patients in an Intensive Care Unit. The model seeks to predict if a patient is likely to live or die in the next 7 days based on their response to a certain treatment.

For this I am using the tidymodels suite in R. I have successfully trained and tuned an elastic net logistic regression model, but I want to see the specific models that have been created (i.e. which variables are in that model, break down the weighting it is giving each variable, etc). I am very close, but I just can't quite get the last little step.

My workflow is as follows:

# initial split of data
proning_initial_split28 <- 
  raw_proning_mortality %>% 
  initial_split(prop = 0.9, strata = mortality_28)

proning_modelTrain_28 <- 
  proning_initial_split28 %>% 
  training()

Creation of k-fold object with 5 folds-

lr_v_fold <- 
  vfold_cv(data = proning_modelTrain_28,
           v = 5, 
           repeats = 5, 
           strata = mortality_28)

Creation of recipe for data processing-

lr_recipe <- 
  recipe(proning_modelTrain_28, formula = mortality_28 ~ .) %>% 
  step_rm(mortality_07) %>% 
  step_dummy(all_factor_predictors(), -mortality_28) %>% 
  step_impute_bag(all_predictors()) %>% 
  step_corr(all_predictors(), threshold = 0.9) %>% 
  step_zv(all_predictors()) %>% 
  prep()

Creation of model and tuning grid for model-

lr_model_01 <- 
  logistic_reg(mode = 'classification',
               engine = 'glmnet',
               penalty = tune(),
               mixture = tune()) %>% 
  set_args(maxit=1e+06)


lr_tuning_grid <- 
  grid_max_entropy(penalty(),
                   mixture(),
                   iter = 2000)

Creation of final workflow to bring it all together. I have passed the control_grid() instructions to both save the predictions made and to save the model which is generated at each step-

lr_workflow <- 
  workflow() %>% 
  add_model(lr_model_01) %>% 
  add_recipe(lr_recipe) %>% 
  tune_grid(resamples = lr_v_fold,
            grid = lr_tuning_grid,
            metrics = metric_set(sens, spec, ppv, npv, roc_auc),
            control = control_grid(save_pred = T, 
                                   extract = extract_fit_parsnip))

I cannot access the models within this final workflow object. lr_workflow$.extracts seems to contain the models (example below).

Example of part of output from lr_workflow$.extracts

However, getting much deeper is difficult.

The above image shows item [[25[]]. I can go in part of this with lr_workflow$.extracts[[25]]$.extracts[1], but the output I get is as follows-

[[1]]
parsnip model object


Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial",      alpha = ~0.0505244575906545, maxit = ~1e+06) 

    Df  %Dev Lambda
1    0  0.00 4.4620
2    1  0.16 4.0660
3    1  0.33 3.7050
(continues for a total 100 rows)

How can I get a better breakdown of any of the logistic regression models I have trained? By this I mean a breakdown that resembles the image below (the image is an illustrative example, and unconnected to my specific data)-

enter image description here


Solution

  • Here is a solution. The trick is that you need to save the preprocessing workflow in an object, so you can afterwards use it to get the model. Note that I changed the original lr_workflow so it contains the preprocessing workflow and created a new object named tune_wf to get the tuning results.

    # Create preprocessing workflow
    lr_workflow <- 
      workflow() %>% 
      add_model(lr_model_01) %>% 
      add_recipe(lr_recipe) 
    
    # Tune parameters
    tune_wf <- lr_workflow %>% 
      tune_grid(resamples = lr_v_fold,
                grid = lr_tuning_grid,
                metrics = metric_set(sens, spec, ppv, npv, roc_auc),
                control = control_grid(save_pred = T, 
                                       extract = extract_fit_parsnip))
    
    # Collect metrics
    tune_wf %>% 
      collect_metrics()
    
    # Get model with highest roc_auc or another meric
    best_mod <- tune_wf %>%
      select_best("roc_auc")
    
    # Get the model
    final_wf <- lr_workflow %>% 
      finalize_workflow(best_mod)
    
    # last fit the model
    final_wf %>%
      last_fit(data_split)