logistic-regressionr-caretrsample

Cannot Extract Information from glm model using tidy function from rsample package


I have been foll0wing the logistic regression chapter in Hands on Programing with R. As I started all the codes were working fine but then I retarted my R session and when I run this code

tidy(model1)

it throws this error message.

`Error in UseMethod("tidy") : 
  no applicable method for 'tidy' applied to an object of class "c('glm', 'lm')"`

so here are my codes up to where it throws the error

library(dplyr)
library(ggplot2)
library(rsample)
library(modeldata) #contains the attrition dataset
library(caret)
library(vip)

df <- attrition |> 
  mutate_if(is.ordered, factor, ordered=F)

#create training(70%) and test(30% sets )
set.seed(191) # for reproducibility
churn_split <- initial_split(df, prop =0.7, strata ='Attrition')
churn_train <- training(churn_split)
churn_test <- testing(churn_split)

#model simple logistic regression (use 1 variable for prediction)
model1 <- glm(Attrition ~ MonthlyIncome, 
              family ='binomial',
              data = churn_train)
model2 <- glm(Attrition ~ OverTime, 
              family = 'binomial',
              data = churn_train)
summary(model1)
exp(coef(model1))

all these codes work fine and this tidy(model1) was also working until I restarted R studio. I want to know if I did something or the function is just messing with me and how do I fix it


Solution

  • I tried running your code and it gave me the same error; but after reloading the broom package, it did create a tibble of model1 and model2:

    set.seed(123)  # for reproducibility
    churn_split <- initial_split(df, prop = .7, strata = "Attrition")
    churn_train <- training(churn_split)
    churn_test  <- testing(churn_split)
    
    model1 <- glm(Attrition ~ MonthlyIncome, family = "binomial", data = churn_train) # prob. of attrition on income
    model2 <- glm(Attrition ~ OverTime, family = "binomial", data = churn_train) # prob. of attrition on overtime
    
    broom::tidy(model1) 
    A tibble: 2 × 5
      term           estimate std.error statistic      p.value
      <chr>             <dbl>     <dbl>     <dbl>        <dbl>
    1 (Intercept)   -0.886    0.157         -5.64 0.0000000174
    2 MonthlyIncome -0.000139 0.0000272     -5.10 0.000000344 
    
    broom::tidy(model2)
    # A tibble: 2 × 5
      term        estimate std.error statistic  p.value
      <chr>          <dbl>     <dbl>     <dbl>    <dbl>
    1 (Intercept)    -2.13     0.119    -17.9  1.46e-71
    2 OverTimeYes     1.29     0.176      7.35 2.01e-13
    

    There might be an issue with the package or the combination of loaded packages in your R environment that makes R get confused by the class of your model "c("glm","lm")".

    To my knowledge, the tidy function in the broom package works by using a mechanism to identify the specific class of your model object. Based on the object's class, it will select the right tidying method (i.e, tidy.glm or tidy.lm) and transform it into a tidy data frame.

    R might be trying to figure out the class of your model but stopping after seeing the model's class doesn't pass the tidy.glm checks. (You can type broom:::tidy.glm to look at the function arguments)

    You'll see that a model object of class glm will display the following results:

    summary(model1)
    Call:
    glm(formula = Attrition ~ MonthlyIncome, family = "binomial", # identifies the model
        data = churn_train)
    
    Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    # calculates p-value.
    (Intercept)   -8.861e-01  1.572e-01  -5.636 1.74e-08 ***
    MonthlyIncome -1.386e-04  2.719e-05  -5.098 3.44e-07 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1) # Provides a dispersion parameter like `lm` objects
    
        Null deviance: 905.68  on 1027  degrees of freedom 
    Residual deviance: 870.83  on 1026  degrees of freedom
    AIC: 874.83 # Provides a statistic (AIC instead of the R-squared and the F-statistic)
    
    Number of Fisher Scoring iterations: 5
    

    So, in theory, tidy(model1) should work whether your input is a glm or lm model. If there is some sort of issue with your packages, R might be throwing an error thinking there are no tidy functions that take objects of class c("glm","lm").

    It seemed like reloading the package is a simple solution that fixes the issue.