rformulaglmnon-standard-evaluation

Non standard evaluation in formula argument


Problem

I would like to fit multiple logistic regression models in R for different values of i:

glm(mpg_20  ~ poly(horsepower, i), data = Auto)

My problem is that the call argument of the resulting model object is always equal to

mpg_20  ~ poly(horsepower, i)

whereas I would need it to be, e.g.:

mpg_20  ~ poly(horsepower, 1)

Hence, I would like that the formula get evaluated before being passed to the glm object. Does anybody have tips for how to get around this non-standard evaluation problem?

Reproducible example

# Load Auto data
data("Auto", package = "ISLR2")

# Create a binary indicator variable
Auto$mpg_20 <- as.numeric(Auto$mpg < 20)

# Create a list of models
mlist <- lapply(
  1:3, 
  \(i) {
    glm(mpg_20  ~ poly(horsepower, i), data = Auto)
  }
)

# Print call objects
lapply(mlist, \(x) x$call)

Output

[[1]]
glm(formula = mpg_20 ~ poly(horsepower, i), data = Auto)

[[2]]
glm(formula = mpg_20 ~ poly(horsepower, i), data = Auto)

[[3]]
glm(formula = mpg_20 ~ poly(horsepower, i), data = Auto)

Desired output

[[1]]
glm(formula = mpg_20 ~ poly(horsepower, 1), data = Auto)

[[2]]
glm(formula = mpg_20 ~ poly(horsepower, 2), data = Auto)

[[3]]
glm(formula = mpg_20 ~ poly(horsepower, 3), data = Auto)

Session Info

R version 4.5.0 (2025-04-11)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.5.0 tools_4.5.0   

Solution

  • fit <- glm(mpg_20  ~ 1, data = Auto)
    
    # Create a list of models
    mlist <- lapply(
      1:3, 
      \(i) {
        update(fit, 
               reformulate(
                 sprintf("poly(horsepower, %d)", i), 
                 "mpg_20")
        )
      }
    )
    
    
    lapply(mlist, \(x) x$call)
    #[[1]]
    #glm(formula = mpg_20 ~ poly(horsepower, 1), data = Auto)
    #
    #[[2]]
    #glm(formula = mpg_20 ~ poly(horsepower, 2), data = Auto)
    #
    #[[3]]
    #glm(formula = mpg_20 ~ poly(horsepower, 3), data = Auto)
    

    Alternatively:

    mlist <- lapply(
      1:3, 
      \(i) {
        eval(
          substitute(
            glm(mpg_20  ~ poly(horsepower, i), data = Auto), 
            list(i = i)
            )
          )
      }
    )