rnon-standard-evaluation

Call to weight in lm() within function doesn't evaluate properly


I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:

wt_reg <- function(form, data, wts) {
  lm(formula = as.formula(form), data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

This returns

Error in eval(extras, data, env) : object 'wts' not found

If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame()). Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:

mf is assigned such that it's the same as

stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE)

When I run

parent.frame()$wts

it does return a numeric vector. But when I run

eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE), parent.frame()) 

it doesn't.

I can run

stats::model.frame(formula = as.formula(parent.frame()$form), 
    data = parent.frame()$data, weights = parent.frame()$wts, 
    drop.unused.levels = TRUE)

and it works. You can test this yourself if you want using the example from the top.

Any thoughts? I really have no idea what's going on here...


Solution

  • Formulas are special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out

    ff <- mpg ~ cyl
    environment(ff)
    # <environment: R_GlobalEnv>
    foo <- function() {
      ff <- mpg ~ cyl
      environment(ff)
    }
    foo()
    # <environment: 0x0000026172e505d8> private function environment (different each time)
    

    The problem is that lm will try to use the data.frame you pass in and the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg, the formula holds on the the global scope. But wts only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work

    wt_reg <- function(form, data, wts) {
      ff <- as.formula(form)
      environment(ff) <- environment()
      lm(formula = ff, data = data,
         weights = wts)
    }
    
    wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
    

    The eval(mf, parent.frame) you are referring to in lm() is calling model.frame() with your formula. And from the description on the ?model.frame help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.

    Alternatively, you could move the weights into the data object you are passing to lm itself. This would work

    wt_reg <- function(form, data, wts) {
      lm(form, data = cbind(data, wts=wts),
         weights = wts)
    }
    
    wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))