I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:
wt_reg <- function(form, data, wts) {
lm(formula = as.formula(form), data = data,
weights = wts)
}
wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
This returns
Error in eval(extras, data, env) : object 'wts' not found
If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame())
. Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:
mf is assigned such that it's the same as
stats::model.frame(formula = as.formula(form), data = data, weights = wts,
drop.unused.levels = TRUE)
When I run
parent.frame()$wts
it does return a numeric vector. But when I run
eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts,
drop.unused.levels = TRUE), parent.frame())
it doesn't.
I can run
stats::model.frame(formula = as.formula(parent.frame()$form),
data = parent.frame()$data, weights = parent.frame()$wts,
drop.unused.levels = TRUE)
and it works. You can test this yourself if you want using the example from the top.
Any thoughts? I really have no idea what's going on here...
Formulas are special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out
ff <- mpg ~ cyl
environment(ff)
# <environment: R_GlobalEnv>
foo <- function() {
ff <- mpg ~ cyl
environment(ff)
}
foo()
# <environment: 0x0000026172e505d8> private function environment (different each time)
The problem is that lm
will try to use the data.frame you pass in and the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg
, the formula holds on the the global scope. But wts
only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work
wt_reg <- function(form, data, wts) {
ff <- as.formula(form)
environment(ff) <- environment()
lm(formula = ff, data = data,
weights = wts)
}
wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
The eval(mf, parent.frame)
you are referring to in lm()
is calling model.frame()
with your formula. And from the description on the ?model.frame
help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.
Alternatively, you could move the weights into the data object you are passing to lm
itself. This would work
wt_reg <- function(form, data, wts) {
lm(form, data = cbind(data, wts=wts),
weights = wts)
}
wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))