I'm running R 4.4.1 and MASS 7.3-61 on my MacBook Pro, 14", Nov 2023, which has MacOS 14.6.
Here's some reproducible code as a MWE:
require(MASS)
set.seed(42)
x = rnorm(5)
y = rnorm(5)
df = data.frame(x, y)
lmod = lm(y~x, data=df)
boxcox(lmod)
This produces the error:
Error in model.frame.default(formula = y ~ x, data = df, drop.unused.levels = TRUE) :
'data' must be a data.frame, environment, or list
The variable df
is clearly a data frame, so this error message is totally erroneous:
> class(df)
[1] "data.frame"
> is.data.frame(df)
[1] TRUE
I'm obviously specifying the model correctly, so that cause is not relevant. If I try the traceback()
function, it yields the following:
16: stop("'data' must be a data.frame, environment, or list")
15: model.frame.default(formula = y ~ x + cat, data = df, drop.unused.levels = TRUE)
14: stats::model.frame(formula = y ~ x + cat, data = df, drop.unused.levels = TRUE)
13: eval(mf, parent.frame())
12: eval(mf, parent.frame())
11: lm(formula = y ~ x + cat, data = df, y = TRUE, qr = TRUE)
10: eval(call, parent.frame())
9: eval(call, parent.frame())
8: update.default(object, y = TRUE, qr = TRUE, ...)
7: update(object, y = TRUE, qr = TRUE, ...)
6: boxcox.lm(lmod, plotit = TRUE)
5: boxcox(lmod, plotit = TRUE) at test_boxcox.R#20
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/Projects/non_repo_data/test_boxcox.R")
But going through the stats::model.frame.default
function's source code did not reveal this stop
command anywhere. I'm at a total loss for understanding why this is happening, or even whence the error is arising. Definitely feels like a bug, though.
tl;dr you have to name your data frame something other than df
, so that it doesn't collide with a built-in R object.
The error itself arises from line 526 of src/library/stats/R/models.R.
This is arguably a bug, or at least an "infelicity" (sensu Bill Venables), in MASS::boxcox
, but it is also an illustration of why it's good to avoid name overlaps between your variables and built-in objects. (I've submitted a bug report.)
Continuing with your example:
dff <- df ## rename your data frame
lmod <- lm(y~x, data=dff)
boxcox(lmod)
Error in boxcox.default(lmod) : response variable must be positive
This error happens because you constructed a slightly inappropriate example (which was fine for showing what you wanted).
lmod <- lm(abs(y)~x, data=dff)
boxcox(lmod) ## works
We can get a hint of what's going on by looking at the output of traceback()
:
12: stop("'data' must be a data.frame, environment, or list")
11: model.frame.default(formula = y ~ x, data = df, drop.unused.levels = TRUE)
10: stats::model.frame(formula = y ~ x, data = df, drop.unused.levels = TRUE)
9: eval(mf, parent.frame())
8: eval(mf, parent.frame())
7: lm(formula = y ~ x, data = df, y = TRUE, qr = TRUE)
6: eval(call, parent.frame())
5: eval(call, parent.frame())
4: update.default(object, y = TRUE, qr = TRUE, ...)
3: update(object, y = TRUE, qr = TRUE, ...)
2: boxcox.lm(lmod)
1: boxcox(lmod)
boxcox
is calling update()
to make sure the fitted models has all the components it needs (especially the stored QR decomposition)update()
is re-calling lm()
lm()
is calling model.frame()
model.frame()
is being evaluated in an environment where it sees the built-in df
(in the stats
package) before it sees your data frame (in .GlobalEnv
).It would take just a little more work than I feel like doing right now to establish exactly what all those parent.frame()
invocations are seeing. From within the lm()
call (you can get there by setting options(error = recover)
, you can see that the enclosing environment of the parent frame parent.frame()$enclos
is <environment:base>
. I'm not quite sure how we get from there to <environment: namespace:stats>
, which is where we're getting df
from ...