I am trying to fit some lme4::lmer
models in drake
plan, but am getting an error
'data' not found, and some variables missing from formula environment
If I substitute an lm
model, it works.
Here is a reproducible example
library(drake)
library(lme4)
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:drake':
#>
#> expand
plan_lm <- drake_plan(
dat = iris,
mod = lm(Sepal.Length ~ Petal.Length, data = dat)
)
make(plan_lm)
#> ℹ Consider drake::r_make() to improve robustness.
#> ▶ target dat
#> ▶ target mod
plan_lmer <- drake_plan(
dat1 = iris,
mod1 = lmer(Sepal.Length ~ Petal.Length, data = dat1)
)
make(plan_lmer)
#> ▶ target dat1
#> ▶ target mod1
#> x fail mod1
#> Error: target mod1 failed.
#> diagnose(mod1)$error$message:
#> 'data' not found, and some variables missing from formula environment
#> diagnose(mod1)$error$calls:
#> lme4::lFormula(formula = Sepal.Length ~ Petal.Length, data = dat1,
#> control = list("nloptwrap", TRUE, 1e-05, TRUE, FALSE, list(
#> "ignore", "stop", "ignore", "stop", "stop", "message+drop.cols",
#> "warning", "stop"), list(list("warning", 0.002, NULL),
#> list("message", 1e-04), list("warning", 1e-06)), list()))
#> lme4:::checkFormulaData(formula, data, checkLHS = control$check.formula.LHS ==
#> "stop")
#> base::stop("'data' not found, and some variables missing from formula environment",
#> call. = FALSE)
Created on 2020-07-29 by the reprex package (v0.3.0)
Any suggestions?
This edge case is an instance of https://github.com/ropensci/drake/issues/1012 and https://github.com/ropensci/drake/issues/1163. drake
creates its own environments to run commands, so the environment with dat
is different from the environment where the model actually runs. There are good reasons drake
does this, and the behavior is not going to change, so this issue is unfortunately permanent unless lme4
changes. The best workaround I can offer is to create the formula in the target's environment at runtime, something like the reprex below. You have to manually force the data and the formula to be in the same environment. I recommend writing a custom function to do this.
library(drake)
suppressPackageStartupMessages(library(lme4))
fit_lmer <- function(dat) {
envir <- environment()
envir$dat <- dat
f <- as.formula("Reaction ~ Days + (Days | Subject)", env = envir)
lme4::lmer(f, data = dat)
}
plan <- drake_plan(
dat = sleepstudy,
mod = fit_lmer(dat)
)
make(plan)
#> ▶ target dat
#> ▶ target mod
Created on 2020-07-29 by the reprex package (v0.3.0)
By the way, please consider avoiding the iris dataset if you can: https://armchairecology.blog/iris-dataset/