rlme4drake-r-package

drake plan fitting lmer models fails


I am trying to fit some lme4::lmer models in drake plan, but am getting an error

'data' not found, and some variables missing from formula environment

If I substitute an lm model, it works.

Here is a reproducible example

library(drake)
library(lme4)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:drake':
#> 
#>     expand

plan_lm <- drake_plan(
  dat = iris,
  mod = lm(Sepal.Length ~ Petal.Length, data = dat)
)

make(plan_lm)
#> ℹ Consider drake::r_make() to improve robustness.
#> ▶ target dat
#> ▶ target mod

plan_lmer <- drake_plan(
  dat1 = iris,
  mod1 = lmer(Sepal.Length ~ Petal.Length, data = dat1)
)

make(plan_lmer)
#> ▶ target dat1
#> ▶ target mod1
#> x fail mod1
#> Error: target mod1 failed.
#> diagnose(mod1)$error$message:
#>   'data' not found, and some variables missing from formula environment
#> diagnose(mod1)$error$calls:
#>   lme4::lFormula(formula = Sepal.Length ~ Petal.Length, data = dat1, 
#>     control = list("nloptwrap", TRUE, 1e-05, TRUE, FALSE, list(
#>         "ignore", "stop", "ignore", "stop", "stop", "message+drop.cols", 
#>         "warning", "stop"), list(list("warning", 0.002, NULL), 
#>         list("message", 1e-04), list("warning", 1e-06)), list()))
#>   lme4:::checkFormulaData(formula, data, checkLHS = control$check.formula.LHS == 
#>     "stop")
#>   base::stop("'data' not found, and some variables missing from formula environment", 
#>     call. = FALSE)
Created on 2020-07-29 by the reprex package (v0.3.0)

Any suggestions?


Solution

  • This edge case is an instance of https://github.com/ropensci/drake/issues/1012 and https://github.com/ropensci/drake/issues/1163. drake creates its own environments to run commands, so the environment with dat is different from the environment where the model actually runs. There are good reasons drake does this, and the behavior is not going to change, so this issue is unfortunately permanent unless lme4 changes. The best workaround I can offer is to create the formula in the target's environment at runtime, something like the reprex below. You have to manually force the data and the formula to be in the same environment. I recommend writing a custom function to do this.

    library(drake)
    suppressPackageStartupMessages(library(lme4))
    
    fit_lmer <- function(dat) {
      envir <- environment()
      envir$dat <- dat
      f <- as.formula("Reaction ~ Days + (Days | Subject)", env = envir)
      lme4::lmer(f, data = dat)
    }
    
    plan <- drake_plan(
      dat = sleepstudy,
      mod = fit_lmer(dat)
    )
    
    make(plan)
    #> ▶ target dat
    #> ▶ target mod
    

    Created on 2020-07-29 by the reprex package (v0.3.0)

    By the way, please consider avoiding the iris dataset if you can: https://armchairecology.blog/iris-dataset/