rr-markdownpurrrenvironmentr-glue

Use of glue in map in an RMarkdown in a new environment


Consider the following Rmarkdown document:

---
title: "Environments"
author: "Me"
date: "2023-01-13"
output: html_document
---

```{r setup}
library(glue)
library(purrr)
```

```{r vars}
a <- 1
x <- list("`a` has the value: {a}")
```

```{r works}
glue(x[[1L]])
```

```{r does-not-work, error = TRUE}
map_chr(x, glue)
```

When using RStudio's knit button, everything works like a charm and the output is as follows:

The rendered HTML file where all chunks are rendered properly

However, if I try to call render myself with an own environment, it fails:

ne <- new.env()
render("env.Rmd", envir = ne)

The rendered HTML file where the second chunk produces an error

So apparently glue trips over the environments when used within purrr::map.

How would I call render with an own environment without generating this error? Ideally, I do not want to chnage the Rmarkdown itself.

Update

Interestingly enough, if I wrap glue in an own function things work smoothly again:

```
glue <- function(...) glue::glue(...)
map_chr(x, glue)
```

Update 2

The problem seems not to be related to knitr/rmarkdown, but is a general scoping issue which seems to have to do with the environments the involved functions are defined:

library(rlang)
library(purrr)
library(glue)
rm(list = ls())

e <- env(a = 1, x = "`a` has the value: {a}")
delayedAssign("res", map_chr(x, glue), e, e)
e$res
# Error:
# ℹ In index: 1.
# Caused by error:
# ! object 'a' not found

## as opposed to

a <- 1
x <- "`a` has the value: {a}"
delayedAssign("res", {
   map_chr(x, glue)
})
res

# [1] "`a` has the value: 1"

Solution

  • This has nothing to do with either RMarkdown or ‘glue’. It’s also not a bug, contrary to what I claimed previously. In fact, the issue can be reproduced by simply accessing a variable inside the environment e, e.g. via the get function:

    e = env(a = 1)
    local(lapply("a", get), envir = e)
    # Error in FUN(X[[i]], ...) : object 'a' not found
    

    This is a consequence of R’s lexical scoping rules:

    lapply executes FUN (= get) inside its call frame.1 Due to the way R scoping works,2 FUN will look up variable names in its calling scope. This calling scope is the lapply call frame. Of course a does not exist in the call frame of FUN (by contrast, X and FUN exist, since they are parameter names of lapply).

    If R does not find a name in the local scope, it continues searching “upwards”, in the parent environment of the current environment. The parent environment of a call frame is the environment in which the function was defined. In the case of lapply, this is namespace:base.

    namespace:base also does not define the name a, so the search continues upwards. Its parent environment is .GlobalEnv.3 And that is why lapply("a", get) works (purely by accident!) if we defined a inside the global environment.4 However, in our case where we defined a inside another environment, that environment is never searched, unless we attach() it to the search path (but of course that’s a bad idea).

    The workaround is to invoke the function (either glue or get, or whatever needs to access local variables) inside an anonymous function. Strictly speaking we should always do this, not just when working on a different environment:

    local(lapply("a", \(.) get(.)), envir = e)
    # [1] "1.000000"
    

    This works because the anonymous function \(.) get(.) is defined inside the calling scope which, in this example, is e. So when lapply executes this function, get first searches the name a in the local scope of the anonymous function, doesn’t find a, and then walks up the chain of parent environments. And the first parent environment is the environment in which the anonymous function was defined: e.

    Note, however, that we need to take care with the choice of our parameter name! Because the scope of the anonymous function is the first one that is searched, it takes precedence and can hide our intended variable:

    # Works:
    local(lapply("a", \(.) get(.)), envir = e)
    # [[1]]
    # [1] 1
    
    # Fails:
    local(lapply("a", \(a) get(a)), envir = e)
    # [[1]]
    # [1] "a"
    

    1 In fact lapply is implemented as an internal function in C; but for the sake of this discussion we can pretend that it is defined in R as follows:

    lapply = function (X, FUN, ...) {
      FUN = match.fun(FUN)
      if (! is.vector(X) || is.object(X)) X = as.list(X)
      res = vector('list', length(X))
    
      for (i in seq_along(X)) res[[i]] = FUN(X[[i]], ...)
      res
    }
    

    Also note that I am using lapply instead of map_chr, but analogous reasoning applies to map_chr.

    2 I would like to emphasise that R’s scoping rules make perfect sense and are internally consistent, even though it is inconvenient in this case. In fact, lexical scoping is generally superior to other scoping rules.

    3 I’ve argued before that this is in fact a bug in R. At the very least it is a seriously questionable design decision which leads to errors and misunderstandings, and this question is a prime example. For this reason, my package ‘box’ defines module environments differently, specifically to avoid this behaviour.

    4 For example (and to illustrate that the original code really only worked purely by accident!), consider the following, where we change the variable name a to sum:

    sum = 1
    lapply("sum", get)
    # [[1]]
    # function (..., na.rm = FALSE)  .Primitive("sum")
    

    … oops! get didn’t return the value of the global variable we defined but rather a function defined in namespace:base, because namespace:base comes before .GlobalEnv in the chain of parent environments that get searched. And the same is true in the case of purrr::map_chr.