rlazy-evaluation

Why don't replacement functions use lazy evaluation?


Replacement functions, such as names<-, seem to not use lazy evaluation when called like names(x) <- c("a", "b").

To demonstrate, let's define a function to get the fractional part of a number and a corresponding replacement function - but inside the replacement function, include a line to print the defused value argument.

fractional <- function(x) { 
  x %% 1 
}

`fractional<-` <- function(x, value) {
  print(rlang::enexpr(value))
  invisible(x %/% 1 + value)
}

Now if we call fractional<- directly, it prints the expression we gave for value:

x <- 10.1
`fractional<-`(x, 0.2 + 0.2)
#> 0.2 + 0.2

But if we call it in the assignment form, it prints the result of evaluating the expression:

x <- 10.1
fractional(x) <- 0.2 + 0.2
#> [1] 0.4

The language definition explains replacement functions like:

names(x) <- c("a","b")

is equivalent to

`*tmp*` <- x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)

But this doesn't reproduce this behavior:

x <- 10.1

`*tmp*` <- x
x <- "fractional<-"(`*tmp*`, value=0.2 + 0.2)
rm(`*tmp*`)

#> 0.2 + 0.2

What is happening internally in <- that makes it so that value is passed to fractional<- after being evaluated, and is there any way to circumvent this behavior?


Edit: @SamR pointed out that using substitute captures the expression from the promise:

x <- 10.1
`fractional<-` <- function(x, value) {
  print(substitute(value))
  invisible(x %/% 1 + value)
}
fractional(x) <- 0.2 + 0.2

#> 0.2 + 0.2

So clearly I was mistaken to assume that value was being evaluated before being passed to fractional<-. However, I would still very much like to know why base::substitute works as expected here while rlang::enexpr and friends do not. After all, enexpr uses substitute internally:

enexpr <- function(arg) {
  .Call(ffi_enexpr, substitute(arg), parent.frame())
}

Debugging in R studio shows that, both when called in assignment form as fractional(x) <- 0.2 + 0.2 and in prefix form "fractional<-"(x, 0.2 + 0.2), fractional<- is passed an unevaluated promise for value:

enter image description here

This remains unevaluated when called in prefix form:

enter image description here

But is evaluated after the call to enexpr when called in assignment form:

enter image description here

I'm wondering if it has to do with the fact that in the assignment form, the function is called by a primitive function, <-? But it's not clear why that would make a difference.


Solution

  • In R, the form

    f(x) <- y
    

    is known as complex assignment, a term which also applies to the various subsetting assignments such as:

    x$y <- y
    x[1] <- y
    x[[1]] <- y
    

    The R interpreter handles complex assignments in the underlying C code via the function applydefine, in the file src/main/eval.c.

    The code is difficult to follow without knowing some details of how R is implemented in C, but essentially, when the parser comes across f(x) <- y, it rearranges the expression before evaluating it. Firstly, it appends a <- at the end of the function name f and builds a call to the function `f<-`. However, it does not simply rearrange the symbols and call `f<-`(x, y). Instead, it calls `f<-` with a temporary variable in place of x and a promise object in place of y. There are good reasons for this that we will see below.

    Before we get into that, let's confirm the differences between calling the function directly and via complex assignment. We can write a function that simply prints the arguments it is called with and leaves x as it is:

    `f<-` <- function(x, value) {
      print(as.list(match.call()))
      invisible(x)
    }
    

    Calling this directly, we get no surprises:

    x <- 1
    `f<-`(x, "foo")
    #> [[1]]
    #> `f<-`
    #> 
    #> $x
    #> x
    #> 
    #> $value
    #> [1] "foo"
    

    But look what happens when we use the complex assignment syntax:

    f(x) <- "foo"
    #> [[1]]
    #> `f<-`
    #> 
    #> $x
    #> `*tmp*`
    #> 
    #> $value
    #> <promise: 0x0000024723abce98>
    

    We can see that x has been replaced by a variable called *tmp* and that "foo" has been replaced with a promise object.

    The *tmp* variable is referred to in the C code as R_TmpvalSymbol, and is written into the call as a way of holding intermediate calculations in case of nested complex assignments. This is described in the code comments:

    We need a temporary variable to hold the intermediate values in the computation. For efficiency reasons we record the location where this variable is stored. We need to protect the location in case the binding is removed from its environment by user code or an assignment within the assignment arguments

    As for the promise being used in place of the expression on the right hand side, the code comments explain that

    It's important that the rhs get evaluated first because assignment is right associative i.e. a <- b <- c is parsed as a <- (b <- c).

    However, R likes to use lazy evaluation (not evaluating code until it has to), and rather than fully evaluating the right hand side, the C function applydefine locks in the value as a promise. A promise object has two components: firstly, some code stored as a language object, and secondly the environment in which that code should be evaluated. Doing almost anything with a promise will force its evaluation. Although it seems that complex assignment is not using lazy evaluation, it actually is - it's just that we need to extract the language object from the promise before it evaluates.

    Fortunately, as SamR mentions in the comments, this is exactly what substitute does if you give it a promise (if you are interested, it does it right here)

    `f<-` <- function(x, value) {
         substitute(value)
    }
    
    x <- 1
    f(x) <- 0.2 + 0.2
    x
    #> 0.2 + 0.2
    

    The reason why enexpr doesn't work here is that the argument being passed to value in the complex assignment is not the unevaluated language object 0.2 + 0.2, but a promise comprising both the language object 0.2 + 0.2 plus the evaluation environment. It is therefore this promise object rather than the raw code that is returned from rlang:::ffi_enexpr, the C function called from inside enexpr. Returning the promise forces its evaluation, so we get the numeric value 0.4 instead of the language object 0.2 + 0.2.