Why does `x[i]` return `x` when `i` is missing due to `recursive default argument reference`?

The function below throws an error because of a recursive default argument reference. This is intended behavior because default arguments are evaluated inside the scope of the function.

i = 1
f1 = function(i=i) i
f1() # Error: promise already under evaluation: recursive default argument reference

But if i is inside single brackets [, then there is no error, and the function returns x instead of the intended x[i]:

x = 1:5
i = 1:3
f2 = function(x, i=i) x[i]
f2(x) # No error, but returns x, not x[i]

This is a silent error that leads to subtle bugs. For example, in machine learning we may think we are subsetting the training set (x[i]), but are actually using both the training and testing set (x). The behavior occurs even if the variable i doesn't exist:

x = 1:5
if (exists("i")) rm(i)
f3 = function(x, i=i) x[i]
f3(x) # No error, but returns x, even if i doesnt exist.

A more reasonable behavior is when i is inside double brackets ([[ instead of [), which throws a missing subscript error:

x = 1:5
i = 1:3
f4 = function(x, i=i) x[[i]]
f4(x) # Error: missing subscript

My questions are:

Is the behavior of f2 and f3 intended or is it a bug?
If it is not a bug, then can someone explain the reasoning behind why it is intended? I briefly looked over the R source code for subsetting, but my knowledge of C is not enough to understand the behavior of f2.

Solution

This is an edit to my original answer:

I think it's not a bug, though you can argue that it's a design flaw.

First, there's an easy explanation for the difference between x[i] and x[[i]] behaviour: x[] is legal, and returns x. x[[]] is not legal, because it says to extract something, but doesn't say what.

Now, why did I say it is not a bug? Take a look at this example, a little simpler than yours, and not using the primitive function [:

f <- function(farg = farg) 
  if (missing(farg)) 
    message("Not an error:  farg is missing")

g <- function(garg)
  f(garg)

g()
#> Not an error:  farg is missing

^{Created on 2023-12-05 with reprex v2.0.2}

The function f() tests its argument using missing(farg). This doesn't ever evaluate the default value, it just reports on whether the argument was missing or not. So f() never ends up trying to evaluate farg.

The function [ is like f: if the index is missing, it just returns the whole vector, it doesn't try to evaluate the index. Since it never tries to evaluate i, it doesn't generate an error.

New addition:

But this explanation is incomplete. @Roland suggested looking at code like this: (I've changed names to match my example more closely):

g1 <- function(x, i=1) 
  x[i]
g1(1:5)
#> [1] 1

^{Created on 2023-12-05 with reprex v2.0.2}

Here i is missing in the call to g1(), but now it has a default value. So when R tries to evaluate x[i] it will check whether i is missing. It is evaluated in the context of g1(), where missing(i) would return true, but now we have a default value, so the missingness doesn't propagate, we get the default value 1 substituted for i and end up evaluating x[1].

Now what if the default had been i = i, as in the original question? Now when determining if i is missing in x[i], R will substitute the default value and check that. It finds that yes, i is missing, so x[i] returns the same thing as x[].

So why is it a design flaw? My first answer is that there's a question of whether missingness should propagate through function calls. "Obviously" the argument is not missing in f(garg), and yet R sees it as missing because garg was missing. You would get the results you expected if missingness depended on the form of the call, not the value in it. But that's not how R works.

My second answer is that it's a flaw because propagation of missingness is handled in a pretty subtle way. Arguments with default values are replaced with their default before deciding what to propagate.