rdataframeerror-handlingtidyevalnse

Why are these error messages inconsistent when using NSE in R?


Question

Why do I get inconsistent error messages in R from a function that uses non-standard evaluation, and what is the best way to control this / write the function?

Initial setup that works as intended:

library(tidyverse)

df <- tibble(a = c(1,4,9),
             b = c("a", "b", "c"))
> df
# A tibble: 3 × 2
      a b    
  <dbl> <chr>
1     1 a    
2     4 b    
3     9 c    

myFun1 <- function(df, y){
  df <- df %>%
    mutate(c = str_to_upper({{y}}))
  return(df)
}

> myFun1(df, y = b)
# A tibble: 3 × 3
      a b     c    
  <dbl> <chr> <chr>
1     1 a     A    
2     4 b     B    
3     9 c     C   

Everything above works as intended. There are no errors with my function calls as shown.

Problem starts here

I made a mistake by not passing the dataframe's column as a parameter in the call, but received no error message:

> myFun1(df)     # Note the missing parameter of 'y = b' on this function call.
# A tibble: 3 × 3
      a b     c    
  <dbl> <chr> <chr>
1     1 a     ""   
2     4 b     ""   
3     9 c     ""   

My expectation when calling a function with missing parameters is that it would throw an error, which does not happen. (Side question: Why does it return an empty string "" for column c?)

I thought I could helpfully write my own error message -- which worked -- but not in the way I intended:

myFun2 <- function(df, y){
  if(is.null(y)){            
    stop("You are missing 'y'")
  }
  df <- df %>%
    mutate(c = str_to_upper({{y}}))
  return(df)
}

> myFun2(df)
Error in myFun2(df) : argument "y" is missing, with no default

That error above is triggered by is.null(y) and not by my if(){stop()} (so you ARE NOT seeing my intended error message), but what is interesting to me is that THIS IS THE ERROR MESSAGE I WAS HOPING/EXPECTING TO SEE when I called myFun1(df). So it seems that my use of {{y}} in the function prevents the triggering of the error message.(?)

Note: I realize (later, with more reading) that I could get my stop() message to trigger if I used if(missing(y)), but I'm more curious about the case above, where I get the error message I was originally hoping to get without the if(){stop()} call.

What if I try a different function, like sum()?

I tried to change functions, as below (shown working properly first):

myFun3 <- function(data, x){
  df <- data %>%
    mutate(c = sum({{x}}))
  return(df)
}

> myFun3(df, x = a)
# A tibble: 3 × 3
      a b         c
  <dbl> <chr> <dbl>
1     1 a        14
2     4 b        14
3     9 c        14

...and now with same mistake as before:

> myFun3(df)
Error in `mutate()`:
ℹ In argument: `c = sum()`.
Caused by error in `sum()`:
! invalid 'type' (symbol) of argument
Run `rlang::last_trace()` to see where the error occurred.
Called from: signal_abort(cnd, .file)
Browse[1]> 

So now I get an error message on the sum() call because the wrong type -- (symbol) -- is detected, which is my {{x}}. Okay, that makes sense (I think), but why then do I not get that same error message on the call str_to_upper({{y}}) ? Isn't {{y}} also a symbol in my original bad call of myFun1(df)? I am trying to understand this all in the context of learning non-standard evaluation, and I am reading Advanced R, but this issue above seems pretty specific. I appreciate any insight. Thank you.


Solution

  • The main problem is that is.null has to evaluate its argument to find out if the value of that argument is NULL. An object having a NULL value is not the same thing as that object not existing at all.

    For example, I don't have an object called bananas in my global environment, so I expect to get an error if I do:

    bananas
    #> Error: object 'bananas' not found
    

    Of course, the same is true if I try to look up the value of bananas to find out if it is numeric:

    is.numeric(bananas)
    #> Error: object 'bananas' not found
    

    Or if its value is NA:

    is.na(bananas)
    #> Error: object 'bananas' not found
    

    Or if its value is NULL:

    is.null(bananas)
    #> Error: object 'bananas' not found
    

    Whereas, if we assign the name bananas, even as NULL, there is no such error, because now at least there is an object called bananas, albeit one with a NULL value.

    bananas <- NULL
    bananas
    #> NULL
    is.null(bananas)
    #> [1] TRUE
    

    As you have discovered yourself, the correct usage would be to check missing(y):

    myFun2 <- function(df, y) {
      if(missing(y)) stop("Argument y is missing")
      mutate(df, c = str_to_upper({{y}}))
    }
    
    myFun2(df)
    #> Error in myFun2(df) : Argument y is missing
    

    As for why str_to_upper({{y}}) doesn't throw an error when y is missing yet sum({{y}}) does throw an error is because of how the curly-curly operator works. Ultimately, a missing y value here will be converted to an empty symbol (the same type of object you get with alist(a=)$a). A symbol can be converted into a character (in this case the character ""), so as.character, or any of the many functions that convert to character, will quite happily return an empty string here. Existence functions such as is.numeric or is.na will also work just fine, but they are testing a symbol.

    On the other hand, if you try to do any math with the missing variable, you are trying to perform arithmetic on a symbol, which doesn't make sense and throws an error.

    As for why the curly-curly operator allows a valid quosure even when its argument is missing, I can't say for sure whether this is an unavoidable side effect of the mechanism or a conscious design decision, but your plan to handle the case of a missing argument yourself is definitely a good idea either way.