rcurly-bracesevaluate

Proper syntax for 'curly' brackets after the pipe operator in R


Forgive me for asking what might be a simple question, perhaps I am misunderstanding how the curly brackets {} work specifically in R, but I am seeing some odd behavior - likely due to my own misunderstandings - and wanted to reach out to the community so I can understand my programming better. I am also not sure why I am seeing the is.na call returning an inappropriate result.

I have several columns of data with a number of na's in one or more columns. After removing the rows containing na's in one column, I want to check the data to make sure I know how many rows are left and document that all the na's are removed. I can do this in 3 separate lines, but am trying to use the pipe operator for simplicity.

library(magrittr)

df <- data.frame(a=rnorm(10, 3, 5),   #create a quick data frame without any na values
                 b=rnorm(10, -3, 5))
df %>% head()        #works
df %>% count()       #works
df %>% sum(is.na())  #doesn't work - error
#Error in is.na() : 0 arguments passed to 'is.na' which requires 1

df %>% sum(is.na(.)) #returns random number (perhaps sum of all values) instead of zero??

Perhaps a separate question, but why doesn't the first one work, and why does the second one not evaluate the 'is.na' argument? If I put curly braces around the 3rd argument, it returns the correct value:

df %>% {             #works, but why is this different?
  sum(is.na(.))
}
#[1] 0

Now when I try and evaluate all 3, I don't understand the behavior I see:

df %>% {             #doesn't work - error
  head()
  count()
  sum(is.na())
}
# Error in checkHT(n, dx <- dim(x)) : 
#   argument "x" is missing, with no default
df %>% {             #returns appropriate na count of zero, but nothing else is evaluated
  head(.)
  count(.)
  sum(is.na(.))
}
# [1] 0
df %>% {             #returns first and third result, but not count(.)
  print(head(.))
  count(.)
  sum(is.na(.))
}
#    a           b
# 1  0.3555877  -7.29064483
# 2 -2.6278037   4.30943634
# 3  5.6163705 -10.31436769
# 4 -2.8920773  -4.83949384
# 5  9.0941861  -0.09287319
# 6  2.6118720 -11.86665105

# [1] 0
df %>% {             #returns all three like I want
  print(head(.))
  print(count(.))
  sum(is.na(.))
}
#    a           b
# 1  0.3555877  -7.29064483
# 2 -2.6278037   4.30943634
# 3  5.6163705 -10.31436769
# 4 -2.8920773  -4.83949384
# 5  9.0941861  -0.09287319
# 6  2.6118720 -11.86665105

#   n
# 1 10

# [1] 0

Thanks for any advice in how to interpret this behavior so I can improve my code for next time.


Solution

  • This stems from aspects of how braces behave both in magrittr and base R.

    First, why does df %>% sum(is.na(.)) return an unexpectedly large number, while df %>% {sum(is.na(.))} works as you expect? By default, %>% pass the left-hand side to the first argument on the function on the right-hand side. So df %>% sum(is.na(.)) is equivalent to sum(df, is.na(df)), which should give you an idea of why it yields a large number. However, per the magrittr docs, this "behavior can be overruled by enclosing the right-hand side in braces." When the rhs is enclosed in braces, the lhs is only inserted where you explicitly add the . placeholder. So df %>% {sum(is.na(.))} is equivalent to sum(is.na(df)).

    Second, in

    df %>% {
      print(head(.))
      print(count(.))
      sum(is.na(.))
    }
    

    why do you have to wrap head(.) and count(.) in print(), but not sum()? This is because, per the R docs, expressions wrapped in { return "the result of the last expression evaluated." So the result of sum(is.na(.)) is returned and automatically printed, but the results of the prior expressions aren't returned, so must be explicitly print()ed.

    Finally, you might be interested in the nakedpipe package, which adds more flexibility to using pipes with embraced blocks of functions.