rdataframebooleanaddition

Boolean addition in R data frame produces a boolean instead of an integer


If I try to create a new column in an R dataframe by adding 3 boolean expressions in one step, it results in a boolean rather than an integer. If I use an intermediate step to first create columns for the 3 boolean expressions, I can add them up and get an integer. I don't understand why the two sets of code produce different results.

#The input is a dataframe with 3 variables that are sometimes missing
#and sometimes not.
subjid <- c(101,102,103,104,105,106,107,108)
var1 <- c(1,2,3,4,NaN,NaN,NaN,NaN)
var2 <- c(1,2,NaN,NaN,5,6,NaN,NaN)
var3 <- c(1,NaN,3,NaN,5,NaN,7,NaN)
df <- data.frame(subjid, var1, var2, var3)
df
subjid var1 var2 var3
1    101    1    1    1
2    102    2    2  NaN
3    103    3  NaN    3
4    104    4  NaN  NaN
5    105  NaN    5    5
6    106  NaN    6  NaN
7    107  NaN  NaN    7
8    108  NaN  NaN  NaN
#This code was intended to count how many of the 3 variables were nonmissing
#But it produces an unexpected result
df$nonmissing_count_a <- !is.na(df$var1) + !is.na(df$var2) + !is.na(df$var3)
table(df$nonmissing_count_a)
FALSE  TRUE
5     3
#This code is intended to obtain the same count of nonmissing variables
#And it works as expected
df$var1_nonmissing <- !is.na(df$var1)
df$var2_nonmissing <- !is.na(df$var2)
df$var3_nonmissing <- !is.na(df$var3)
df$nonmissing_count_b <- df$var1_nonmissing + df$var2_nonmissing + df$var3_nonmissing
table(df$nonmissing_count_b)
0 1 2 3
1 3 3 1

Solution

  • It happens because of operator precedence (see ?Syntax), try

    table((!is.na(df$var1)) + (!is.na(df$var2)) + (!is.na(df$var3)))
    
    0 1 2 3 
    1 3 3 1
    

    The addition + has higher precedence than negation !

    Keep in mind that you're actually expecting output from counting or adding 1s and 0s (numeric)

    table(as.numeric(!is.na(df$var1)) + 
          as.numeric(!is.na(df$var2)) + 
          as.numeric(!is.na(df$var3)))
    
    0 1 2 3 
    1 3 3 1
    

    Alternatively try rowSums

    table(rowSums(!is.na(df[,-1])))
    
    0 1 2 3 
    1 3 3 1