rconditional-statementsrecode

Why aren't my conditional ifelse recodes returning the expected values?


I'm creating a new variable that uses conditional "ifelse" statements based on the values of several source variables.

The source variables are a survey question that changed names over successive yearly editions of the survey. So when one of the variables has a value, the other variables, corresponding to other years of the survey, have missing ("NA") values.

The data are:

      A  B
[1,]  1 NA
[2,]  2 NA
[3,] NA NA
[4,] NA  1
[5,] NA  2

What I want to do is create "C" according to these rules:

 If A = 1 or B = 1, C = 1
 If A = 2 or B = 2, C = 2
 If A and B equal something other than 1 or 2, C = NA

This is what I get:

    A  B  C
1:  1 NA  1
2:  2 NA NA
3: NA NA NA
4: NA  1  1
5: NA  2 NA

So, in lines 2 and 5, C should be "2", not "NA".

Here's a reproducible example:

require(tidyverse)
require(dplyr)
require(data.table)

#Create Data
A <- as.numeric(c(1,2,NA,NA,NA))
B <- as.numeric(c(NA,NA,NA,1,2))
df_ <- cbind(A,B)

#Conditional Recode with dplyr
df2 <- df_ %>% data.frame() %>%
  mutate(C = ifelse(A == 1 | B == 1, 1, 
              ifelse(A == 2 | B == 2, 2, NA)))

#Conditional Recode with data.table()
df2 <- df_ %>% data.table() %>% 
  .[ , C := ifelse(A == 1 | B == 1, 1, 
            ifelse(A == 2 | B == 2, 2, NA))] 

I suspect it has something to do with the way R handles NAs.

Can anyone throw me a bone, here? Thanks! :)


Solution

  • Use %in%, not == when NA's are in the mix:

    df_ %>% data.frame() %>%
      mutate(C = ifelse(A %in% 1 | B %in% 1, 1, 
                        ifelse(A %in% 2 | B %in% 2, 2, NA)))
    
    #    A  B  C
    # 1  1 NA  1
    # 2  2 NA  2
    # 3 NA NA NA
    # 4 NA  1  1
    # 5 NA  2  2
    

    The reason is that == returns NA whenever the element is NA because equality comparison is undefined, so returns NA - whereas %in% is testing each element in a set (though here the "set" only has one value). Since there is no direct match, it returns FALSE. You can see this with a simpler example here:

    x <- c("A", "B", NA, "D")
    
    x == "A"
    # [1]  TRUE FALSE NA FALSE
    
    x %in% "A"
    # [1]  TRUE FALSE FALSE FALSE
    

    Also, you may want to use dplyr::case_when instead of nested ifelse:

    df_ %>% data.frame() %>%
      mutate(C = case_when(
        A %in% 1 | B %in% 1 ~ 1,
        A %in% 2 | B %in% 2 ~ 2
      ))
    
    #    A  B  C
    # 1  1 NA  1
    # 2  2 NA  2
    # 3 NA NA NA
    # 4 NA  1  1
    # 5 NA  2  2