rdataframedplyrnacombn

Count pairs of non-NA observations by row in selected columns


I have a dataframe:

  id    cog com emo
AUD-002 12  34  24
PAR-044 NA  28  38
BRE-019 0   NA  51
2-1-GRE NA  31  68

I am interested in counting non-NA values per row between all pairs of columns cog, com, emo

My required output is:

  id     cog com  emo cog-com cog-emo com-emo
AUD-002  12  34   24   1        1       1
PAR-044  NA  28   38   0        0       1
BRE-019  0   NA   51   0        1       0
2-1-GRE  NA  31   68   0        0       1

I found that the following question might be related: Count non-NA observations by row in selected columns but they count overall non-NA entries per row and not by pairs of columns of that row. Also, I can achieve this by using multiple statements like this:

library(dplyr)
df = df %>%
  mutate(count_cog_com = rowSums(!is.na(select(., 2:3))) - 1)

df = df %>%
  mutate(count_cog_emo = rowSums(!is.na(select(., 2,4))) - 1)

df = df %>%
  mutate(count_com_emo = rowSums(!is.na(select(., 3:4))) - 1)

But I don't want to use these on my actual data because I have several columns. Is there an easy dplyr way to achieve this functionality? Can these statements be joined somehow? Thank you fo your help!

The dput is as below:

dput(df)

structure(list(id = structure(c(2L, 4L, 3L, 1L), 
               .Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), 
               class = "factor"), 
               cog = c(12L, NA, 0L, NA), 
               com = c(34L, 28L, NA, 31L), 
               emo = c(24L, 38L, 51L, 68L)), 
           row.names = c(NA, -4L), class = "data.frame")

Solution

  • Here is a base R way.
    Function combn returns the combinations of its 1st argument and optionally applies a function to them. In this case it computes the row sums minus 1. The column names are then assigned in a similar way.

    df <-
      structure(list(
        id = structure(c(2L, 4L, 3L, 1L), 
                       .Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), 
                       class = "factor"), 
        cog = c(12L, NA, 0L, NA), 
        com = c(34L, 28L, NA, 31L), 
        emo = c(24L, 38L, 51L, 68L)), 
        row.names = c(NA, -4L), class = "data.frame")
    
    tmp <- combn(df[-1], 2, \(x) rowSums(!is.na(x)) - 1L)
    colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
    df <- cbind(df, tmp)
    rm(tmp)
    
    df
    #>        id cog com emo cog_com cog_emo com_emo
    #> 1 AUD-002  12  34  24       1       1       1
    #> 2 PAR-044  NA  28  38       0       0       1
    #> 3 BRE-019   0  NA  51       0       1       0
    #> 4 2-1-GRE  NA  31  68       0       0       1
    

    Created on 2022-10-15 with reprex v2.0.2


    Edit

    Answering to the request in comment, yes, it is possible. Have the anonymous function called by combn compute the logical && and coerce the result to integer. This will return a 0 if any of the values is NA and 1 otherwise.

    The line that needs to be changed is this:

    tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))
    

    A complete code run:

    tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))
    colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
    df <- cbind(df, tmp)
    rm(tmp)
    
    df
    #>        id cog com emo cog_com cog_emo com_emo
    #> 1 AUD-002  12  34  24       1       1       1
    #> 2 PAR-044  NA  NA  38       0       0       0
    #> 3 BRE-019   0  NA  51       0       1       0
    #> 4 2-1-GRE  NA  31  68       0       0       1
    

    Created on 2022-10-15 with reprex v2.0.2

    More readable but equivalent:

    tmp <- combn(df[-1], 2, \(x) {
      not_na <- apply(!is.na(x), 1, \(y) y[1] && y[2])
      as.integer(not_na)
    })