I have a dataframe:
id cog com emo
AUD-002 12 34 24
PAR-044 NA 28 38
BRE-019 0 NA 51
2-1-GRE NA 31 68
I am interested in counting non-NA values per row between all pairs of columns cog, com, emo
My required output is:
id cog com emo cog-com cog-emo com-emo
AUD-002 12 34 24 1 1 1
PAR-044 NA 28 38 0 0 1
BRE-019 0 NA 51 0 1 0
2-1-GRE NA 31 68 0 0 1
I found that the following question might be related: Count non-NA observations by row in selected columns but they count overall non-NA entries per row and not by pairs of columns of that row. Also, I can achieve this by using multiple statements like this:
library(dplyr)
df = df %>%
mutate(count_cog_com = rowSums(!is.na(select(., 2:3))) - 1)
df = df %>%
mutate(count_cog_emo = rowSums(!is.na(select(., 2,4))) - 1)
df = df %>%
mutate(count_com_emo = rowSums(!is.na(select(., 3:4))) - 1)
But I don't want to use these on my actual data because I have several columns. Is there an easy dplyr
way to achieve this functionality? Can these statements be joined somehow? Thank you fo your help!
The dput is as below:
dput(df)
structure(list(id = structure(c(2L, 4L, 3L, 1L),
.Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"),
class = "factor"),
cog = c(12L, NA, 0L, NA),
com = c(34L, 28L, NA, 31L),
emo = c(24L, 38L, 51L, 68L)),
row.names = c(NA, -4L), class = "data.frame")
Here is a base R way.
Function combn
returns the combinations of its 1st argument and optionally applies a function to them. In this case it computes the row sums minus 1. The column names are then assigned in a similar way.
df <-
structure(list(
id = structure(c(2L, 4L, 3L, 1L),
.Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"),
class = "factor"),
cog = c(12L, NA, 0L, NA),
com = c(34L, 28L, NA, 31L),
emo = c(24L, 38L, 51L, 68L)),
row.names = c(NA, -4L), class = "data.frame")
tmp <- combn(df[-1], 2, \(x) rowSums(!is.na(x)) - 1L)
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)
df
#> id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002 12 34 24 1 1 1
#> 2 PAR-044 NA 28 38 0 0 1
#> 3 BRE-019 0 NA 51 0 1 0
#> 4 2-1-GRE NA 31 68 0 0 1
Created on 2022-10-15 with reprex v2.0.2
Answering to the request in comment, yes, it is possible. Have the anonymous function called by combn
compute the logical &&
and coerce the result to integer. This will return a 0
if any of the values is NA
and 1
otherwise.
The line that needs to be changed is this:
tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))
A complete code run:
tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)
df
#> id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002 12 34 24 1 1 1
#> 2 PAR-044 NA NA 38 0 0 0
#> 3 BRE-019 0 NA 51 0 1 0
#> 4 2-1-GRE NA 31 68 0 0 1
Created on 2022-10-15 with reprex v2.0.2
More readable but equivalent:
tmp <- combn(df[-1], 2, \(x) {
not_na <- apply(!is.na(x), 1, \(y) y[1] && y[2])
as.integer(not_na)
})