I have this data frame,
id <- seq(1:4)
var_1 <- c(TRUE, FALSE, FALSE, FALSE)
var_2 <- c(FALSE, TRUE, FALSE, FALSE)
var_3 <- c(FALSE, TRUE, FALSE, FALSE)
var_4 <- c(FALSE, FALSE, TRUE, FALSE)
var_5 <- c(FALSE, TRUE, TRUE, FALSE)
df <- data.frame(id, var_1, var_2, var_3, var_4, var_5)
df
> df
id var_1 var_2 var_3 var_4 var_5
1 1 TRUE FALSE FALSE FALSE FALSE
2 2 FALSE TRUE TRUE FALSE TRUE
3 3 FALSE FALSE FALSE TRUE TRUE
4 4 FALSE FALSE FALSE FALSE FALSE
Now, I want to look on every row and see if any of the columns have TRUE
values. If yes, I want to record it in a new column result
. For the first row, I have TRUE
in only var_1
. So I record, the variable name where TRUE
found, i.e. var_1
. For second row, I have TRUE
in var_2
, var_3
and var_5
. The entry in the result column should be var_2, var_3, var_5
. Lastly, row 4 has no TRUE
, so it should be NA
. The resulting data.frame should look like this:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
I can do this,
library(tidyverse)
df |>
mutate(across(where(is.logical),
~ case_when(.x ~ cur_column()), .names = "{.col}")) |>
unite("result", var_1:var_5, sep = ", ", na.rm = TRUE)
id result
1 1 var_1
2 2 var_2, var_3, var_5
3 3 var_4, var_5
4 4
However, the problem is the number of variables should not be fixed to 5. It should be scalable to any variable number. Any idea how to achive that?
You can do the following:
left_join(
df,
pivot_longer(df,cols=starts_with("var_")) |>
filter(value==T) |>
reframe(result = toString(name),.by=id)
)
Output:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
You can also do this without pivoting longer.. For example:
df$result <- apply(df[,-1], 1, \(x) toString(names(x)[x]))
Output:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE