rtidyverse

Creating a column containing variable names of a data frame based on values in the columns


I have this data frame,

id <- seq(1:4)
var_1 <- c(TRUE, FALSE, FALSE, FALSE)
var_2 <- c(FALSE, TRUE, FALSE, FALSE)
var_3 <- c(FALSE, TRUE, FALSE, FALSE)
var_4 <- c(FALSE, FALSE, TRUE, FALSE)
var_5 <- c(FALSE, TRUE, TRUE, FALSE)
df <- data.frame(id, var_1, var_2, var_3, var_4, var_5)
df
> df
  id var_1 var_2 var_3 var_4 var_5
1  1  TRUE FALSE FALSE FALSE FALSE
2  2 FALSE  TRUE  TRUE FALSE  TRUE
3  3 FALSE FALSE FALSE  TRUE  TRUE
4  4 FALSE FALSE FALSE FALSE FALSE

Now, I want to look on every row and see if any of the columns have TRUE values. If yes, I want to record it in a new column result. For the first row, I have TRUE in only var_1. So I record, the variable name where TRUE found, i.e. var_1. For second row, I have TRUE in var_2, var_3 and var_5. The entry in the result column should be var_2, var_3, var_5. Lastly, row 4 has no TRUE, so it should be NA. The resulting data.frame should look like this:

 id var_1 var_2 var_3 var_4 var_5              result
1  1  TRUE FALSE FALSE FALSE FALSE               var_1
2  2 FALSE  TRUE  TRUE FALSE  TRUE var_2, var_3, var_5
3  3 FALSE FALSE FALSE  TRUE  TRUE        var_4, var_5
4  4 FALSE FALSE FALSE FALSE FALSE                <NA>

I can do this,

library(tidyverse)
df |> 
  mutate(across(where(is.logical),
                ~ case_when(.x ~ cur_column()), .names = "{.col}")) |> 
  unite("result", var_1:var_5, sep = ", ", na.rm = TRUE)

  id              result
1  1               var_1
2  2 var_2, var_3, var_5
3  3        var_4, var_5
4  4  

However, the problem is the number of variables should not be fixed to 5. It should be scalable to any variable number. Any idea how to achive that?


Solution

  • You can do the following:

    left_join(
      df, 
      pivot_longer(df,cols=starts_with("var_")) |> 
        filter(value==T) |> 
        reframe(result = toString(name),.by=id)
    )
    

    Output:

      id var_1 var_2 var_3 var_4 var_5              result
    1  1  TRUE FALSE FALSE FALSE FALSE               var_1
    2  2 FALSE  TRUE  TRUE FALSE  TRUE var_2, var_3, var_5
    3  3 FALSE FALSE FALSE  TRUE  TRUE        var_4, var_5
    4  4 FALSE FALSE FALSE FALSE FALSE                <NA>
    

    You can also do this without pivoting longer.. For example:

    df$result <- apply(df[,-1], 1, \(x) toString(names(x)[x]))
    

    Output:

     id var_1 var_2 var_3 var_4 var_5              result
    1  1  TRUE FALSE FALSE FALSE FALSE               var_1
    2  2 FALSE  TRUE  TRUE FALSE  TRUE var_2, var_3, var_5
    3  3 FALSE FALSE FALSE  TRUE  TRUE        var_4, var_5
    4  4 FALSE FALSE FALSE FALSE FALSE