rselectdplyrfilter

dplyr::select or dplyr::filter - if all values are TRUE


I have this kind of dataframe, but with a total of 14294 Gene_IDs and 36 Exp columns.

Gene_ID Exp_A   Exp_B   Exp_C    Exp_D
Gene1   TRUE    FALSE   FALSE    FALSE
Gene2   TRUE    TRUE    FALSE    TRUE
Gene3   TRUE    TRUE    FALSE    FALSE
Gene4   TRUE    FALSE   FALSE    TRUE
Gene5   FALSE   FALSE   FALSE    FALSE

I am attempting to filter the rows according to the number of TRUE. I am interested in Gene_IDs where all but one value is TRUE, or only three values are TRUE, etc.

I have tried using filter. I also have tried transposing the data frame, where each column is a Gene_ID, and tried to select.

Nothing really works. I also got some errors. For example:

df %>% filter(if_any(everything(), ~ .x == TRUE))

gives me an

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Is there any dplyr function which is the best to do this?


Solution

  • Assuming you do have a data.frame with logical Exp_ columns, you could use pick() for subsetting and rowSums() for counting:

    library(dplyr)
    
    read.table(header = TRUE, text = 
    "Gene_ID Exp_A   Exp_B   Exp_C    Exp_D
    Gene1   TRUE    FALSE   FALSE    FALSE
    Gene2   TRUE    TRUE    FALSE    TRUE
    Gene3   TRUE    TRUE    FALSE    FALSE
    Gene4   TRUE    FALSE   FALSE    TRUE
    Gene5   FALSE   FALSE   FALSE    FALSE") |> 
      filter(rowSums(pick(starts_with("Exp")) == TRUE) == 3)
    #>   Gene_ID Exp_A Exp_B Exp_C Exp_D
    #> 1   Gene2  TRUE  TRUE FALSE  TRUE
    

    Created on 2024-11-11 with reprex v2.1.1