I have this kind of dataframe, but with a total of 14294 Gene_IDs and 36 Exp columns.
Gene_ID Exp_A Exp_B Exp_C Exp_D
Gene1 TRUE FALSE FALSE FALSE
Gene2 TRUE TRUE FALSE TRUE
Gene3 TRUE TRUE FALSE FALSE
Gene4 TRUE FALSE FALSE TRUE
Gene5 FALSE FALSE FALSE FALSE
I am attempting to filter
the rows according to the number of TRUE. I am interested in Gene_IDs where all but one value is TRUE, or only three values are TRUE, etc.
I have tried using filter. I also have tried transposing the data frame, where each column is a Gene_ID, and tried to select.
Nothing really works. I also got some errors. For example:
df %>% filter(if_any(everything(), ~ .x == TRUE))
gives me an
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Is there any dplyr function which is the best to do this?
Assuming you do have a data.frame with logical Exp_
columns, you could use pick()
for subsetting and rowSums()
for counting:
library(dplyr)
read.table(header = TRUE, text =
"Gene_ID Exp_A Exp_B Exp_C Exp_D
Gene1 TRUE FALSE FALSE FALSE
Gene2 TRUE TRUE FALSE TRUE
Gene3 TRUE TRUE FALSE FALSE
Gene4 TRUE FALSE FALSE TRUE
Gene5 FALSE FALSE FALSE FALSE") |>
filter(rowSums(pick(starts_with("Exp")) == TRUE) == 3)
#> Gene_ID Exp_A Exp_B Exp_C Exp_D
#> 1 Gene2 TRUE TRUE FALSE TRUE
Created on 2024-11-11 with reprex v2.1.1