rselectdplyrfilter

Subset columns based on condition, on a subset of columns (dplyr)


I want to subset columns based on a condition using select like this:

select(where(~sum(.) != 0))

but I only want to do it on a specific subset of columns, while keeping the others columns of my dataframe.

For example, I only want the cx1_a column gone but NOT the id3 one in the following example:

data <- data.frame(
  name1 = c("a", "b", "c", "d", "e"),
  name2 = c(6, 7, 4, 5, 5),
  id3 = c(0, 0, 0, 0, 0),
  cx1_a = c(0, 0, 0, 0, 0),
  cx1_b = c(1, 2, 0, 0, 0),
  cx2_a = c(0, 0, 1, 1, 0),
  cx2_b = c(4, 5, 0, 0, 0)
)

I tried to use this:

select(where(~ sum(.) != 0, .cols = starts_with("cx1_") | starts_with("cx2_"))))

But .cols doesn't work, do any of you have a solution? Thanks


Solution

  • Is this what you are looking for?

    library(tidyverse)
    
    data %>% 
      select(
        ., 
        - any_of(
          select(., matches('^cx[12]')) %>% keep(~ sum(.x) == 0) %>% names()
        )
      )