rdplyrtidyverse

Which IDs have only zero-counts in variable across all days?


In my dataset there is the variable "cigarettes per day" (CPD) for 21 days and several subjects (ID). I want to know how many and which subjects never smoked (e.g. have only 0 in CPD) across the 21 days.

Here is a example dataset for 3 subjects and 5 days

day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
CPD <- c(3,4,0,2,0,0,0,0,0,0,4,0,0,0,1)
df <- data.frame(day, ID, CPD)

what I want would be something like this:

   day ID CPD
1    1  2 0
2    2  2 0
3    3  2 0
4    4  2 0
5    5  2 0

Solution

  • We may use a group by all approach

    library(dplyr)
    df %>% 
       group_by(ID) %>%
       filter(all(CPD %in% 0)) %>%
       ungroup
    

    -output

    # A tibble: 5 × 3
        day    ID   CPD
      <dbl> <dbl> <dbl>
    1     1     2     0
    2     2     2     0
    3     3     2     0
    4     4     2     0
    5     5     2     0
    

    Or without grouping

    df %>% 
       filter(!ID %in% ID[CPD != 0])
      day ID CPD
    1   1  2   0
    2   2  2   0
    3   3  2   0
    4   4  2   0
    5   5  2   0
    

    Or with base R

    subset(df, !ID %in% ID[CPD != 0])