rdataframefilteradvanced-filter

Filter a column in R based on another column with 2 criteria


I have a column - "issue_slip" in R dataframe - "vouchers" with values/rows such as

Issue slip: IS/001, IS/001, IS/001, IS/002, IS/002, IS/002

and another column "rec_status" with values 0 or 1. Each issue_slip row can have rec_status 0 or 1. I would like to keep only those issue_slips that have all rec_status as 0 OR 0 or 1 --> remove issue_slip rows that have all rec_status as 1.

For example,

  1. IS/001 - 1,
  2. IS/001 - 0 ,
  3. IS/001 - 1

should show up and not get filtered out because at least one row has rec_status = 1

I tried using the filter and subset functions but could not figure out how to go about filtering this in the same column


Solution

  • Sample data

    quux <- data.frame(issue_slip = c("IS/001", "IS/001", "IS/001", "IS/002", "IS/002", "IS/002"), rec_status = c(0, 0, 1, 1, 1, 1))
    quux
    #   issue_slip rec_status
    # 1     IS/001          0
    # 2     IS/001          0
    # 3     IS/001          1
    # 4     IS/002          1
    # 5     IS/002          1
    # 6     IS/002          1
    

    base R

    ind <- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) > 0
    ind
    # [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
    quux[ind,]
    #   issue_slip rec_status
    # 1     IS/001          0
    # 2     IS/001          0
    # 3     IS/001          1
    

    dplyr

    library(dplyr)
    quux %>%
      group_by(issue_slip) %>%
      filter(any(rec_status %in% 0)) %>%
      ungroup()
    # # A tibble: 3 × 2
    #   issue_slip rec_status
    #   <chr>           <dbl>
    # 1 IS/001              0
    # 2 IS/001              0
    # 3 IS/001              1
    

    data.table

    library(data.table)
    as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
    #    issue_slip rec_status
    #        <char>      <num>
    # 1:     IS/001          0
    # 2:     IS/001          0
    # 3:     IS/001          1
    

    Note, I'm using rec_status %in% 0 instead of rec_status == 0 for a reason: since we have no sample data (and often even when we do), I have no assurance that there are not any NAs in the data; note that NA == 0 will return NA itself and therefore often fail non-defensive code, but NA %in% 0 returns false, which is often what we need (and I'm inferring it's what we want here).