rdplyrfill

Fill or carry LOCF up to a certain value


I have data frame df_have. I would like to create data frame df_want, where I retain the value of flag to fill in missing value until flag changes it value to a non-missing for each patient ID. I tried function fill( ,.direction=down), however, this results in carrying forward the previous value across all observations. I would like to stop filling the FLAG column once FLAG=="Y".

USUBJID <- c(1,1,1,1,1,1, 2,2,2,2,2,2 ,3,3,3,3,3,3)
FLAG <- c("N", NA, NA, "Y", NA, NA,    "N", NA, NA, "Y", NA, NA,   "N", NA, "Y", NA, NA, NA)

df_have<-data.frame(USUBJID, FLAG)
df_have

USUBJID <- c(1,1,1,1,1,1, 2,2,2,2,2,2 ,3,3,3,3,3,3)
FLAG <- c("N", "N", "N", "Y", NA, NA,    "N", "N", "N", "Y", NA, NA,   "N", "N", "Y", NA, NA, NA)

df_want<-data.frame(USUBJID, FLAG)
df_want

Solution

  • You could create a column filled, a copy of FLAG, then use fill() on this column. Finally check that values are in the first group of filled values, otherwise use the values from the original column FLAG.

    df_have %>%
      group_by(USUBJID) %>%
      mutate(filled = FLAG) %>%
      fill(filled) %>%
      mutate(FLAG = ifelse(consecutive_id(filled)==1, filled, FLAG)) %>%
      ungroup() %>%
      select(-filled)
    # A tibble: 18 × 2
       USUBJID FLAG 
         <dbl> <chr>
     1       1 N    
     2       1 N    
     3       1 N    
     4       1 Y    
     5       1 NA   
     6       1 NA   
     7       2 N    
     8       2 N    
     9       2 N    
    10       2 Y    
    11       2 NA   
    12       2 NA   
    13       3 N    
    14       3 N    
    15       3 Y    
    16       3 NA   
    17       3 NA   
    18       3 NA