I have data frame df_have. I would like to create data frame df_want, where I retain the value of flag to fill in missing value until flag changes it value to a non-missing for each patient ID. I tried function fill( ,.direction=down), however, this results in carrying forward the previous value across all observations. I would like to stop filling the FLAG column once FLAG=="Y".
USUBJID <- c(1,1,1,1,1,1, 2,2,2,2,2,2 ,3,3,3,3,3,3)
FLAG <- c("N", NA, NA, "Y", NA, NA, "N", NA, NA, "Y", NA, NA, "N", NA, "Y", NA, NA, NA)
df_have<-data.frame(USUBJID, FLAG)
df_have
USUBJID <- c(1,1,1,1,1,1, 2,2,2,2,2,2 ,3,3,3,3,3,3)
FLAG <- c("N", "N", "N", "Y", NA, NA, "N", "N", "N", "Y", NA, NA, "N", "N", "Y", NA, NA, NA)
df_want<-data.frame(USUBJID, FLAG)
df_want
You could create a column filled
, a copy of FLAG
, then use fill()
on this column. Finally check that values are in the first group of filled values, otherwise use the values from the original column FLAG
.
df_have %>%
group_by(USUBJID) %>%
mutate(filled = FLAG) %>%
fill(filled) %>%
mutate(FLAG = ifelse(consecutive_id(filled)==1, filled, FLAG)) %>%
ungroup() %>%
select(-filled)
# A tibble: 18 × 2
USUBJID FLAG
<dbl> <chr>
1 1 N
2 1 N
3 1 N
4 1 Y
5 1 NA
6 1 NA
7 2 N
8 2 N
9 2 N
10 2 Y
11 2 NA
12 2 NA
13 3 N
14 3 N
15 3 Y
16 3 NA
17 3 NA
18 3 NA