Selecting grouped samples satisfying desired conditions

I have a dataframe like

raw.df <- data.frame(id = c("X01", "X02", "X03", "X04", "X05", "X06", "X07", "X08", "X09", "X10"),
           subject = c("S01", "S01", "S01", "S02", "S02", "S03", "S04", "S04", "S05", "S06"),
           time = c("D0", "D1", "D2", "D0", "D2", "D0", "D0", "D2", "D2", "D2"),
           response = c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N"))

I want to create a new column to add "Y" for the samples satisfying conditions, and "N" for the others. If any subject with response="Y" has two samples from times D0 and D2, they should be selected. Based on that, I need to obtain the dataframe below with "selected" column.

final.df <- data.frame(id = c("X01", "X02", "X03", "X04", "X05", "X06", "X07", "X08", "X09", "X10"),
           subject = c("S01", "S01", "S01", "S02", "S02", "S03", "S04", "S04", "S05", "S06"),
           time = c("D0", "D1", "D2", "D0", "D2", "D0", "D0", "D2", "D2", "D2"),
           response = c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N"),
           selected = c("Y", "N", "Y", "N", "N", "N", "Y", "Y", "N", "N"))

I can select samples without pairing using case_when by;

final.df <- raw.df %>%
  mutate(selected = case_when(time %in% c("D0", "D2") & response = "Y" ~ "Y",
                              TRUE ~ "N"))

for pairing, I can use another code

final.df %>%
  filter(selected == "Y") %>%
  group_by(subject) %>%
  add_count() %>% ungroup() %>%
  mutate(n = if_else(n == 2, "Include", "Exclude")) %>%
  dplyr::rename(`paired` = n)

and then I can combine those, but I wonder if there is a better combined solution

Solution

One way could be to create a table of matching cases and join to it. The matching cases where are where time is D0 or D1, response is Y, and there are two distinct times per subject -- ie at least one Y for each of those two times. Then I put in N where the join doesn't have a match.

raw.df |>
  left_join(
    raw.df |>
      filter(time %in% c("D0", "D2"), response == "Y") |>
      distinct(time, subject) |>
      filter(n() == 2, .by = subject) |>
      mutate(selected = "Y")) |>
  mutate(selected = coalesce(selected, "N"))

Result

    id subject time response selected
1  X01     S01   D0        Y        Y
2  X02     S01   D1        Y        N
3  X03     S01   D2        Y        Y
4  X04     S02   D0        N        N
5  X05     S02   D2        N        N
6  X06     S03   D0        Y        N
7  X07     S04   D0        Y        Y
8  X08     S04   D2        Y        Y
9  X09     S05   D2        Y        N
10 X10     S06   D2        N        N