I have a dataframe like
raw.df <- data.frame(id = c("X01", "X02", "X03", "X04", "X05", "X06", "X07", "X08", "X09", "X10"),
subject = c("S01", "S01", "S01", "S02", "S02", "S03", "S04", "S04", "S05", "S06"),
time = c("D0", "D1", "D2", "D0", "D2", "D0", "D0", "D2", "D2", "D2"),
response = c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N"))
I want to create a new column to add "Y" for the samples satisfying conditions, and "N" for the others. If any subject with response="Y" has two samples from times D0 and D2, they should be selected. Based on that, I need to obtain the dataframe below with "selected" column.
final.df <- data.frame(id = c("X01", "X02", "X03", "X04", "X05", "X06", "X07", "X08", "X09", "X10"),
subject = c("S01", "S01", "S01", "S02", "S02", "S03", "S04", "S04", "S05", "S06"),
time = c("D0", "D1", "D2", "D0", "D2", "D0", "D0", "D2", "D2", "D2"),
response = c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N"),
selected = c("Y", "N", "Y", "N", "N", "N", "Y", "Y", "N", "N"))
I can select samples without pairing using case_when by;
final.df <- raw.df %>%
mutate(selected = case_when(time %in% c("D0", "D2") & response = "Y" ~ "Y",
TRUE ~ "N"))
for pairing, I can use another code
final.df %>%
filter(selected == "Y") %>%
group_by(subject) %>%
add_count() %>% ungroup() %>%
mutate(n = if_else(n == 2, "Include", "Exclude")) %>%
dplyr::rename(`paired` = n)
and then I can combine those, but I wonder if there is a better combined solution
One way could be to create a table of matching cases and join to it. The matching cases where are where time is D0 or D1, response is Y, and there are two distinct times per subject -- ie at least one Y for each of those two times. Then I put in N where the join doesn't have a match.
raw.df |>
left_join(
raw.df |>
filter(time %in% c("D0", "D2"), response == "Y") |>
distinct(time, subject) |>
filter(n() == 2, .by = subject) |>
mutate(selected = "Y")) |>
mutate(selected = coalesce(selected, "N"))
Result
id subject time response selected
1 X01 S01 D0 Y Y
2 X02 S01 D1 Y N
3 X03 S01 D2 Y Y
4 X04 S02 D0 N N
5 X05 S02 D2 N N
6 X06 S03 D0 Y N
7 X07 S04 D0 Y Y
8 X08 S04 D2 Y Y
9 X09 S05 D2 Y N
10 X10 S06 D2 N N