I am trying to filter pairs (identified by id
) based on conditions. I have the following data frame,
id <- c(1,1,2,2,1,1,3,3,4,4)
PorF <- c("start","fail","start","pass","start","pass","start","pass","start","fail")
timest <- c(as.POSIXct("2021-05-08 08:15:07"),
as.POSIXct("2021-05-08 08:15:45"),
as.POSIXct("2021-05-28 08:17:09"),
as.POSIXct("2021-05-28 08:17:25"),
as.POSIXct("2021-05-28 08:32:07"),
as.POSIXct("2021-05-28 08:32:12"),
as.POSIXct("2021-05-28 08:33:14"),
as.POSIXct("2021-05-28 08:33:45"),
as.POSIXct("2021-05-28 08:34:12"),
as.POSIXct("2021-05-28 08:34:56"))
testdf <- data.frame(id, PorF, timest)
testdf
> testdf
id PorF timest
1 1 start 2021-05-08 08:15:07
2 1 fail 2021-05-08 08:15:45
3 2 start 2021-05-28 08:17:09
4 2 pass 2021-05-28 08:17:25
5 1 start 2021-05-28 08:32:07
6 1 pass 2021-05-28 08:32:12
7 3 start 2021-05-28 08:33:14
8 3 pass 2021-05-28 08:33:45
9 4 start 2021-05-28 08:34:12
10 4 fail 2021-05-28 08:34:56
I want to filter those IDs that has a start and a pass. All the pairs with start and fail should be filtered out.
My expected output should look like this,
> filtered_testdf
id PorF timest
2 start 2021-05-28 08:17:09
2 pass 2021-05-28 08:17:25
1 start 2021-05-28 08:32:07
1 pass 2021-05-28 08:32:12
3 start 2021-05-28 08:33:14
3 pass 2021-05-28 08:33:45
What I am trying is this which is not giving what I want,
testdf |>
group_by(id) |>
filter(PorF == "start" & PorF == "pass")
Any idea how to achieve the expected result?
You can do:
library(tidyverse)
testdf |>
mutate(id_helper = cumsum(PorF == "start")) |>
filter(any(PorF == "start") & any(PorF == "pass"), .by = id_helper)
Or equivalently:
testdf |>
mutate(id_helper = cumsum(PorF == "start")) |>
filter(!any(PorF == "fail"), .by = id_helper)
id PorF timest id_helper
1 2 start 2021-05-28 08:17:09 2
2 2 pass 2021-05-28 08:17:25 2
3 1 start 2021-05-28 08:32:07 3
4 1 pass 2021-05-28 08:32:12 3
5 3 start 2021-05-28 08:33:14 4
6 3 pass 2021-05-28 08:33:45 4
Note that I had to create an additional id variable since (see my comment above) there are several pairs of rows with the same id (i.e. id 1 has two pairs).