Consider the data frame
id <-c(1,1,1,2,2,3,3,3,3)
x1 <-c("no","yes","yes","no","no","no","no","no","yes")
x2 <-c(9,3,7,4,5,2,5,6,11)
df <- data.frame(id, x1, x2)
For each id I wish to keep the first row where x1="yes" and where x1 is completely "no" for a particular id I want to retain the last row for that id. I wrote the code
df %>% group_by(id) %>%
filter(if(x1 == "yes") x1 == "yes" else row_number() %in% c(n()))
but it runs into an error. The expect is
id x1 x2
1 yes 3
2 no 5
3 yes 11
I was wondering if this code could be improved to get this issue resolved? I also appreciate alternative approaches (e.g., data.table approach) of handling this. Thank you!
your code fails because filter produces a vector for each group instead of a single true/false.
so x1 == "yes"
produces a vector like c(FALSE, TRUE, TRUE)
while your if statement expects a singular TRUE
or FALSE
.
I recommend using slice for this use case as such:
df %>%
group_by(id) %>%
slice(if (any(x1 == "yes")) which.max(x1 == "yes") else n())
any(x1 == "yes")
makes it so that it checks the entire group and returns a single TRUE
or FALSE
and using slice()
picks rows by their position (eg: 1st row, 5th row)
so which.max(x1 == "yes")
finds the position of the first "yes"
and selects it.
and if none exist, it gives the position of the last "no"