I have the following issue that I can't find an answer to:
I have a df that looks like this:
id x1_t1 x1_t2 x1_t3 y1_t1 y1_t2 y1_t3
1 1 0 0 0 100 110 150
2 2 0 0 1 150 140 180
3 3 0 1 1 90 150 160
x1 and y1 are in their multiple times (different time points). Depending on when the x1 variable changes from 0 to 1, I want the respective y1 values, so that the resulting dataframe would look like this:
id y1_before y1_after
1 1 110 150
2 2 140 180
3 3 90 150
I have already messed around with dplyr quite a bit, but haven't found a working solution. Perhaps you could give me an idea. thanks in advance!!
1) dplyr Using dat
shown reproducibly in the Note at the end, define the x1_* columns as x1
and then y1_* columns as y1
and the index of the match to the first 1 (or the last y1
if none) as i
. Then use i
to extract the desired values from x1
and y1
.
library(dplyr)
dat %>%
mutate(
x1 = pick(starts_with("x1")),
y1 = pick(starts_with("y1")),
i = match(1, x1, length(x1)), .by = id) %>%
summarize(y1_before = y1[[i-1]], y1_after = y1[[i]], .by = id)
giving
id y1_before y1_after
1 1 110 150
2 2 140 180
3 3 90 150
2) Base R Pretty much the same logic can be used in base R:
do.call("rbind", by(dat, dat$id, \(z) with(z, {
x1 <- z[grep("x1", names(z))]
y1 <- z[grep("y1", names(z))]
i <- match(1, x1, length(x1))
data.frame(id, y1_before = y1[[i-1]], y1_after = y1[[i]])
})))
dat <- data.frame(
id = 1:3,
x1_t1 = integer(3),
x1_t2 = c(0L, 0L, 1L),
x1_t3 = c(0L, 1L, 1L),
y1_t1 = c(100L, 150L, 90L),
y1_t2 = c(110L, 140L, 150L),
y1_t3 = c(150L, 180L, 160L)
)