I have data that contains multiple replications of program output. For each replication, I need to sort by Name, then Time, then locate when the value in the Stage column changes. The row where the Stage value changes should then be written to a new dataframe. The new dataframe will contain all rows where that Stage variable changes value.
If my data looks like this:
Rep | Time | Name | Stage |
---|---|---|---|
1 | 33.075 | Tom | none |
1 | 85.972 | Tom | incubation |
1 | 86.139 | Tom | incubation |
1 | 29.120 | Sam | none |
1 | 33.099 | Sam | incubation |
1 | 90.568 | Sam | symptomatic |
2 | 33.075 | Tom | none |
2 | 85.972 | Tom | incubation |
2 | 29.120 | Tom | none |
2 | 33.099 | Sam | none |
2 | 90.568 | Sam | none |
3 | 32.515 | Tom | none |
3 | 90.863 | Joe | symptomatic |
3 | 86.139 | Joe | incubation |
3 | 85.972 | Tom | incubation |
3 | 29.120 | Joe | none |
My resulting dataframe needs to look like this:
Rep | Time | Name | Stage |
---|---|---|---|
1 | 85.972 | Tom | incubation |
1 | 33.099 | Sam | incubation |
1 | 90.568 | Sam | symptomatic |
2 | 85.972 | Tom | incubation |
3 | 85.972 | Tom | incubation |
3 | 86.139 | Joe | incubation |
3 | 90.863 | Joe | symptomatic |
It doesn't seem like it would be that complicated, but I have been playing around with it all day and can't get it working properly.
In base R
, we can try
Y =
X[order(X$Rep, -xtfrm(X$Name), X$Time), ] |>
transform(i = ave(Stage, Rep, Name, FUN=\(x) x!=c(NA, head(x, -1)))) |>
subset(as.logical(i), -i)
> Y Rep Time Name Stage 2 1 85.972 Tom incubation 5 1 33.099 Sam incubation 6 1 90.568 Sam symptomatic 8 2 85.972 Tom incubation 15 3 85.972 Tom incubation 14 3 86.139 Joe incubation 13 3 90.863 Joe symptomatic
Data made reproducible
X = structure(list(Rep = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3), Time = c(33.075, 85.972, 86.139, 29.12, 33.099, 90.568,
33.075, 85.972, 29.12, 33.099, 90.568, 32.515, 90.863, 86.139,
85.972, 29.12), Name = c("Tom", "Tom", "Tom", "Sam", "Sam", "Sam",
"Tom", "Tom", "Tom", "Sam", "Sam", "Tom", "Joe", "Joe", "Tom",
"Joe"), Stage = c("none", "incubation", "incubation", "none",
"incubation", "symptomatic", "none", "incubation", "none", "none",
"none", "none", "symptomatic", "incubation", "incubation", "none"
)), class = "data.frame", row.names = c(NA, -16L))