rdataframevariables

Create a new variable indicating the last observation within each participant and day (group within group) in R


First, I apologize, I'm fairly sure this is a duplicate but the closest question/answer I found was this and I wasn't sure which part of the code in that answer was needed for transforming time variable.

So, I have a dataset like this:

id<-rep(1:20, each=16)
index<-rep(1:16, times=20)
day<-rep(rep(1:4, each=4), times=20)

mock_df<-data.frame(id, index, day)

This is a longitudinal data with participants providing several reports per day for several days. id refers to a participant, day to day and index to measurement occasion for each participant. In the mock data each "participant" provided 4 reports per day for a total of 4 days.

(However, the actual data is unbalanced, with participants having different amounts of days of reporting and different amounts of reports per day.)

I'd need a new variable "last" indicating the last measurement occasion for each day for each participant. So, what I'd like is e.g. this:

  id index day last
   1     1   1  0
   1     2   1  0
   1     3   1  0
   1     4   1  1
   1     5   2  0
   1     6   2  0
   1     7   2  0
   1     8   2  1
   1     9   3  0
  ...

I tried

mock_df <- mock_df |>
group_by(id) |>
mutate(last=day[which.max(day)]) |>
ungroup()

But this just created a variable with value 10 for all rows, I assume because in the actual data participants reported for 10 days.


Solution

  • Something like this might be of interest to you :

    library(dplyr)
    
    mock_df |>
      mutate(last = as.integer(max(index) == index), .by = c(id, day))