rdataframedplyrtime

Creating a variable indicating last report of the day for each id but some last reports are after midnight


I have a dataframe like this:

A tibble: 30 x 4
       id index   day time    
    <dbl> <int> <int> <chr>   
 1 238686     1     1 11:53:33
 2 238686     2     1 17:45:27
 3 238686     3     1 21:12:36
 4 238686     4     2 00:32:36
 5 238686     5     2 11:07:08
 6 238686     6     2 14:43:41
 7 238686     7     2 20:50:29
 8 238686     8     2 23:22:33
 9 238686     9     3 12:05:53
10 238686    10     3 14:48:50

id refers to a participant (each participant provides several reports per day for 10 days), index refers to the consequtive number of report per participant throughout the whole study, day refers to day of study for each participant, and time refers to the time of day of each report.

I'd need a new variable indicating the last report for each day for each participant. I got a very helpful reply from here to do this using

ex_data <- ex_data |>
     mutate(last=as.integer(max(index) == index), .by = c(id, day))

This worked great; however, I hadn't realized that some participants had provided the last report of the day after midnight on some days. I managed to mark those occasions using

ex_data$time2<-as.hms(ex_data$time)

ex_data <- ex_data |>
mutate(nighttime = if_else(time2 >= parse_hms("00:00:00") & time2 < parse_hms("03:00:01"), 1, 0))

But I can't come up with a way to create a variable that would indicate whether a report is "not last report of the day" or "last report of the day" for each participant that would include last reports of the day given before and after midnight.

Here's the data (it's a snippet of the actual data):

ex_data<-
structure(list(id = c(238686, 238686, 238686, 238686, 238686, 
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686, 
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686, 
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686, 
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686, 
239297, 239297, 239297), index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 1L, 2L, 3L), day = c(1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 
10L, 1L, 1L, 1L), time = c("11:53:33", "17:45:27", "21:12:36", 
"00:32:36", "11:07:08", "14:43:41", "20:50:29", "23:22:33", "12:05:53", 
"14:48:50", "21:15:33", "12:09:46", "14:27:06", "18:01:24", "20:56:40", 
"23:17:18", "11:19:02", "17:32:18", "00:05:52", "11:45:11", "18:10:10", 
"20:08:09", "00:30:00", "11:17:36", "14:29:43", "18:12:06", "20:54:48", 
"23:20:32", "11:16:00", "17:26:45", "11:45:13", "14:30:26", "18:31:49", 
"20:31:42", "23:47:41", "14:16:07", "23:55:13", "11:16:34", "12:01:56", 
"14:33:38")), row.names = c(NA, -40L), class = c("tbl_df", "tbl", 
"data.frame"))

And what I'd like is

   id     index   day time       last  
  238686     1     1  11:53:33   0
  238686     2     1  17:45:27   0
  238686     3     1  21:12:36   0
  238686     4     2  00:32:36   1
  238686     5     2  11:07:08   0
  238686     6     2  14:43:41   0
  238686     7     2  20:50:29   0
  238686     8     2  23:22:33   1
  238686     9     3  12:05:53   0

Solution

  • So, we need to treat times before some cut-off time as if they were on the previous day. Create a new modified_day column that does this.

    day_cutoff = "03:00:01"
    ex_data |>
      mutate(day_modified = ifelse(time < day_cutoff, day - 1, day)) |>
      mutate(last = as.integer(row_number() == n()), .by = c(id, day_modified)) |>
      print(n = 20)
    # # A tibble: 40 × 6
    #        id index   day time     day_modified  last
    #     <dbl> <int> <int> <chr>           <dbl> <int>
    #  1 238686     1     1 11:53:33            1     0
    #  2 238686     2     1 17:45:27            1     0
    #  3 238686     3     1 21:12:36            1     0
    #  4 238686     4     2 00:32:36            1     1
    #  5 238686     5     2 11:07:08            2     0
    #  6 238686     6     2 14:43:41            2     0
    #  7 238686     7     2 20:50:29            2     0
    #  8 238686     8     2 23:22:33            2     1
    #  9 238686     9     3 12:05:53            3     0
    # 10 238686    10     3 14:48:50            3     0
    # 11 238686    11     3 21:15:33            3     1
    # 12 238686    12     4 12:09:46            4     0
    # 13 238686    13     4 14:27:06            4     0
    # 14 238686    14     4 18:01:24            4     0
    # 15 238686    15     4 20:56:40            4     0
    # 16 238686    16     4 23:17:18            4     1
    # 17 238686    17     5 11:19:02            5     0
    # 18 238686    18     5 17:32:18            5     0
    # 19 238686    19     6 00:05:52            5     1
    # 20 238686    20     6 11:45:11            6     0
    # # ℹ 20 more rows
    # # ℹ Use `print(n = ...)` to see more rows