I have a dataframe like this:
A tibble: 30 x 4
id index day time
<dbl> <int> <int> <chr>
1 238686 1 1 11:53:33
2 238686 2 1 17:45:27
3 238686 3 1 21:12:36
4 238686 4 2 00:32:36
5 238686 5 2 11:07:08
6 238686 6 2 14:43:41
7 238686 7 2 20:50:29
8 238686 8 2 23:22:33
9 238686 9 3 12:05:53
10 238686 10 3 14:48:50
id refers to a participant (each participant provides several reports per day for 10 days), index refers to the consequtive number of report per participant throughout the whole study, day refers to day of study for each participant, and time refers to the time of day of each report.
I'd need a new variable indicating the last report for each day for each participant. I got a very helpful reply from here to do this using
ex_data <- ex_data |>
mutate(last=as.integer(max(index) == index), .by = c(id, day))
This worked great; however, I hadn't realized that some participants had provided the last report of the day after midnight on some days. I managed to mark those occasions using
ex_data$time2<-as.hms(ex_data$time)
ex_data <- ex_data |>
mutate(nighttime = if_else(time2 >= parse_hms("00:00:00") & time2 < parse_hms("03:00:01"), 1, 0))
But I can't come up with a way to create a variable that would indicate whether a report is "not last report of the day" or "last report of the day" for each participant that would include last reports of the day given before and after midnight.
Here's the data (it's a snippet of the actual data):
ex_data<-
structure(list(id = c(238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
239297, 239297, 239297), index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 1L, 2L, 3L), day = c(1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 1L, 1L, 1L), time = c("11:53:33", "17:45:27", "21:12:36",
"00:32:36", "11:07:08", "14:43:41", "20:50:29", "23:22:33", "12:05:53",
"14:48:50", "21:15:33", "12:09:46", "14:27:06", "18:01:24", "20:56:40",
"23:17:18", "11:19:02", "17:32:18", "00:05:52", "11:45:11", "18:10:10",
"20:08:09", "00:30:00", "11:17:36", "14:29:43", "18:12:06", "20:54:48",
"23:20:32", "11:16:00", "17:26:45", "11:45:13", "14:30:26", "18:31:49",
"20:31:42", "23:47:41", "14:16:07", "23:55:13", "11:16:34", "12:01:56",
"14:33:38")), row.names = c(NA, -40L), class = c("tbl_df", "tbl",
"data.frame"))
And what I'd like is
id index day time last
238686 1 1 11:53:33 0
238686 2 1 17:45:27 0
238686 3 1 21:12:36 0
238686 4 2 00:32:36 1
238686 5 2 11:07:08 0
238686 6 2 14:43:41 0
238686 7 2 20:50:29 0
238686 8 2 23:22:33 1
238686 9 3 12:05:53 0
So, we need to treat times before some cut-off time as if they were on the previous day. Create a new modified_day
column that does this.
day_cutoff = "03:00:01"
ex_data |>
mutate(day_modified = ifelse(time < day_cutoff, day - 1, day)) |>
mutate(last = as.integer(row_number() == n()), .by = c(id, day_modified)) |>
print(n = 20)
# # A tibble: 40 × 6
# id index day time day_modified last
# <dbl> <int> <int> <chr> <dbl> <int>
# 1 238686 1 1 11:53:33 1 0
# 2 238686 2 1 17:45:27 1 0
# 3 238686 3 1 21:12:36 1 0
# 4 238686 4 2 00:32:36 1 1
# 5 238686 5 2 11:07:08 2 0
# 6 238686 6 2 14:43:41 2 0
# 7 238686 7 2 20:50:29 2 0
# 8 238686 8 2 23:22:33 2 1
# 9 238686 9 3 12:05:53 3 0
# 10 238686 10 3 14:48:50 3 0
# 11 238686 11 3 21:15:33 3 1
# 12 238686 12 4 12:09:46 4 0
# 13 238686 13 4 14:27:06 4 0
# 14 238686 14 4 18:01:24 4 0
# 15 238686 15 4 20:56:40 4 0
# 16 238686 16 4 23:17:18 4 1
# 17 238686 17 5 11:19:02 5 0
# 18 238686 18 5 17:32:18 5 0
# 19 238686 19 6 00:05:52 5 1
# 20 238686 20 6 11:45:11 6 0
# # ℹ 20 more rows
# # ℹ Use `print(n = ...)` to see more rows