I have a ~38,000 observation dataframe (hr_clean) that has a POSIXct value for each row, this is fitbit data that tracks average Heart Rate over every minute for the duration of a month.
I want to check if the Date and Time of each row of hr_clean is during sleeping or waking hours. I have a dataframe of sleep ranges that I will use to make this comparison (sleep_ranges).
For each row I need to check if the Date and Time is in between the sleep time and wake time (POSIXct objects), to determine if the subject was asleep during that minute.
I wrote a for loop that effectively does this; however it is incredibly slow when running a comparison of 38,000 by 26 sleep ranges. Please help me make this code efficient.
For Loop:
# Set seed for reproducibility
set.seed(123)
# Define the start and end dates
start_date <- ymd_hms("2025-03-01 00:00:00")
end_date <- ymd_hms("2025-04-01 23:59:59")
hr_clean <- data.frame(dateTime = start_date + runif(10, 0, as.numeric(difftime(end_date, start_date, units = "secs"))))
sleep_ranges <- data.frame(startTime = ymd_hms(c("2025-03-15 00:58:00",
"2025-03-14 00:17:00",
"2025-03-12 23:38:00",
"2025-03-12 00:21:30")),
endTime = ymd_hms(c("2025-03-15 09:10:30",
"2025-03-14 07:30:30",
"2025-03-13 07:45:30",
"2025-03-12 07:26:30")))
is_sleeping_df <- data.frame(dateTime = character(),
is_sleeping = character())
for(minute in hr_clean$dateTime){
print(as_datetime(minute))
sleepTF = (any(minute > sleep_ranges$startTime & minute < sleep_ranges$endTime))
df <- data.frame(dateTime = as_datetime(minute),
is_sleeping = sleepTF)
is_sleeping_df <- rbind(is_sleeping_df, df)
}
You might use |
in Reduce
with data.table
.
> library(data.table)
> setDT(hr_clean); setDT(sleep_ranges);
>
> hr_clean[, asleep := Reduce(`|`, lapply(seq_len(nrow(sleep_ranges)), \(i)
+ dateTime %between% sleep_ranges[i]))][]
dateTime asleep
<POSc> <lgcl>
1: 2025-03-10 04:34:18 FALSE
2: 2025-03-26 04:37:47 FALSE
3: 2025-03-14 01:41:06 TRUE
4: 2025-03-29 05:16:26 FALSE
5: 2025-03-31 02:20:17 FALSE
6: 2025-03-02 10:56:30 FALSE
7: 2025-03-17 21:03:24 FALSE
8: 2025-03-29 12:29:06 FALSE
9: 2025-03-18 14:57:01 FALSE
10: 2025-03-15 14:13:24 FALSE
Same logic in base R, which is slightly slower (~1.5 times).
> hr_clean |>
+ transform(
+ asleep=Reduce(`|`, lapply(seq_len(nrow(sleep_ranges)), \(i)
+ dateTime > sleep_ranges[i, 1] &
+ dateTime < sleep_ranges[i, 2]))
+ )
dateTime asleep
1 2025-03-10 04:34:18 FALSE
2 2025-03-26 04:37:47 FALSE
3 2025-03-14 01:41:06 TRUE
4 2025-03-29 05:16:26 FALSE
5 2025-03-31 02:20:17 FALSE
6 2025-03-02 10:56:30 FALSE
7 2025-03-17 21:03:24 FALSE
8 2025-03-29 12:29:06 FALSE
9 2025-03-18 14:57:01 FALSE
10 2025-03-15 14:13:24 FALSE
Alternatively outer
in base R, but it's much slower.
> hr_clean |>
+ transform(
+ asleep=rowSums(
+ outer(hr_clean$dateTime,
+ asplit(sleep_ranges, 1),
+ Vectorize(\(x, y) x > y[1] & x < y[2]))
+ ) > 0
+ )
dateTime asleep
<POSc> <lgcl>
1: 2025-03-10 04:34:18 FALSE
2: 2025-03-26 04:37:47 FALSE
3: 2025-03-14 01:41:06 TRUE
4: 2025-03-29 05:16:26 FALSE
5: 2025-03-31 02:20:17 FALSE
6: 2025-03-02 10:56:30 FALSE
7: 2025-03-17 21:03:24 FALSE
8: 2025-03-29 12:29:06 FALSE
9: 2025-03-18 14:57:01 FALSE
10: 2025-03-15 14:13:24 FALSE