rdplyrsummarize

R dplyr summarise over intervals


This is probably an R dplyr summarise question. I have a data.frame with values recorded for subjects at 5 minute time intervals and it has these three columns: id: subject ID, value: the recorded value at the time point, and cum_time: the cumulative time value for each id:

library(dplyr)
set.seed(1)
df <- data.frame(id = c(rep("id1", 100), rep("id2", 100), rep("id3", 100)),
                 value = runif(300, 10, 20)) %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(cum_time = 5 * (dplyr::row_number()-1))

I'd like to compute a data.frame with the medians of value over 60 minute intervals to give this resulting data.frame:

rbind(data.frame(id = "id1", median_value = c(median(dplyr::filter(df, id == "id1" & cum_time >= 0 & cum_time <= 60)$value),
                                              median(dplyr::filter(df, id == "id1" & cum_time >= 65 & cum_time <= 120)$value),
                                              median(dplyr::filter(df, id == "id1" & cum_time >= 125 & cum_time <= 180)$value),
                                              median(dplyr::filter(df, id == "id1" & cum_time >= 185 & cum_time <= 240)$value),
                                              median(dplyr::filter(df, id == "id1" & cum_time >= 245 & cum_time <= 300)$value)),
                 cum_time = c(60, 120, 180, 240, 300)),
      data.frame(id = "id2", median_value = c(median(dplyr::filter(df, id == "id2" & cum_time >= 0 & cum_time <= 60)$value),
                                              median(dplyr::filter(df, id == "id2" & cum_time >= 65 & cum_time <= 120)$value),
                                              median(dplyr::filter(df, id == "id2" & cum_time >= 125 & cum_time <= 180)$value),
                                              median(dplyr::filter(df, id == "id2" & cum_time >= 185 & cum_time <= 240)$value),
                                              median(dplyr::filter(df, id == "id2" & cum_time >= 245 & cum_time <= 300)$value)),
                 cum_time = c(60, 120, 180, 240, 300)),
      data.frame(id = "id3", median_value = c(median(dplyr::filter(df, id == "id3" & cum_time >= 0 & cum_time <= 60)$value),
                                              median(dplyr::filter(df, id == "id3" & cum_time >= 65 & cum_time <= 120)$value),
                                              median(dplyr::filter(df, id == "id3" & cum_time >= 125 & cum_time <= 180)$value),
                                              median(dplyr::filter(df, id == "id3" & cum_time >= 185 & cum_time <= 240)$value),
                                              median(dplyr::filter(df, id == "id3" & cum_time >= 245 & cum_time <= 300)$value)),
                 cum_time = c(60, 120, 180, 240, 300)))

    id median_value cum_time
1  id1     15.72853       60
2  id1     15.74687      120
3  id1     14.87811      180
4  id1     16.00048      240
5  id1     14.57858      300
6  id2     15.98761       60
7  id2     14.65317      120
8  id2     15.36035      180
9  id2     15.16835      240
10 id2     13.90954      300
11 id3     12.68951       60
12 id3     15.79852      120
13 id3     14.03968      180
14 id3     14.29187      240
15 id3     15.11250      300

Solution

  • Perhaps this approach works for you (with addition thanks to @thelatemail)?

    df %>% 
      filter(cum_time<=300) %>% 
      group_by(id, grp=cut(cum_time, seq(0, max(cum_time),60), include.lowest = T)) %>% 
      summarize(median_value = median(value), cum_time=max(cum_time), .groups = "drop") %>% 
      select(-grp)
    

    Output:

        id median_value cum_time
    1  id1     15.72853       60
    2  id1     15.74687      120
    3  id1     14.87811      180
    4  id1     16.00048      240
    5  id1     14.57858      300
    6  id2     15.98761       60
    7  id2     14.65317      120
    8  id2     15.36035      180
    9  id2     15.16835      240
    10 id2     13.90954      300
    11 id3     12.68951       60
    12 id3     15.79852      120
    13 id3     14.03968      180
    14 id3     14.29187      240
    15 id3     15.11250      300