This is probably an R
dplyr
summarise
question.
I have a data.frame
with values recorded for subjects at 5 minute time intervals and it has these three columns: id
: subject ID, value
: the recorded value at the time point, and cum_time
: the cumulative time value for each id
:
library(dplyr)
set.seed(1)
df <- data.frame(id = c(rep("id1", 100), rep("id2", 100), rep("id3", 100)),
value = runif(300, 10, 20)) %>%
dplyr::group_by(id) %>%
dplyr::mutate(cum_time = 5 * (dplyr::row_number()-1))
I'd like to compute a data.frame
with the median
s of value
over 60 minute intervals to give this resulting data.frame
:
rbind(data.frame(id = "id1", median_value = c(median(dplyr::filter(df, id == "id1" & cum_time >= 0 & cum_time <= 60)$value),
median(dplyr::filter(df, id == "id1" & cum_time >= 65 & cum_time <= 120)$value),
median(dplyr::filter(df, id == "id1" & cum_time >= 125 & cum_time <= 180)$value),
median(dplyr::filter(df, id == "id1" & cum_time >= 185 & cum_time <= 240)$value),
median(dplyr::filter(df, id == "id1" & cum_time >= 245 & cum_time <= 300)$value)),
cum_time = c(60, 120, 180, 240, 300)),
data.frame(id = "id2", median_value = c(median(dplyr::filter(df, id == "id2" & cum_time >= 0 & cum_time <= 60)$value),
median(dplyr::filter(df, id == "id2" & cum_time >= 65 & cum_time <= 120)$value),
median(dplyr::filter(df, id == "id2" & cum_time >= 125 & cum_time <= 180)$value),
median(dplyr::filter(df, id == "id2" & cum_time >= 185 & cum_time <= 240)$value),
median(dplyr::filter(df, id == "id2" & cum_time >= 245 & cum_time <= 300)$value)),
cum_time = c(60, 120, 180, 240, 300)),
data.frame(id = "id3", median_value = c(median(dplyr::filter(df, id == "id3" & cum_time >= 0 & cum_time <= 60)$value),
median(dplyr::filter(df, id == "id3" & cum_time >= 65 & cum_time <= 120)$value),
median(dplyr::filter(df, id == "id3" & cum_time >= 125 & cum_time <= 180)$value),
median(dplyr::filter(df, id == "id3" & cum_time >= 185 & cum_time <= 240)$value),
median(dplyr::filter(df, id == "id3" & cum_time >= 245 & cum_time <= 300)$value)),
cum_time = c(60, 120, 180, 240, 300)))
id median_value cum_time
1 id1 15.72853 60
2 id1 15.74687 120
3 id1 14.87811 180
4 id1 16.00048 240
5 id1 14.57858 300
6 id2 15.98761 60
7 id2 14.65317 120
8 id2 15.36035 180
9 id2 15.16835 240
10 id2 13.90954 300
11 id3 12.68951 60
12 id3 15.79852 120
13 id3 14.03968 180
14 id3 14.29187 240
15 id3 15.11250 300
Perhaps this approach works for you (with addition thanks to @thelatemail)?
df %>%
filter(cum_time<=300) %>%
group_by(id, grp=cut(cum_time, seq(0, max(cum_time),60), include.lowest = T)) %>%
summarize(median_value = median(value), cum_time=max(cum_time), .groups = "drop") %>%
select(-grp)
Output:
id median_value cum_time
1 id1 15.72853 60
2 id1 15.74687 120
3 id1 14.87811 180
4 id1 16.00048 240
5 id1 14.57858 300
6 id2 15.98761 60
7 id2 14.65317 120
8 id2 15.36035 180
9 id2 15.16835 240
10 id2 13.90954 300
11 id3 12.68951 60
12 id3 15.79852 120
13 id3 14.03968 180
14 id3 14.29187 240
15 id3 15.11250 300