I have a set of data representing a patient's hospital stays.
Some of these stays may contain an event (Boolean column). Starting with an "event" stay, the aim is to group together all the following stays if they are consecutive in time, in other words, if there is no day between the end of one and the start of the next. For these grouped stays, the costs of each stay is also added together to obtain an overall cost.
Here is an exemple:
library(tidyverse)
library(lubridate)
input <- tribble(
~ begin , ~ end , ~ event, ~ cost,
ymd("2022/11/21"), ymd("2023/11/30"), FALSE , 6,
ymd("2023/01/01"), ymd("2023/01/03"), TRUE , 8,
ymd("2023/01/07"), ymd("2023/01/10"), FALSE , 2,
ymd("2023/01/14"), ymd("2023/01/19"), TRUE , 3,
ymd("2023/01/19"), ymd("2023/01/25"), FALSE , 7,
ymd("2023/02/14"), ymd("2023/03/01"), FALSE , 5,
ymd("2023/04/03"), ymd("2023/04/11"), TRUE , 5,
ymd("2023/04/11"), ymd("2023/04/21"), FALSE , 2,
ymd("2023/04/21"), ymd("2023/04/23"), FALSE , 4,
ymd("2023/05/01"), ymd("2023/06/22"), FALSE , 15,
)
expected_ouput <- tribble(
~ begin , ~ end , ~ cost,
ymd("2023/01/01"), ymd("2023/01/03"), 8,
ymd("2023/01/14"), ymd("2023/01/25"), 10,
ymd("2023/04/03"), ymd("2023/04/23"), 11,
)
I have made up some boring solution - with nested while loops - that does not look like tidyverse or R style to me. Please enlighten me!
You may use the following -
library(dplyr)
input %>%
mutate(diff_days = as.integer(begin - lag(end)),
group = cumsum(event)) %>%
filter(event | diff_days == 0) %>%
summarise(begin = min(begin),
end = max(end),
cost = sum(cost), .by = group) %>%
select(-group)
# A tibble: 3 × 3
# begin end cost
# <date> <date> <dbl>
#1 2023-01-01 2023-01-03 8
#2 2023-01-14 2023-01-25 10
#3 2023-04-03 2023-04-23 11
We create diff_days
column to check if the days are consecutive, keep only those rows where event
is TRUE
or days are consecutive. I have also created a group
column to identify clear group of days based on event
variable. For each group
we then sum
the cost
and take min
and max
values of begin
and end
dates respectively.