So I have two data frames I am working with. The first contains information on a cohort of kids. These same kids have been measured multiple times on different days (so uniqueID AD_1 indicates child AD receiving their first set of measurements, and AD_2 indicates the same child receiving their second set of measurements). We are interested in the date windows between these measurement days, and have made columns start_date and end_date to define the windows, like so:
uniqueID | start_date | end_date |
---|---|---|
AD_1 | 2/22/2023 | 2/24/2023 |
KJ_1 | 2/22/2023 | 2/26/2023 |
AD_2 | 2/24/2023 | 3/8/2023 |
KJ_2 | 2/26/2023 | 3/2/2023 |
AD_3 | 3/8/2023 | 3/10/2023 |
KJ_3 | 3/2/2023 | 3/9/2023 |
The second data frame contains a continuous range of dates, and maximum daily temperatures for each date:
date | maxtemp |
---|---|
2/22/2023 | 23.3 |
2/23/2023 | 18.8 |
2/24/2023 | 25.2 |
2/25/2023 | 27.0 |
2/26/2023 | 19.1 |
2/27/2023 | 20.2 |
2/28/2023 | 20.7 |
3/1/2023 | 26.1 |
3/2/2023 | 27.2 |
3/3/2023 | 20.8 |
3/4/2023 | 22.4 |
3/5/2023 | 21.8 |
3/6/2023 | 23.9 |
3/7/2023 | 24.9 |
3/8/2023 | 25.1 |
3/9/2023 | 22.7 |
3/10/2023 | 25.5 |
What I need to be able to do is count the number of days over 25° for each growth window (i.e., between each start and end date) as specified in the first data frame, using the temperature data in the second data frame.
I attempted to do this using the following code:
df1 %>%
transmute(uniqueID, day = map2(start_date, end_date, seq, by = "1 day")) %>%
group_by(uniqueID) %>%
summarise(count = df2$maxtemp >= 25)
However, this seemed to just generate a massive list of T/F values based on whether each day in the date window had a maximum temperature over 25°. Is there a way to have it add up the occurrences of days over 25°, to generate a new column in the first data frame, like this?
uniqueID | start_date | end_date | daysover25 |
---|---|---|---|
AD_1 | 2/22/2023 | 2/24/2023 | 1 |
KJ_1 | 2/22/2023 | 2/26/2023 | 2 |
AD_2 | 2/24/2023 | 3/8/2023 | 5 |
KJ_2 | 2/26/2023 | 3/2/2023 | 2 |
AD_3 | 3/8/2023 | 3/10/2023 | 1 |
KJ_3 | 3/2/2023 | 3/9/2023 | 2 |
You can try
df1 %>%
left_join(
df2 %>%
filter(maxtemp >= 25),
by = join_by(start_date <= date, end_date >= date)
) %>%
reframe(daysover25 = n(), .by = uniqueID:end_date)
which gives
uniqueID start_date end_date daysover25
1 AD_1 2023-02-22 2023-02-24 1
2 KJ_1 2023-02-22 2023-02-26 2
3 AD_2 2023-02-24 2023-03-08 5
4 KJ_2 2023-02-26 2023-03-02 2
5 AD_3 2023-03-08 2023-03-10 2
6 KJ_3 2023-03-02 2023-03-09 2
> dput(df1)
structure(list(uniqueID = c("AD_1", "KJ_1", "AD_2", "KJ_2", "AD_3",
"KJ_3"), start_date = structure(c(19410, 19410, 19412, 19414,
19424, 19418), class = "Date"), end_date = structure(c(19412,
19414, 19424, 19418, 19426, 19425), class = "Date")), class = "data.frame", row.names = c(NA,
-6L))
> dput(df2)
structure(list(date = structure(c(19410, 19411, 19412, 19413,
19414, 19415, 19416, 19417, 19418, 19419, 19420, 19421, 19422,
19423, 19424, 19425, 19426), class = "Date"), maxtemp = c(23.3,
18.8, 25.2, 27, 19.1, 20.2, 20.7, 26.1, 27.2, 20.8, 22.4, 21.8,
23.9, 24.9, 25.1, 22.7, 25.5)), class = "data.frame", row.names = c(NA,
-17L))