rdataframedplyrdata.tablerollapply

rolling summarize conditioned to multiple variables on another dataframe


I have the following data example:

trap_data <- structure(list(site = c(1, 2, 3, 3), trap_date = structure(c(18809, 
18809, 18307, 18322), class = "Date")), class = "data.frame", row.names = c(NA, 
-4L))

climate <- structure(list(site = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), 
    V1 = c(0.1, 0.2, 0.1, 0.1, 0.5, 0.2, 0.3, 0.1, 0.2, 0.1, 
    0.1, 0.5), V2 = c(1, 1, 3, 1, 2, 3, 3, 1, 1, 3, 1, 2), date = structure(c(18779, 
    18790, 18789, 18792, 18791, 18790, 18789, 18792, 18305, 18306, 
    18307, 18308), class = "Date")), class = "data.frame", row.names = c(NA, 
-12L))

I would add columns in trap_data that summarize N, mean and sum, of V1 and V2 of climate data, conditioned by site and date. However, I would these new variables (N, mean and sum) are for the N days (these time lengths are variable. for example 20 current days) before the specified date in trap_data.

Thanks,


Solution

  • We may create a new column in the 'trap_data' - 20 days prior date and then do a join

    library(data.table)
    library(lubridate)
    setDT(trap_data)[, prev_date := trap_date - days(20)]
    setDT(climate)[trap_data, .(N = .N,
         MeanV1 = mean(V1, na.rm = TRUE),
         SumV1 = sum(V1, na.rm = TRUE),
         MeanV2 = mean(V2, na.rm = TRUE),
      SumV2 = sum(V2, na.rm = TRUE)),
       on = .(site, date >= prev_date, date <= trap_date), by = .EACHI]
    

    -output

        site       date       date     N    MeanV1 SumV1   MeanV2 SumV2
       <num>     <Date>     <Date> <int>     <num> <num>    <num> <num>
    1:     1 2021-06-11 2021-07-01     3 0.1333333   0.4 1.666667     5
    2:     2 2021-06-11 2021-07-01     4 0.2750000   1.1 2.250000     9
    3:     3 2020-01-26 2020-02-15     3 0.1333333   0.4 1.666667     5
    4:     3 2020-02-10 2020-03-01     4 0.2250000   0.9 1.750000     7