rmerge

Merge timeseries dataframes of different lengths and time intervals


I wish to join two dataframes of different lengths and time intevals based on common variable DateTime15 so that the Hourly_Rainfall_mm in df2 is merged to the df1 where the DateTime15 matches, which is every hour.

df1<-structure(list(Tag = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = structure(c(1721174400, 
1721174400, 1721174400, 1721174400, 1721174400, 1721174400, 1721174400, 
1721174400, 1721174400, 1721174400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(46820, 47764, 48665, 49566, 
50468, 51368, 52269, 53170, 54071, 54972), class = c("hms", "difftime"
), units = "secs"), Temp = c(17.9, 17.9, 17.9, 17.8, 17.8, 17.8, 
17.8, 17.7, 17.7, 17.7), `Baro (mb)` = c(1016, 1016, 1016, 1016, 
1016, 1016, 1016, 1016, 1016, 1016), pH = c(8.45, 8.42, 8.4, 
8.38, 8.38, 8.37, 8.37, 8.36, 8.36, 8.37), pHmV = c(-68.3, -67.1, 
-66.2, -65.2, -65, -64.8, -64.7, -64.4, -64.3, -64.5), `ORP (REDOX)` = c(225.8, 
212.5, 221.1, 229.1, 234, 237.5, 239.6, 240.6, 242.3, 242.6), 
    DO_Sat = c(107.4, 107.4, 106.6, 106.1, 106.3, 106.4, 106.5, 
    106.4, 106.7, 106.6), DO_mgL = c(10.16, 10.16, 10.08, 10.06, 
    10.08, 10.09, 10.09, 10.11, 10.14, 10.13), Conductivity = c(563, 
    561, 562, 561, 563, 560, 564, 564, 564, 565), `RES (Ohms.cm)` = c(2053, 
    2061, 2057, 2066, 2057, 2070, 2053, 2057, 2057, 2053), `TDS (mg/L)` = c(365, 
    364, 365, 364, 365, 364, 366, 366, 366, 367), `SAL (ppt)` = c(0.24, 
    0.24, 0.24, 0.24, 0.24, 0.23, 0.24, 0.24, 0.24, 0.24), `SSG (st)` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), NH3 = c(0.01, 0.01, 0.01, 0.01, 
    0.01, 0.01, 0.01, 0.01, 0.01, 0.01), cDOM = c(8.9, 8.9, 9.1, 
    9, 7.9, 8.7, 9.2, 8.7, 8.6, 8.9), Ammonium = c(0.13, 0.13, 
    0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12), DateTime = structure(c(1721217620, 
    1721218564, 1721219465, 1721220366, 1721221268, 1721222168, 
    1721223069, 1721223970, 1721224871, 1721225772), class = c("POSIXct", 
    "POSIXt"), tzone = ""), DateTime15 = structure(c(1721217600, 
    1721218500, 1721219400, 1721220300, 1721221200, 1721222100, 
    1721223000, 1721223900, 1721224800, 1721225700), class = c("POSIXct", 
    "POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")

df2<-structure(list(DateTime = structure(c(1721221200.055, 1721224800.06, 
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08, 
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), Hourly_Rainfall_mm = c(0, 0, 0, 0, 0, 0.1, 0.1, 0.8, 
1.4, 1.1), DateTime15 = structure(c(1721221200.055, 1721224800.06, 
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08, 
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

I've spent hours going through stackoverflow questions trying to find a solution and have applied code from a number of answers but I seem to be missing something as none of them have worked for my dataframes.

JoinedDF<-left_join(df1, df2, by="DateTime15") # Returns all NA for Hourly_Rainfall_mm

JoinedDF<- merge(df1, df2, by.x = df1$DateTime15, 
                                             by.y= df2$DateTime15, all=TRUE) # gives error in fix.by(by.x, x) : 'by' must match numbers of columns

JoinedDF<-merge(df1, df2, by="DateTime15", all=TRUE) # inserts an additional row in the dataframe 

I'd be very grateful for any help.


Solution

  • Compare df1$DateTime15 |> dput() with df2$DateTime15 |> dput(): in df1 your datetimes are aligned to seconds (and in fact to hours), while in df2 you have fractional seconds. (... Each measure being 1 hour + 1/200th of second [3600.005 seconds] after the preceding one. Presumably some earlier step in your process created those misalignments inadvertently.)

    We can join df1 to df2 with rounded seconds in DateTime15:

    library(dplyr); library(lubridate)
    df1 |>
      left_join(df2 |> mutate(across(DateTime15, ~round_date(., "second"))), 
                join_by(DateTime15))\
    

    Selecting just cols 1:5 and Hourly_Rainfall_mm:

       Tag       Date     Time Temp Baro (mb) Hourly_Rainfall_mm
    1    1 2024-07-17 13:00:20 17.9      1016                 NA
    2    2 2024-07-17 13:16:04 17.9      1016                 NA
    3    3 2024-07-17 13:31:05 17.9      1016                 NA
    4    4 2024-07-17 13:46:06 17.8      1016                 NA
    5    5 2024-07-17 14:01:08 17.8      1016                  0
    6    6 2024-07-17 14:16:08 17.8      1016                 NA
    7    7 2024-07-17 14:31:09 17.8      1016                 NA
    8    8 2024-07-17 14:46:10 17.7      1016                 NA
    9    9 2024-07-17 15:01:11 17.7      1016                  0
    10  10 2024-07-17 15:16:12 17.7      1016                 NA