I wish to join two dataframes of different lengths and time intevals based on common variable DateTime15 so that the Hourly_Rainfall_mm in df2 is merged to the df1 where the DateTime15 matches, which is every hour.
df1<-structure(list(Tag = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = structure(c(1721174400,
1721174400, 1721174400, 1721174400, 1721174400, 1721174400, 1721174400,
1721174400, 1721174400, 1721174400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(46820, 47764, 48665, 49566,
50468, 51368, 52269, 53170, 54071, 54972), class = c("hms", "difftime"
), units = "secs"), Temp = c(17.9, 17.9, 17.9, 17.8, 17.8, 17.8,
17.8, 17.7, 17.7, 17.7), `Baro (mb)` = c(1016, 1016, 1016, 1016,
1016, 1016, 1016, 1016, 1016, 1016), pH = c(8.45, 8.42, 8.4,
8.38, 8.38, 8.37, 8.37, 8.36, 8.36, 8.37), pHmV = c(-68.3, -67.1,
-66.2, -65.2, -65, -64.8, -64.7, -64.4, -64.3, -64.5), `ORP (REDOX)` = c(225.8,
212.5, 221.1, 229.1, 234, 237.5, 239.6, 240.6, 242.3, 242.6),
DO_Sat = c(107.4, 107.4, 106.6, 106.1, 106.3, 106.4, 106.5,
106.4, 106.7, 106.6), DO_mgL = c(10.16, 10.16, 10.08, 10.06,
10.08, 10.09, 10.09, 10.11, 10.14, 10.13), Conductivity = c(563,
561, 562, 561, 563, 560, 564, 564, 564, 565), `RES (Ohms.cm)` = c(2053,
2061, 2057, 2066, 2057, 2070, 2053, 2057, 2057, 2053), `TDS (mg/L)` = c(365,
364, 365, 364, 365, 364, 366, 366, 366, 367), `SAL (ppt)` = c(0.24,
0.24, 0.24, 0.24, 0.24, 0.23, 0.24, 0.24, 0.24, 0.24), `SSG (st)` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), NH3 = c(0.01, 0.01, 0.01, 0.01,
0.01, 0.01, 0.01, 0.01, 0.01, 0.01), cDOM = c(8.9, 8.9, 9.1,
9, 7.9, 8.7, 9.2, 8.7, 8.6, 8.9), Ammonium = c(0.13, 0.13,
0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12), DateTime = structure(c(1721217620,
1721218564, 1721219465, 1721220366, 1721221268, 1721222168,
1721223069, 1721223970, 1721224871, 1721225772), class = c("POSIXct",
"POSIXt"), tzone = ""), DateTime15 = structure(c(1721217600,
1721218500, 1721219400, 1721220300, 1721221200, 1721222100,
1721223000, 1721223900, 1721224800, 1721225700), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
df2<-structure(list(DateTime = structure(c(1721221200.055, 1721224800.06,
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08,
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Hourly_Rainfall_mm = c(0, 0, 0, 0, 0, 0.1, 0.1, 0.8,
1.4, 1.1), DateTime15 = structure(c(1721221200.055, 1721224800.06,
1721228400.065, 1721232000.07, 1721235600.075, 1721239200.08,
1721242800.085, 1721246400.09, 1721250000.095, 1721253600.1), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
I've spent hours going through stackoverflow questions trying to find a solution and have applied code from a number of answers but I seem to be missing something as none of them have worked for my dataframes.
JoinedDF<-left_join(df1, df2, by="DateTime15") # Returns all NA for Hourly_Rainfall_mm
JoinedDF<- merge(df1, df2, by.x = df1$DateTime15,
by.y= df2$DateTime15, all=TRUE) # gives error in fix.by(by.x, x) : 'by' must match numbers of columns
JoinedDF<-merge(df1, df2, by="DateTime15", all=TRUE) # inserts an additional row in the dataframe
I'd be very grateful for any help.
Compare df1$DateTime15 |> dput()
with df2$DateTime15 |> dput()
: in df1 your datetimes are aligned to seconds (and in fact to hours), while in df2 you have fractional seconds.
(... Each measure being 1 hour + 1/200th of second [3600.005 seconds] after the preceding one. Presumably some earlier step in your process created those misalignments inadvertently.)
We can join df1 to df2 with rounded seconds in DateTime15:
library(dplyr); library(lubridate)
df1 |>
left_join(df2 |> mutate(across(DateTime15, ~round_date(., "second"))),
join_by(DateTime15))\
Selecting just cols 1:5 and Hourly_Rainfall_mm:
Tag Date Time Temp Baro (mb) Hourly_Rainfall_mm
1 1 2024-07-17 13:00:20 17.9 1016 NA
2 2 2024-07-17 13:16:04 17.9 1016 NA
3 3 2024-07-17 13:31:05 17.9 1016 NA
4 4 2024-07-17 13:46:06 17.8 1016 NA
5 5 2024-07-17 14:01:08 17.8 1016 0
6 6 2024-07-17 14:16:08 17.8 1016 NA
7 7 2024-07-17 14:31:09 17.8 1016 NA
8 8 2024-07-17 14:46:10 17.7 1016 NA
9 9 2024-07-17 15:01:11 17.7 1016 0
10 10 2024-07-17 15:16:12 17.7 1016 NA