rmergetemporal-database

Is it possible to merge two .csv files to the nearest hour?


I have two .csv files called 'tag.csv' and 'wind.csv' that I would like to merge based on the 'Date' column. However, as you can see from the data below the times do not match exactly.

tag.csv

Date
13/12/2014 05:11
13/12/2014 05:43
13/12/2014 06:34

wind.csv

Date
13/12/2014 05:00
13/12/2014 06:00
13/12/2014 07:00

I am using the following simple script.

tag<- read.csv("tag.csv")
wind<- read.csv("wind.csv")
myfulldata = merge(tag, wind)

Is there a command I could add so that the data will merge with the closest time? For example, 13/12/2014 05:11 will merge with 13/12/2014 05:00


Solution

  • data.table package has its own merging method and an option called roll that allows closest matches. e.g.

    library(data.table)
    dt1 <- data.table(a = c(1,2,3,4), b=1:4)
    dt2 <- data.table(a = c(1.6,2.1,3.2,4.9), c=1:4)
    setkeyv(dt1, "a")
    setkeyv(dt2, "a")
    dt1[dt2, roll="nearest"]
    

    See Join R data.tables where key values are not exactly equal--combine rows with closest times for another example.

    To be robust though, I would probably instead do some datetime manipulation to force them to both be to the nearest hour, and then do a nearest join if all else fails

    edit: this method won't work on data.frame objects, if you haven't used data.table before maybe stay simple and round the hours