rdataframejoin

How can I append my dataset based on timestamp?


I have two datasets, one I'll call the master and the other the supplemental. They both have observations at 10 minute intervals, but there are gaps in time in the master that I want to fill with the supplemental. I would normally just use a left_join, but the supplemental also has overlapping timestamps from the master. I only want to transfer over the unique timestamps from the supplemental to the master. Example code below:

TIMESTAMP <- c("2019-04-27 17:30:00", "2019-04-27 17:40:00", "2019-04-27 17:50:00", "2019-04-27 18:10:00", "2019-04-27 18:30:00", "2019-04-27 18:40:00")
dates<-c("2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27")
Tower <- c("TW2", "TW2", "TW2", "TW2", "TW2", "TW2")
Camera <- c("C5N","C5N","C5N","C5N","C5N","C5N")
Cow <-c(0, 1, 0, 0, 0, 5)

Master<- data.frame(TIMESTAMP, dates, Tower, Camera, Cow)

TIMESTAMP <- c("2019-04-27 17:30:00", "2019-04-27 17:40:00", "2019-04-27 18:00:00", "2019-04-27 18:10:00", "2019-04-27 18:20:00","2019-04-27 18:30:00")
dates<-c("2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27")
Tower <- c("TW2", "TW2", "TW2", "TW2", "TW2", "TW2")
Camera <- c("C4N","C4N","C4N","C4N","C4N","C4N")
Cow <-c(0, 1, 4, 0, 2, 0)

Supplemental<- data.frame(TIMESTAMP, dates, Tower, Camera, Cow)

I tried using anti_join in the dplyr package, but the resulting dataframe had only 2 observations, so something must be wrong.

require(dplyr)
Complete<-Master%>%
  anti_join(Supplemental, by=join_by(TIMESTAMP==TIMESTAMP))

Any help is appreciated!


Solution

  • anti_join() is a filtering join -- "get me rows from x that do not have a match in y", it never returns more rows than in the first frame. You can switch anti_join() argument order to get rows from Supplemental that do not have a matching TIMESTAMP in Master and append that subset to a Master:

    bind_rows(
      Master,
      anti_join(Supplemental, Master, by = join_by(TIMESTAMP))
    ) |> 
    arrange(TIMESTAMP)
    #>             TIMESTAMP      dates Tower Camera Cow
    #> 1 2019-04-27 17:30:00 2019-04-27   TW2    C5N   0
    #> 2 2019-04-27 17:40:00 2019-04-27   TW2    C5N   1
    #> 3 2019-04-27 17:50:00 2019-04-27   TW2    C5N   0
    #> 4 2019-04-27 18:00:00 2019-04-27   TW2    C4N   4
    #> 5 2019-04-27 18:10:00 2019-04-27   TW2    C5N   0
    #> 6 2019-04-27 18:20:00 2019-04-27   TW2    C4N   2
    #> 7 2019-04-27 18:30:00 2019-04-27   TW2    C5N   0
    #> 8 2019-04-27 18:40:00 2019-04-27   TW2    C5N   5
    

    Or just use base R subsetting:

    
    `%nin%` <- Negate(`%in%`)
    Complete <- rbind(Master, Supplemental[Supplemental$TIMESTAMP %nin% Master$TIMESTAMP,])
    Complete
    #>              TIMESTAMP      dates Tower Camera Cow
    #> 1  2019-04-27 17:30:00 2019-04-27   TW2    C5N   0
    #> 2  2019-04-27 17:40:00 2019-04-27   TW2    C5N   1
    #> 3  2019-04-27 17:50:00 2019-04-27   TW2    C5N   0
    #> 4  2019-04-27 18:10:00 2019-04-27   TW2    C5N   0
    #> 5  2019-04-27 18:30:00 2019-04-27   TW2    C5N   0
    #> 6  2019-04-27 18:40:00 2019-04-27   TW2    C5N   5
    #> 31 2019-04-27 18:00:00 2019-04-27   TW2    C4N   4
    #> 51 2019-04-27 18:20:00 2019-04-27   TW2    C4N   2
    
    Complete[order(Complete$TIMESTAMP),]
    #>              TIMESTAMP      dates Tower Camera Cow
    #> 1  2019-04-27 17:30:00 2019-04-27   TW2    C5N   0
    #> 2  2019-04-27 17:40:00 2019-04-27   TW2    C5N   1
    #> 3  2019-04-27 17:50:00 2019-04-27   TW2    C5N   0
    #> 31 2019-04-27 18:00:00 2019-04-27   TW2    C4N   4
    #> 4  2019-04-27 18:10:00 2019-04-27   TW2    C5N   0
    #> 51 2019-04-27 18:20:00 2019-04-27   TW2    C4N   2
    #> 5  2019-04-27 18:30:00 2019-04-27   TW2    C5N   0
    #> 6  2019-04-27 18:40:00 2019-04-27   TW2    C5N   5
    

    Created on 2025-04-04 with reprex v2.1.1