I have two datasets, one I'll call the master and the other the supplemental. They both have observations at 10 minute intervals, but there are gaps in time in the master that I want to fill with the supplemental. I would normally just use a left_join, but the supplemental also has overlapping timestamps from the master. I only want to transfer over the unique timestamps from the supplemental to the master. Example code below:
TIMESTAMP <- c("2019-04-27 17:30:00", "2019-04-27 17:40:00", "2019-04-27 17:50:00", "2019-04-27 18:10:00", "2019-04-27 18:30:00", "2019-04-27 18:40:00")
dates<-c("2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27")
Tower <- c("TW2", "TW2", "TW2", "TW2", "TW2", "TW2")
Camera <- c("C5N","C5N","C5N","C5N","C5N","C5N")
Cow <-c(0, 1, 0, 0, 0, 5)
Master<- data.frame(TIMESTAMP, dates, Tower, Camera, Cow)
TIMESTAMP <- c("2019-04-27 17:30:00", "2019-04-27 17:40:00", "2019-04-27 18:00:00", "2019-04-27 18:10:00", "2019-04-27 18:20:00","2019-04-27 18:30:00")
dates<-c("2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27", "2019-04-27")
Tower <- c("TW2", "TW2", "TW2", "TW2", "TW2", "TW2")
Camera <- c("C4N","C4N","C4N","C4N","C4N","C4N")
Cow <-c(0, 1, 4, 0, 2, 0)
Supplemental<- data.frame(TIMESTAMP, dates, Tower, Camera, Cow)
I tried using anti_join in the dplyr package, but the resulting dataframe had only 2 observations, so something must be wrong.
require(dplyr)
Complete<-Master%>%
anti_join(Supplemental, by=join_by(TIMESTAMP==TIMESTAMP))
Any help is appreciated!
anti_join()
is a filtering join -- "get me rows from x that do not have a match in y", it never returns more rows than in the first frame. You can switch anti_join()
argument order to get rows from Supplemental
that do not have a matching TIMESTAMP
in Master
and append that subset to a Master
:
bind_rows(
Master,
anti_join(Supplemental, Master, by = join_by(TIMESTAMP))
) |>
arrange(TIMESTAMP)
#> TIMESTAMP dates Tower Camera Cow
#> 1 2019-04-27 17:30:00 2019-04-27 TW2 C5N 0
#> 2 2019-04-27 17:40:00 2019-04-27 TW2 C5N 1
#> 3 2019-04-27 17:50:00 2019-04-27 TW2 C5N 0
#> 4 2019-04-27 18:00:00 2019-04-27 TW2 C4N 4
#> 5 2019-04-27 18:10:00 2019-04-27 TW2 C5N 0
#> 6 2019-04-27 18:20:00 2019-04-27 TW2 C4N 2
#> 7 2019-04-27 18:30:00 2019-04-27 TW2 C5N 0
#> 8 2019-04-27 18:40:00 2019-04-27 TW2 C5N 5
Or just use base R subsetting:
`%nin%` <- Negate(`%in%`)
Complete <- rbind(Master, Supplemental[Supplemental$TIMESTAMP %nin% Master$TIMESTAMP,])
Complete
#> TIMESTAMP dates Tower Camera Cow
#> 1 2019-04-27 17:30:00 2019-04-27 TW2 C5N 0
#> 2 2019-04-27 17:40:00 2019-04-27 TW2 C5N 1
#> 3 2019-04-27 17:50:00 2019-04-27 TW2 C5N 0
#> 4 2019-04-27 18:10:00 2019-04-27 TW2 C5N 0
#> 5 2019-04-27 18:30:00 2019-04-27 TW2 C5N 0
#> 6 2019-04-27 18:40:00 2019-04-27 TW2 C5N 5
#> 31 2019-04-27 18:00:00 2019-04-27 TW2 C4N 4
#> 51 2019-04-27 18:20:00 2019-04-27 TW2 C4N 2
Complete[order(Complete$TIMESTAMP),]
#> TIMESTAMP dates Tower Camera Cow
#> 1 2019-04-27 17:30:00 2019-04-27 TW2 C5N 0
#> 2 2019-04-27 17:40:00 2019-04-27 TW2 C5N 1
#> 3 2019-04-27 17:50:00 2019-04-27 TW2 C5N 0
#> 31 2019-04-27 18:00:00 2019-04-27 TW2 C4N 4
#> 4 2019-04-27 18:10:00 2019-04-27 TW2 C5N 0
#> 51 2019-04-27 18:20:00 2019-04-27 TW2 C4N 2
#> 5 2019-04-27 18:30:00 2019-04-27 TW2 C5N 0
#> 6 2019-04-27 18:40:00 2019-04-27 TW2 C5N 5
Created on 2025-04-04 with reprex v2.1.1