I was wondering whether someone knows if the dplyr
extension packages (dbplyr
and dtplyr
) allow non-equi joins within the usual dplyr workflow? I rarely need data.table
, but fast non-equi joins are the only moments where I always need to setDT
, then perform the join and then switch back with as_tibble()
. I scrolled through the issues in the packages on github but didn't find whether this is something that's planned or already implemented.
Non-equi joins are available since 1.1.0
via the function join_by
. To create non-equi joins, you can use <
, >
, >=
, <=
, or helpers between
, within
, overlaps
and nearest
.
library(dplyr)
#Example from https://github.com/tidyverse/dplyr/pull/5910.
set.seed(123)
dates <- as.Date("2019-01-01") + 0:4
needles <- tibble(dates = dates, x = sample(length(dates)))
set.seed(123)
lower <- as.Date("2019-01-01") + sample(6, 5, replace = TRUE)
upper <- lower + sample(2, 5, replace = TRUE)
haystack <- tibble(lower = lower, upper = upper, y = sample(length(lower)))
needles
#> # A tibble: 5 x 2
#> dates x
#> <date> <int>
#> 1 2019-01-01 3
#> 2 2019-01-02 2
#> 3 2019-01-03 5
#> 4 2019-01-04 4
#> 5 2019-01-05 1
haystack
#> # A tibble: 5 x 3
#> lower upper y
#> <date> <date> <int>
#> 1 2019-01-04 2019-01-06 1
#> 2 2019-01-07 2019-01-08 2
#> 3 2019-01-04 2019-01-05 3
#> 4 2019-01-03 2019-01-05 4
#> 5 2019-01-03 2019-01-05 5
# Non-equi join
# For each row in `needles`, find locations in `haystack` matching the condition
left_join(needles, haystack, by = join_by(dates >= lower, dates <= upper))
#> # A tibble: 12 x 5
#> dates x lower upper y
#> <date> <int> <date> <date> <int>
#> 1 2019-01-01 3 NA NA NA
#> 2 2019-01-02 2 NA NA NA
#> 3 2019-01-03 5 2019-01-03 2019-01-05 4
#> 4 2019-01-03 5 2019-01-03 2019-01-05 5
#> 5 2019-01-04 4 2019-01-04 2019-01-06 1
#> 6 2019-01-04 4 2019-01-04 2019-01-05 3
#> 7 2019-01-04 4 2019-01-03 2019-01-05 4
#> 8 2019-01-04 4 2019-01-03 2019-01-05 5
#> 9 2019-01-05 1 2019-01-04 2019-01-06 1
#> 10 2019-01-05 1 2019-01-04 2019-01-05 3
#> 11 2019-01-05 1 2019-01-03 2019-01-05 4
#> 12 2019-01-05 1 2019-01-03 2019-01-05 5