rdata.tablenon-equi-join

non equi join returns non existing column names


I am unable to do a basic non equi join in two data.tables in R without the error: argument specifying columns specify non existing column(s): cols[2]='abs(x.val - i.val)'

A min. example to show the error.

library(data.table)
set.seed(1); dt1 <- data.table(id= sample(letters[1:3],size = 10,replace = T),val = rnorm(10,mean = 5,sd = 2))
set.seed(2); dt2 <- data.table(id= sample(letters[1:3],size = 10,replace = T),val = rnorm(10,mean = 5,sd = 2))
dt1[dt2,on = .(id,abs(x.val - i.val) <1),nomatch=0]
#> Error in colnamesInt(x, names(on), check_dups = FALSE): argument specifying columns specify non existing column(s): cols[2]='abs(x.val - i.val)'

Created on 2022-05-17 by the reprex package (v2.0.1)

It must be something fundamental I am missing.

Note: I have deliberately kept column names identical in both data tables to test the x. and i. functionality of data.table.


Solution

  • In two steps as on only accepts =,< or > :

    dt1[dt2,on = .(id),nomatch=0,allow.cartesian=T][abs(val - i.val)<1]
    
       id      val    i.val
    1:  a 3.142866 3.175512
    2:  b 4.421077 3.494669
    3:  b 4.401570 3.494669
    4:  a 4.988466 4.428577
    5:  a 3.142866 2.931430
    6:  a 4.988466 4.943710
    7:  a 4.988466 4.267357
    8:  a 3.142866 2.769503
    

    Or to avoid cartesian product:

    dt2[,`:=`(val_min=val-1,val_max=val+1)]
    dt1[dt2,.(id,x.val,i.val),on = .(id,val>val_min,val<val_max),nomatch=0]
    
       id    x.val    i.val
    1:  a 3.142866 3.175512
    2:  b 4.421077 3.494669
    3:  b 4.401570 3.494669
    4:  a 4.988466 4.428577
    5:  a 3.142866 2.931430
    6:  a 4.988466 4.943710
    7:  a 4.988466 4.267357
    8:  a 3.142866 2.769503