rjoincartesian

Join leads to cartesian


I have two DT's. I want to join DT1 with DT2 based on a column and take a column from DT2.

DT1:
         id  place
     1: id1    A
     2: id2    B
     3: id3    B
     4: id4    C
     5: id5    C

DT2:

      place rank
   1:  A      3
   2:  B      2
   3:  C      1
   4:  D      3
   5:  E      2

Expected:

         id  place  rank
     1: id1    A       3
     2: id2    B       2
     3: id3    B       2
     4: id4    C       1 
     5: id5    C       1 

Right now, I have tried -

dt1[dt2, on=c('place'), nomatch=0]

I thought this will map all the rows based on the value in place column and add the rank column to it. But I get an error saying a cartesian has occured.

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  :
  Join results in <> rows; more than <> = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large al
location. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

My first question is I dont understand how is a cartesian getting formed here ? Is it because my first DT has multiple rows for the same date ?

Second - How do I achieve this correctly ?

I have also tried a right outer join instead of an inner join. I get the same error.


Solution

  • We can do

    library(data.table)
    dt1[dt2, on = .(place), rank := rank, mult = 'first']