rmatrixset-intersection

Match rows of a matrix to the rows of another, regardless of the column order?


I have two matrices, and I need to find if the two-column sets in one matrix appear in the other matrix without regard for the order (A:B = B:A).

As an example, here are two matrices:

X <- matrix(c(23, 33, 4, 21, 5, 27, 47, 39, 37, 8, 30, 42, 59, 63, 53, 50, 49, 65, 53, 59), nrow = 10, ncol = 2, byrow = F)
Y <- matrix(c(30, 21, 53, 23, 63, 37), nrow = 3, ncol = 2, byrow = F)
> X
      [,1] [,2]
 [1,]   23   30
 [2,]   33   42
 [3,]    4   59
 [4,]   21   63
 [5,]    5   53
 [6,]   27   50
 [7,]   47   49
 [8,]   39   65
 [9,]   37   53
[10,]    8   59
> Y
     [,1] [,2]
[1,]   30   23
[2,]   21   63
[3,]   53   37

I want to be able to find, for example, Y[1,] in X whether it is {30,23} or {23,30}.

I have tried merge, intersect, and setdiff, but none of those will return all the possible matches.

> merge(data.frame(X),data.frame(Y))
  X1 X2
1 21 63
> merge(data.frame(Y),data.frame(X))
  X1 X2
1 21 63
> intersect(data.frame(X),data.frame(Y))
  X1 X2
1 21 63
> intersect(data.frame(Y),data.frame(X))
  X1 X2
1 21 63
> setdiff(data.frame(Y),data.frame(X))
  X1 X2
1 30 23
2 53 37
> setdiff(data.frame(X),data.frame(Y))
  X1 X2
1 23 30
2 33 42
3  4 59
4  5 53
5 27 50
6 47 49
7 39 65
8 37 53
9  8 59 

The ultimate goal is to identify the rows in X that contain matches (with or without contents). So, in pseudo-code, it would be:

for each Y[i,] in X
return row number X


Solution

  • We can sort by row on each dataset

    x1 <- t(apply(X, 1, sort))
    y1 <- t(apply(Y, 1, sort))
    

    and then do a match on the pasted rows of each dataset to return the row index of the match

    match(do.call(paste, as.data.frame(y1)), do.call(paste, as.data.frame(x1)))
    #[1] 1 4 9