rdplyrdata.tablesymmetric

How to create a symmetric matrix in R counting how often two columns have the same values?


Suppose I have a dataframe like this:

ID sp1 sp2 sp3
1  NA   1   1
2  0    0   1
3  1    NA  0
4  1    1   1

Here is what I wanted to get:

ID 1 2 3 4
1  2 1 0 2
2  1 1 0 1
3  0 0 1 1
4  2 1 1 3

which shows the number of times two columns have the same value 1 here.

As the original dataframe is quite large, I hope to find a efficient way to address this.

Thank you very much for any efforts.


Solution

  • In order to create a co-occurrence matrix from your data, you first need to convert your NAs into 0s, then do a cross-product of your data without the first ID column:

    x = data.frame(ID = c(1:4), sp1 = c(NA,0,1,1), sp2 = c(1,0,NA,1), sp3 = c(1,1,0,1))
    x[is.na(x)] = 0
    crossprod(t(x[-1]))
    
         [,1] [,2] [,3] [,4]
    [1,]    2    1    0    2
    [2,]    1    1    0    1
    [3,]    0    0    1    1
    [4,]    2    1    1    3