rsparse-matrix

Efficiently obtaining nonzero element coordinates of sparse matrix


I want to get the row-column coordinates for all nonzero elements in a matrix M. If M isn't too big, it's straightforward:

m <- matrix(sample(0:1, 25, TRUE, prob=c(0.75, 0.25)), 5, 5)

     #[,1] [,2] [,3] [,4] [,5]
#[1,]    0    0    0    0    0
#[2,]    1    1    0    0    0
#[3,]    0    0    0    1    0
#[4,]    0    0    1    0    0
#[5,]    0    0    0    0    0

nz <- which(m != 0)
cbind(row(m)[nz], col(m)[nz])

     #[,1] [,2]
#[1,]    2    1
#[2,]    2    2
#[3,]    4    3
#[4,]    3    4

However, in my case M is a sparse matrix (created using the Matrix package), whose dimensions can be very large. If I call row(M) and col(M) like above, I'll be generating a couple of dense matrices the same size as M, which I definitely don't want to do.

Is there a way of getting a result like the above without creating dense matrices along the way?


Solution

  • I think you want

    which(m!=0,arr.ind=TRUE)
    

    Looking at showMethods("which"), it seems that this is set up to work efficiently with sparse matrices. You can also get the answer more directly (but inscrutably) for a sparse, column-oriented matrix (provided it is not stored internally as a symmetric matrix: see comment below) by manipulating the internal @p (column pointer) and @i (row pointer) slots:

    mm <- Matrix(m)
    dp <- diff(mm@p)
    cbind(mm@i+1,rep(seq_along(dp),dp))
    

    @tflutre comments:

    If m is symmetric, only which(m != 0, arr.ind=TRUE) always returns the full list of nonzero coordinates. The code using mm@p may not! Indeed, mm <- as(m, "sparseMatrix") can automatically and "silently" detect that m is symmetric and, if so, it may only store the upper (or lower triangular values): look at the field mm@uplo.