rperformancercppprocessing-efficiencymemory-efficient

R: Efficient Way to partly modify diagonal of matrix


I have a square matrix with dimension ranging from 100x100 to 10,000x10,000. The matrix represents parameter values for a function. I go through a loop where I try various combinations of parameters. Every iteration W has a different value so I have to update the locations in the matrix which have the value W. This happens to be the even entries of the diagonal, so [2,2], [4,4], etc. I am curious if there is a more efficient way to do this than my current approach:

W<-treeDepth*newVar
for (iW in evenDiag) {
  matrixSource[iW,iW]<-W
}

So far I've only tested with 134x134 size matrices, but looping appears to be the fastest approach according to analysis with profvis.

When I try

diag(matrixSource)[evenDiag]<-W

It appears to take similar amounts of time, but it starts calling < GC> after an average of every 5 calls or so, but the actual times < GC> is called appear to be random.

I think < GC> is garbage collection, but whatever it is, it takes forever, and it's rarely if ever called when I have the loop version above.

Am I wrong to think there's a better approach than looping one by one. Would writing the loop in Rcpp make it faster? Hadley in "Advanced R" doesn't say that writing values into a matrix would be faster with Rcpp so it probably doesn't. If it did, how would I change that one little line(s) to Rcpp without making a C++ function or anything complicated (I don't know any C++).

According to my research it isn't possible to just write something like

matrixSource[evenDiag,evenDiag]<-W

but that if it were, R would excel at vectorized processing.

What is the best approach for this?

If it's helpful, the context is that the matrix needs to be fed into

negLogLik<- -mvtnorm::dmvnorm(flattenedData,sigma=matrixSource,log=T,checkSymmetry = FALSE)

and internally to that function it's fed into chol() (which sometimes calls < GC> in parallel)

So if there's a way I can modify that function to work with only partial matrices that aren't fully created so that maybe I only have to assign the value of W once, that'd also be good.

I also need a way to assign the diagonal directly above and below the main diagonal all the same value, Z. Is there a good way to do that?

Thank you <3


Solution

  • I have not found memory problems with square matrices up to dimension 1000.
    @MichaelChirico's suggestion seems to be the most efficient. It runs in time constant with the matrix dimension, unlike diag<-.
    The two methods used below are

    # create an index
    i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
    j <- which(i)
    
    # can also be diag(A)[i]
    diag(A)[j] <- W
    
    # cannot be cbind(i, i)
    A[cbind(j, j)] <- 0
    

    cbind is clearly more efficient.

    library(microbenchmark)
    library(ggplot2)
    
    testFun <- function(N) {
      out <- lapply(seq.int(N)[-1L], \(n) {
        A <- matrix(1:n^2, n, n)
        W <- rep(0L, ncol(A) %/% 2L)
        i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
        j <- which(i)
        mb <- microbenchmark(
          diag = {diag(A)[i] <- W},
          cbind = {A[cbind(j, j)] <- W}
        )
        mb$dim <- n
        mb
      })
      out <- do.call(rbind, out)
      aggregate(time ~ ., out, median)
    }
    
    testData <- testFun(1000)
    ggplot(testData, aes(dim, time, colour = expr)) +
      geom_line() +
      geom_point() +
      theme_bw()
    

    Created on 2024-01-14 with reprex v2.0.2


    Edit

    Another way of assigning the diagonal elements of a square matrix is proposed by user20650 in comment. Change the timimg instruction call above to

    mb <- microbenchmark(
      diag = {diag(A)[i] <- W},
      cbind = {A[cbind(j, j)] <- W},
      sq = {A[j*j] <- W}
    )
    

    and the plot (without diag) becomes

    enter image description here