I have a square matrix with dimension ranging from 100x100 to 10,000x10,000. The matrix represents parameter values for a function. I go through a loop where I try various combinations of parameters. Every iteration W
has a different value so I have to update the locations in the matrix which have the value W
. This happens to be the even entries of the diagonal, so [2,2]
, [4,4]
, etc. I am curious if there is a more efficient way to do this than my current approach:
W<-treeDepth*newVar
for (iW in evenDiag) {
matrixSource[iW,iW]<-W
}
So far I've only tested with 134x134 size matrices, but looping appears to be the fastest approach according to analysis with profvis
.
When I try
diag(matrixSource)[evenDiag]<-W
It appears to take similar amounts of time, but it starts calling < GC> after an average of every 5 calls or so, but the actual times < GC> is called appear to be random.
I think < GC> is garbage collection, but whatever it is, it takes forever, and it's rarely if ever called when I have the loop version above.
Am I wrong to think there's a better approach than looping one by one. Would writing the loop in Rcpp
make it faster? Hadley in "Advanced R" doesn't say that writing values into a matrix would be faster with Rcpp so it probably doesn't. If it did, how would I change that one little line(s) to Rcpp without making a C++ function or anything complicated (I don't know any C++).
According to my research it isn't possible to just write something like
matrixSource[evenDiag,evenDiag]<-W
but that if it were, R would excel at vectorized processing.
What is the best approach for this?
If it's helpful, the context is that the matrix needs to be fed into
negLogLik<- -mvtnorm::dmvnorm(flattenedData,sigma=matrixSource,log=T,checkSymmetry = FALSE)
and internally to that function it's fed into chol() (which sometimes calls < GC> in parallel)
So if there's a way I can modify that function to work with only partial matrices that aren't fully created so that maybe I only have to assign the value of W once, that'd also be good.
I also need a way to assign the diagonal directly above and below the main diagonal all the same value, Z. Is there a good way to do that?
Thank you <3
I have not found memory problems with square matrices up to dimension 1000.
@MichaelChirico's suggestion seems to be the most efficient. It runs in time constant with the matrix dimension, unlike diag<-
.
The two methods used below are
# create an index
i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
j <- which(i)
# can also be diag(A)[i]
diag(A)[j] <- W
# cannot be cbind(i, i)
A[cbind(j, j)] <- 0
cbind
is clearly more efficient.
library(microbenchmark)
library(ggplot2)
testFun <- function(N) {
out <- lapply(seq.int(N)[-1L], \(n) {
A <- matrix(1:n^2, n, n)
W <- rep(0L, ncol(A) %/% 2L)
i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
j <- which(i)
mb <- microbenchmark(
diag = {diag(A)[i] <- W},
cbind = {A[cbind(j, j)] <- W}
)
mb$dim <- n
mb
})
out <- do.call(rbind, out)
aggregate(time ~ ., out, median)
}
testData <- testFun(1000)
ggplot(testData, aes(dim, time, colour = expr)) +
geom_line() +
geom_point() +
theme_bw()
Created on 2024-01-14 with reprex v2.0.2
Another way of assigning the diagonal elements of a square matrix is proposed by user20650 in comment. Change the timimg instruction call above to
mb <- microbenchmark(
diag = {diag(A)[i] <- W},
cbind = {A[cbind(j, j)] <- W},
sq = {A[j*j] <- W}
)
and the plot (without diag
) becomes