I am converting a MATLAB script into R and regretting it so far, as it is slower at the moment. I'm trying to use "vectorized functions" as much as possible, but I'm relatively new to R and do not know what is meant by this. From my research for loops are only slower than the apply() method in R if you use loads of operators (including the parenthesis). Otherwise, I don't see what R could have done to slow down it further. Here is code that works that I want to speed up.
somPEs <- 9;
inputPEs <- 6;
initial_w <- matrix(1, nrow=somPEs, ncol=inputPEs)
w <- apply(initial_w, 1, function(i) runif(i));
# Reshape w to a 3D matrix of dimension: c(sqrt(somPEs), sqrt(somPEs), inputPEs)
nw <- array(0, dim=c(sqrt(somPEs), sqrt(somPEs), inputPEs))
for (i in 1:inputPEs) {
nw[,,i] <- matrix(w[i,], nrow=sqrt(somPEs), ncol=sqrt(somPEs), byrow=TRUE)
}
w <- nw;
In MATLAB, this code is executed by a built-in function called "reshape", as is done as below:
w = reshape(w,[sqrt(somPEs) sqrt(somPEs) inputPEs]);
I timed my current R code and it's actually super fast, but I'd still like to learn about vectorization and how to convert my code to apply() for readability's sake.
user system elapsed
0.003 0.000 0.002
The first step is to convert your array w
from 6x9
to 3x3x6
size, which in your case can be done by transposing and then changing the dimension:
neww <- t(w)
dim(neww) <- c(sqrt(somPEs), sqrt(somPEs), inputPEs)
This is almost what we want, except that the first two dimensions are flipped. You can use the aperm
function to transpose them:
neww <- aperm(neww, c(2, 1, 3))
This should be a good deal quicker than looping through the matrix and individually copying over data by row. To see this, let's look at a larger example with 10,000 rows and 100 columns (which will be mapped to a 10x10x10k matrix):
josilber <- function(w) {
neww <- t(w)
dim(neww) <- c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1])
aperm(neww, c(2, 1, 3))
}
OP <- function(w) {
nw <- array(0, dim=c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1]))
for (i in 1:(dim(w)[1])) {
nw[,,i] <- matrix(w[i,], nrow=sqrt(dim(w)[2]), ncol=sqrt(dim(w)[2]), byrow=TRUE)
}
nw
}
bigw <- matrix(runif(1000000), nrow=10000, ncol=100)
all.equal(josilber(bigw), OP(bigw))
# [1] TRUE
microbenchmark(josilber(bigw), OP(bigw))
# Unit: milliseconds
# expr min lq mean median uq max neval
# josilber(bigw) 8.483245 9.08430 14.46876 9.431534 11.76744 135.7204 100
# OP(bigw) 83.379053 97.07395 133.86606 117.223236 129.28317 1553.4381 100
The approach using t
, dim
, and aperm
is more than 10x faster in median runtime than the looping approach.