I have a large "big.matrix" and I need to remove a few columns from it. It was created from a CSV file (with 72 million rows) using
BigMat <- read.big.matrix("matrix.csv", type="double", header=TRUE,
backingfile="matrix.bin",
descriptorfile="matrix.desc")
This successfully loads the matrix into R but I do not have enough memory space to create a new object when trying to subset this matrix:
BigMatSub <- BigMat[, 5:71]
It gave me: Error: cannot allocate vector of size 37.6 Gb.
Is there any way of the removing columns while without hitting memory limit? I need to have it as "big.matrix" object in the end to use in biglasso()
.
The matrix is sparse with many zero values.
Any help is much appreciated.
So you are using package bigmemory. No wonder you could store the full matrix "in the memory" in the first place.
I haven't used bigmemory before. But intuitively, if the subset we want to extract is still too large, we still want a "big.matrix" after subsetting, instead of coercing it to a regular dense matrix. The error message you got implies that the usual "["
does not respect a "big.matrix" object, and attempts to return a dense matrix that is 37.6 GB. Wow! This implies that your "big.matrix" roughly has 75,322,188 rows!
Searching "subset" in the package's PDF manual, I find that you could try:
BigMatSubset <- deepcopy(BigMat, cols = 5:71)
Interesting, the manual also documents "["
. But it does not explicitly state if we are going to lose "big.matrix" class and get a regular matrix instead. For verification, you could extract a very small subset:
what <- BigMat[1:10, 1:4]
and see if what
is a regular dense matrix.
Update
Searching "[r] deepcopy" gives only 7 posts (excluding this answer) so far. The most relevant one is:
I also discovered function sub.big.matrix
when reading those posts. Searching "[r] sub.big.matrix" gives only 2 posts so far (excluding this answer), both answered by Charles Determan, an author of bigmemory:
I am now convinced that sub.big.matrix
is a better way to go.
All these posts are tagged with r-bigmemory. So I will edit your question to include this tag, too.