My regular matrix has object size 416 bytes, and when I use as(, "sparseMatrix")
to turn it into sparse matrix, then the size for this sparse matrix goes up to 1720 bytes.
Is it normal? Shouldn't we expect a smaller storage size for the sparse matrix than the regular one?
Many thanks in advance!
matrix
is one of the base data structures of R, and can be stored with very little metadata: it is a sequence of values with just a length for each dimension, and a data type.
A sparseMatrix
object however contains more metadata, as you'll see with str()
in the examples below. Most prominently, for each non-zero value an (x,y) position is stored in addition to the value itself. This alone will cause a threefold increase in memory use, if you're storing integers. This is only compensated when there are many zero values, as they are not stored at all.
Dense example
Compare for a matrix with no zero values:
> mat1 = matrix( sample(3*3), c(3, 3))
> smat1 <- as(mat1, "sparseMatrix")
> showMem(c('mat1', 'smat1'), bytes=T)
size bytes
mat1 264 B 264
smat1 1.7 kB 1688
> mat1
[,1] [,2] [,3]
[1,] 2 5 7
[2,] 8 6 1
[3,] 3 4 9
> str(mat1)
int [1:3, 1:3] 2 8 3 5 6 4 7 1 9
> str(smat1)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:9] 0 1 2 0 1 2 0 1 2
..@ p : int [1:4] 0 3 6 9
..@ Dim : int [1:2] 3 3
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:9] 2 8 3 5 6 4 7 1 9
..@ factors : list()
Or a larger version of such a matrix:
> mat2 = matrix( sample(1000*1000), c(1000, 1000))
> smat2 <- as(mat2, "sparseMatrix")
> showMem(c('mat2', 'smat2'), bytes=T)
size bytes
mat2 4 MB 4000216
smat2 12 MB 12005504
Sparse example
Here we create a more sparse matrix, with 6 zeroes and only 3 values. We can see that the sparseMatrix only stores the 3 values.
> mat3 = matrix( sample(3*3)%%3%%2, c(3, 3))
> smat3 <- as(mat3, "sparseMatrix")
> showMem(c('mat3', 'smat3'), bytes=T)
size bytes
mat3 344 B 344
smat3 1.6 kB 1560
> mat3
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 0
[3,] 1 0 1
> str(mat3)
num [1:3, 1:3] 0 0 1 1 0 0 0 0 1
> str(smat3)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:3] 2 0 2
..@ p : int [1:4] 0 1 2 3
..@ Dim : int [1:2] 3 3
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:3] 1 1 1
..@ factors : list()
And finally a case where the sparseMatrix gives the expected memory savings:
> mat4 = matrix( sample(1000*1000)%%3%%2, c(1000, 1000))
> smat4 <- as(mat4, "sparseMatrix")
> table(mat4)
mat4
0 1
666666 333334
> showMem(c('mat4', 'smat4'), bytes=T)
size bytes
mat4 8 MB 8000216
smat4 4 MB 4005512