rsparse-matrixobjectsize

Sparse Matrix size vs Regular Matrix size in r


My regular matrix has object size 416 bytes, and when I use as(, "sparseMatrix") to turn it into sparse matrix, then the size for this sparse matrix goes up to 1720 bytes.

Is it normal? Shouldn't we expect a smaller storage size for the sparse matrix than the regular one?

Many thanks in advance!


Solution

  • matrix is one of the base data structures of R, and can be stored with very little metadata: it is a sequence of values with just a length for each dimension, and a data type.

    A sparseMatrix object however contains more metadata, as you'll see with str() in the examples below. Most prominently, for each non-zero value an (x,y) position is stored in addition to the value itself. This alone will cause a threefold increase in memory use, if you're storing integers. This is only compensated when there are many zero values, as they are not stored at all.

    Dense example

    Compare for a matrix with no zero values:

    > mat1 = matrix( sample(3*3), c(3, 3))
    > smat1 <- as(mat1, "sparseMatrix")
    
    > showMem(c('mat1', 'smat1'), bytes=T)
            size bytes
    mat1   264 B   264
    smat1 1.7 kB  1688
    
    > mat1
         [,1] [,2] [,3]
    [1,]    2    5    7
    [2,]    8    6    1
    [3,]    3    4    9
    
    > str(mat1)
     int [1:3, 1:3] 2 8 3 5 6 4 7 1 9
    
    > str(smat1)
    Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
      ..@ i       : int [1:9] 0 1 2 0 1 2 0 1 2
      ..@ p       : int [1:4] 0 3 6 9
      ..@ Dim     : int [1:2] 3 3
      ..@ Dimnames:List of 2
      .. ..$ : NULL
      .. ..$ : NULL
      ..@ x       : num [1:9] 2 8 3 5 6 4 7 1 9
      ..@ factors : list()
    

    Or a larger version of such a matrix:

    > mat2 = matrix( sample(1000*1000), c(1000, 1000))
    > smat2 <- as(mat2, "sparseMatrix")
    
    > showMem(c('mat2', 'smat2'), bytes=T)
           size    bytes
    mat2   4 MB  4000216
    smat2 12 MB 12005504
    

    Sparse example

    Here we create a more sparse matrix, with 6 zeroes and only 3 values. We can see that the sparseMatrix only stores the 3 values.

    > mat3 = matrix( sample(3*3)%%3%%2, c(3, 3))
    > smat3 <- as(mat3, "sparseMatrix")
    
    > showMem(c('mat3', 'smat3'), bytes=T)
            size bytes
    mat3   344 B   344
    smat3 1.6 kB  1560
    
    > mat3
         [,1] [,2] [,3]
    [1,]    0    1    0
    [2,]    0    0    0
    [3,]    1    0    1
    
    > str(mat3)
     num [1:3, 1:3] 0 0 1 1 0 0 0 0 1
    
    > str(smat3)
    Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
      ..@ i       : int [1:3] 2 0 2
      ..@ p       : int [1:4] 0 1 2 3
      ..@ Dim     : int [1:2] 3 3
      ..@ Dimnames:List of 2
      .. ..$ : NULL
      .. ..$ : NULL
      ..@ x       : num [1:3] 1 1 1
      ..@ factors : list()
    

    And finally a case where the sparseMatrix gives the expected memory savings:

    > mat4 = matrix( sample(1000*1000)%%3%%2, c(1000, 1000))
    
    > smat4 <- as(mat4, "sparseMatrix")
    
    > table(mat4)
    mat4
         0      1 
    666666 333334 
    
    > showMem(c('mat4', 'smat4'), bytes=T)
          size   bytes
    mat4  8 MB 8000216
    smat4 4 MB 4005512