c++sparse-matrixeigensizeofeigenclass

Evaluate the size in bytes of something in C++


i have a sparse matrix from eigen library defined as:

Eigen::SparseMatrix<float> MyMatrix(2**n, 2**n). 

In addition I use the function reserve:

MyMatrix.reserve(Eigen::VectorXi::Constant(2**n, n+1)); 

This matrix has n+1 nonzero numbers each column and it has 2^n columns

I want the actual size in bytes this object occupies.

If I use:

MyMatrix.size()

It gives n_rows*n_columns.

I'm sure it's not the actual size stored, as I checked with the memory of the computer.

For example I can create on my computer a 2^25 * 2^25 sparse sized matrix of floats in this way, which should occupy ~ 10^15 bytes, which is simply impossible.

If I write

sizeof(MyMatrix) 

it gives 72, whichever n I use. It's probably something related to the class itself and not the object actually saved in it

Update 2:

This is the right way to compute its size:

It's one float and one int per reserved (or used) element plus two ints per column plus sizeof(Matrix) fixed overhead


Solution

  • As discussed in comments, here I explain the sparse matrix format

    You call reserve(), so the specific sub-format is uncompressed. This means we have

    That gives us (2 * 2**25 * 26 * 4 + 2 * 2**25 * 4) / 1024**3 = 6.75 GiB

    Let's put that to the test:

    #include <Eigen/Dense>
    #include <Eigen/Sparse>
    
    #include <malloc.h>
    
    
    int main()
    {
      int size = 1<<25;
      int nonzero_per_row = 26;
      Eigen::SparseMatrix<float> mat(size, size);
      mat.reserve(Eigen::VectorXi::Constant(size, nonzero_per_row));
      malloc_stats();
    }
    

    This prints:

    Arena 0:
    system bytes     =     135168
    in use bytes     =      74400
    Total (incl. mmap):
    system bytes     = 2952941568
    in use bytes     = 2952880800
    max mmap regions =          4
    max mmap bytes   = 7247773696
    

    As you can see, four mmapped allocations with a total size of 7,247,773,696 byte; which is 6.75 GiB.

    The reason this will work on your laptop with less memory is that you don't use that memory, yet. The memory is mmapped, but not initialized, so the operating system maps it all to the single zero page it has for this exact purpose. See for example Allocating more memory than there exists using malloc

    Integer range concerns

    One thing that should be noted is that this format uses a simple int to denote the array offset of the nonzero elements. This means the total number of nonzeros (and reserved elements) must remain below 2**31-1 (signed int range). With 2**25 * 26 elements, you are already close to 2**30 elements.

    Unless you know that this is the absolute upper limit with no concern for growth, I recommend you change the format to Eigen::SparseMatrix<float, Eigen::ColMajor, Eigen::Index>, using Eigen::Index, a.k.a. std::ptrdiff_t, instead of int. This will bump up the memory use to about 10.25 GiB but it will remove all concerns about potential integer overflows.