There has been a post regarding usage of MPI with Armadillo in C++: here
My question is, wether Lapack and OpenBlas did implement MPI support? I could not anything so far in their documentations.
There seems to be a different library called ScaLAPACK, which uses MPI. Is this library compatible with Armadillo. Is it any different from LAPACK in use?
I need to diagonalise extremely large matrices (over 1TB of memory), therefore I need to spread the memory over multiple nodes on a cluster using MPI, but I don't know how to deal with that using Armadillo.
Do you have any useful tip/reference where I could find how to do that?
Any Blas is single-process. Some Blas implementations do multi-threading. So MPI has nothing to do with this: in an MPI run, each process calls a non-distributed Blas routine.
Scalapack is distributed memory, based on MPI. It is very different from Lapack. Matrix handling is considerably more complicated. Some applications / libraries are able to use Scalapack, but you can not switch out Lapack for Scalapack: support for Scalapack needs to be added explicitly.
Armadillo mentions threading support through OpenMP, and there is no mention of MPI. Therefore, you can not use Armadillo over multiple nodes.
If you want to do distributed eigenvalue calculations, take a look at the PETSc library and the SLEPc package on top of it. Those are written in C, so they can easily (though not entirely idiomatically) be used from C++.