pythonnumpylinear-algebrasvdarpack

SVD MemoryError in Python


I want to perform an SVD on a big array M[159459,159459].

Since SVD computation depends on the input matrix of shape (159459,159459), this here does not address my goal.

I have tried to use:

However, I always get a MemoryError. Finally, I want to compute the full SVD, because I want to perform a Procrustes analysis, i.e. if M is the matrix that I have now, I need M = USV'

import numpy as np
from scipy import linalg

#M = np.load("M.npy")
M = np.random.rand(159459,159459)
U, s, Vh = linalg.svd(M, check_finite=False, lapack_driver='gesvd)

Everything fails.

My system details:

$ cat /proc/meminfo
MemTotal: 527842404 kB
MemFree: 523406068 kB
MemAvailable: 521659112 kB


Solution

  • Memory size matters, latency costs will hurt you next:

    Given mM.shape == [159459, 159459],
    given mM.dtype is by default float(64)
    there will be a need to have about:
    203.42 [GB] for the ori ginal of mM[159459, 159459], plus
    203.42 [GB] for the computed mU[159459, 159459], plus
    203.42 [GB] for the computed Vh[159459, 159459]
    0.0013 [GB] for the computed vS[159459]

    A cheapest ever step, by trying a linear-only downscaling by a factor of 2 ( and not more than 4 ) from float64 to float32 or even float16 is not the game changer and is even heavily penalised by numpy inefficiencies ( if not internally performed back-conversions up to float64 again - my own attempts were so bleeding on this, that I share the resulting dissatisfaction here, so as to avoid repeating my own errors on trying to start with the lowest hanging fruit first ... )

    In case your analysis may work just with the vector vS, only the .svd( ..., compute_uv = False, ... ) flag will avoid making a space for about ~ 1/2 [TB] RAM-allocations by not returning ( and thus not reserving space for them ) instances of mU and Vh.

    Even such a case does not mean your SLOC will survive as is in the just about the reported 0.5 TB RAM-system. The scipy.linalg.svd() processing will allocated internally working resources, that are outside of your scope of coding ( sure, unless you re-factor and re-design the scipy.linalg module on your own, which is fair to consider very probable if not sure ) and configuration control. So, be warned that even when you test the compute_uv = False-mode of processing, the .svd() may still throw an error, if it fails to internally allocate required internally used data-structures, that do not fit the current RAM.

    This also means that even using the numpy.memmap(), which may be a successful trick to off-load the in-RAM representation of the original mM ( avoiding some remarkable part of the first needed 203.4 [GB] from sitting and blocking the usage of the hosts RAM ), yet there are costs you will have to pay for using this trick.

    My experiments, at smaller scales of the .memmap-s, used for matrix processing and in ML-optimisations, yield about 1E4 ~ 1E6 slower processing because, in spite of the smart caching, the numpy.memmap()-instances are dependent on the disk-I/O.

    Best result will come from using advanced, TB-sized, SSD-only-storage devices, hosted right on the computing device on some fast and low-latency access-bus M.2 or PCIx16.

    The last piece of experience, one might not yet want to hear here:

    Using larger host-based RAM, which means using a multi-TB computing device, is the safest way to go further. Testing the above proposed steps will help, if reduced performance and additional expenses are within your Project's budget. If not, go for the use of an HPC centre at your Alma Mater or at your Project's closest research centre, where such multi-TB computing devices are being used in common.