Outer Product and Partial Trace of large Vectors exceeds Memory

I'm trying to find the outer product of a large complex-valued vector (of size 91204) to later on find the it's partial trace using np.einsum. However I get the following error:

numpy._core._exceptions._ArrayMemoryError: Unable to allocate 124. GiB for an array with shape (91204, 91204) and data type complex128

While trying to find a way to solve this, I reduced the size of my vector and the Memory Error came from np.einsum.

The code I'm using for this is the following:

def partial_trace(rho, keep, dims, optimize=False):
    keep = np.asarray(keep)
    dims = np.asarray(dims)
    Ndim = dims.size
    Nkeep = np.prod(dims[keep])

    idx1 = [i for i in range(Ndim)]
    idx2 = [Ndim+i if i in keep else i for i in range(Ndim)]
    rho_a = rho.reshape(np.tile(dims,2))
    rho_a = np.einsum(rho_a, idx1+idx2, optimize=optimize)
    return rho_a.reshape(Nkeep, Nkeep)

shape = (1,91204)

GS = np.random.uniform(-1, 1, shape) + 1.j * np.random.uniform(-1, 1, shape)
GS = GS/np.linalg.norm(GS)
GS
rho = np.outer(GS, GS.transpose().conj())
P = partial_trace(rho, [0], [Nb+1, 2, Nb+1, 2], True)

In this case, GS is one eigenvector of a (91204, 91204) matrix. Is there a more efficient way to calculate this either only using einsum or other approach?

Solution

Basic idea is just to work with GS directly to save on memory rather than compute the intermediate matrix. Consider also that the vast majority of elements on rho aren't used to calculate the partial trace, so there's a lot of wasted calculations.

So element rho[i, j] = GS[i]*GS.conj()[j], and given some Nb, rho has shape 4(Nb+1)^2 x 4(Nb+1)^2 which you split into 4(Nb+1) x 4(Nb+1) matrices and calculate the trace of each submatrix. If we call this submatrix length v = 4(Nb+1), then the each partial trace element P[i,j] = np.sum(GS[i*v:(i+1)*v]*GS.conj()[j*v:(j+1)*v].

Actual code:


Nb = 150
shape = (1, 4*(Nb+1)**2)

GS = np.random.uniform(-1, 1, shape) + 1.j * np.random.uniform(-1, 1, shape)
GS = GS/np.linalg.norm(GS)

P2 = np.zeros((Nb+1, Nb+1), dtype = np.complex128)

vector_length = 4*(Nb+1)

for i in range(Nb+1):
    for j in range(Nb+1):
        P2[i, j] = np.sum(GS[0, i*vector_length:(i+1)*vector_length]*GS.conj()[0, j*vector_length:(j+1)*vector_length])