pythonmemoryscipyscipy-optimize-minimize

Scipy minimize returns an array too big


I am trying to minimize this function but using scipy minimize i expect to get in return an array of the same size of the one in input, that is (92445,) but i get an array of size (92445, 92445) and so i get the following error:

MemoryError: Unable to allocate 63.7 GiB for an array with shape (92445, 92445) and data type float64

I don't understant if i simply did not really understand how minimize works or if there is another problem that i don't see

X = np.asarray(freezer.dropna())
Xc = X
D = np.random.random((5,18489))
Z = np.random.random((5,5))
B = np.ones((5,5))
u = 0.5
us = np.sqrt(u)
l = 0.1

def P1(Di):
    Di = Di.reshape(D.shape)
    A = Z - np.matmul(Di,Xc) - B
    return (np.linalg.norm(A, 'fro'))**2

def P2(Xi):
    Xi = Xi.reshape(Xc.shape)
    A1 = np.vstack(X, us*(Z-B))
    A2 = np.vstack(np.identity(18489), us*D)
    A3 = np.matmul(A2, Xi)
    A4 = A1 - A3
    return (np.linalg.norm(A4, 'fro'))**2

def P3(Z):
    Z = Z.reshape(Z.shape)
    A = Z - np.matmul(D, Xc) - B
    return (np.linalg.norm(A, 'fro'))**2 + (l/u)*np.linalg.norm(Z, 1)

Df = D.flatten()
Df = minimize(P1, Df)
D = Df.reshape(D.shape)

Xcf = Xc.flatten()
Xcf = minimize(P2, Xcf)
Xc = Xcf.reshape(Xc.shape)

Zf = Z.flatten()
Zf = minimize(P3, Zf)
Z = Zf.reshape(Z.shape)

B = Z - np.matmul(D,Xc) - B

Solution

  • Nonlinear approach

    For the case of P1, you can notice that the argument D only affects one row of A at a time. Therefore, you can optimize each row of D separately, and combine them at the end into a single D matrix. Solving a 18489 dimension problem five times is much easier than solving a 92445 dimension problem, so this will make it faster and use less memory.

    I also found that it was necessary to use L-BFGS-B, a solver that uses less memory than what you are currently using, BFGS.

    Code:

    def P1(Di):
        Di = Di.reshape(D.shape)
        A = Z - np.matmul(Di,Xc) - B
        return (np.linalg.norm(A, 'fro'))**2
    
    
    def P1_simplified(Di, idx):
        """A version of P1 which operates on only one row of D,
        one row of Z, and one row of B."""
        Di = Di.reshape((1,) + (D.shape[1],))
        A = Z[idx] - np.matmul(Di,Xc) - B[idx]
        return (np.linalg.norm(A, 'fro'))**2
    
    
    def solve_P1(D):
        idx=0
        D_list = []
        for idx in range(D.shape[0]):
            print(f"solving row {idx}")
            result = minimize(
                P1_simplified,
                D[idx],
                args=(idx,),
                method='L-BFGS-B',
                options=dict(maxfun=1e7),
            )
            D_row = result.x
            D_list.append(D_row)
        D = np.array(D_list)
        print(f"solution has frobenius norm {P1(D)}")
    
    solve_P1(D)
    

    This is sufficient to solve this problem in under a gigabyte of memory, to get a Frobenius norm of ~10^-9. Since 0 is the best possible result, this is pretty close.

    I think a similar approach could be used for P2 and P3, though I haven't worked through it in detail.

    Linear approach

    Another approach, which would be way faster, would be to reformat the problem into a linear regression.

    Here is a short sketch of a proof.

    Taking the square of the Frobenius norm is equivalent to the sum of squares of each element of A. Since A = Z - np.matmul(Di,Xc) - B, this is equivalent to minimizing the square of the difference np.matmul(Di,Xc) - (Z - B). Split this into two pieces. Let Y = (Z - B). In order to do a standard linear regression, the parameters must be on the right, and in the expression np.matmul(Di,Xc), they are on the left. Fix this by transposing the inputs and swapping them to get np.matmul(Xc.T, Di.T). We also need to transpose Y as well, so Y = (Z - B).T. We are now trying to find np.matmul(Xc.T, Di.T) ~ Y. This is now an ordinary least squares problem, and we can use lstsq(), which is much faster than minimize.

    Code:

    def P1_lstsq(X, Z, B):
        Y = (Z - B).T
        Di, residuals, rank, s = np.linalg.lstsq(Xc.T, Y, rcond=None)
        return Di.T
    
    Di = P1_lstsq(X, Z, B)
    P1(Di)
    

    Not only can this solve the problem in under a millisecond, it also finds a smaller result than the minimize() solution.

    I think you could apply this to P2 and P3 as well. P3 looks to be equivalent to P1 except with the addition of an intercept to the linear regression. I don't quite understand P2, but since A2 and A3 are constant with respect to Xi, I think you could do something similar.