python numpy math linear-algebra numerical-methods

What is wrong with my block GMRES implementation?

I'm trying to implement a block GMRES procedure (i.e., GMRES to solve Ax=b, but with b that is not a vector but a n x r matrix, where r << n). My goal is to have a first implementation in Python (where I can compare to the scipy implementation) to be able to translate it in C++ later.

I have a simple code for GMRES (with restart) that is working quite well:

def gmres_restart(A, b, x0=None, tol=1e-5, max_iters=100, restart_dim=30):
    n = len(b)
    if x0 is None:
        x = np.zeros(n)
    else:
        x = x0.copy()

    r = b - A @ x
    beta = np.linalg.norm(r)
    if beta < tol:
        return x

    for outer_it in range(0, max_iters, restart_dim):
        # Arnoldi process
        V = np.zeros((n, restart_dim + 1))
        H = np.zeros((restart_dim + 1, restart_dim))
        V[:, 0] = r / beta

        for j in range(restart_dim):
            w = A @ V[:, j]
            for i in range(j + 1):
                H[i, j] = np.dot(V[:, i], w)
                w -= H[i, j] * V[:, i]
            H[j + 1, j] = np.linalg.norm(w)

            if H[j + 1, j] < 1e-14:
                break

            V[:, j + 1] = w / H[j + 1, j]

        # Solve least squares:
        e1 = np.zeros(j + 2)
        e1[0] = beta
        H_small = H[:j + 2, :j + 1]
        y, *_ = np.linalg.lstsq(H_small, e1)

        # Update solution
        x += V[:, :j + 1] @ y

        # New residual
        r = b - A @ x
        beta = np.linalg.norm(r)
        print(f"GMRES iter {outer_it + restart_dim}, residual = {beta:.2e}")
        if beta < tol:
            print("Converged.")
            return x

    print("Did not converge.")
    return x

It uses least-square fit, as described in Wikipedia.

"Translating" this code to block GMRES requires to use block Arnoldi, which I implemented like this:

def block_arnoldi(A, V0, m):
    n, s = V0.shape
    V = np.zeros((n, s * (m + 1)))
    H = np.zeros((s * (m + 1), s * m))

    # Normalize the initial block
    Q, _ = np.linalg.qr(V0, mode='reduced')
    V[:, :s] = Q

    for j in range(m):
        # Compute W = A * V_j
        Vj = V[:, j * s : (j + 1) * s]
        Wj = A @ Vj

        # Orthogonalize against previous blocks
        for i in range(j + 1):
            Vi = V[:, i * s : (i + 1) * s]
            Hij = Vi.T @ Wj
            H[i * s : (i + 1) * s, j * s : (j + 1) * s] = Hij
            Wj -= Vi @ Hij

        # QR decomposition to get next block of orthonormal vectors
        Q, R = np.linalg.qr(Wj, mode='reduced')
        V[:, (j + 1) * s : (j + 2) * s] = Q
        H[(j + 1) * s : (j + 2) * s, j * s : (j + 1) * s] = R

    return V, H

For those who have it, this is Algorithm 6.8 of "NUMERICAL METHODS FOR LARGE EIGENVALUE PROBLEMS (second edition)" by Y. Saad (page 146).

And then, here is my implementation for block GMRES:

def block_gmres_restart(A, B, X0=None, tol=1e-5, max_iters=100, restart_dim=30):
    n, s = B.shape
    if X0 is None:
        X = np.zeros((n, s))
    else:
        X = X0.copy()

    R = B - A @ X
    beta = np.linalg.norm(R, ord='fro')

    if beta < tol:
        return X

    for outer in range(0, max_iters, restart_dim):
        V, H = block_arnoldi(A, R, restart_dim)

        # Solve least-squares
        e1 = np.zeros(((restart_dim + 1) * s, s))
        e1[:s, :] = np.linalg.norm(R, axis=0).reshape(-1, 1) * np.eye(s)
        H_small = H[:(restart_dim + 1) * s, :restart_dim * s]
        Y, *_ = np.linalg.lstsq(H_small, e1)

        # Update solution
        X += V[:, :restart_dim * s] @ Y

        # New residual
        R = B - A @ X
        res_norm = np.linalg.norm(R, ord='fro')
        print(f"Block GMRES iter {outer + restart_dim}, residual = {res_norm:.2e}")

        if res_norm < tol:
            print("Converged.")
            return X

    print("Did not converge.")
    return X

The problem is the following: it never works for r > 1, and it sometimes works for r = 1. For example, this works:

np.random.seed(42)  # fix randomness

n = 20
A = np.diag(np.linspace(1, 100, n)) + 0.001 * np.random.randn(n, n)
A = (A + A.T) / 2  # Ensure symmetry

b = np.random.randn(n, 1)

x = gmres_restart(A, b[:, 0], restart_dim=10)
x = block_gmres_restart(A, b, restart_dim=10)

... But when I set n=30, block_gmres_restart does not converge, while gmres_restart does (residual increase instead of decreasing).

My understanding at the moment is that X += V[:, :restart_dim * s] @ Y should sometimes be X -= V[:, :restart_dim * s] @ Y. Would anyone have an idea why?

EDIT: I'm pretty sure that this is related to the use of QR decomposition, but I have no idea how this plays out.

Solution

After a good look at "Iterative methods for sparse linear systems" (also by Y. Saad), I found out my problem.

One needs to perform the QR decomposition of R, as V0, R0 = np.linalg.QR(R) (as it was done), use V0 as the first block in the Block Arnoldi procedure (as it was also the case), but the e1 vector needs to be set using R0, to e1[:s, :s] = R0, instead of using the norm of R as incorrectly done in my code above (which would work in certain cases).

For those who are interested, a working implementation is available at: https://gist.github.com/pierre-24/d638dcb908e28cde759c04989be7b603