I'm trying to implement a block GMRES procedure (i.e., GMRES to solve Ax=b, but with b that is not a vector but a n x r matrix, where r << n). My goal is to have a first implementation in Python (where I can compare to the scipy implementation) to be able to translate it in C++ later.
I have a simple code for GMRES (with restart) that is working quite well:
def gmres_restart(A, b, x0=None, tol=1e-5, max_iters=100, restart_dim=30):
n = len(b)
if x0 is None:
x = np.zeros(n)
else:
x = x0.copy()
r = b - A @ x
beta = np.linalg.norm(r)
if beta < tol:
return x
for outer_it in range(0, max_iters, restart_dim):
# Arnoldi process
V = np.zeros((n, restart_dim + 1))
H = np.zeros((restart_dim + 1, restart_dim))
V[:, 0] = r / beta
for j in range(restart_dim):
w = A @ V[:, j]
for i in range(j + 1):
H[i, j] = np.dot(V[:, i], w)
w -= H[i, j] * V[:, i]
H[j + 1, j] = np.linalg.norm(w)
if H[j + 1, j] < 1e-14:
break
V[:, j + 1] = w / H[j + 1, j]
# Solve least squares:
e1 = np.zeros(j + 2)
e1[0] = beta
H_small = H[:j + 2, :j + 1]
y, *_ = np.linalg.lstsq(H_small, e1)
# Update solution
x += V[:, :j + 1] @ y
# New residual
r = b - A @ x
beta = np.linalg.norm(r)
print(f"GMRES iter {outer_it + restart_dim}, residual = {beta:.2e}")
if beta < tol:
print("Converged.")
return x
print("Did not converge.")
return x
It uses least-square fit, as described in Wikipedia.
"Translating" this code to block GMRES requires to use block Arnoldi, which I implemented like this:
def block_arnoldi(A, V0, m):
n, s = V0.shape
V = np.zeros((n, s * (m + 1)))
H = np.zeros((s * (m + 1), s * m))
# Normalize the initial block
Q, _ = np.linalg.qr(V0, mode='reduced')
V[:, :s] = Q
for j in range(m):
# Compute W = A * V_j
Vj = V[:, j * s : (j + 1) * s]
Wj = A @ Vj
# Orthogonalize against previous blocks
for i in range(j + 1):
Vi = V[:, i * s : (i + 1) * s]
Hij = Vi.T @ Wj
H[i * s : (i + 1) * s, j * s : (j + 1) * s] = Hij
Wj -= Vi @ Hij
# QR decomposition to get next block of orthonormal vectors
Q, R = np.linalg.qr(Wj, mode='reduced')
V[:, (j + 1) * s : (j + 2) * s] = Q
H[(j + 1) * s : (j + 2) * s, j * s : (j + 1) * s] = R
return V, H
For those who have it, this is Algorithm 6.8 of "NUMERICAL METHODS FOR LARGE EIGENVALUE PROBLEMS (second edition)" by Y. Saad (page 146).
And then, here is my implementation for block GMRES:
def block_gmres_restart(A, B, X0=None, tol=1e-5, max_iters=100, restart_dim=30):
n, s = B.shape
if X0 is None:
X = np.zeros((n, s))
else:
X = X0.copy()
R = B - A @ X
beta = np.linalg.norm(R, ord='fro')
if beta < tol:
return X
for outer in range(0, max_iters, restart_dim):
V, H = block_arnoldi(A, R, restart_dim)
# Solve least-squares
e1 = np.zeros(((restart_dim + 1) * s, s))
e1[:s, :] = np.linalg.norm(R, axis=0).reshape(-1, 1) * np.eye(s)
H_small = H[:(restart_dim + 1) * s, :restart_dim * s]
Y, *_ = np.linalg.lstsq(H_small, e1)
# Update solution
X += V[:, :restart_dim * s] @ Y
# New residual
R = B - A @ X
res_norm = np.linalg.norm(R, ord='fro')
print(f"Block GMRES iter {outer + restart_dim}, residual = {res_norm:.2e}")
if res_norm < tol:
print("Converged.")
return X
print("Did not converge.")
return X
The problem is the following: it never works for r > 1, and it sometimes works for r = 1. For example, this works:
np.random.seed(42) # fix randomness
n = 20
A = np.diag(np.linspace(1, 100, n)) + 0.001 * np.random.randn(n, n)
A = (A + A.T) / 2 # Ensure symmetry
b = np.random.randn(n, 1)
x = gmres_restart(A, b[:, 0], restart_dim=10)
x = block_gmres_restart(A, b, restart_dim=10)
... But when I set n=30
, block_gmres_restart
does not converge, while gmres_restart
does (residual increase instead of decreasing).
My understanding at the moment is that X += V[:, :restart_dim * s] @ Y
should sometimes be X -= V[:, :restart_dim * s] @ Y
. Would anyone have an idea why?
EDIT: I'm pretty sure that this is related to the use of QR decomposition, but I have no idea how this plays out.
After a good look at "Iterative methods for sparse linear systems" (also by Y. Saad), I found out my problem.
One needs to perform the QR decomposition of R
, as V0, R0 = np.linalg.QR(R)
(as it was done), use V0
as the first block in the Block Arnoldi procedure (as it was also the case), but the e1
vector needs to be set using R0
, to e1[:s, :s] = R0
, instead of using the norm of R
as incorrectly done in my code above (which would work in certain cases).
For those who are interested, a working implementation is available at: https://gist.github.com/pierre-24/d638dcb908e28cde759c04989be7b603