pythonalgorithmlinear-algebra

Why does this implementation of the Block Tridiagonal Thomas algorithm give such large errors?


See below my attempt at implementing the block tridiagonal thomas algorithm. If you run this however, you get a relatively large (10^-2) error in the block TMDA compared to the np direct solve (10^-15), even for this very simple case. More complicated test cases give larger errors - I think the errors start growing on the back substitution. Any help as to why would be much appreciated!

import numpy as np
import torch

def solve_block_tridiagonal(a, b, c, d):

    N = len(b)
    x = np.zeros_like(d)
    
    # Forward elimination with explicit C* and d* storage
    C_star = np.zeros_like(c)
    d_star = np.zeros_like(d)

    # Initial calculations for C_0* and d_0*
    C_star[0] = np.linalg.solve(b[0], c[0])
    d_star[0] = np.linalg.solve(b[0], d[0])

    # Forward elimination
    for i in range(1, N - 1):
        C_star[i] = np.linalg.solve(b[i] - a[i-1] @ C_star[i-1], c[i])
        d_star[i] = np.linalg.solve(b[i] - a[i-1] @ C_star[i-1], d[i] - a[i-1] @ d_star[i-1])

    # Last d_star update for the last block
    d_star[-1] = np.linalg.solve(b[-1] - a[-2] @ C_star[-2], d[-1] - a[-2] @ d_star[-2])

    # Backward substitution
    x[-1] = d_star[-1]
    for i in range(N-2, -1, -1):
        x[i] = d_star[i] - C_star[i] @ x[i+1]

    return x


def test_block_tridiagonal_solver():

    N = 4

    a = np.array([
        [[1, 0.5], [0.5, 1]],  
        [[1, 0.5], [0.5, 1]],
        [[1, 0.5], [0.5, 1]]
    ], dtype=np.float64)
    
    b = np.array([
        [[5, 0.5], [0.5, 5]],  
        [[5, 0.5], [0.5, 5]],
        [[5, 0.5], [0.5, 5]],
        [[5, 0.5], [0.5, 5]]
    ], dtype=np.float64)
    
    c = np.array([
        [[1, 0.5], [0.5, 1]],  
        [[1, 0.5], [0.5, 1]],
        [[1, 0.5], [0.5, 1]]
    ], dtype=np.float64)

    d = np.array([
        [1, 2], 
        [2, 3], 
        [3, 4], 
        [4, 5]
    ], dtype=np.float64)
    
    x = solve_block_tridiagonal(a, b, c, d)

    # Construct the equivalent full matrix A_full and right-hand side d_full
    A_full = np.block([
        [b[0], c[0], np.zeros((2, 2)), np.zeros((2, 2))],
        [a[0], b[1], c[1], np.zeros((2, 2))],
        [np.zeros((2, 2)), a[1], b[2], c[2]],
        [np.zeros((2, 2)), np.zeros((2, 2)), a[2], b[3]]
    ])
    
    d_full = d.flatten()  # Flatten d for compatibility with the full system

    # Solve using numpy's direct solve for comparison
    x_np = np.linalg.solve(A_full, d_full).reshape((N, 2))
    # Print the solutions for comparison
    print("Solution x from block tridiagonal solver (TMDA):\n", x, "\nResidual:", torch.sum(torch.abs(torch.tensor(A_full)@torch.tensor(x).flatten() - torch.tensor(d).flatten())))
    print("Solution x from direct full matrix solver:\n", x_np, "\nResidual np:", torch.sum(torch.abs(torch.tensor(A_full)@torch.tensor(x_np).flatten() - torch.tensor(d).flatten())))
# Run the test function
test_block_tridiagonal_solver()

Solution

  • It is dangerous to define a and c with a different number of elements to b, as it could lead to the wrong indexing in the TDMA.

    If you change the line setting the last element of d_star to the following:

    d_star[-1] = np.linalg.solve(b[-1] - a[-1] @ C_star[-1], d[-1] - a[-1] @ d_star[-2])
    

    this should now work. Note the TWO occurrences of a[-1], NOT a[-2] and the occurrence of C_star[-1] rather than C_star[-2] ... but these are consequences of how you have sent under-length arrays a and c to the routine.