What is the best way for diagonalizing a triangular matrix in numpy/scipy?

I have a matrix I know is triangular and I want to diagonalize it. Since the eigenvalues of a triangular matrix are easy to find (just the diagonal values), this problem is theoretically exactly solvable, so I figured that using numpy.linalg.eig might be overkill and might be slower/not as accurate as it could be. That is, of course, unless eig is smart enough to work efficiently with a triangular matrix, but I couldn't find any documentation that assures this.

Technically I COULD use numpy.linalg.solve to solve (A-lambda I)x=0 for each eigenvalue lambda to find the eigenvectors, but I have a decent amount of matrices (~200) and a decent amount of eigenvalues (also ~200) so I feel like this might not be optimal either.

Solution

Since the eigenvalues of a triangular matrix are easy to find (just the diagonal values), this problem is theoretically exactly solvable

I don't know if that really helps you. All of the ways I can think of for efficiently calculating eigenvectors also give you the eigenvalues for free.

Honestly, I would suggest just using numpy.linalg.eig(A). My testing showed that it is about 10x-20x faster in the case where A is an upper triangular matrix.

Reading the source code, OpenBLAS appears to do a QR factorization to obtain an upper triangular matrix, then use the strevc3_ lapack function to find the eigenvalues and eigenvectors. A QR factorization is going to be very fast for an upper triangular matrix.

Benchmark:

import numpy as np
import scipy.linalg
np.random.seed(42)

for N in [100, 1000]:
    print(f"trying matrices of size {N}")
    A = np.random.normal(size=(N, N))
    print("full matrix")
    %timeit np.linalg.eig(A)
    %timeit np.linalg.qr(A)
    %timeit scipy.linalg.lu_factor(A)
    %timeit
    print("upper tri")
    A = np.random.normal(size=(N, N))
    A = np.triu(A)
    %timeit np.linalg.eig(A)
    %timeit np.linalg.qr(A)
    %timeit scipy.linalg.lu_factor(A)

Results:

trying matrices of size 100
full matrix
11.7 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
737 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
91.7 µs ± 854 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
upper tri
438 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
94.5 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
81.6 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
trying matrices of size 1000
full matrix
1.31 s ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
94 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
18.4 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
upper tri
157 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
80.1 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
16.7 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As an alternative, you might think about using strevc3_ directly. However, I couldn't find a simple way to do this. Although both SciPy and NumPy include a strevc3_ or dtrevc3 routine internally, neither SciPy nor Numpy have a Python accessible wrapper for it.