numpyscipylinear-regressionsvd

Why don't signular values of SciPy's `linalg.svd` and `linalg.lstsq` match?


SciPy's documentation says that lstsq returns the singular values of the observation matrix. But when I compute them directly using singular value decomposition (from SciPy's very same implementation, scipy.linalg.svd, I get a different set of values.

The trend is certainly the same among the two. But it seems that their minimum and maximum is different. That's particularly important, because it changes the condition number estimation. Why are they different?

Here's code to replicate this:

import numpy as np
from scipy.linalg import svd, lstsq
import matplotlib.pyplot as plt


# Let's generate some interesting X
X = np.arange(100*50, dtype=float).reshape(100,50)
X = np.sin(X) + np.tan(X) + np.cos(X)
X += np.random.normal(0,3, size=(100,50))

# And some function which we want to fit
# (for now it does't matter)
Y = np.sin(X)

# Let's compute the signular values of the observation matrix X
W, res, rank, s = lstsq(X, Y, cond=0) # cond=0 to deactivate sing-val truncation
_, S, _ = svd(X.T @ X)

# They should match exactly
plt.semilogy(S, label='from svd')
plt.semilogy(s, label='from lstsq')
plt.legend()

Enter image description here


Solution

  • You are computing, on one hand, singular values of X, and on another, singular values of XᵀX. So, not the same result.

    To be more accurate, result of the second one are the square of the first. Hence the multiplicative factor on log scale.

    If you want to be convinced of that, just plot square root of svd

    _, S, _ = svd(X.T@X)
    plt.semilogy(np.sqrt(S), label='from svd')
    plt.semilogy(s, label='from lstsq')
    

    enter image description here

    Or, compare with correct computation

    _, S, _ = svd(X)
    plt.semilogy(S, label='from svd')
    plt.semilogy(s, label='from lstsq')
    

    (same result)

    In your code, s are the singular values of X, and S the singular values of X.T@X. So different things. But reason why one is the square of the other is because of the definition of singular value: singular value is the square root of eigenvalues of X*X (here = XᵀX since those are all real values). So, the XᵀX part is already done by svd.

    And if λ is eigen value of XᵀX, that is if ∃u≠0, XᵀXu = λu, then (XᵀX)ᵀ(XᵀX) = (XᵀX)(XᵀX)u = XᵀX λu = λ²u. So λ² is eigenvalue of (XᵀX)ᵀ(XᵀX).

    So, if s is singular value of X, that is if s² is eigenvalue of XᵀX, then s⁴ is eigenvalue of (XᵀX)ᵀ(XᵀX), then s² is singular value of XᵀX.

    So, nothing surprising here. Singular values of XᵀX are the square of singular values of X. Which is exactly what your graph shows.