Given the matrix X
of dimension D x N
, I am interested to compute the eigen-values of C = np.dot(X, X.T)/N
using QR factorization. Based on following:
we expect the eigen-values of C
to be np.diag(r.T,r)
using the following
q, r=np.linalg.qr(np.dot(X.T, V))
lambdas2=np.diag(np.dot(r.T, r)) / N
However, the values in lambdas2
I am taking using the code below are different from the ones in lambda1
.
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(X)
lambdas1=pca.explained_variance_
The full example is:
import numpy as np
from sklearn.decomposition import PCA
if __name__ == "__main__":
N = 1000
D = 20
X = np.random.rand(D, N)
X_train_mean = X.mean(axis=0)
X_train_std = X.std(axis=0)
X_normalized = (X - X_train_mean) / X_train_std
pca = PCA(n_components=D)
cov_ = np.cov(X_normalized) # A D x D array.
pca.fit(cov_)
lambdas1 = pca.explained_variance_
projected_data = np.dot(pca.components_, X_normalized).T # An N x n_components array.
q, r = np.linalg.qr(projected_data)
lambdas2 = np.sort(np.diag(np.dot(r.T, r)) / N)[::-1]
I guess that you need to pass X_normalized.T
to the fit method of PCA and not the covariance matrix.
Because the computation of the covariance matrix is part of PCA algorithm and the components
/explained_variance
are directly the eigenvectors/eigenvalues of the covariance matrix.