[SOLVED] How to find best fit line using PCA in Python?

How to find best fit line using PCA in Python?

I have this code that does it using SVD. But I want to know how to do the same using PCA. Online all I can find is that they are related, etc, but not sure how they are related and how they are different in code, doing the exact same thing.

I just want to see how PCA does this differently than SVD.

import numpy as np

points = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Centering the points
mean_point = np.mean(points, axis=0)
centered_points = points - mean_point

# Calculating the covariance matrix
covariance_matrix = np.cov(centered_points, rowvar=False)

# Performing the SVD
U, s, V = np.linalg.svd(covariance_matrix)

# Getting the first column of the first matrix U as the best fit line
normal = U[:, 0]

print("Best fit line:", normal)

Solution

tl;dr: SVD and PCA are used as synonyms. Mathematica

While Singular Value Decomposition refers to the mathematical operation (a factorization, strictly speaking), the Principle Component Analysis is more loosely defined as a method for finding linearly independent directions of maximum variability in high-dimensional space (where the data exists). This can be achieved by performing and SVD on the dataset matrix. Both terms are used as synonyms depending on the scientific community.

Regarding your question: The line

U, s, V = np.linalg.svd(covariance_matrix)

performs an SVD, while the lines

# Centering the points
mean_point = np.mean(points, axis=0)
centered_points = points - mean_point

# Calculating the covariance matrix
covariance_matrix = np.cov(centered_points, rowvar=False)

# Performing the SVD
U, s, V = np.linalg.svd(covariance_matrix)

perform an PCA, since usually a a zero-mean data-matrix is used.