scikit-learnpca

What does `fit_transform` do in the context of Scikit Learn PCA?


I don't understand what fit_transform does compared to fit in the context of Scikit Learn and PCA.

PCA takes some data and attempts to measure a set of eigenvectors, where each vector is orthogonal to all others and aligned in the direction of maximum remaining variance.

Put another way, the first eigenvector found is oriented along the axis of maximal data variance.

What transformation does fit_transform do, and what interpretation does it have in the context of PCA?

In other words, what transformation is being done by the transform step?


Solution

  • In simple terms:

    In practice, Scikit-learn’s PCA implementation uses Singular Value Decomposition (SVD) on X, which gives you both the eigenvectors and principal components in one step during fit(). However, if you have new data to project into the principal component space, you’ll need the transform() method to do that projection.

    Note on Scikit-learn's terminology: eigenvectors = components_