machine-learningpcaprincipal-components

1st principal component of 3 points on a line


I am a little bit confused on the first principal directions. Say I have three points in a two dimensional euclidean space: (1,1), (2,2), and (3,3) and I want to calculate the first principal component.

First I see that the center is (2,2) so I move all points to the origin. Now (2,2) is like (0,0) and (1,1) is (-1,-1) and (3,3) is (1,1). This is the mean shift. Now, I know that the first principal component is transpose((sqrt(2)/2, sqrt(2)/2)) from matlab. But, how is it calculating this? What does this mean?

Do you compute the covariance matrix then find the eigenvalues then the eigenvectors. This eigenvector is the direction? Then you normalize?

So we have our points after the mean shift at (-1,-1), (0,0), and (1,1). We now compute the covariance matrix

c(x,x) c(x,y)

c(y,x) c(y,y)

which is [0 1; 0 1] we then look at the largest eigenvalue 1 and compute the eigenvector which is [1;1]. Then we normalize so divide by sqrt(1^2 + 1^2)?


Solution

  • The steps you write is correct but you misunderstand some concepts. "Mean shift" part has no problem but you got it wrong about covariance matrix. Since the original data is in 2D, then the covariance matrix should between these two dimensions including all six values, that is (-1,0,1) in x axis and (-1,0,1) in y axis. So [0 1; 0 1] is not a correct answer.

    Suppose we already have the covariance matrix, we can use svd function in matlab to get the eigenvectors and eigenvalues. Eigenvector with the largest eigenvalue is not the direction but a new basis to represent the data. So if you multiply this eigenvector with original data, you can get a new representation of the data in a new coordinate system.

    I write a code in matlab to make my description easy to understand.

    clear;
    % Original data
    x = [1,1;2,2;3,3];
    x = x';
    x = x - repmat(mean(x, 2), 1, size(x, 2));
    figure('name','original data')
    plot(x(1,:),x(2,:),'*')
    axis([-5 5 -5 5])
    % PCA rotate data
    sigma = x * x' / size(x, 2);
    [U, S, V] = svd(sigma);
    xRot = U' * x;
    figure('name','PCA data rotation')
    plot(xRot(1,:),xRot(2,:),'*')
    axis([-5 5 -5 5])