rplotggplot2principal-components

PCA projection plot with ggplot2


I have a plot which demonstrates the idea of projection of points onto the axis with greatest variance. The code in R is pasted below and I need an initial pointer how to reproduce this plot in ggplot2.

Figure-1

# Simulate data
library(mvtnorm)
set.seed(2014)
Sigma <- matrix(data = c(4, 2, 2, 3), ncol=2)
mu <- c(1, 2)
n <- 20
X <- rmvnorm(n = n, mean = mu, sigma = Sigma)

# Run PCA
pca <- princomp(X)
load <- loadings(pca)
slope <- load[2, ] / load[1, ]
cmeans <- apply(X, 2, mean)
intercept <- cmeans[2] - (slope * cmeans[1])

# Plot data & 1st principal component
plot(X, pch = 20, asp = 1)
abline(intercept[1], slope[1])

# Plot perpendicular segments
x1 <- (X[, 2] - intercept[1]) / slope[1]
y1 <- intercept[1] + slope[1] * X[, 1]
x2 <- (x1 + X[, 1]) / 2
y2 <- (y1 + X[, 2]) / 2
segments(X[, 1], X[, 2], x2, y2, col = "red")

Solution

  • Put your matrix X and vectors x2 and y2 in one data frame.

    df<-data.frame(X,x2,y2)
    

    Then use columns X1, X2 as x and y values, x2 and y2 as xend= and yend=. Points are added with geom_point(), abline with geom_abline() and segments with geom_segment(). With coord_fixed() you ensure that x and y axis are the same width.

    library(ggplot2)
    ggplot(df,aes(X1,X2))+
          geom_point()+
          geom_abline(intercept=intercept[1],slope=slope[1])+
          geom_segment(aes(xend=x2,yend=y2),color="red")+
          coord_fixed()
    

    enter image description here