I have a plot which demonstrates the idea of projection of points onto the axis with greatest variance. The code in R
is pasted below and I need an initial pointer how to reproduce this plot in ggplot2
.
# Simulate data
library(mvtnorm)
set.seed(2014)
Sigma <- matrix(data = c(4, 2, 2, 3), ncol=2)
mu <- c(1, 2)
n <- 20
X <- rmvnorm(n = n, mean = mu, sigma = Sigma)
# Run PCA
pca <- princomp(X)
load <- loadings(pca)
slope <- load[2, ] / load[1, ]
cmeans <- apply(X, 2, mean)
intercept <- cmeans[2] - (slope * cmeans[1])
# Plot data & 1st principal component
plot(X, pch = 20, asp = 1)
abline(intercept[1], slope[1])
# Plot perpendicular segments
x1 <- (X[, 2] - intercept[1]) / slope[1]
y1 <- intercept[1] + slope[1] * X[, 1]
x2 <- (x1 + X[, 1]) / 2
y2 <- (y1 + X[, 2]) / 2
segments(X[, 1], X[, 2], x2, y2, col = "red")
Put your matrix X and vectors x2 and y2 in one data frame.
df<-data.frame(X,x2,y2)
Then use columns X1, X2 as x
and y
values, x2 and y2 as xend=
and yend=
. Points are added with geom_point()
, abline with geom_abline()
and segments with geom_segment()
. With coord_fixed()
you ensure that x and y axis are the same width.
library(ggplot2)
ggplot(df,aes(X1,X2))+
geom_point()+
geom_abline(intercept=intercept[1],slope=slope[1])+
geom_segment(aes(xend=x2,yend=y2),color="red")+
coord_fixed()