I am using K-mean alg. in R
in order to separe variables. I would like to plot results in ggplot
witch I was able to manage,
however results seem to be different in ggplot
and in cluster::clusplot
So I wanted to ask what I am missing: for example I know that scaling in different but I was wondering Whz when using clustplot
all variables are inside the bounds and when using ggplot
it is not.
Is it just because of the scaling?
So are two below result exatly the same?
library(cluster)
library(ggfortify)
x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)
A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')
For me, I get the same plot using either clusplot
or ggplot
. But for using ggplot
, you have to first make a PCA
on your data in order to get the same plot as clustplot
. Maybe it's where you have an issue.
Here, with your example, I did:
x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)
A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
pca_x = princomp(x)
x_cluster = data.frame(pca_x$scores,A$cluster)
ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() +
stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)
Hope it helps you to figure out the reason of your different plots