rggplot2k-meansggfortify

Kmean clustering in ggplot


I am using K-mean alg. in R in order to separe variables. I would like to plot results in ggplot witch I was able to manage, however results seem to be different in ggplot and in cluster::clusplot

So I wanted to ask what I am missing: for example I know that scaling in different but I was wondering Whz when using clustplot all variables are inside the bounds and when using ggplot it is not.

Is it just because of the scaling?

So are two below result exatly the same?

library(cluster)
library(ggfortify)


x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
           matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)

A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')

Solution

  • For me, I get the same plot using either clusplot or ggplot. But for using ggplot, you have to first make a PCA on your data in order to get the same plot as clustplot. Maybe it's where you have an issue.

    Here, with your example, I did:

    x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
               matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
    colnames(x) <- c("x", "y")
    x <- data.frame(x)
    
    A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
    cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
    
    pca_x = princomp(x)
    x_cluster = data.frame(pca_x$scores,A$cluster)
    ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() + 
      stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)
    

    The plot using clusplot enter image description here

    And the one using ggplot: enter image description here

    Hope it helps you to figure out the reason of your different plots