rcluster-analysisggpubrfactoextra

Trouble visualizing K-means clusters with fviz_clusters()


Currently trying to visualize k-means clusters and running in to a bit of trouble. I'm getting this error message when I run the code below:

Error in fviz_cluster(res.km, data = nci[, 5], palette = c("#2E9FDF",  : 
  The dimension of the data < 2! No plot.

Here's my code:

library(dplyr)
library(tidyr)
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(factoextra)
library(ggpubr)


nci <- read.csv('/Users/KyleHammerberg/Desktop/ML Extra Credit/nci.datanames.csv')

names(nci)[1] <- "gene"

# Compute k-means with k = 3
set.seed(123)
res.km <- kmeans(scale(nci[,2]), 3, nstart = 25)
# K-means clusters showing the group of each individuals
res.km$cluster

fviz_cluster(res.km, data = nci[,5 ],
             palette = c("#2E9FDF", "#00AFBB", "#E7B800"), 
             geom = "point",
             ellipse.type = "convex", 
             ggtheme = theme_bw()
)

res.km$cluster
   [1] 1 2 1 2 3 1 1 3 3 3 3 3 1 1 1 3 3 3 1 3 3 3 3 1 1 1 3 3 3 3 1 3 3 1 3 3 1 1 1 1 1 3
  [43] 1 3 3 3 1 1 1 1 3 3 3 3 3 3 3 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 1 1 3 3 1 2 1 1 3 2 1 3
  [85] 1 1 1 1 1 1 1 2 3 1 1 1 3 3 1 1 1 1 1 1 1 3 2 1 2 1 3 3 1 1 1 1 3 3 1 3 3 3 3 1 1 1
 [127] 3 3 1 3 1 1 1 3 1 1 1 2 2 2 1 2 2 2 3 1 1 3 3 1 3 1 2 1 3 3 3 3 3 3 1 1 3 1 1 3 3 3
 [169] 1 3 3 3 3 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 2 3 3 3 1 3 3 1 1 3 3 1 3 1 1 3 3 1
 [211] 3 1 3 1 3 3 1 3 3 1 1 1 1 3 3 1 3 1 3 3 3 3 1 1 1 1 1 3 3 1 3 1 3 1 3 1 3 1 3 3 3 3
 [253] 3 3 1 3 3 3 3 3 1 2 1 3 1 3 3 1 1 3 1 1 1 1 1 3 1 3 3 3 3 1 1 3 3 1 3 3 1 1 1 3 1 1
 [295] 2 3 1 3 1 3 1 3 1 3 3 3 1 3 3 3 3 3 3 3 1 1 1 1 3 1 1 1 3 1 3 1 1 1 1 3 3 1 3 1 1 1
 [337] 3 1 1 2 1 1 1 1 1 1 3 1 3 3 1 3 1 3 3 1 1 3 3 1 1 1 3 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1
 [379] 1 1 1 1 1 1 1 1 3 3 1 3 1 1 1 2 1 1 1 3 1 1 1 1 1 3 3 1 3 3 3 1 1 1 1 1 1 1 1 1 3 1
 [421] 1 1 1 3 1 3 1 2 1 3 3 3 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 3 3 3 1 1 3 3 1 1 1 3
 [463] 3 3 1 3 3 1 3 3 3 3 1 3 1 1 1 3 1 3 3 3 3 3 3 3 3 3 1 3 1 1 3 3 1 1 3 3 3 3 3 3 3 3
 [505] 3 3 3 1 3 1 3 3 2 1 1 3 3 1 3 3 3 1 1 3 3 3 1 1 1 1 1 3 3 1 3 3 1 1 1 3 3 1 3 3 1 3
 [547] 1 1 1 1 3 3 3 1 3 3 3 3 3 3 1 2 1 1 3 3 3 3 1 1 3 3 3 3 3 1 3 1 1 3 1 3 3 3 3 3 3 3
 [589] 1 1 1 1 1 1 3 1 3 1 3 3 3 3 3 1 3 3 3 3 3 1 1 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3
 [631] 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 1 3 3 1 3 3 3 1 3
 [673] 1 3 3 1 1 1 3 1 3 3 3 3 1 3 3 1 3 1 1 1 1 3 1 3 1 3 3 3 1 1 1 3 1 1 1 1 3 3 3 3 3 3
 [715] 1 1 1 1 1 1 1 3 1 1 1 3 1 1 3 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 3 1 3 1 1 3 3
 [757] 1 1 1 1 1 1 1 3 3 3 3 1 3 1 1 3 1 3 3 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1
 [799] 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 3 3 1 3 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 3 3 1 3 3
 [841] 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 1 1 3 3 1 2 1 1 1 3 3 1 3 1 1 1 1 1 1 3 1 3 1 1 1
 [883] 1 1 1 1 1 1 3 1 1 1 1 3 3 1 1 3 3 3 3 3 3 1 1 2 1 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1
 [925] 1 1 1 3 3 1 1 3 1 1 1 1 1 1 1 1 1 1 3 3 3 3 1 3 3 3 3 3 3 3 1 1 1 3 1 3 1 1 1 1 1 1
 [967] 1 1 1 3 1 1 3 1 3 1 3 1 1 3 1 3 3 3 3 3 3 3 1 3 1 3 3 3 3 1 3 1 1 1
 [ reached getOption("max.print") -- omitted 5830 entries ]
 

Here's a look at the data if that helps:

head(nci)
  gene    CNS    CNS.1  CNS.2     RENAL BREAST  CNS.3  CNS.4 BREAST.1  NSCLC NSCLC.1
1   g1  0.300 0.679961  0.940  2.80e-01  0.485  0.310 -0.830   -0.190  0.460   0.760
2   g2  1.180 1.289961 -0.040 -3.10e-01 -0.465 -0.030  0.000   -0.870  0.000   1.490
3   g3  0.550 0.169961 -0.170  6.80e-01  0.395 -0.100  0.130   -0.450  1.150   0.280
4   g4  1.140 0.379961 -0.040 -8.10e-01  0.905 -0.460 -1.630    0.080 -1.400   0.100
5   g5 -0.265 0.464961 -0.605  6.25e-01  0.200 -0.205  0.075    0.005 -0.005  -0.525
6   g6 -0.070 0.579961  0.000 -1.39e-17 -0.005 -0.540 -0.360    0.350 -0.700   0.360
  RENAL.1 RENAL.2 RENAL.3 RENAL.4 RENAL.5 RENAL.6 RENAL.7 BREAST.2 NSCLC.2 RENAL.8 UNKNOWN
1   0.270  -0.450  -0.030   0.710  -0.360  -0.210  -0.500   -1.060   0.150  -0.290  -0.200
2   0.630  -0.060  -1.120   0.000  -1.420  -1.950  -0.520   -2.190  -0.450   0.000   0.740
3  -0.360   0.150  -0.050   0.160  -0.030  -0.700  -0.660   -0.130  -0.320   0.050   0.080
4  -1.040  -0.610   0.000  -0.770  -2.280  -1.650  -2.610    0.000  -1.610   0.730   0.760
5   0.015  -0.395  -0.285   0.045   0.135  -0.075   0.225   -0.485  -0.095   0.385  -0.105
6  -0.040   0.150  -0.250  -0.160  -0.320   0.060  -0.050   -0.430  -0.080   0.390  -0.080
  OVARIAN MELANOMA PROSTATE OVARIAN.1 OVARIAN.2 OVARIAN.3 OVARIAN.4 OVARIAN.5 PROSTATE.1
1   0.430   -0.490   -0.530    -0.010     0.640    -0.480     0.140     0.640      0.070
2   0.500    0.330   -0.050    -0.370     0.550     0.970     0.720     0.150      0.290
3  -0.730    0.010   -0.230    -0.160    -0.540     0.300    -0.240    -0.170      0.070
4   0.600   -1.660    0.170     0.930    -1.780     0.470     0.000     0.550      1.310
5  -0.635   -0.185    0.825     0.395     0.315     0.425     1.715    -0.205      0.085
6  -0.430   -0.140    0.010    -0.100     0.810     0.020     0.260     0.290     -0.620
  NSCLC.3 NSCLC.4 NSCLC.5 LEUKEMIA K562B.repro X6K562B.repro  LEUKEMIA.1 LEUKEMIA.2
1   0.130   0.320   0.515    0.080       0.410        -0.200 -0.36998050     -0.370
2   2.240   0.280   1.045    0.120       0.000         0.000 -1.38998000      0.180
3   0.640   0.360   0.000    0.060       0.210         0.060 -0.05998047      0.000
4   0.680  -1.880   0.000    0.400       0.180        -0.070  0.07001953     -1.320
5   0.135   0.475   0.330    0.105      -0.255        -0.415 -0.07498047     -0.825
6   0.300   0.110  -0.155   -0.190      -0.110         0.020  0.04001953     -0.130
  LEUKEMIA.3 LEUKEMIA.4 LEUKEMIA.5       COLON COLON.1   COLON.2     COLON.3 COLON.4
1     -0.430     -0.380     -0.550 -0.32003900  -0.620 -4.90e-01  0.07001953  -0.120
2     -0.590     -0.550      0.000  0.08996101   0.080  4.20e-01 -0.82998050   0.000
3     -0.500     -1.710      0.100 -0.29003900   0.140 -3.40e-01 -0.59998050  -0.010
4     -1.520     -1.870     -2.390 -1.03003900   0.740  7.00e-02 -0.90998050   0.130
5     -0.785     -0.585     -0.215  0.09496101   0.205 -2.05e-01  0.24501950   0.555
6      0.520      0.120     -0.620  0.05996101   0.000 -1.39e-17 -0.43998050  -0.550
  COLON.5    COLON.6 MCF7A.repro   BREAST.3 MCF7D.repro BREAST.4     NSCLC.6 NSCLC.7
1  -0.290 -0.8100195       0.200 0.37998050   0.3100195    0.030 -0.42998050   0.160
2   0.030  0.0000000      -0.230 0.44998050   0.4800195    0.220 -0.38998050  -0.340
3  -0.310  0.2199805       0.360 0.65998050   0.9600195    0.150 -0.17998050  -0.020
4   1.500  0.7399805       0.180 0.76998050   0.9600195   -1.240  0.86001950  -1.730
5   0.005  0.1149805      -0.315 0.05498047  -0.2149805   -0.305  0.78501950  -0.625
6  -0.540  0.1199805       0.410 0.54998050   0.3700195    0.050  0.04001953  -0.140
  NSCLC.8 MELANOMA.1 BREAST.5    BREAST.6 MELANOMA.2 MELANOMA.3 MELANOMA.4 MELANOMA.5
1   0.010     -0.620   -0.380  0.04998047      0.650     -0.030     -0.270      0.210
2  -1.280     -0.130    0.000 -0.72001950      0.640     -0.480      0.630     -0.620
3  -0.770      0.200   -0.060  0.41998050      0.150      0.070     -0.100     -0.150
4   0.940     -1.410    0.800  0.92998050     -1.970     -0.700      1.100     -1.330
5  -0.015      1.585   -0.115 -0.09501953     -0.065     -0.195      1.045      0.045
6   0.270      1.160    0.180  0.19998050      0.130      0.410      0.080     -0.400
  MELANOMA.6 MELANOMA.7
1  -5.00e-02      0.350
2   1.40e-01     -0.270
3  -9.00e-02      0.020
4  -1.26e+00     -1.230
5   4.50e-02     -0.715
6  -2.71e-20     -0.340

Solution

  • nci[,5 ] is the data with only one column. fviz_cluster requires data with atleast 2 columns. This check is performed in these lines https://github.com/kassambara/factoextra/blob/master/R/fviz_cluster.R#L184-L203 .

    Using mtcars as example -

    Passing a single column in data :

    res.km <- kmeans(scale(mtcars[,2]), 3, nstart = 25)
    factoextra::fviz_cluster(res.km, data = mtcars[,5],
                 palette = c("#2E9FDF", "#00AFBB", "#E7B800"), 
                 geom = "point",
                 ellipse.type = "convex", 
                 ggtheme = theme_bw())
    

    Error in factoextra::fviz_cluster(res.km, data = mtcars[, 5], palette = c("#2E9FDF", : The dimension of the data < 2! No plot.

    Passing two columns in data :

    factoextra::fviz_cluster(res.km, data = mtcars[,5:6],
                 palette = c("#2E9FDF", "#00AFBB", "#E7B800"), 
                 geom = "point",
                 ellipse.type = "convex", 
                 ggtheme = theme_bw())
    

    enter image description here