Currently trying to visualize k-means clusters and running in to a bit of trouble. I'm getting this error message when I run the code below:
Error in fviz_cluster(res.km, data = nci[, 5], palette = c("#2E9FDF", :
The dimension of the data < 2! No plot.
Here's my code:
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidyverse)
library(hrbrthemes)
library(factoextra)
library(ggpubr)
nci <- read.csv('/Users/KyleHammerberg/Desktop/ML Extra Credit/nci.datanames.csv')
names(nci)[1] <- "gene"
# Compute k-means with k = 3
set.seed(123)
res.km <- kmeans(scale(nci[,2]), 3, nstart = 25)
# K-means clusters showing the group of each individuals
res.km$cluster
fviz_cluster(res.km, data = nci[,5 ],
palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw()
)
res.km$cluster
[1] 1 2 1 2 3 1 1 3 3 3 3 3 1 1 1 3 3 3 1 3 3 3 3 1 1 1 3 3 3 3 1 3 3 1 3 3 1 1 1 1 1 3
[43] 1 3 3 3 1 1 1 1 3 3 3 3 3 3 3 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 1 1 3 3 1 2 1 1 3 2 1 3
[85] 1 1 1 1 1 1 1 2 3 1 1 1 3 3 1 1 1 1 1 1 1 3 2 1 2 1 3 3 1 1 1 1 3 3 1 3 3 3 3 1 1 1
[127] 3 3 1 3 1 1 1 3 1 1 1 2 2 2 1 2 2 2 3 1 1 3 3 1 3 1 2 1 3 3 3 3 3 3 1 1 3 1 1 3 3 3
[169] 1 3 3 3 3 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 2 3 3 3 1 3 3 1 1 3 3 1 3 1 1 3 3 1
[211] 3 1 3 1 3 3 1 3 3 1 1 1 1 3 3 1 3 1 3 3 3 3 1 1 1 1 1 3 3 1 3 1 3 1 3 1 3 1 3 3 3 3
[253] 3 3 1 3 3 3 3 3 1 2 1 3 1 3 3 1 1 3 1 1 1 1 1 3 1 3 3 3 3 1 1 3 3 1 3 3 1 1 1 3 1 1
[295] 2 3 1 3 1 3 1 3 1 3 3 3 1 3 3 3 3 3 3 3 1 1 1 1 3 1 1 1 3 1 3 1 1 1 1 3 3 1 3 1 1 1
[337] 3 1 1 2 1 1 1 1 1 1 3 1 3 3 1 3 1 3 3 1 1 3 3 1 1 1 3 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1
[379] 1 1 1 1 1 1 1 1 3 3 1 3 1 1 1 2 1 1 1 3 1 1 1 1 1 3 3 1 3 3 3 1 1 1 1 1 1 1 1 1 3 1
[421] 1 1 1 3 1 3 1 2 1 3 3 3 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 3 3 3 1 1 3 3 1 1 1 3
[463] 3 3 1 3 3 1 3 3 3 3 1 3 1 1 1 3 1 3 3 3 3 3 3 3 3 3 1 3 1 1 3 3 1 1 3 3 3 3 3 3 3 3
[505] 3 3 3 1 3 1 3 3 2 1 1 3 3 1 3 3 3 1 1 3 3 3 1 1 1 1 1 3 3 1 3 3 1 1 1 3 3 1 3 3 1 3
[547] 1 1 1 1 3 3 3 1 3 3 3 3 3 3 1 2 1 1 3 3 3 3 1 1 3 3 3 3 3 1 3 1 1 3 1 3 3 3 3 3 3 3
[589] 1 1 1 1 1 1 3 1 3 1 3 3 3 3 3 1 3 3 3 3 3 1 1 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3
[631] 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 1 3 3 1 3 3 3 1 3
[673] 1 3 3 1 1 1 3 1 3 3 3 3 1 3 3 1 3 1 1 1 1 3 1 3 1 3 3 3 1 1 1 3 1 1 1 1 3 3 3 3 3 3
[715] 1 1 1 1 1 1 1 3 1 1 1 3 1 1 3 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 3 1 3 1 1 3 3
[757] 1 1 1 1 1 1 1 3 3 3 3 1 3 1 1 3 1 3 3 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1
[799] 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 3 3 1 3 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 3 3 1 3 3
[841] 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 1 1 3 3 1 2 1 1 1 3 3 1 3 1 1 1 1 1 1 3 1 3 1 1 1
[883] 1 1 1 1 1 1 3 1 1 1 1 3 3 1 1 3 3 3 3 3 3 1 1 2 1 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1
[925] 1 1 1 3 3 1 1 3 1 1 1 1 1 1 1 1 1 1 3 3 3 3 1 3 3 3 3 3 3 3 1 1 1 3 1 3 1 1 1 1 1 1
[967] 1 1 1 3 1 1 3 1 3 1 3 1 1 3 1 3 3 3 3 3 3 3 1 3 1 3 3 3 3 1 3 1 1 1
[ reached getOption("max.print") -- omitted 5830 entries ]
Here's a look at the data if that helps:
head(nci)
gene CNS CNS.1 CNS.2 RENAL BREAST CNS.3 CNS.4 BREAST.1 NSCLC NSCLC.1
1 g1 0.300 0.679961 0.940 2.80e-01 0.485 0.310 -0.830 -0.190 0.460 0.760
2 g2 1.180 1.289961 -0.040 -3.10e-01 -0.465 -0.030 0.000 -0.870 0.000 1.490
3 g3 0.550 0.169961 -0.170 6.80e-01 0.395 -0.100 0.130 -0.450 1.150 0.280
4 g4 1.140 0.379961 -0.040 -8.10e-01 0.905 -0.460 -1.630 0.080 -1.400 0.100
5 g5 -0.265 0.464961 -0.605 6.25e-01 0.200 -0.205 0.075 0.005 -0.005 -0.525
6 g6 -0.070 0.579961 0.000 -1.39e-17 -0.005 -0.540 -0.360 0.350 -0.700 0.360
RENAL.1 RENAL.2 RENAL.3 RENAL.4 RENAL.5 RENAL.6 RENAL.7 BREAST.2 NSCLC.2 RENAL.8 UNKNOWN
1 0.270 -0.450 -0.030 0.710 -0.360 -0.210 -0.500 -1.060 0.150 -0.290 -0.200
2 0.630 -0.060 -1.120 0.000 -1.420 -1.950 -0.520 -2.190 -0.450 0.000 0.740
3 -0.360 0.150 -0.050 0.160 -0.030 -0.700 -0.660 -0.130 -0.320 0.050 0.080
4 -1.040 -0.610 0.000 -0.770 -2.280 -1.650 -2.610 0.000 -1.610 0.730 0.760
5 0.015 -0.395 -0.285 0.045 0.135 -0.075 0.225 -0.485 -0.095 0.385 -0.105
6 -0.040 0.150 -0.250 -0.160 -0.320 0.060 -0.050 -0.430 -0.080 0.390 -0.080
OVARIAN MELANOMA PROSTATE OVARIAN.1 OVARIAN.2 OVARIAN.3 OVARIAN.4 OVARIAN.5 PROSTATE.1
1 0.430 -0.490 -0.530 -0.010 0.640 -0.480 0.140 0.640 0.070
2 0.500 0.330 -0.050 -0.370 0.550 0.970 0.720 0.150 0.290
3 -0.730 0.010 -0.230 -0.160 -0.540 0.300 -0.240 -0.170 0.070
4 0.600 -1.660 0.170 0.930 -1.780 0.470 0.000 0.550 1.310
5 -0.635 -0.185 0.825 0.395 0.315 0.425 1.715 -0.205 0.085
6 -0.430 -0.140 0.010 -0.100 0.810 0.020 0.260 0.290 -0.620
NSCLC.3 NSCLC.4 NSCLC.5 LEUKEMIA K562B.repro X6K562B.repro LEUKEMIA.1 LEUKEMIA.2
1 0.130 0.320 0.515 0.080 0.410 -0.200 -0.36998050 -0.370
2 2.240 0.280 1.045 0.120 0.000 0.000 -1.38998000 0.180
3 0.640 0.360 0.000 0.060 0.210 0.060 -0.05998047 0.000
4 0.680 -1.880 0.000 0.400 0.180 -0.070 0.07001953 -1.320
5 0.135 0.475 0.330 0.105 -0.255 -0.415 -0.07498047 -0.825
6 0.300 0.110 -0.155 -0.190 -0.110 0.020 0.04001953 -0.130
LEUKEMIA.3 LEUKEMIA.4 LEUKEMIA.5 COLON COLON.1 COLON.2 COLON.3 COLON.4
1 -0.430 -0.380 -0.550 -0.32003900 -0.620 -4.90e-01 0.07001953 -0.120
2 -0.590 -0.550 0.000 0.08996101 0.080 4.20e-01 -0.82998050 0.000
3 -0.500 -1.710 0.100 -0.29003900 0.140 -3.40e-01 -0.59998050 -0.010
4 -1.520 -1.870 -2.390 -1.03003900 0.740 7.00e-02 -0.90998050 0.130
5 -0.785 -0.585 -0.215 0.09496101 0.205 -2.05e-01 0.24501950 0.555
6 0.520 0.120 -0.620 0.05996101 0.000 -1.39e-17 -0.43998050 -0.550
COLON.5 COLON.6 MCF7A.repro BREAST.3 MCF7D.repro BREAST.4 NSCLC.6 NSCLC.7
1 -0.290 -0.8100195 0.200 0.37998050 0.3100195 0.030 -0.42998050 0.160
2 0.030 0.0000000 -0.230 0.44998050 0.4800195 0.220 -0.38998050 -0.340
3 -0.310 0.2199805 0.360 0.65998050 0.9600195 0.150 -0.17998050 -0.020
4 1.500 0.7399805 0.180 0.76998050 0.9600195 -1.240 0.86001950 -1.730
5 0.005 0.1149805 -0.315 0.05498047 -0.2149805 -0.305 0.78501950 -0.625
6 -0.540 0.1199805 0.410 0.54998050 0.3700195 0.050 0.04001953 -0.140
NSCLC.8 MELANOMA.1 BREAST.5 BREAST.6 MELANOMA.2 MELANOMA.3 MELANOMA.4 MELANOMA.5
1 0.010 -0.620 -0.380 0.04998047 0.650 -0.030 -0.270 0.210
2 -1.280 -0.130 0.000 -0.72001950 0.640 -0.480 0.630 -0.620
3 -0.770 0.200 -0.060 0.41998050 0.150 0.070 -0.100 -0.150
4 0.940 -1.410 0.800 0.92998050 -1.970 -0.700 1.100 -1.330
5 -0.015 1.585 -0.115 -0.09501953 -0.065 -0.195 1.045 0.045
6 0.270 1.160 0.180 0.19998050 0.130 0.410 0.080 -0.400
MELANOMA.6 MELANOMA.7
1 -5.00e-02 0.350
2 1.40e-01 -0.270
3 -9.00e-02 0.020
4 -1.26e+00 -1.230
5 4.50e-02 -0.715
6 -2.71e-20 -0.340
nci[,5 ]
is the data with only one column. fviz_cluster
requires data with atleast 2 columns. This check is performed in these lines https://github.com/kassambara/factoextra/blob/master/R/fviz_cluster.R#L184-L203 .
Using mtcars
as example -
Passing a single column in data
:
res.km <- kmeans(scale(mtcars[,2]), 3, nstart = 25)
factoextra::fviz_cluster(res.km, data = mtcars[,5],
palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw())
Error in factoextra::fviz_cluster(res.km, data = mtcars[, 5], palette = c("#2E9FDF", : The dimension of the data < 2! No plot.
Passing two columns in data
:
factoextra::fviz_cluster(res.km, data = mtcars[,5:6],
palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw())