Just trying to applying K-means clustering to some data to find the optimal K and illustrate the process graphically, but I'm having trouble. I think it might have something to do with the structure of my data, but I'm very new to all of this.
Here's my code:
nci <- read.csv('/Users/myname/Desktop/ML/nci.datanames.csv')
names(nci)[1] <- "gene"
kmeans(nci, 10)
> head(nci)
gene CNS CNS.1 CNS.2 RENAL BREAST CNS.3 CNS.4 BREAST.1 NSCLC NSCLC.1
1 g1 0.300 0.679961 0.940 2.80e-01 0.485 0.310 -0.830 -0.190 0.460 0.760
2 g2 1.180 1.289961 -0.040 -3.10e-01 -0.465 -0.030 0.000 -0.870 0.000 1.490
3 g3 0.550 0.169961 -0.170 6.80e-01 0.395 -0.100 0.130 -0.450 1.150 0.280
4 g4 1.140 0.379961 -0.040 -8.10e-01 0.905 -0.460 -1.630 0.080 -1.400 0.100
5 g5 -0.265 0.464961 -0.605 6.25e-01 0.200 -0.205 0.075 0.005 -0.005 -0.525
6 g6 -0.070 0.579961 0.000 -1.39e-17 -0.005 -0.540 -0.360 0.350 -0.700 0.360
Getting this error message:
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
How to resolve it?
To reproduce your error you can try
lala <- mtcars
lala$a <- LETTERS[1:nrow(mtcars)]
kmeans(lala, 3)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
Try removing the first column from the kmeans call.
kmeans(nci[,2:ncol(nci)], 10)