rk-means

trouble applying K-means clustering in R


Just trying to applying K-means clustering to some data to find the optimal K and illustrate the process graphically, but I'm having trouble. I think it might have something to do with the structure of my data, but I'm very new to all of this.

Here's my code:

nci <- read.csv('/Users/myname/Desktop/ML/nci.datanames.csv')

names(nci)[1] <- "gene"
  
kmeans(nci, 10)

> head(nci)
  gene    CNS    CNS.1  CNS.2     RENAL BREAST  CNS.3  CNS.4 BREAST.1  NSCLC NSCLC.1
1   g1  0.300 0.679961  0.940  2.80e-01  0.485  0.310 -0.830   -0.190  0.460   0.760
2   g2  1.180 1.289961 -0.040 -3.10e-01 -0.465 -0.030  0.000   -0.870  0.000   1.490
3   g3  0.550 0.169961 -0.170  6.80e-01  0.395 -0.100  0.130   -0.450  1.150   0.280
4   g4  1.140 0.379961 -0.040 -8.10e-01  0.905 -0.460 -1.630    0.080 -1.400   0.100
5   g5 -0.265 0.464961 -0.605  6.25e-01  0.200 -0.205  0.075    0.005 -0.005  -0.525
6   g6 -0.070 0.579961  0.000 -1.39e-17 -0.005 -0.540 -0.360    0.350 -0.700   0.360

Getting this error message:

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion

How to resolve it?


Solution

  • To reproduce your error you can try

    lala <- mtcars
    lala$a <- LETTERS[1:nrow(mtcars)]
    kmeans(lala, 3)
    Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In storage.mode(x) <- "double" : NAs introduced by coercion
    

    Try removing the first column from the kmeans call.

    kmeans(nci[,2:ncol(nci)], 10)