rcluster-analysisk-means

K means clustering function in R


I am trying to write my own k- means clustering function to be applied on a matrix of (n by p matrix). The function should be able to take four inputs:

  1. Datapoints: the n by p matrix containing all data points,
  2. ncluster: K, the number of clusters,
  3. initialClusters: a vector of length n (i.e. n is the number of zones.Zonal can be thought of as some weighted average). whose element i corresponds to the cluster initially assigned to observation i.
  4. maxiter: the maximum number of iterations before stopping the algorithm.

The expected output: a list of length 2 whose first element is a K by p matrix containing the final cluster centroids obtained from applying the K means algorithm and whose second element is a vector of length n listing the cluster assigned to each observation.

I have tried the following code but not working:

set.seed(345) 
KmeansClustering<-function(Datapoints, ncluster, initialClusters,maxiter) {  
   Datapoints<-LMPmatrix_t 
   ncluster<-2 
   initialClusters<-mean(LMPmatrix) 
   initialClusters 

   maxiter<-100 
   KmeansOut<-kmeans(Datapoints, ncluster, initialClusters,maxiter)  
   return(KmeansOut) 
}

Solution

  • kmeans can only take number of clusters or centers, but not both. And in the function, you constantly assigned something from the environment, which defeats the purpose of the function. Try something like this:

    set.seed(345) 
    KmeansClustering<-function(Datapoints,ncluster=NULL,initialClusters=NULL,maxiter) {  
       if(!is.null(ncluster) & !is.null(initialClusters)){
          stop("only provide ncluster or initialCluster, not both")
       }
       if(!is.null(ncluster)){
       KmeansOut<-kmeans(Datapoints, ncluster,maxiter) 
       }else{
       KmeansOut<-kmeans(Datapoints,initialClusters,maxiter)
       }
       return(KmeansOut) 
    }
    
    set.seed(100)
    # use 3 observations as initial centers
    ini_centers = iris[sample(nrow(iris),3),-5]
    
    #works
    KmeansClustering(iris[,-5],ncluster=3,maxiter=10)
    #works
    KmeansClustering(iris[,-5],initialClusters=ini_centers,maxiter=10)
    #error
    KmeansClustering(iris[,-5],ncluster=3,initialClusters=ini_centers,maxiter=10)