I am trying to write my own k- means clustering function to be applied on a matrix of (n by p matrix). The function should be able to take four inputs:
The expected output: a list of length 2 whose first element is a K by p matrix containing the final cluster centroids obtained from applying the K means algorithm and whose second element is a vector of length n listing the cluster assigned to each observation.
I have tried the following code but not working:
set.seed(345)
KmeansClustering<-function(Datapoints, ncluster, initialClusters,maxiter) {
Datapoints<-LMPmatrix_t
ncluster<-2
initialClusters<-mean(LMPmatrix)
initialClusters
maxiter<-100
KmeansOut<-kmeans(Datapoints, ncluster, initialClusters,maxiter)
return(KmeansOut)
}
kmeans can only take number of clusters or centers, but not both. And in the function, you constantly assigned something from the environment, which defeats the purpose of the function. Try something like this:
set.seed(345)
KmeansClustering<-function(Datapoints,ncluster=NULL,initialClusters=NULL,maxiter) {
if(!is.null(ncluster) & !is.null(initialClusters)){
stop("only provide ncluster or initialCluster, not both")
}
if(!is.null(ncluster)){
KmeansOut<-kmeans(Datapoints, ncluster,maxiter)
}else{
KmeansOut<-kmeans(Datapoints,initialClusters,maxiter)
}
return(KmeansOut)
}
set.seed(100)
# use 3 observations as initial centers
ini_centers = iris[sample(nrow(iris),3),-5]
#works
KmeansClustering(iris[,-5],ncluster=3,maxiter=10)
#works
KmeansClustering(iris[,-5],initialClusters=ini_centers,maxiter=10)
#error
KmeansClustering(iris[,-5],ncluster=3,initialClusters=ini_centers,maxiter=10)