c++opencvk-meansflann

K-means Cluster Centers Are the Same Every Run Although It Is Randomly Initialized


I have an application where I need to implement K-means myself. I was trying my implementation in a small program to see how things are going; however, I find it really confusing that I get the exact same cluster centers every time I run the program. I am surprised because my cluster centers are randomly initialized, and I am using FLANN to update the clusters (both should make the output nondeterministic, right?). I'm probably missing something, and would appreciate some help.

cv::Mat kMeans(cv::Mat data, int k, int max_itr){
    cv::Mat centroids = cv::Mat(cv::Size(data.cols, k), data.type());
    std::vector<int> cluster_ids(data.rows, 0);

    //randomly initialize centroids
    for(int i = 0; i < k; i++)
    {
        int idx = rand() % data.rows;
        data.row(idx).copyTo(centroids.row(i));
    }

    int itr = 0;
    cv::flann::Index kdtree(centroids, cv::flann::KDTreeIndexParams(4));
    cv::Mat indices, dists;
    
    //TODO: add tolerance stopping condition
    while (itr < max_itr)
    {
        //assign data points to centroids
        for(int i = 0; i < data.rows; i++)
        {
            kdtree.knnSearch(data.row(i), indices, dists, 1, cv::flann::SearchParams(32));
            cluster_ids[i] = indices.data[0];
        } 
        //update centroids
        for(int i = 0; i < centroids.rows; i++)
        {
            cv::Mat sum = cv::Mat::zeros(cv::Size(data.cols, 1), data.type()); 
            int count = 0;
            for(int j = 0; j < data.rows; j++)
            {
                if(cluster_ids[j] != i) {continue;}
                sum += data.row(j);
                count++;
            }
            centroids.row(i) = sum / float(count);
        }
       itr += 1;
    }
return centroids;
}

int main()
{
    cv::Mat_<float> data;
    cv::Mat_<float> centroids;
    cv::Mat dst;
    //initialize with dummy data
    for(int i = 0; i < 1000; i++) 
    { 
        cv::Mat row = (cv::Mat_<float>(1, 3) << float(i), float(i), float(i));
        data.push_back(row);
    }
    centroids = kMeans(data, 20, 20);
    std::cout << centroids << std::endl;
 
    return 0;
}

Solution

  • You need this line:

    std::srand(std::time(nullptr)); // use current time as seed for random generator
    

    at the start of your program to get different sequences from the rand function every time you run the program.

    You can use some other seed if you want, but you still need to call srand with some source of randomness at the start of the program.