I've seen that in many K-means implementations, like VLFeat k-means or OpenCV k-means, there are essentially 2 methods for choosing starting centroids:
However, I didn't get in which cases one is better of the other, especially because the starting method is considered important. Can you help me understanding this point?
In theory, k-means++ is much nicer. It is a biased random sampling that prefers points that are farther from each other, and avoids close points. Random initialization may be unlucky and choose nearby centers.
So in theory, k-means++ should require fewer iterations and have a higher chance of finding the global optimum.
However, k-means++ is not exactly "free", and in my experiments I did not see any systematic difference between the two with the more advanced k-means algorithms. k-means++ requires O(k n) computation - about the cost of one full iteration. But there are improved k-means algorithms that iterate in much less than that. With these algorithms, k-means++ may cost some 10-20% of the total runtime (with textbook k-means it's often less than 1%).
I guess I prefer the simple random initialization now, try multiple times, give each try 10 iterations to refine on a sample, and then continue with the best only.