python-3.xmachine-learningscikit-learn

scikit-learn clustering: predict(X) vs. fit_predict(X)


In scikit-learn, some clustering algorithms have both predict(X) and fit_predict(X) methods, like KMeans and MeanShift, while others only have the latter, like SpectralClustering. According to the doc:

fit_predict(X[, y]):    Performs clustering on X and returns cluster labels.
predict(X): Predict the closest cluster each sample in X belongs to.

I don't really understand the difference between the two, they seem equivalent to me.


Solution

  • In order to use the 'predict' you must use the 'fit' method first. So using 'fit()' and then 'predict()' is definitely the same as using 'fit_predict()'. However, one could benefit from using only 'fit()' in such cases where you need to know the initialization parameters of your models rather than if you use 'fit_predict()', where you will just be obtained the labeling results of running your model on the data.