[SOLVED] scikit-learn clustering: predict(X) vs. fit

scikit-learn clustering: predict(X) vs. fit_predict(X)

In scikit-learn, some clustering algorithms have both predict(X) and fit_predict(X) methods, like KMeans and MeanShift, while others only have the latter, like SpectralClustering. According to the doc:

fit_predict(X[, y]):    Performs clustering on X and returns cluster labels.
predict(X): Predict the closest cluster each sample in X belongs to.

I don't really understand the difference between the two, they seem equivalent to me.

Solution

In order to use the 'predict' you must use the 'fit' method first. So using 'fit()' and then 'predict()' is definitely the same as using 'fit_predict()'. However, one could benefit from using only 'fit()' in such cases where you need to know the initialization parameters of your models rather than if you use 'fit_predict()', where you will just be obtained the labeling results of running your model on the data.