pythoncluster-analysislogistic-regressionk-meansunsupervised-learning

Is there a way to validate the integrity of KMeans clusters using binary classification methods?


I wanted to know if there is an academic method to verify that KMeans clusters are valid and have been clustered properly using a binary classfication method? I had the idea to use logistic regression on each cluster to check if a logistic regression model could predict the clusters accurately. It somehow worked pretty good, I was able to show that clusters were clustered properly using accuracy and gini scores, but I couldn't find a paper on this subject. Any ideas would be appreciated.

Thank you very much


Solution

  • Not sure if this is what you expected, but maybe it is worth reading: The area under the ROC curve as a measure of clustering quality (https://link.springer.com/article/10.1007/s10618-022-00829-0)