machine-learningclassificationsupervised-learningunsupervised-learning

clustering VS supervised classification, in the case of very small database


I'm trying to classify/cluster subjects according to 4 features in two classes: healthy and sick.

Two things to know: I know the labels/classes of each subject + I only have 40 subjects (in total: training + testing set!)

What should I choose in this case, clustering or classification?


Solution

  • Clustering vs classification is not the choice of method but choice of problem. What is the problem at hand? You have labeled data and want to get a model that can label more - this is by definition classification. In terms of what specific method of classification to use it is a whole new, research-driven, question, rather than a simple programming issue. In particular many classifiers will try to fit some sort of generative model to the data (and thus learn about the structure even without labels), but in the end - labels are there, and should be used.*