pythonapache-sparkpysparkrecommendation-enginemulticlass-classification

Dealing with multi-class problem. Can Random Forest Classifier handle >100,000 classes?


I need to create a Recommender System to be able to classify >100,000 unique classes.

Can anyone tell me if Random Forest Classifier can handle this problem?

As far as I understood thru numerous articles on this topic, people keep saying that maximum classes they were able to classify with RFC was 100-200.

Is there a way to bypass this issue with RFC and how it will affect the accuracy?

If not, what ML algo would you suggest me to follow?

Thank you in advance!


Solution

  • Beyond the problem mentioned, it is not a good idea to have a single model that classifies 100k classes. It's like having a translator who knows all the languages. It is preferable to have as many translators as language pairs. Is it the same for you. A first model that classifies large groups

    Assumes the tree of life and a model capable of classifying all living species.

    enter image description here

    Do you think it makes sense to create this kind of model? Perhaps it is better to have a model which classifies by major branches, then sub-models specialized in the classification of minor branches and finally models which define the final species (the leaves of the tree).

    The development work will probably take longer but the results will be better. You are not going to ask an ornithologist to classify the species of a fish but rather an ichthyologist :-)

    As you can see, you can use several random forest classifiers but specialized in one part of the job. I hope my explanations have been clear even though my answer does not provide usable code.