[SOLVED] How does CatBoost perform multiclass classification?

How does CatBoost perform multiclass classification?

I am trying to figure out how CatBoost performs multiclass classification with MultiClass loss function. As I understand it, for each prediction MultiClass requires M values for each of M classes. My questions are:

How are those M values are obtained?
How are those M values are transferred to predicted probabilities?

My current hypothesis is that CatBoost builds separate binary classifier for each of M classes and then uses softmax function to get the predicted probabilities.

If this is the case, is every sequence of trees for individual classifiers the same or completely different?

Solution

For some other common GBMs, I've seen that they work as your hypothesis, building the one-vs-rest classifiers (completely different in general) and then at the end applying softmax to recover final predictions.

But apparently CatBoost builds one set of multi-output trees:
https://github.com/catboost/catboost/issues/1806