pythonscikit-learnoutputlinear-regression

How to get scikit-learn to ensure that all prediction outputs should sum to 100%?


I have a 'MultiOutputRegressor' which is based on a 'LinearRegression' regressor. I am using it to predict three outputs per row of X_data (like a classifier) which represent the percentage likelihood of three outcomes.

The regressor is fitted against y_data where the three labels sum correctly to 100%.

Obviously the regressor doesn't really know that it's three prediction outputs should sum, it just knows roughly what values they should be.

Is there a way that I can tell the regressor explicitly that one of the rules is that all three prediction outputs should together sum to 100%?


Solution

  • Shortly, no. Regressors cannot know this. Your problem is a multi-class classification problem. You need to use classifier for your problem. Classifier model predicts probabilites of three labels. And sum of them will be 1 (100%).

    https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html