pythonpandasscikit-learnlogistic-regression

How to get coefficients of multinomial logistic regression?


I need to calculate coefficients of a multiple logistic regression using sklearn:

X =

x1          x2          x3   x4         x5    x6
0.300000    0.100000    0.0  0.0000     0.5   0.0
0.000000    0.006000    0.0  0.0000     0.2   0.0
0.010000    0.678000    0.0  0.0000     2.0   0.0
0.000000    0.333000    1.0  12.3966    0.1   4.0
0.200000    0.005000    1.0  0.4050     1.0   0.0
0.000000    0.340000    1.0  15.7025    0.5   0.0
0.000000    0.440000    1.0  8.2645     0.0   4.0
0.500000    0.055000    1.0  18.1818    0.0   4.0

The values of y are categorical in range [1; 4].

y =

1
2
1
3
4
1
2
3

This is what I do:

import pandas as pd
from sklearn import linear_modelion
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

h = .02

logreg = linear_model.LogisticRegression(C=1e5)

logreg.fit(X, y)

# print the coefficients
print(logreg.intercept_)
print(logreg.coef_)

However, I get 6 columns in the output of logreg.intercept_ and 6 columns in the output of logreg.coef_ How can I get 1 coefficient per feature, e.g. a - f values?

y = a*x1 + b*x2 + c*x3 + d*x4 + e*x5 + f*x6

Also, probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.


Solution

  • Check the online documentation:

    coef_ : array, shape (1, n_features) or (n_classes, n_features)

    Coefficient of the features in the decision function.

    coef_ is of shape (1, n_features) when the given problem is binary.

    As @Xochipilli has already mentioned in comments you are going to have (n_classes, n_features) or in your case (4,6) coefficients and 4 intercepts (one for each class)

    Probably I am doing something wrong, because y_pred = logreg.predict(X) gives me the value of 1 for all rows.

    yes, you shouldn't try to use data that you've used for training your model for prediction. Split your data into training and test data sets, train your model using train data set and check it's accuracy using test data set.