I am learning about Ordered logit regression and I was wondering how the prediction works mathematically and how can I do it in python by myself. I know that in python i can just simply use predict but I was wondering on how can I make a prediction with only coefs from model.summary().
import pandas as pd
from statsmodels.miscmodels.ordinal_model import OrderedModel
data = pd.DataFrame({
'score': [3.2, 4.5, 5.6, 6.7, 7.8, 8.9, 9.1],
'rating': [1,2,3,4,5,6,6]
})
X = data[['score']]
y = data['rating']
ordinal_model = OrderedModel(y, X, distr='logit')
ordinal_results = ordinal_model.fit(method='bfgs')
print(ordinal_results.summary())
The outcome is:
Time: 17:05:52
No. Observations: 7
Df Residuals: 1
Df Model: 1
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
score 66.3902 5669.125 0.012 0.991 -1.1e+04 1.12e+04
1/2 285.5835 2.56e+04 0.011 0.991 -4.98e+04 5.04e+04
2/3 4.2698 88.656 0.048 0.962 -169.493 178.032
3/4 4.1879 155.834 0.027 0.979 -301.241 309.617
4/5 4.3867 136.765 0.032 0.974 -263.668 272.442
5/6 3.4706 220.734 0.016 0.987 -429.161 436.102
==============================================================================
Using coef vector how do i get the same output as in
ordinal_results.model.predict(ordinal_results.params, exog = (4.3))
[[0.5264086 0.4735914 0. 0. 0. 0. ]]
I thought that I simply should use softmax on linear sum of coef and new data but that didn't work
You suggested the "linear sum of coef and new data", which is correct, but since you only have one feature it's just the coefficient times the new data value:
66.3902 * 4.3
>>> 285.47786
But the other entries of the coef
column aren't really coefficients in the traditional sense (and there is no softmax); instead they represent cutoffs for the discrete targets.
The prediction from the linear model (285.48 above) is taken as the mean of a normal distribution y
with standard deviation of 1 (by default, see parameter distr
), and the probability of each target is the probability that y
is between the associated cutoffs.
It's not documented so well what those cutoffs are, but I assume that the first non-coefficient coef
is the first cutoff, and the rest indicate the difference between consecutive cutoffs. So
p_1 = P(y < 285.5835) ~= 0.5264086
p_2 = P(285.5835 < y < 285.5835 + 4.2698) ~= 0.4735914
p_3 = P( 285.5835 + 4.2698 < y < 285.5835 + 4.2698 + 4.1879) ~= 0
etc.