python statistics regression poisson polynomial-approximations

Polynomial Regression in Poisson Regression

I want to create a model which takes values from a polynomial regression and creates a Poisson Regression based upon the predicted values (by polynomial).

I only got the R code which would be something like this

glm(y ~ poly(x, 6), family = Poisson, data = data_set)

My approach so far is first calculate the Polynomial predictions for the discrete data points first and based upon this run a Poisson Regression. However, my results are a bit off.

import pandas as pd
from statsmodels.formula.api import glm
polynomial_model = np.poly1d(np.polyfit(x=bin_mids, y=count_per_bin, deg = 6))
model_values = [polynomial_model(i) for i in bin_mids]
df_1["model_values"] = model_values
poisson_model = glm("df_1['count_per_bin'] ~ df_1['model_values']" , data = df_1 ,family = sm.families.Poisson()).fit()

If anyone of you see my mistake I'd love to understand where I went wrong.

Cheers

Solution

Not very sure what you are doing with binning and rest of the code. What you are doing in R is a regression with orthogonal polynomials. You can see this for more info about why it's used.

If you are using statsmodels, you just need to provide the input matrix with transformed values of x, x^2 , x^3 then perform a QR decomposition, as also touch upon in this post. you can use sklearn to get the matrix:

import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})

xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]

model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()

                     Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            GLM   Df Residuals:                       43
Model Family:                 Poisson   Df Model:                            6
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -100.45
Date:                Thu, 04 Mar 2021   Deviance:                       33.846
Time:                        22:32:57   Pearson chi2:                     31.8
No. Iterations:                     4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.5458      0.066     23.495      0.000       1.417       1.675
x1            -0.2059      0.439     -0.469      0.639      -1.067       0.655
x2             0.8169      0.466      1.754      0.079      -0.096       1.730
x3             0.1178      0.442      0.267      0.790      -0.748       0.984
x4            -0.5503      0.454     -1.212      0.226      -1.440       0.340
x5             0.0035      0.457      0.008      0.994      -0.892       0.899
x6            -0.6878      0.455     -1.512      0.131      -1.579       0.204
==============================================================================

We write the data to csv

df.to_csv("data.csv")

and fit using R, you can see we get the same coefficients:

df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))

Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.42648  -0.73970  -0.04625   0.61351   1.81234  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.545808   0.065794  23.495   <2e-16 ***
poly(x, 6)1 -0.205940   0.439457  -0.469   0.6393    
poly(x, 6)2  0.816910   0.465770   1.754   0.0794 .  
poly(x, 6)3  0.117815   0.441847   0.267   0.7897    
poly(x, 6)4 -0.550271   0.454099  -1.212   0.2256    
poly(x, 6)5 -0.003508   0.456691  -0.008   0.9939    
poly(x, 6)6  0.687751   0.454953   1.512   0.1306    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 40.443  on 49  degrees of freedom
Residual deviance: 33.846  on 43  degrees of freedom
AIC: 214.91