I want to create a model which takes values from a polynomial regression and creates a Poisson Regression based upon the predicted values (by polynomial).
I only got the R code which would be something like this
glm(y ~ poly(x, 6), family = Poisson, data = data_set)
My approach so far is first calculate the Polynomial predictions for the discrete data points first and based upon this run a Poisson Regression. However, my results are a bit off.
import pandas as pd
from statsmodels.formula.api import glm
polynomial_model = np.poly1d(np.polyfit(x=bin_mids, y=count_per_bin, deg = 6))
model_values = [polynomial_model(i) for i in bin_mids]
df_1["model_values"] = model_values
poisson_model = glm("df_1['count_per_bin'] ~ df_1['model_values']" , data = df_1 ,family = sm.families.Poisson()).fit()
If anyone of you see my mistake I'd love to understand where I went wrong.
Cheers
Not very sure what you are doing with binning and rest of the code. What you are doing in R is a regression with orthogonal polynomials. You can see this for more info about why it's used.
If you are using statsmodels, you just need to provide the input matrix with transformed values of x, x^2 , x^3 then perform a QR decomposition, as also touch upon in this post. you can use sklearn to get the matrix:
import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})
xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]
model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: GLM Df Residuals: 43
Model Family: Poisson Df Model: 6
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -100.45
Date: Thu, 04 Mar 2021 Deviance: 33.846
Time: 22:32:57 Pearson chi2: 31.8
No. Iterations: 4
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.5458 0.066 23.495 0.000 1.417 1.675
x1 -0.2059 0.439 -0.469 0.639 -1.067 0.655
x2 0.8169 0.466 1.754 0.079 -0.096 1.730
x3 0.1178 0.442 0.267 0.790 -0.748 0.984
x4 -0.5503 0.454 -1.212 0.226 -1.440 0.340
x5 0.0035 0.457 0.008 0.994 -0.892 0.899
x6 -0.6878 0.455 -1.512 0.131 -1.579 0.204
==============================================================================
We write the data to csv
df.to_csv("data.csv")
and fit using R, you can see we get the same coefficients:
df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))
Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.42648 -0.73970 -0.04625 0.61351 1.81234
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.545808 0.065794 23.495 <2e-16 ***
poly(x, 6)1 -0.205940 0.439457 -0.469 0.6393
poly(x, 6)2 0.816910 0.465770 1.754 0.0794 .
poly(x, 6)3 0.117815 0.441847 0.267 0.7897
poly(x, 6)4 -0.550271 0.454099 -1.212 0.2256
poly(x, 6)5 -0.003508 0.456691 -0.008 0.9939
poly(x, 6)6 0.687751 0.454953 1.512 0.1306
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 40.443 on 49 degrees of freedom
Residual deviance: 33.846 on 43 degrees of freedom
AIC: 214.91