I have trained a glm
as follows:
fitGlm = smf.glm( listOfInModelFeatures,
family=sm.families.Binomial(),data=train, freq_weights = train['sampleWeight']).fit()
The results looks good:
print(fitGlm.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Target No. Observations: 1065046
Model: GLM Df Residuals: 4361437.81
Model Family: Binomial Df Model: 7
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -6.0368e+05
Date: Sun, 25 Aug 2024 Deviance: 1.2074e+06
Time: 09:03:54 Pearson chi2: 4.12e+06
No. Iterations: 8 Pseudo R-squ. (CS): 0.1716
Covariance Type: nonrobust
===========================================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept 3.2530 0.003 1074.036 0.000 3.247 3.259
feat1 0.6477 0.004 176.500 0.000 0.641 0.655
feat2 0.3939 0.006 71.224 0.000 0.383 0.405
feat3 0.1990 0.007 28.294 0.000 0.185 0.213
feat4 0.4932 0.009 54.614 0.000 0.476 0.511
feat5 0.4477 0.005 90.323 0.000 0.438 0.457
feat6 0.3031 0.005 57.572 0.000 0.293 0.313
feat7 0.3711 0.004 87.419 0.000 0.363 0.379
===========================================================================================
I have then tried to export the summary()
into .png
as suggested here:
Python: How to save statsmodels results as image file?
So, I have written this code:
fig, ax = plt.subplots(figsize=(16, 8))
summary = []
fitGlm.summary(print_fn=lambda x: summary.append(x))
summary = '\n'.join(summary)
ax.text(0.01, 0.05, summary, fontfamily='monospace', fontsize=12)
ax.axis('off')
plt.tight_layout()
plt.savefig('output.png', dpi=300, bbox_inches='tight')
But I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[57], line 57
55 fig, ax = plt.subplots(figsize=(16, 8))
56 summary = []
---> 57 fitGlm.summary(print_fn=lambda x: summary.append(x))
58 summary = '\n'.join(summary)
59 ax.text(0.01, 0.05, summary, fontfamily='monospace', fontsize=12)
TypeError: GLMResults.summary() got an unexpected keyword argument 'print_fn'
Looks like print_fn
is not recognized by statsmodels?
Can someone help me, please?
I have set up a test to see where the print_fn can be used. I also checked the solution posted by the last question, but I have not been able to find print_fn in the documentation.
I have attempted to convert to tabulate in order to save the summary to png:
import matplotlib.pyplot as plt
import pandas as pd
# Convert the summary table to a pandas DataFrame
# change tables [0] to [1] to get the second table
summary_df = pd.read_html(model.summary().tables[0].as_html(), header=0, index_col=0)[0]
# Get the headers
headers = summary_df.columns.tolist()
# Convert the DataFrame to a list of lists and add the headers
summary_list = [headers] + summary_df.values.tolist()
# Create a new figure
fig, ax = plt.subplots()
# Remove the axes
ax.axis('off')
# Add a table to the figure
table = plt.table(cellText=summary_list, loc='center')
# Auto scale the table
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 1.5)
# Save the figure as a PNG file
plt.savefig('summary2.png', dpi=300, bbox_inches='tight')
In my opinion, it is a very unusual case to save data to png. It prevents users from sharing information. There are options such to export the summary to csv and latex. If you are doing this manually I would suggest exporting to csv and copy paste as image. Or save as txt and screenshot even.
for reference:
model.summary().as_csv()
# save as csv
with open('summary.csv', 'w') as file:
file.write(model.summary().as_csv())
or
text = model.summary().as_text()
# save to txt
with open('summary.txt', 'w') as file:
file.write(text)