I am in the process of estimating the fixed effect of panel data using the Python statsmodel package.
First, the data used in the analysis include X and Y observed over time with several companies. Below are some examples from the actual data, but originally, there is a Balanced Panel of about 5,000 companies' one-year data.
| date | firm | X1 | X2 | X3 | Y |
|:---------- |:----:|:--:|:--:|:--:|--:|
| 2021-01-01 | A | 1 | 4 | 1 | 10|
| 2021-01-02 | A | 2 | 7 | 0 | 21|
| 2021-01-03 | A | 4 | 3 | 1 | 12|
| 2021-01-01 | B | 2 | 1 | 0 | 4 |
| 2021-01-02 | B | 3 | 7 | 1 | 9 |
| 2021-01-03 | B | 7 | 1 | 1 | 4 |
When analyzing the fixed effect model that controlled the effect of the company with the code below, the results were well derived without any problems.
mod = PanelOLS.from_formula('Y ~ X1 + X2 + X3 + EntityEffects',
data=df.set_index(['firm', 'date']))
result = mod.fit(cov_type='clustered', cluster_entity=True)
result.summary
[out put]
However, the problem is that the effect of the intercept term is not printed on the result value, so I want to find a way to solve this problem.
Is there an option to force the intercept term to be output?
It is not very clear from the git but it looks like it is stored under result.estimated_effects
. You should also mention it is from linearmodels
, not statsmodels
.
from linearmodels import PanelOLS
import pandas as pd
df = pd.DataFrame({'date':['2021-01-01','2021-01-02','2021-01-03',
'2021-01-01','2021-01-02','2021-01-03'],
'firm':['A','A','A','B','B','B'],
'X1':[1,2,4,2,3,7],'X2':[4,7,3,1,7,1],
'X3':[1,0,1,0,1,1],'Y':[10,21,12,4,9,4]})
df['date'] = pd.to_datetime(df['date'])
mod = PanelOLS.from_formula('Y ~ X1 + X2 + X3 + EntityEffects',
data=df.set_index(['firm', 'date']))
result = mod.fit(cov_type='clustered', cluster_entity=True)
result.estimated_effects
estimated_effects
firm date
A 2021-01-01 8.179545
2021-01-02 8.179545
2021-01-03 8.179545
B 2021-01-01 0.258438
2021-01-02 0.258438
2021-01-03 0.258438