I have a .dta file of loan data over the course of 10 years, and want to run a Fama-Macbeth regression on the data to estimate risk premiums on loan returns.
For a quick overview of what Fama-Macbeth regression is, here's an excerpt from an older stackoverflow post
Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced. The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T regressions are run. Then we have a time series of coefficients for each independent variable. Then we can perform hypothesis test using the time series of coefficients. Usually we take the average as the final coefficients of each independent variable. And we use t-stats to test significance.
This process can be done in Stata using the asreg command. Running this on the data after declaring it as a panel gives us:
. sort FacilityID yyyymm
. xtset FacilityID yyyymm
Panel variable: FacilityID (unbalanced)
Time variable: yyyymm, 199908 to 200911, but with gaps
Delta: 1 unit
. asreg ExcessRet1 Mom STM, fmb newey(3)
Fama-MacBeth Two-Step procedure (Newey SE) Number of obs = 58608
(Newey-West adj. Std. Err. using lags(3)) Num. time periods = 124
F( 2, 121) = 4.69
Prob > F = 0.0109
avg. R-squared = 0.1284
Adj. R-squared = 0.1245
------------------------------------------------------------------------------
| Newey-FMB
ExcessRet1 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Mom | 5.44422 1.793302 3.04 0.003 1.893906 8.994534
STM | .8705018 2.164802 0.40 0.688 -3.415295 5.156298
cons | -.0756198 .1027633 -0.74 0.463 -.2790669 .1278273
------------------------------------------------------------------------------
However running the same process in python using the FamaMacbeth class from the linearmodels package (Documentation here) gives verey different results.
The dataframe was imported into python using pd.read_stata()
.
After declaring the data as a panel, running the regression gives very different results:
import linearmodels.panel.model.FamaMacbeth as lm_fm
factors = ["Mom", "STM"]
df = df.set_index(["FacilityID", "yyyymm"])
formula = "ExcessRet1 ~ 1 + " + " + ".join(factors)
reg = lm_fm.from_formula(formula, data=tempdf)
res = reg.fit(cov_type='kernel', kernel="newey-west")
print(res)
prints
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
Intercept -0.2338 0.1223 -1.9107 0.0561 -0.4736 0.0060
Mom 5.8947 2.0287 2.9057 0.0037 1.9184 9.8709
STM 3.9649 3.1279 1.2676 0.2050 -2.1659 10.096
==============================================================================
There is a significant difference between the results of the 2 regressions, and other solutions (such as the one posted in the older stackoverflow post mentioned earlier, and an implementation of the regression I found on GitHub all give the same results as the linearmodels attempt.
What is causing this difference in the results? Is it a change in the procedure? How would I have to change my code to get the results from the Stata implementation?
You might be doing something wrong, which I cannot replicate at my end. I am going to create some simulated data and post results from my asreg
program and fama_macbeth
from finance_byu
library.
n_firms = 1.0e2
n_periods = 1.0e2
def firm(fid):
>>> f = np.random.random((int(n_periods),4))
>>> f = pd.DataFrame(f)
>>> f['period'] = f.index
>>> f['firmid'] = fid
>>> return f
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> df.head()
ret exmkt smb hml period firmid
0 0.607847 0.264077 0.158241 0.025651 0 0
1 0.140113 0.215597 0.262877 0.953297 1 0
2 0.504742 0.531757 0.812430 0.937104 2 0
3 0.709870 0.299985 0.080907 0.624482 3 0
4 0.682049 0.455993 0.230743 0.368847 4 0
result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
fm_summary(result)
mean std_error tstat
intercept 0.483657 0.009682 49.956515
exmkt 0.017926 0.009364 1.914239
smb -0.001474 0.010007 -0.147283
hml 0.001873 0.010330 0.181276
/* Save the dataframe as a Stata data file */
df.to_stata("example.dta")
use "example.dta"
tsset firmid period
asreg ret exmkt smb hml, fmb
Fama-MacBeth (1973) Two-Step procedure Number of obs = 10000
Num. time periods = 100
F( 3, 96) = 1.23
Prob > F = 0.3016
avg. R-squared = 0.0293
Adj. R-squared = -0.0010
------------------------------------------------------------------------------
| Fama-MacBeth
ret | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
exmkt | .0179255 .0093643 1.91 0.059 -.0006625 .0365136
smb | -.0014739 .010007 -0.15 0.883 -.0213375 .0183898
hml | .0018726 .0103299 0.18 0.857 -.0186322 .0223773
cons | .483657 .0096816 49.96 0.000 .4644393 .5028748
------------------------------------------------------------------------------
Both the regression coefficients and errors are identical in the output of asreg
and finance_byu
lobrary.
Here is the official page where more examples and uses of asreg can be found.