pythonregressionstatastatsmodelslinearmodels

asreg (Stata) and linearmodels.panel.model.FamaMacbeth (Python) give different results


I have a .dta file of loan data over the course of 10 years, and want to run a Fama-Macbeth regression on the data to estimate risk premiums on loan returns.

For a quick overview of what Fama-Macbeth regression is, here's an excerpt from an older stackoverflow post

Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced. The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T regressions are run. Then we have a time series of coefficients for each independent variable. Then we can perform hypothesis test using the time series of coefficients. Usually we take the average as the final coefficients of each independent variable. And we use t-stats to test significance.

This process can be done in Stata using the asreg command. Running this on the data after declaring it as a panel gives us:

. sort FacilityID yyyymm

. xtset FacilityID yyyymm

Panel variable: FacilityID (unbalanced)
 Time variable: yyyymm, 199908 to 200911, but with gaps
         Delta: 1 unit


.  asreg ExcessRet1 Mom STM, fmb newey(3)

Fama-MacBeth Two-Step procedure (Newey SE)       Number of obs     =     58608
(Newey-West adj. Std. Err. using lags(3))        Num. time periods =       124
                                                 F(  2,   121)     =      4.69
                                                 Prob > F          =    0.0109
                                                 avg. R-squared    =    0.1284
                                                 Adj. R-squared    =    0.1245
------------------------------------------------------------------------------
             |              Newey-FMB
  ExcessRet1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         Mom |    5.44422   1.793302     3.04   0.003     1.893906    8.994534
         STM |   .8705018   2.164802     0.40   0.688    -3.415295    5.156298
        cons |  -.0756198   .1027633    -0.74   0.463    -.2790669    .1278273
------------------------------------------------------------------------------

However running the same process in python using the FamaMacbeth class from the linearmodels package (Documentation here) gives verey different results.

The dataframe was imported into python using pd.read_stata(). After declaring the data as a panel, running the regression gives very different results:

import linearmodels.panel.model.FamaMacbeth as lm_fm

factors = ["Mom", "STM"]
df = df.set_index(["FacilityID", "yyyymm"])
formula = "ExcessRet1 ~ 1 + " + " + ".join(factors)

reg = lm_fm.from_formula(formula, data=tempdf)
res = reg.fit(cov_type='kernel', kernel="newey-west")
print(res)

prints

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -0.2338     0.1223    -1.9107     0.0561     -0.4736      0.0060
Mom            5.8947     2.0287     2.9057     0.0037      1.9184      9.8709
STM            3.9649     3.1279     1.2676     0.2050     -2.1659      10.096
==============================================================================

There is a significant difference between the results of the 2 regressions, and other solutions (such as the one posted in the older stackoverflow post mentioned earlier, and an implementation of the regression I found on GitHub all give the same results as the linearmodels attempt.

What is causing this difference in the results? Is it a change in the procedure? How would I have to change my code to get the results from the Stata implementation?


Solution

  • You might be doing something wrong, which I cannot replicate at my end. I am going to create some simulated data and post results from my asreg program and fama_macbeth from finance_byu library.

    Results from the finance_byu library

    n_firms = 1.0e2
    n_periods = 1.0e2
    def firm(fid):
    >>>     f = np.random.random((int(n_periods),4))
    >>>     f = pd.DataFrame(f)
    >>>     f['period'] = f.index
    >>>     f['firmid'] = fid
    >>>     return f
    >>> df = [firm(i) for i in range(int(n_firms))]
    >>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
    >>> df.head()
    
             ret       exmkt         smb       hml  period  firmid
    0   0.607847    0.264077    0.158241    0.025651    0       0
    1   0.140113    0.215597    0.262877    0.953297    1       0
    2   0.504742    0.531757    0.812430    0.937104    2       0
    3   0.709870    0.299985    0.080907    0.624482    3       0
    4   0.682049    0.455993    0.230743    0.368847    4       0
    
    result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
    fm_summary(result)
    
    
                  mean      std_error     tstat
    intercept   0.483657    0.009682    49.956515
    exmkt       0.017926    0.009364     1.914239
    smb        -0.001474    0.010007    -0.147283
    hml         0.001873    0.010330     0.181276
    

    Results from asreg

    /* Save the dataframe as a Stata data file */
    df.to_stata("example.dta")
    
    use "example.dta" 
    tsset firmid period
    
    asreg ret exmkt smb hml, fmb
    Fama-MacBeth (1973) Two-Step procedure           Number of obs     =     10000
                                                     Num. time periods =       100
                                                     F(  3,    96)     =      1.23
                                                     Prob > F          =    0.3016
                                                     avg. R-squared    =    0.0293
                                                     Adj. R-squared    =   -0.0010
    ------------------------------------------------------------------------------
                 |            Fama-MacBeth
             ret | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           exmkt |   .0179255   .0093643     1.91   0.059    -.0006625    .0365136
             smb |  -.0014739    .010007    -0.15   0.883    -.0213375    .0183898
             hml |   .0018726   .0103299     0.18   0.857    -.0186322    .0223773
            cons |    .483657   .0096816    49.96   0.000     .4644393    .5028748
    ------------------------------------------------------------------------------
    

    Both the regression coefficients and errors are identical in the output of asreg and finance_byu lobrary.

    Here is the official page where more examples and uses of asreg can be found.