I would like to control for a factor variable that contains over a hundred levels without outputting the results of that control to a summary table. Note, I am also interested in replicating the speed of Stata's command, rather than merely cosmetic changes to output.
In Stata I can use "absorb" like so:
use http://www.stata-press.com/data/r14/abdata.dta, clear
. xtreg n w k i.year, fe
Fixed-effects (within) regression Number of obs = 1,031
Group variable: id Number of groups = 140
R-sq: Obs per group:
within = 0.6277 min = 7
between = 0.8473 avg = 7.4
overall = 0.8346 max = 9
F(10,881) = 148.56
corr(u_i, Xb) = 0.5666 Prob > F = 0.0000
------------------------------------------------------------------------------
n | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w | -.2731482 .0551503 -4.95 0.000 -.3813896 -.1649068
k | .5648036 .0212211 26.62 0.000 .5231537 .6064535
|
year |
1977 | -.0347963 .0188134 -1.85 0.065 -.0717206 .0021281
1978 | -.0583286 .0190916 -3.06 0.002 -.0957989 -.0208583
1979 | -.070047 .0190414 -3.68 0.000 -.1074187 -.0326752
1980 | -.0889378 .0189788 -4.69 0.000 -.1261867 -.0516889
1981 | -.1401502 .0186309 -7.52 0.000 -.1767163 -.1035841
1982 | -.1603768 .0188132 -8.52 0.000 -.1973008 -.1234528
1983 | -.1621103 .0222902 -7.27 0.000 -.2058585 -.1183621
1984 | -.1258136 .0282391 -4.46 0.000 -.1812373 -.0703899
|
_cons | 2.255419 .1772614 12.72 0.000 1.907515 2.603323
-------------+----------------------------------------------------------------
sigma_u | .64723143
sigma_e | .12836859
rho | .96215208 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(139, 881) = 126.32 Prob > F = 0.0000
Using absorb removes the fixed effects
. reghdfe n w k, absorb(id year)
(converged in 7 iterations)
HDFE Linear regression Number of obs = 1,031
Absorbing 2 HDFE groups F( 2, 881) = 362.67
Prob > F = 0.0000
R-squared = 0.9922
Adj R-squared = 0.9908
Within R-sq. = 0.4516
Root MSE = 0.1284
------------------------------------------------------------------------------
n | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w | -.2731482 .0551503 -4.95 0.000 -.3813896 -.1649068
k | .5648036 .0212211 26.62 0.000 .5231537 .6064535
-------------+----------------------------------------------------------------
Absorbed | F(147, 881) = 120.660 0.000 (Joint test)
------------------------------------------------------------------------------
Absorbed degrees of freedom:
---------------------------------------------------------------+
Absorbed FE | Num. Coefs. = Categories - Redundant |
-------------+-------------------------------------------------|
id | 140 140 0 |
year | 8 9 1 |
---------------------------------------------------------------+
The best alternative I could find is the lfe package, which implements models with high dimensional fixed effects or/and instrumental variables.
You can specify fixed effects after a vertical bar like so:
felm(n ~ w _ k | year, df)
The year coefficients will be absorbed in the final specification. The problem with this method is that it does now allow you to predict observations.
Edit: Update
The R library estimatr
has the function lm_robust
, which has a fixed_effects parameter that absorbs FE and works better than any library I've found online. Highly recommend.