rpanel-dataplm

Two-way fixed effect estimation via `plm` fails with an error message (but works via `lm`)


I am trying to run a two-way fixed-effects panel regression using plm in R. First, I randomly generate some data. Then I create time and firm indices (two-way indexing as usual in a panel dataset) and the explanatory variable of interest (zp.dummy). Then I create a panel data frame. Then I try to fit a two-way fixed-effects panel regression via plm:

library(plm)
set.seed(0); z=rnorm(40)        # generate random data
ztime=rep(c(1:10),4)            # time index
zp.dummy=as.numeric(ztime>5)    # a dummy to distinguish first 5 from last 5 time periods
zfirm=rep(sequence(4), each=10) # firm index
zp.rete=pdata.frame(cbind(ztime,zfirm,zp.dummy,z),index=c("ztime","zfirm"))
                                # create panel data frame indexed by time and firm
colnames(zp.rete)[4]="zp.rete"  # rename a column in the panel data frame
zm1p=plm(zp.rete~zp.dummy, data=zp.rete, index=c("ztime","zfirm"), model="within", effect="twoways")               
                                # run the panel regression via `plm`

When running the last line, I get this error message:

> Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor,  : 
  empty model

Question: What am I doing wrong?

I think I can achieve the desired result via lm:

zftime=as.factor(ztime)         # turn time index into factor
zffirm=as.factor(zfirm)         # turn firm index into factor
zm1 = lm(zp.rete$zp.rete~-1+zp.dummy+zffirm+zftime) 
                                # two-way fixed effects regression via `lm`

How may I replicate the result from lm by plm?


Solution

  • Carefully look at the output of the model via lm: You will notice, a factor's level is non-estimable (is NA). That is because there is not enough information in the data.

    # NA coefficient:
    summary(zm1)
    model.matrix(zm1) ## looks suspicious
    plm::detect.lindep(model.matrix(zm1)) ## collinear columns
    

    Now, why does plm output an error? It transforms the data first (two-way within transformation) and then runs a plain linear regression on the transformations result, for the right-hand side called the model matrix. We can also look at the model matrix (the data after transformation) and will notice, we end up with a zero-only column. Obviously, a model with one zero-only column is not estimable and, thus, plm errors rightfully.

    library(plm)
    set.seed(0); z <- rnorm(40)        # generate random data
    ztime <- rep(c(1:10),4)            # time index
    zp.dummy <- as.numeric(ztime>5)    # a dummy to distinguish first 5 from last 5 time periods
    zfirm <- rep(sequence(4), each=10) # firm index
    zp.data <- pdata.frame(cbind(ztime, zfirm, zp.dummy, z),index=c("zfirm", "ztime"))
    # create panel data frame indexed by time and firm
    colnames(zp.data)[4] <- "zp.rete"  # rename a column in the panel data frame
    # create model frame
    mf <- model.frame(zp.data, zp.rete ~ zp.dummy)
    # create model matrix
    mm <- model.matrix(mf, model = "within", effect = "twoways")
    all(mm == 0) # TRUE