[SOLVED] How to deal with NA in a panel data regression?

How to deal with NA in a panel data regression?

I am trying to predict fitted values over data containing NAs, and based on a model generated by plm. Here's some sample code:

require(plm)
test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1), 
   y=c(1,3,5,10,8), x=c(1, NA, 3,4,5))
model <- plm(y ~ x, data=test.data, index=c("id", "time"), 
       model="pooling", na.action=na.exclude)
yhat <- predict(model, test.data, na.action=na.pass)
test.data$yhat <- yhat

When I run the last line I get an error stating that the replacement has 4 rows while data has 5 rows.

I have no idea how to get predict return a vector of length 5...

If instead of running a plm I run an lm (as in the line below) I get the expected result.

model <- lm(y ~ x, data=test.data, na.action=na.exclude)

Solution

As of version 2.6.2 of plm (2022-08-16), this should work out of the box: Predict out of sample on fixed effects model (from the NEWS file:

prediction implemented for fixed effects models incl. support for argument newdata and out-of-sample prediction. Help page (?predict.plm) added to specifically explain the prediction for fixed effects models and the out-of-sample case.

I think this is something that predict.plm ought to handle for you -- seems like an oversight on the package authors' part -- but you can use ?napredict to implement it for yourself:

 pp <- predict(model, test.data)
 na.stuff <- attr(model$model,"na.action")
 (yhat <- napredict(na.stuff,pp))
 ## [1] 1.371429       NA 5.485714 7.542857 9.600000