Issue with predict.plm() when using lag() inside the regression

I've recently encountered a problem with predict() function for plm regression that uses AR(1) term in form of lag().

Here is an toy example: 3 countries, 3 periods, current population as a function a lagged population, the models are estimated with pooled OLS.

library(plm)

data <- data.frame(country = rep(c("A", "B", "C"), each = 3),
                  year = rep(c(1, 2, 3), 3),
                  population = c(100, 150, 200, 1000, 1200, 1250, 10, 20, 45)) |>
        pdata.frame(index = c('country', 'year'))

data$lag_population <- plm::lag(data$population, k = 1)

lm_model <- lm(population ~ lag_population, data)
plm_model <- plm(population ~ plm::lag(population, k = 1), data, model = 'pooling')
plm_model2 <- plm(population ~ lag_population, data, model = 'pooling')

All three models are identical:

> print(lm_model)

Call:
lm(formula = population ~ lag_population, data = data)

Coefficients:
   (Intercept)  lag_population  
        31.633           1.079  

+ print(plm_model)

Model Formula: population ~ plm::lag(population, k = 1)

Coefficients:
                (Intercept) plm::lag(population, k = 1) 
                    31.6332                      1.0787 

+ print(plm_model2)

Model Formula: population ~ lag_population

Coefficients:
   (Intercept) lag_population 
       31.6332         1.0787

Yet, their forecasts for the same data are different:

lm() based forecast is exactly what is expected, with NA in first periods for each cross-section (b/c first periods don't have lagged observations)
plm() forecast that uses generated variable lag_population instead of the lag() function, is similar to that from lm(), yet, NAs are dropped
plm() forecast that uses lag() function is the most problematic: values are shifted 1 period back, plus some values are generated for last periods (the model is pooled, without fixed effects, so how are the forecasts generated for 1st periods, especially that fill.na = FALSE is used?)

predict_plm <- predict(plm_model, newdata = data, na.fill = FALSE)
predict_plm2 <- predict(plm_model2, newdata = data, na.fill = FALSE)
predict_lm <- predict(lm_model, newdata = data)

> print(predict_lm) 
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
        NA  139.50424  193.43973         NA 1110.34313 1326.08511         NA   42.42035   53.20745 
+ print(predict_plm2)
       A-2        A-3        B-2        B-3        C-2        C-3 
 139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745 
+ print(predict_plm)
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
 139.50424  193.43973  247.37522 1110.34313 1326.08511 1380.02060   42.42035   53.20745   80.17519

Has anyone encountered a similar issue? How can I make the plm predictions look the same as those based on lm()?

Solution

EDIT: I see, this is fixed in the newest plm CRAN release 2.6-6

This is fixed already in the development version of plm, see https://github.com/ycroissant/plm/

Install the development version by, e.g.:

# install.packages("remotes") # remove '#' if pkg 'remotes' is not installed
remotes::install_github("ycroissant/plm")

For your example, it yields:

predict(plm_model, newdata = data)
##       A-2        A-3        B-2        B-3        C-2        C-3 
##  139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745

Regards the case with NAs: currently, NA-padding (via na.action) is not supported. So, plm's predict corresponds to lm's predict with these arguments:

predict(lm_model, newdata = data, na.action = na.omit)
##       A-2        A-3        B-2        B-3        C-2        C-3 
## 139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745