rpredictionpanel-dataplm

Issue with predict.plm() when using lag() inside the regression


I've recently encountered a problem with predict() function for plm regression that uses AR(1) term in form of lag().

Here is an toy example: 3 countries, 3 periods, current population as a function a lagged population, the models are estimated with pooled OLS.

library(plm)

data <- data.frame(country = rep(c("A", "B", "C"), each = 3),
                  year = rep(c(1, 2, 3), 3),
                  population = c(100, 150, 200, 1000, 1200, 1250, 10, 20, 45)) |>
        pdata.frame(index = c('country', 'year'))

data$lag_population <- plm::lag(data$population, k = 1)

lm_model <- lm(population ~ lag_population, data)
plm_model <- plm(population ~ plm::lag(population, k = 1), data, model = 'pooling')
plm_model2 <- plm(population ~ lag_population, data, model = 'pooling')

All three models are identical:

> print(lm_model)

Call:
lm(formula = population ~ lag_population, data = data)

Coefficients:
   (Intercept)  lag_population  
        31.633           1.079  

+ print(plm_model)

Model Formula: population ~ plm::lag(population, k = 1)

Coefficients:
                (Intercept) plm::lag(population, k = 1) 
                    31.6332                      1.0787 

+ print(plm_model2)

Model Formula: population ~ lag_population

Coefficients:
   (Intercept) lag_population 
       31.6332         1.0787 


Yet, their forecasts for the same data are different:

predict_plm <- predict(plm_model, newdata = data, na.fill = FALSE)
predict_plm2 <- predict(plm_model2, newdata = data, na.fill = FALSE)
predict_lm <- predict(lm_model, newdata = data)

> print(predict_lm) 
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
        NA  139.50424  193.43973         NA 1110.34313 1326.08511         NA   42.42035   53.20745 
+ print(predict_plm2)
       A-2        A-3        B-2        B-3        C-2        C-3 
 139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745 
+ print(predict_plm)
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
 139.50424  193.43973  247.37522 1110.34313 1326.08511 1380.02060   42.42035   53.20745   80.17519 

Has anyone encountered a similar issue? How can I make the plm predictions look the same as those based on lm()?


Solution

  • This is fixed already in the development version of plm, see github.com/ycroissant/plmhttps://github.com/ycroissant/plm/

    Install the development version by, e.g.:

    # install.packages("remotes") # remove '#' if pkg 'remotes' is not installed
    remotes::install_github("ycroissant/plm")
    

    For your example, it yields:

    predict(plm_model, newdata = data)
    ##       A-2        A-3        B-2        B-3        C-2        C-3 
    ##  139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745 
    

    Regards the case with NAs: currently, NA-padding (via na.action) is not supported. So, plm's predict corresponds to lm's predict with these arguments:

    predict(lm_model, newdata = data, na.action = na.omit)
    ##       A-2        A-3        B-2        B-3        C-2        C-3 
    ## 139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745