rpanel

plm and factor(year) in panel data


I use the plm function of R to run various panel data models. To include time effects for the first difference model I use

model<-  plm(y ~ x1 + x2+factor(year) ,  data = df,   model="fd")

R automatically drops the first and the last time dummy. Is it possible to tell R which time dummies to throw? In my case, I need to drop the first and second one. Is there a way to do that?

see this example

 df <- data.frame(Country=c('A', 'A', 'A', 'A','A', 'B', 'B', 'B','B', 'B','C', 'C', 'C','C', 'C',  'D', 'D', 'D','D', 'D'),
                 y=c(12, 15, 19, 19, 29, 12,59, 59, 9,60, 1,3,7,9,3,44,66,77,88,0),
x=c(1+12*2, 1+15*3,1+ 19*1, 1+90*3, 1+29*34, 1+12*8,1+59, 1+59, 1+9,1+60, 1+1,1+3,1+7,1+9,1+3,1+44,1+66,1+77,1+88,1+0),
year=c(1990, 1991,1992,1993,1994,1990, 1991,1992,1993,1994,1990, 1991,1992,1993,1994,1990, 1991,1992,1993,1994 ))




df$id<- rep(1:4, each=5) 




       Country  y   x year id
1        A 12  25 1990  1
2        A 15  46 1991  1
3        A 19  20 1992  1
4        A 19 271 1993  1
5        A 29 987 1994  1
6        B 12  97 1990  2
7        B 59  60 1991  2
8        B 59  60 1992  2
9        B  9  10 1993  2
10       B 60  61 1994  2
11       C  1   2 1990  3
12       C  3   4 1991  3
13       C  7   8 1992  3
14       C  9  10 1993  3
15       C  3   4 1994  3
16       D 44  45 1990  4
17       D 66  67 1991  4
18       D 77  78 1992  4
19       D 88  89 1993  4
20       D  0   1 1994  4

   mydata <- pdata.frame(df, index=c("Country", "year"))

model<-  plm(y ~ x +factor(year) ,  data = mydata,   model="fd")

summary(model)


 Oneway (individual) effect First-Difference Model

Call:
plm(formula = y ~ x + factor(year), data = mydata, model = "fd")

Balanced Panel: n = 4, T = 5, N = 20
Observations used in estimation: 16

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-65.23957 -13.70019  -0.28284  12.62938  65.88940 

Coefficients: (1 dropped because of singularities)
                  Estimate Std. Error t-value Pr(>|t|)
(Intercept)      -1.691089   8.819438 -0.1917   0.8514
x                 0.056626   0.049116  1.1529   0.2734
factor(year)1991 20.077837  14.768075  1.3595   0.2012
factor(year)1992 26.674648  17.650245  1.5113   0.1589
factor(year)1993 16.086244  15.558256  1.0339   0.3234

Total Sum of Squares:    15932
Residual Sum of Squares: 12394
R-Squared:      0.22209
Adj. R-Squared: -0.06079
F-statistic: 0.7851 on 4 and 11 DF, p-value: 0.55817

I want to have in the estimation output the time effects for the years 1992,1993, 1994


Solution

  • Simply, adjust levels in your factor variable to the desired order, then run the plm model:

    factor (notice year and year_f align: 1990 => level 5, 1991 => level 1, 1992 => level 2...)

    df$year_f <- factor(df$year, c("1991", "1992", "1993", "1994", "1990"))
    
    str(df)
    # 'data.frame': 20 obs. of  6 variables:
    #  $ Country: chr  "A" "A" "A" "A" ...
    #  $ y      : num  12 15 19 19 29 12 59 59 9 60 ...
    #  $ x      : num  25 46 20 271 987 97 60 60 10 61 ...
    #  $ year   : num  1990 1991 1992 1993 1994 ...
    #  $ id     : int  1 1 1 1 1 2 2 2 2 2 ...
    #  $ year_f : Factor w/ 5 levels "1991","1992",..: 5 1 2 3 4 5 1 2 3 4 ...
    

    Models (both variants return exact results)

    plm + index

    model <- plm(y ~ x + year_f, data = df, index = c("Country", "year"), model="fd")
    summary(model)
    
    # Oneway (individual) effect First-Difference Model
    # 
    # Call:
    # plm(formula = y ~ x + year_f, data = df, model = "fd", index = c("Country", "year"))
    # 
    # Balanced Panel: n = 4, T = 5, N = 20
    # Observations used in estimation: 16
    # 
    # Residuals:
    #      Min.   1st Qu.    Median   3rd Qu.      Max. 
    # -65.23957 -13.70019  -0.28284  12.62938  65.88940 
    # 
    # Coefficients: (1 dropped because of singularities)
    #               Estimate Std. Error t-value Pr(>|t|)
    # (Intercept)  18.386748  16.783437  1.0955   0.2967
    # x             0.056626   0.049116  1.1529   0.2734
    # year_f1992  -13.481026  23.736104 -0.5680   0.5815
    # year_f1993  -44.147268  41.174228 -1.0722   0.3066
    # year_f1994  -80.311349  59.072302 -1.3595   0.2012
    # 
    # Total Sum of Squares:    15932
    # Residual Sum of Squares: 12394
    # R-Squared:      0.22209
    # Adj. R-Squared: -0.06079
    # F-statistic: 0.7851 on 4 and 11 DF, p-value: 0.55817
    

    pdata.frame + plm

    mydata <- pdata.frame(df, index=c("Country", "year"))
    model <- plm(y ~ x + year_f, data = mydata, model="fd")
    summary(model)
    
    # Oneway (individual) effect First-Difference Model
    # 
    # Call:
    # plm(formula = y ~ x + year_f, data = mydata, model = "fd")
    # 
    # Balanced Panel: n = 4, T = 5, N = 20
    # Observations used in estimation: 16
    # 
    # Residuals:
    #      Min.   1st Qu.    Median   3rd Qu.      Max. 
    # -65.23957 -13.70019  -0.28284  12.62938  65.88940 
    # 
    # Coefficients: (1 dropped because of singularities)
    #               Estimate Std. Error t-value Pr(>|t|)
    # (Intercept)  18.386748  16.783437  1.0955   0.2967
    # x             0.056626   0.049116  1.1529   0.2734
    # year_f1992  -13.481026  23.736104 -0.5680   0.5815
    # year_f1993  -44.147268  41.174228 -1.0722   0.3066
    # year_f1994  -80.311349  59.072302 -1.3595   0.2012
    # 
    # Total Sum of Squares:    15932
    # Residual Sum of Squares: 12394
    # R-Squared:      0.22209
    # Adj. R-Squared: -0.06079
    # F-statistic: 0.7851 on 4 and 11 DF, p-value: 0.55817