rdplyrbroom

Residuals from linear model with lag predictor by group as a new variable


I am trying to add a new variable (residual) to my original data frame based on the residuals of a first-order autoregressive model using lm().

residuals(lm(Var1 ~lag(Var1), panel_data)

It is similar to this question R: Replacement has [x] rows, data has [y] - residuals from a linear model in new variable but with groups. I already tried the proposed code including a line with group_by. However it is producing wrong residuals for the first observation of every group. How can I adapt the following code?

library(dplyr)
library(broom)

panel_data %>% 
  group.by = group %>%
  lm(Var1 ~ Var1, data = .) %>% 
  augment() %>% 
  select(.rownames, .std.resid) %>% 
  right_join(mutate(panel_data, row = as.character(row_number())), 
             by = c(".rownames" = "row"))

An example data set can be as follows:

# Number of groups
num_groups <- 20

# Number of months
num_months <- 100

panel_data <- data.table(
  group = rep(1:num_groups, each = num_months), # Group IDs
  time = rep(1:num_months, times = num_groups), # Time period
  Var1 = rnorm(num_groups * num_months), # Variable 1
  Var2 = rnorm(num_groups * num_months)  # Variable 2
)

Solution

  • Please have a look at this:

    library(dplyr)
    library(broom)
    
    panel_data %>% 
      group_by(group) %>%
      mutate(Var1_lag = lag(Var1)) %>%
      filter(!is.na(Var1_lag)) %>%
      do({
        model_data <- .
        augment(lm(Var1 ~ Var1_lag, data = model_data), data = model_data)
      }) %>% 
      right_join(panel_data, by = c("group", "time", "Var1")) %>% 
      select(group, time, Var1, Var1_lag, .resid) %>%
      mutate(.resid = ifelse(is.na(Var1_lag), NA, .resid)) %>% 
      ungroup()
    
    A tibble: 2,000 × 5
       group  time    Var1 Var1_lag .resid
       <int> <int>   <dbl>    <dbl>  <dbl>
     1     1     2  0.689   -0.269   0.671
     2     1     3  1.21     0.689   1.25 
     3     1     4  2.06     1.21    2.13 
     4     1     5 -0.292    2.06   -0.175
     5     1     6  1.44    -0.292   1.42 
     6     1     7 -0.938    1.44   -0.857
     7     1     8 -1.33    -0.938  -1.39 
     8     1     9 -0.0830  -1.33   -0.163
     9     1    10  0.273   -0.0830  0.266
    10     1    11 -0.466    0.273  -0.452
    # ℹ 1,990 more rows