I am trying to add a new variable (residual) to my original data frame based on the residuals of a first-order autoregressive model using lm().
residuals(lm(Var1 ~lag(Var1), panel_data)
It is similar to this question R: Replacement has [x] rows, data has [y] - residuals from a linear model in new variable but with groups. I already tried the proposed code including a line with group_by. However it is producing wrong residuals for the first observation of every group. How can I adapt the following code?
library(dplyr)
library(broom)
panel_data %>%
group.by = group %>%
lm(Var1 ~ Var1, data = .) %>%
augment() %>%
select(.rownames, .std.resid) %>%
right_join(mutate(panel_data, row = as.character(row_number())),
by = c(".rownames" = "row"))
An example data set can be as follows:
# Number of groups
num_groups <- 20
# Number of months
num_months <- 100
panel_data <- data.table(
group = rep(1:num_groups, each = num_months), # Group IDs
time = rep(1:num_months, times = num_groups), # Time period
Var1 = rnorm(num_groups * num_months), # Variable 1
Var2 = rnorm(num_groups * num_months) # Variable 2
)
Please have a look at this:
library(dplyr)
library(broom)
panel_data %>%
group_by(group) %>%
mutate(Var1_lag = lag(Var1)) %>%
filter(!is.na(Var1_lag)) %>%
do({
model_data <- .
augment(lm(Var1 ~ Var1_lag, data = model_data), data = model_data)
}) %>%
right_join(panel_data, by = c("group", "time", "Var1")) %>%
select(group, time, Var1, Var1_lag, .resid) %>%
mutate(.resid = ifelse(is.na(Var1_lag), NA, .resid)) %>%
ungroup()
A tibble: 2,000 × 5
group time Var1 Var1_lag .resid
<int> <int> <dbl> <dbl> <dbl>
1 1 2 0.689 -0.269 0.671
2 1 3 1.21 0.689 1.25
3 1 4 2.06 1.21 2.13
4 1 5 -0.292 2.06 -0.175
5 1 6 1.44 -0.292 1.42
6 1 7 -0.938 1.44 -0.857
7 1 8 -1.33 -0.938 -1.39
8 1 9 -0.0830 -1.33 -0.163
9 1 10 0.273 -0.0830 0.266
10 1 11 -0.466 0.273 -0.452
# ℹ 1,990 more rows