I'm grouping some variables and lagging (4 time periods) but I get NA
values. I've tried using dplyr::lag
and plm::lag
test = structure(list(cnty90 = c(1005, 1005, 1005, 1005, 1005, 1005,
1005, 1005, 1005, 1005, 1005, 1005, 1005, 1005, 1005, 1005, 1015,
1015, 1015, 1015, 1015, 1015, 1015, 1015, 1015, 1015, 1015, 1015,
1015, 1015, 1015, 1015, 1031, 1031, 1031, 1031, 1031, 1031, 1031,
1031, 1031, 1031, 1031, 1031, 1031, 1031, 1031, 1031, 1039, 1039),
year = c(1868, 1872, 1876, 1880, 1884, 1888, 1892, 1896, 1900,
1904, 1908, 1912, 1916, 1920, 1924, 1928, 1868, 1872, 1876, 1880,
1884, 1888, 1892, 1896, 1900, 1904, 1908, 1912, 1916, 1920, 1924,
1928, 1868, 1872, 1876, 1880, 1884, 1888, 1892, 1896, 1900, 1904,
1908, 1912, 1916, 1920, 1924, 1928, 1868, 1872),
numdailies = c(0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
0, 2,2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0)), row.names = c(NA, 50L), class = "data.frame")
using dplyr
test %>%
group_by(cnty90, year) %>%
mutate(numdailies_l1 = lag(numdailies, n = 1),
changedailies_for=lead(numdailies, n=1))
# A tibble: 50 x 6
# Groups: cnty90, year [50]
cnty90 year numdailies numdailies_l1 changedailies changedailies_for
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1005 1868 0 NA NA NA
2 1005 1872 0 NA NA NA
3 1005 1876 0 NA NA NA
4 1005 1880 0 NA NA NA
5 1005 1884 2 NA NA NA
6 1005 1888 2 NA NA NA
7 1005 1892 2 NA NA NA
8 1005 1896 1 NA NA NA
9 1005 1900 1 NA NA NA
10 1005 1904 1 NA NA NA
11 1005 1908 1 NA NA NA
12 1005 1912 1 NA NA NA
13 1005 1916 1 NA NA NA
14 1005 1920 1 NA NA NA
15 1005 1924 1 NA NA NA
16 1005 1928 1 NA NA NA
17 1015 1868 0 NA NA NA
18 1015 1872 0 NA NA NA
19 1015 1876 0 NA NA NA
20 1015 1880 0 NA NA NA
using data.table
test = data.table::data.table(test)
test[, ":=" (numdailies_l1 = lag(numdailies, n = 1),
changedailies=numdailies-numdailies_l1,
changedailies_for=lead(numdailies, n=1)), by = .(cnty90, year)]
# A tibble: 20 x 6
cnty90 year numdailies numdailies_l1 changedailies changedailies_for
<int> <int> <int> <int> <int> <int>
1 1005 1868 0 NA NA 0
2 1005 1872 0 0 0 0
3 1005 1876 0 0 0 0
4 1005 1880 0 0 0 2
5 1005 1884 2 0 2 0
6 1005 1888 2 2 0 0
7 1005 1892 2 2 0 -1
8 1005 1896 1 2 -1 0
9 1005 1900 1 1 0 0
10 1005 1904 1 1 0 0
11 1005 1908 1 1 0 0
12 1005 1912 1 1 0 0
13 1005 1916 1 1 0 0
14 1005 1920 1 1 0 0
15 1005 1924 1 1 0 0
16 1005 1928 1 1 0 NA
17 1015 1868 0 NA NA 0
18 1015 1872 0 0 0 0
19 1015 1876 0 0 0 0
20 1015 1880 0 0 0 0
Try not to group by year. test %>% group_by(cnty90) %>%...
When you are grouping, you are actually splitting the original data frame into a series of smaller data frames, with each "new" piece holding the grouping variables equal.
In your provided sample:
cnty90 year numdailies numdailies_l1 changedailies changedailies_for
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1005 1868 0 NA NA NA
2 1005 1872 0 NA NA NA
3 1005 1876 0 NA NA NA
4 1005 1880 0 NA NA NA
5 1005 1884 2 NA NA NA
For every combination of "Cntry90" and "year" there is only 1 row each, thus there are no other rows to lag, thus creating the NA.
By only grouping by cnty90 variable, there are now 16 rows for cnty90 =1005 thus the ability to perform the lag function. The same will also apply when cnty90 =1015