I have a large panel data set and I would like to lag and lead a variable by 1 month and 6 business days.
I know, for instance, from dplyr
there is the lag
or lead
function. However, I also need to group by data based on the "Names" in the panel data.
My data look like this:
structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019",
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019",
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A",
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L,
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L,
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L,
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))
It is just a simplified version. My real data lasts from 01.08.2018 to 31.12.2021. How can I only lag and lead the variable called "Rating" by 1 month and 6 business days?
My difficulty is that I have 1 month and 6 business days and not just one variable in the dataframe. All the other variables should not be adjusted.
So far I tried this:
Data_2 <- Data %>%
group_by(Name) %>%
lag('Rating')
Data_3 <- Data %>%
group_by(Name) %>%
lead('Rating')
But this is not what I am aiming for.
EDIT:
My output should look like this in the case of lead: (I just used the first 5 rows to illustrate)
structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018",
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here",
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here",
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))
So for row 1 I added 1 month and 6 business days which gives me 10.09.2018 and so on. The "Rating" will then be the one from 01.08.2018 but the "Size" will be the figure that was actually also reported on 10.09.2018. Then, I would like to do the same but go backwards 1 month and 6 business days.
Here's an approach that would work for "x days later." In this case I use 2 days later to demonstrate on your data, but 35 days later might be good to get the 5 week later #, with same day of week and so should be another "business day" most of the time.
# Convert dates to a date format that can be calculated upon
Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
Data2 %>%
mutate(Date_future = Date + 2) %>%
left_join(Data2, by = c("Name", "Date_future" = "Date"),
suffix = c("_now", "_future"))
# pipe into line below to just show selected columns
# select(Date_future, Name, Rating_now, Size_future)
Result
Date Name Rating_now Size_now Date_future Rating_future Size_future
1 2018-08-01 A 1 1234 2018-08-03 1 23
2 2018-08-02 A 1 24123 2018-08-04 3 1
3 2018-08-03 A 1 23 2018-08-05 3 23
4 2018-08-04 A 3 1 2018-08-06 NA NA
5 2018-08-05 A 3 23 2018-08-07 NA NA
6 2019-04-06 A 4 3 2019-04-08 4 4
7 2019-04-07 A 4 23 2019-04-09 NA NA
8 2019-04-08 A 4 4 2019-04-10 NA NA
9 2018-08-01 A 3 323 2018-08-03 1 23
10 2018-08-02 B 3 3424 2018-08-04 2 234
11 2018-08-03 B 2 523 2018-08-05 NA NA
12 2018-08-04 B 2 234 2018-08-06 NA NA
13 2019-04-06 B 2 35 2019-04-08 1 45
14 2019-04-07 B 1 354 2019-04-09 NA NA
15 2019-04-08 B 1 45 2019-04-10 NA NA
16 2018-08-01 C 1 23 2018-08-03 3 456
17 2018-08-02 C 3 46 2018-08-04 3 546
18 2018-08-03 C 3 456 2018-08-05 5 24
19 2018-08-04 C 3 546 2018-08-06 NA NA
20 2018-08-05 C 5 24 2018-08-07 NA NA
21 2019-04-07 C 5 134 2019-04-09 NA NA
22 2019-04-08 C 5 1 2019-04-10 NA NA