I have a data set DF with the following data
Zone | Year | X | Y |
---|---|---|---|
1001 | 2018 | 10 | 5 |
1001 | 2019 | 20 | 10 |
1001 | 2020 | 30 | 20 |
1002 | 2018 | 15 | 10 |
1002 | 2019 | 25 | 20 |
1002 | 2020 | 35 | 40 |
I want to create a column Z = X + Y - Previous year's Y So it creates the following Table:
Zone | Year | X | Y | Z |
---|---|---|---|---|
1001 | 2018 | 10 | 5 | NA |
1001 | 2019 | 20 | 10 | 25 |
1001 | 2020 | 30 | 20 | 40 |
1002 | 2018 | 15 | 10 | NA |
1002 | 2019 | 25 | 20 | 35 |
1002 | 2020 | 35 | 40 | 55 |
I can use "mutate" from DPLYR to generate column Z: mutate(DF, Z = X + Y - lag(Y))
I can use tapply to apply recursively on DF. Can I create a function using DPLYR in a user-defined function to apply this using tapply later?
In dplyr
you can add group_by
to apply a function for every group (Zone
).
library(dplyr)
DF %>% group_by(Zone) %>% mutate(Z = X + Y - lag(Y))
# Zone Year X Y Z
# <int> <int> <int> <int> <int>
#1 1001 2018 10 5 NA
#2 1001 2019 20 10 25
#3 1001 2020 30 20 40
#4 1002 2018 15 10 NA
#5 1002 2019 25 20 35
#6 1002 2020 35 40 55
We can also write a function :
add_new_col = function(x, y) {
x + y - lag(y)
}
which can be used as :
DF %>% group_by(Zone) %>% mutate(Z = add_new_col(X, Y))
data
DF <- structure(list(Zone = c(1001L, 1001L, 1001L, 1002L, 1002L, 1002L
), Year = c(2018L, 2019L, 2020L, 2018L, 2019L, 2020L), X = c(10L,
20L, 30L, 15L, 25L, 35L), Y = c(5L, 10L, 20L, 10L, 20L, 40L)),
class = "data.frame", row.names = c(NA, -6L))