rdplyrtapply

Creating an iterative function using lag and mutate command in R


I have a data set DF with the following data

Zone Year X Y
1001 2018 10 5
1001 2019 20 10
1001 2020 30 20
1002 2018 15 10
1002 2019 25 20
1002 2020 35 40

I want to create a column Z = X + Y - Previous year's Y So it creates the following Table:

Zone Year X Y Z
1001 2018 10 5 NA
1001 2019 20 10 25
1001 2020 30 20 40
1002 2018 15 10 NA
1002 2019 25 20 35
1002 2020 35 40 55

I can use "mutate" from DPLYR to generate column Z: mutate(DF, Z = X + Y - lag(Y))

I can use tapply to apply recursively on DF. Can I create a function using DPLYR in a user-defined function to apply this using tapply later?


Solution

  • In dplyr you can add group_by to apply a function for every group (Zone).

    library(dplyr)
    DF %>% group_by(Zone) %>% mutate(Z = X + Y - lag(Y))
    
    #   Zone  Year     X     Y     Z
    #  <int> <int> <int> <int> <int>
    #1  1001  2018    10     5    NA
    #2  1001  2019    20    10    25
    #3  1001  2020    30    20    40
    #4  1002  2018    15    10    NA
    #5  1002  2019    25    20    35
    #6  1002  2020    35    40    55
    

    We can also write a function :

    add_new_col = function(x, y) {
      x + y - lag(y)
    }
    

    which can be used as :

    DF %>% group_by(Zone) %>% mutate(Z = add_new_col(X, Y))
    

    data

    DF <- structure(list(Zone = c(1001L, 1001L, 1001L, 1002L, 1002L, 1002L
    ), Year = c(2018L, 2019L, 2020L, 2018L, 2019L, 2020L), X = c(10L, 
    20L, 30L, 15L, 25L, 35L), Y = c(5L, 10L, 20L, 10L, 20L, 40L)), 
    class = "data.frame", row.names = c(NA, -6L))