rdataframeapplysapplytapply

How to use an apply() or equivalent function to perform math operations on current and adjacent data frame rows?


I am performing simple column-wise math operations on data frame rows that also involve accessing adjacent, previous data frame rows. Although the below code works, it's cumbersome (at least with respect to my liberal use of cbind() and subset() functions) and I wonder if there's a clean way to get the same results using an apply() or other super duper R function. In base R if possible.

I'm adding and subtracting column values in each data frame row (referring to the below columns, "plus1" + "plus 2" - "minus" = "total"), and if the id number is the same as you move down from one row to the next, adding in the plus1 from the prior row. See below illustration:

  id   plus1 plus2 minus total [total explained]
1  1     3     5    10    -2
2  2     4     5     9     0
3  3     8     5     8     5   [8 + 5 - 8 = 5, ignoring "plus1" in row 2 since "id" changed between rows 2 and 3]
4  3     1     4     7     6   [1 + 4 - 7, + 8 from "plus1" col in row 3 since "id" is same in rows 3 and 4, = 6]
5  3     2     5     6     2   [2 + 5 - 6, + 1 from "plus1" col in row 4 since "id" is same in rows 4 and 5, = 2]
6  5     3     6     5     4   [3 + 6 - 5 = 4, ignoring "plus1" in row 5 since "id" changed between rows 5 and 6]

Here is the code I used to generate the above:

data <- data.frame(id=c(1,2,3,3,3,5), 
                   plus1=c(3,4,8,1,2,3), 
                   plus2=c(5,5,5,4,5,6),
                   minus = c(10,9,8,7,6,5))

data <- cbind(data,
              tmp1=(data[ ,"plus1"] +
                    data[ ,"plus2"] -
                    data[ ,"minus"]
                   )
              )

grp <- with(rle(data$id), rep(seq_along(values), lengths))
data$tmp2 <- with(data,ave(plus1, grp, FUN = function(x) c(0, x[-length(x)])))

data <- cbind(data, total = round((data[ ,"tmp1"] + data[ ,"tmp2"]),2))
data <- subset(data, select = -c(tmp1,tmp2) )
data

I'm pursuing simplicity in the world of apply() because I'll be using many derivations of this sort thing in my current project. Looks like I'm mimicking Excel in R, which I am.


Solution

  • I think a simple way to do this is using the lag function from dplyr package. I used case_when to check if the id changed. If it didn't change, you add the extra term, otherwise you add 0.

    library(dplyr)
    
    data2<-data %>%
      mutate(extra=case_when(
        id==lag(id) ~ lag(plus1), 
        TRUE ~ 0
      )) %>%
      mutate(computed_total=plus1+plus2-minus+extra)