rdplyrmutated

In using dplyr in R, how to merge 2 separate mutate statements operating on the same object?


The below MWE code works as intended. In summary:

However, when deploying this in the more complete code this is meant for, I get an error when running another table that draws from the equivalent of this "data1" data frame, due to running two data1 <- ... processes. So, how do I collapse these 2 functions into one?

Output with calculations explained:

     id plusA plusB minusC minusD running_balance [explain calculations ...]
     1     3     5     10      5              -7   minus D = plusB, running bal = plusA + plusB - minusC - minusD
     2     4     5      9      5              -5   same formulas as above since id <> prior row id
     3     8     5      8      5               0   same formulas as above since id <> prior row id
     3     1     4      7      9             -11   since id = prior row id, minus D = plusB + prior row plus B, and running bal = running bal from prior row + plusA + plusB - minusC - minusD  
     3     2     5      6      9             -19   same formulas as above since id = prior row id
     5     3     6      5      6              -2   minus D = plusB, running bal = plusA + plusB - minusC - minusD

MWE code:

data <- data.frame(id=c(1,2,3,3,3,5), 
                   plusA=c(3,4,8,1,2,3), 
                   plusB=c(5,5,5,4,5,6),
                   minusC = c(10,9,8,7,6,5))

library(dplyr)

data1<- subset(
  data %>% mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0)) %>%
    mutate(minusD=plusB+extra),
  select = -c(extra) # remove temporary calculation column 
) 

data1 <- data1 %>% group_by(id) %>% mutate(running_balance = cumsum(plusA + plusB - minusC - minusD))

Solution

  • You may continue the chain with %>% instead of creating a temporary object.

    library(dplyr)
    
    data %>% 
      mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0),
             minusD=plusB+extra) %>%
      group_by(id) %>%
      mutate(running_balance = cumsum(plusA + plusB - minusC - minusD)) %>%
      ungroup %>%
      select(-extra)
    
    #     id plusA plusB minusC minusD running_balance
    #  <dbl> <dbl> <dbl>  <dbl>  <dbl>           <dbl>
    #1     1     3     5     10      5              -7
    #2     2     4     5      9      5              -5
    #3     3     8     5      8      5               0
    #4     3     1     4      7      9             -11
    #5     3     2     5      6      9             -19
    #6     5     3     6      5      6              -2