rdplyracross

Variable calculation changing by row in R


I have a data like:

library(dplyr)


set.seed(123)  
data <- data.frame(
  date = rep(1:3, each=3),
  grupo = rep(c("A", "B", "C"), 3),
  x = runif(9, 10, 50),
  y = runif(9, 20, 100),
  z = runif(9, 5, 30)
)

# Convert group into factor
data$grupo <- as.factor(data$grupo)

# Order data 
data <- data %>% arrange(grupo, date)

I want the first observation of each row to be (1+original_value/100). However, starting from the second row I want the formula to be modified to first_modified_value*(1 + second_observed_value/100), the third new row will do: second_modified_value*(1 + third_observed_value/100).

I have generated the following formula, but I am not able to state that lag() should take the modified value, and not the lag(original value):

# Apply formula in each group for variables, x,y,z
dataaa <- data %>% 
  group_by(grupo) %>% 
  mutate(across(c(x, y, z), ~if_else(row_number() == 1, (1 + ./100), lag(.)*(1 + ./100))))

Any help?


Solution

  • The cumprod() function would appear to be the answer here as you are multiplying each value by the previous result:

    dataaa <- data %>%
      group_by(grupo) %>%
      mutate(across(c(x, y, z), ~cumprod(1 + . / 100)))
    
    dataaa
    # A tibble: 9 × 5
    # Groups:   grupo [3]
       date grupo     x     y     z
      <int> <fct> <dbl> <dbl> <dbl>
    1     1 A      1.22  1.57  1.13
    2     2 A      1.77  2.73  1.38
    3     3 A      2.32  5.24  1.68
    4     1 B      1.42  1.97  1.29
    5     2 B      2.09  3.26  1.56
    6     3 B      3.04  4.55  1.91
    7     1 C      1.26  1.56  1.27
    8     2 C      1.41  2.00  1.65
    9     3 C      1.87  2.47  1.96