rdata-analysiscumsumcross-joinaccumulate

Using accumulate and/or cumsum properly for projections


I need to project each Z, W, Y value of each row in df1. To do this, I will use the growth rates in B from df2.

df1:
X    Y  year    Z      W      V
abc ab  2020    0.1  0.7    1.3
abc cd  2020    0.2  0.8    1.4
efg ef  2020    0.3  0.9    1.5
efg gh  2020    0.4  1      1.6

df2:
year    B       
2021    0.05    
2022    0.063   
2023    0.049   
2024    0.061   
2025    0.057   

I understand that a cross join might solve the problem. However, before that I need to be sure that the precious value is used for the next calculation.

I expect something like this:

from df1:
X    Y  year    Z      W      V
abc ab  2020    0.1    0.7    1.3

df3: (edited)
year    B       C       D       E       X   Y
2020    0       0.1000  0.7000  1.3000  abc ab 
2021    0.05    0.1050  0.7350  1.3650  abc ab 
2022    0.063   0.1116  0.7813  1.4510  abc ab 
2023    0.049   0.1171  0.8196  1.5221  abc ab 
2024    0.061   0.1242  0.8696  1.6149  abc ab 
2025    0.057   0.1313  0.9192  1.7070  abc ab  

Formulas for df3 (edited)
1    year   B       C               D               E
2    2020   0       0.1             0.7             1.3
3    2021   0.05    =+C2*($B$3+1)   =+D2*($B$3+1)   =+E2*($B$3+1)
4    2022   0.063   =+C3*($B$4+1)   =+D3*($B$4+1)   =+E3*($B$4+1)
5    2023   0.049   =+C4*($B$5+1)   =+D4*($B$5+1)   =+E4*($B$5+1)
6    2024   0.061   =+C5*($B$6+1)   =+D5*($B$6+1)   =+E5*($B$6+1)
7    2025   0.057   =+C6*($B$7+1)   =+D6*($B$7+1)   =+E6*($B$7+1)

To get df3, I have tried:

df3 <- df2 %>% mutate(example2 = accumulate(B, ~ 0.1 * ( .x +1)))

df3 <- df2 %>% mutate(example2 = cumsum(accumulate(B, ~ 0.1 * ( .x +1))))

The outcome is still not the expected.

At the end, I would need to pass it into the cross join and across functions, something like this:

df3 <- mutate(cross_join(df1[-3], df2), across(Z:V)*(H), H = NULL)

Solution

  • Here is a first take on doing it. There's likely a simpler way (but alas, there always is):

    # first, we do a few things to df2
    # 1. we add a 2020 column
    # 2. we transform the interest rate to 1 + the interest rate (to enable the cumulative product function to work
    # 3. we use the cumprod function to turn the B column into the cumulative products of B + 1
    df2 <- df2 %>% 
      add_row(year = 2020, B = 0, .before = 1) %>%
      mutate(B = cumprod(B + 1))
    
    # remove the year from df1, because it's unnecessary, and we'd have to remove it later anyway
    df1 <- df1 %>%
        select(-year)
    
    # df3
    cross_join(df1, df2) %>%
        mutate(C = Z * B,
            D = W * B,
            E = V * B) %>%
        select(year, B, C, D, E, X, Y)
    
    # A tibble: 24 × 7
        year     B     C     D     E X     Y    
       <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
     1  2020  1    0.1   0.7    1.3  abc   ab   
     2  2021  1.05 0.105 0.735  1.37 abc   ab   
     3  2022  1.12 0.112 0.781  1.45 abc   ab   
     4  2023  1.17 0.117 0.820  1.52 abc   ab   
     5  2024  1.24 0.124 0.870  1.61 abc   ab   
     6  2025  1.31 0.131 0.919  1.71 abc   ab   
     7  2020  1    0.2   0.8    1.4  abc   cd   
     8  2021  1.05 0.21  0.84   1.47 abc   cd   
     9  2022  1.12 0.223 0.893  1.56 abc   cd   
    10  2023  1.17 0.234 0.937  1.64 abc   cd   
    # ℹ 14 more rows
    

    Update: simpler way:

    df2 %>%
        add_row(year = 2020, B = 0, .before = 1) %>%
        cross_join(df1) %>%
        group_by(X, Y) %>%
        mutate(across(c(Z, W, V), ~ . * cumprod(B + 1))) %>%
        rename("C" = Z, "D" = W, "E" = V) %>%
        ungroup()