rtidyversecumsum

How to calculate cumulative sum per group when groups are repeated in R?


I have a dataframe representing paneldata with ids and time. For each id and year I like to calculate a cummulative sum. But id and time groups are repeated.

df <- data.frame(id = rep(12345,15),
                 year = c(rep(2021,5), rep(2022,5), rep(2023,5)), 
                 value = c(rep(5,5), rep(3,5), rep(4,5)))

df

      id year value
1  12345 2021     5
2  12345 2021     5
3  12345 2021     5
4  12345 2021     5
5  12345 2021     5
6  12345 2022     3
7  12345 2022     3
8  12345 2022     3
9  12345 2022     3
10 12345 2022     3
11 12345 2023     4
12 12345 2023     4
13 12345 2023     4
14 12345 2023     4
15 12345 2023     4

So instead of this

df %>% 
  group_by(id, year) %>% 
  mutate(value_cumsum = cumsum(value))

# A tibble: 15 × 4
# Groups:   id, year [3]
      id  year value value_cumsum
   <dbl> <dbl> <dbl>        <dbl>
 1 12345  2021     5            5
 2 12345  2021     5           10
 3 12345  2021     5           15
 4 12345  2021     5           20
 5 12345  2021     5           25
 6 12345  2022     3            3
 7 12345  2022     3            6
 8 12345  2022     3            9
 9 12345  2022     3           12
10 12345  2022     3           15
11 12345  2023     4            4
12 12345  2023     4            8
13 12345  2023     4           12
14 12345  2023     4           16
15 12345  2023     4           20

I need value_cumsum to be 5, 5, 5, 5, 5, 8, 8, 8, 8, 8, 12, 12, 12, 12, 12.


Solution

  • you can try

    df %>%
        mutate(value_cumsum = value * !duplicated(value), .by = c("id", "year")) %>%
        mutate(value_cumsum = cumsum(value_cumsum))
    

    or

    df %>%
        distinct() %>%
        mutate(value_cumsum = cumsum(value)) %>%
        right_join(df)
    

    which gives

          id year value value_cumsum
    1  12345 2021     5            5
    2  12345 2021     5            5
    3  12345 2021     5            5
    4  12345 2021     5            5
    5  12345 2021     5            5
    6  12345 2022     3            8
    7  12345 2022     3            8
    8  12345 2022     3            8
    9  12345 2022     3            8
    10 12345 2022     3            8
    11 12345 2023     4           12
    12 12345 2023     4           12
    13 12345 2023     4           12
    14 12345 2023     4           12
    15 12345 2023     4           12