rcumulative-sumtapply

Using tapply and cumsum function for multiple vectors in R


I have a data frame with four columns.

  country       date       pangolin_lineage       n      cum_country
1 Albania    2020-09-05      B.1.236              1           1
2 Algeria    2020-03-02      B.1                  2           2
3 Algeria    2020-03-08      B.1                  1           3
4 Algeria    2020-06-09      B.1.1.119            1           4
5 Algeria    2020-06-15      B.1                  1           5
6 Algeria    2020-06-15      B.1.36               1           6

I wished to calculate the cumulative sum of n across country and date. I was able to do that with this code:

date_country$cum_country <- as.numeric(unlist(tapply(date_country$n, date_country$country, cumsum)))

I now, however, would like to do the same thing, but the cumulative sum across country, pangolin_lineage, and date. I have tried to add another vector into the above function, but it seems you can only input one index input and one vector input for tapply. I get this error:

date_country$cum_country_pangol <- as.numeric(unlist(tapply(date_country$n, date_country$country, date_country$pangolin_lineage, cumsum)))
Error in match.fun(FUN) : 
  'date_country$pangolin_lineage' is not a function, character or symbol

Does anyone have any ideas how how to use cumsum in tapply across multiple vectors (country, pangolin_lineage, date?


Solution

  • if there are more than one group, wrap it in a list, but note that tapply in a summarising function and it can split up when we specify function like cumsum.

     tapply(date_country$n, list(date_country$country, date_country$pangolin_lineage), cumsum))
    

    But, this is much more easier with ave i.e. if we want to create a new column, avoid the hassle of unlist etc. by just using ave

    ave(date_country$n, date_country$country, 
         date_country$pangolin_lineage, FUN = cumsum)
    #[1] 1 2 3 1 4 1