I have a data frame with four columns.
country date pangolin_lineage n cum_country
1 Albania 2020-09-05 B.1.236 1 1
2 Algeria 2020-03-02 B.1 2 2
3 Algeria 2020-03-08 B.1 1 3
4 Algeria 2020-06-09 B.1.1.119 1 4
5 Algeria 2020-06-15 B.1 1 5
6 Algeria 2020-06-15 B.1.36 1 6
I wished to calculate the cumulative sum of n across country and date. I was able to do that with this code:
date_country$cum_country <- as.numeric(unlist(tapply(date_country$n, date_country$country, cumsum)))
I now, however, would like to do the same thing, but the cumulative sum across country, pangolin_lineage, and date. I have tried to add another vector into the above function, but it seems you can only input one index input and one vector input for tapply. I get this error:
date_country$cum_country_pangol <- as.numeric(unlist(tapply(date_country$n, date_country$country, date_country$pangolin_lineage, cumsum)))
Error in match.fun(FUN) :
'date_country$pangolin_lineage' is not a function, character or symbol
Does anyone have any ideas how how to use cumsum in tapply across multiple vectors (country, pangolin_lineage, date?
if there are more than one group, wrap it in a list
, but note that tapply
in a summarising function and it can split up when we specify function like cumsum
.
tapply(date_country$n, list(date_country$country, date_country$pangolin_lineage), cumsum))
But, this is much more easier with ave
i.e. if we want to create a new column, avoid the hassle of unlist
etc. by just using ave
ave(date_country$n, date_country$country,
date_country$pangolin_lineage, FUN = cumsum)
#[1] 1 2 3 1 4 1