is there an analog to SQL analytic so that one could do aggregation without collapsing rows? For example, I want to do a sum for each group without GROUP BY, in SQL I can do this:
select group, x, sum(x) over(partition by group) group_sum
from mytable
I would like to be able to do something similar in R:
df <- data.frame(group=c('a', 'a', 'b', 'b'), x=c(1, 3, 10, 30))
df %>% mutate(group_sum = window_aggr(group_by=group, func=sum))
group x group_sum
a 1 4
a 3 4
b 10 10
b 30 40
where window_aggr is just a made-up function.
So is there a way to implement this in a single pipeline, without doing an actual aggregation and a join?
Thanks!
Best regards, Nikolai
You can use ave
which will in this case calculate the sum
of df$x
for the groups df$group
.
df$group_sum <- ave(df$x, df$group, FUN=sum)
df
# group x group_sum
#1 a 1 4
#2 a 3 4
#3 b 10 40
#4 b 30 40
Or using base pipes:
df |> transform(group_sum = ave(x, group, FUN=sum))
# group x group_sum
#1 a 1 4
#2 a 3 4
#3 b 10 40
#4 b 30 40
Or using dplyr
library(dplyr)
df %>% mutate(group_sum = ave(x, group, FUN=sum))
# group x group_sum
#1 a 1 4
#2 a 3 4
#3 b 10 40
#4 b 30 40