I have the following sample df, consisting of a category (Cat) in which different articles are shown (Pizza or Pasta) together with their sales data for different calender weeks (CW). In some weeks there is a promotion, which causes the sales to go up. The 6 largest sales values are marked as such promotions.
# example df
set.seed(99999)
df <- data.frame(Cat = rep(c("A","B"),52),
Article = rep(c("Pizza","Pasta"),52))
df <- df[order(df$Cat),]
df$CW <- rep(1:52,2)
df$sales <- abs(2+rnorm(104))
df$promotion <- ifelse(rank(df$sales,ties.method=c("last"))>98,1,0)
The challenge now is to calculate a "baseline" against which to judge the promotion. The baseline needs to meet the following requirements:
I have tried solving this challenge using existing posts on stackoverflow , but with no success. Hence I am asking for help.
A solution with dplyr and zoo could look like this:
library(dplyr)
library(zoo)
df2 <- df %>%
arrange(Cat,Article,CW) %>%
group_by(Cat,Article,stimulus) %>%
mutate(Baseline=rollapplyr(sales,list(-(3:1)),mean,fill=NA))%>%
ungroup()%>%
mutate(Baseline=ifelse(stimulus==1,lead(Baseline,n=1L),Baseline))