I have a code where there is a scenario (sc
column) and a code name (cd
column). I want to calculate the difference between all scenarios to the first one (so scenario 2-1
, 3-1
, etc. I have multiple more scenarios...). Is there a quick way to code which scenario will be compared, for example, if I now the scenario from which all the other are compared is now scenario 2, it would be 1-2
, 3-2
, etc.?
But I also want to make sure that between scenarios, I subtract only the same code names (cd
) so (sc1.cd1 - sc2.cd1
), (sc1.cd2 - sc2.cd2
), etc.
I also want the percentage difference between the sc
1 compared to the other scenarios for each code.
Right now, my code assumes that when I compare the scenarios, the cd
are in the same order. Would there be a way to compare the scenarios and code together rather than using a subset of the data?
set.seed(123456)
vals = as.vector(mapply(FUN = rpois, n = 10, lambda = c(5, 10, 100)))
df.test = data.frame(cd = rep(1:10,3), sc = rep(1:3, each =10 ), vals)
# New empty column
df.test$delta = NA
# Subset each scenarios and subtract for the first scenario
df.test[df.test$sc ==1,'delta'] = df.test[df.test$sc ==1,'vals']-df.test[df.test$sc ==1,'vals']
df.test[df.test$sc ==2,'delta'] = df.test[df.test$sc ==2,'vals']-df.test[df.test$sc ==1,'vals']
df.test[df.test$sc ==3,'delta'] = df.test[df.test$sc ==3,'vals']-df.test[df.test$sc ==1,'vals']
# Get difference
df.test$diff = abs(abs(df.test$delta) - df.test$vals)
# Calculate percentage
df.test$delta.perc = abs(df.test$delta) / df.test$diff +1
df.test$diff * df.test$delta.perc
# Percentage increase
df.test$delta.perc.increase = df.test$delta.perc *100
The problem is that you've hardcoded the differences between scenarios while assuming cd
values will align perfectly, which makes changing the base scenario not quite cumbersome, but definitely a hassle.
We can use dplyr
to handle the grouping and calculations, this way we can set any base scenario dynamically, like 1
, 2
, or whatever you want. We will also make sure that the differences are calculated only between matching cd
values, so there are no assumptions.
library(dplyr)
set.seed(123456)
vals <- as.vector(mapply(FUN = rpois, n = 10, lambda = c(5, 10, 100)))
df_test <- data.frame(cd = rep(1:10, 3), sc = rep(1:3, each = 10), vals)
base_scenario <- 1
df_test <- df_test %>%
group_by(cd) %>%
mutate(
base_vals = vals[sc == base_scenario],
delta = vals - base_vals,
delta_perc = ifelse(base_vals != 0, delta / base_vals * 100, NA)
) %>%
ungroup()
print(df_test)
this way you can set the base_scenario
to whatever value you want and it updates automagically, and by grouping by cd
we ensure that the delta
's are individually calculated for each cd
, so you get sc1.cd1 - sc2.cd2
rather than just subtracting rows in order. I also added the ifelse
to prevent any div/0
errors when the base value is 0
.
Doing it this way lets you change the base_scenario
to any value and it will change for it—so you'll get 1-2
, 3-2
, etc., when the base is set to 2
.
If I've not explained anything well enough, feel free to leave a question in the replies :)