rdiff

rolling difference between a selected base code compared to other code in R


I have a code where there is a scenario (sc column) and a code name (cd column). I want to calculate the difference between all scenarios to the first one (so scenario 2-1, 3-1, etc. I have multiple more scenarios...). Is there a quick way to code which scenario will be compared, for example, if I now the scenario from which all the other are compared is now scenario 2, it would be 1-2, 3-2, etc.?

But I also want to make sure that between scenarios, I subtract only the same code names (cd) so (sc1.cd1 - sc2.cd1), (sc1.cd2 - sc2.cd2), etc.

I also want the percentage difference between the sc 1 compared to the other scenarios for each code.

Right now, my code assumes that when I compare the scenarios, the cd are in the same order. Would there be a way to compare the scenarios and code together rather than using a subset of the data?

set.seed(123456)
vals = as.vector(mapply(FUN = rpois, n = 10, lambda = c(5, 10, 100)))
df.test = data.frame(cd = rep(1:10,3), sc = rep(1:3, each =10 ), vals)

# New empty column 
df.test$delta = NA

# Subset each scenarios and subtract for the first scenario
df.test[df.test$sc ==1,'delta'] = df.test[df.test$sc ==1,'vals']-df.test[df.test$sc ==1,'vals']
df.test[df.test$sc ==2,'delta'] = df.test[df.test$sc ==2,'vals']-df.test[df.test$sc ==1,'vals']
df.test[df.test$sc ==3,'delta'] = df.test[df.test$sc ==3,'vals']-df.test[df.test$sc ==1,'vals']

# Get difference
df.test$diff = abs(abs(df.test$delta) - df.test$vals)

# Calculate percentage
df.test$delta.perc = abs(df.test$delta) / df.test$diff +1
df.test$diff * df.test$delta.perc

# Percentage increase
df.test$delta.perc.increase = df.test$delta.perc *100

Solution

  • The problem is that you've hardcoded the differences between scenarios while assuming cd values will align perfectly, which makes changing the base scenario not quite cumbersome, but definitely a hassle.

    We can use dplyr to handle the grouping and calculations, this way we can set any base scenario dynamically, like 1, 2, or whatever you want. We will also make sure that the differences are calculated only between matching cd values, so there are no assumptions.

    library(dplyr)
    
    set.seed(123456)
    vals <- as.vector(mapply(FUN = rpois, n = 10, lambda = c(5, 10, 100)))
    df_test <- data.frame(cd = rep(1:10, 3), sc = rep(1:3, each = 10), vals)
    
    base_scenario <- 1
    
    df_test <- df_test %>%
      group_by(cd) %>%
      mutate(
        base_vals = vals[sc == base_scenario],
        delta = vals - base_vals,
        delta_perc = ifelse(base_vals != 0, delta / base_vals * 100, NA)
      ) %>%
      ungroup()
    
    print(df_test)
    

    this way you can set the base_scenario to whatever value you want and it updates automagically, and by grouping by cd we ensure that the delta's are individually calculated for each cd, so you get sc1.cd1 - sc2.cd2 rather than just subtracting rows in order. I also added the ifelse to prevent any div/0 errors when the base value is 0.

    Doing it this way lets you change the base_scenario to any value and it will change for it—so you'll get 1-2, 3-2, etc., when the base is set to 2.

    If I've not explained anything well enough, feel free to leave a question in the replies :)