rdplyrfiltermatchmutate

R index, filter then match on multiple criteria. Easy in excel, not so in r


I want to extract the 'untreated control' value for each variable and weed_type. In excel is easy to lookup the filter the value in the row and return the "untreated_contol" result for that weed_type. I would then like to calculate the % control (i.e. (untreated - predicted.value) / untreated * 100). Any suggestions to simplify this process, as I have a large number of "weed_types"?


df <- data.frame(Weed_type = c("weed1", "weed1", "weed1", "weed2", "weed2", "weed2", "weed3", "weed3", "weed3"),
                 Treatment = c("untreated control", "Treatment1", "Treatment2", "untreated control", "Treatment1", "Treatment2", "untreated control", "Treatment1", "Treatment2"),
                 predicted.value = c(23.3, 0.4, 0,  .9, .15, .01, 87, 12,2)
)
df

new to r, used a lengthy work around as follows:

extracted the untreated values for each weed type

weed1_c <- df %>% filter(Weed_type == 'weed1' & Treatment == "untreated control")  %>% pull(predicted.value)
weed2_c <- df %>% filter(Weed_type == 'weed2' & Treatment == "untreated control")  %>% pull(predicted.value)
weed3_c <- df %>% filter(Weed_type == 'weed3' & Treatment == "untreated control")  %>% pull(predicted.value)

added these values back into the dataframe

df <- df %>% 
      mutate(untreated = case_when(
        Weed_type=="weed1" ~   weed1_c,
        Weed_type=="weed2" ~   weed2_c,
        Weed_type=="weed3" ~   weed3_c,
      ))

calculate a percentage control

df$percentage_control <- (df$untreated  - df$predicted.value) / df$untreated *100
df  

would appreciate an easier way to get to this result!


Solution

  • Here are two possible options to achieve your desired result without the need to store the untreated value for each weed type in a separate variable:

    library(dplyr, warn.conflicts = FALSE)
    
    df |> 
      mutate(
        untreated = predicted.value[Treatment == "untreated control"],
        .by = Weed_type
      ) |> 
      mutate(
        percentage_control = (untreated  - predicted.value) / untreated * 100
      )
    #>   Weed_type         Treatment predicted.value untreated percentage_control
    #> 1     weed1 untreated control           23.30      23.3            0.00000
    #> 2     weed1        Treatment1            0.40      23.3           98.28326
    #> 3     weed1        Treatment2            0.00      23.3          100.00000
    #> 4     weed2 untreated control            0.90       0.9            0.00000
    #> 5     weed2        Treatment1            0.15       0.9           83.33333
    #> 6     weed2        Treatment2            0.01       0.9           98.88889
    #> 7     weed3 untreated control           87.00      87.0            0.00000
    #> 8     weed3        Treatment1           12.00      87.0           86.20690
    #> 9     weed3        Treatment2            2.00      87.0           97.70115
    
    
    df |> 
      filter(
        Treatment == "untreated control"
      ) |> 
      select(Weed_type, untreated = predicted.value) |> 
      right_join(x = df, by = "Weed_type") |> 
      mutate(
        percentage_control = (untreated  - predicted.value) / untreated * 100
      )
    #>   Weed_type         Treatment predicted.value untreated percentage_control
    #> 1     weed1 untreated control           23.30      23.3            0.00000
    #> 2     weed1        Treatment1            0.40      23.3           98.28326
    #> 3     weed1        Treatment2            0.00      23.3          100.00000
    #> 4     weed2 untreated control            0.90       0.9            0.00000
    #> 5     weed2        Treatment1            0.15       0.9           83.33333
    #> 6     weed2        Treatment2            0.01       0.9           98.88889
    #> 7     weed3 untreated control           87.00      87.0            0.00000
    #> 8     weed3        Treatment1           12.00      87.0           86.20690
    #> 9     weed3        Treatment2            2.00      87.0           97.70115