rdataframedplyrreplacemutate

Replacing several rows of data in a column efficiently using condition in a pipeline


I have the following dataframe:

df <- data.frame(
   Form=rep(c("Fast", "Medium", "Slow"), each  = 3),
   Parameter =rep(c("Fmax", "TMAX", "B"), times =3),
   Estimate = sample(1:10, 9, replace = TRUE))

It looks like this:

> df
    Form Parameter Estimate
1   Fast      Fmax        7
2   Fast      TMAX        5
3   Fast         B        1
4 Medium      Fmax        4
5 Medium      TMAX        7
6 Medium         B        4
7   Slow      Fmax        4
8   Slow      TMAX        6
9   Slow         B        7

I would like to overwrite the Estimate values from Form = Slow to Form = Medium in an efficient way, e.g. not having to pivot wide and then doing a mutate with case_when on every column (Fmax, TMAX, B), and I would like the solution to be sufficiently flexible so that I don't have to specify rows numbers explicitly (e.g. using the example pasted at the end of this question. My desired result would like this:

> df
    Form Parameter Estimate
1   Fast      Fmax        7
2   Fast      TMAX        5
3   Fast         B        1
4 Medium      Fmax        4
5 Medium      TMAX        6
6 Medium         B        7
7   Slow      Fmax        4
8   Slow      TMAX        6
9   Slow         B        7

Can you please help? Thank you.

Example specifying row numbers (which I don't want):

library(dplyr)

# Sample data
df <- data.frame(
  id = 1:10,
  colA = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
  colB = rep(NA, 10)
)

# Pipeline to assign values from colA rows 6-10 to colB rows 1-5
df_modified <- df %>%
  mutate(
    colB = replace(
      x = colB,
      list = 1:5,
      values = colA[6:10]
    )
  )

# View the result
print(df_modified)

Solution

  • Data (with seed) for reproducibility:

    set.seed(42)
    df <- data.frame(Form=rep(c("Fast", "Medium", "Slow"), each  = 3), Parameter =rep(c("Fmax", "TMAX", "B"), times =3), Estimate = sample(1:10, 9, replace = TRUE))
    df
    #     Form Parameter Estimate
    # 1   Fast      Fmax        1
    # 2   Fast      TMAX        5
    # 3   Fast         B        1
    # 4 Medium      Fmax        9
    # 5 Medium      TMAX       10
    # 6 Medium         B        4
    # 7   Slow      Fmax        2
    # 8   Slow      TMAX       10
    # 9   Slow         B        1
    

    Solution:

    df |>
      mutate(
        .by = Parameter,
        Estimate = if_else("Slow" %in% Form & Form == "Medium", 
                           Estimate[Form == "Slow"], Estimate)
      )
    #     Form Parameter Estimate
    # 1   Fast      Fmax        1
    # 2   Fast      TMAX        5
    # 3   Fast         B        1
    # 4 Medium      Fmax        2
    # 5 Medium      TMAX       10
    # 6 Medium         B        1
    # 7   Slow      Fmax        2
    # 8   Slow      TMAX       10
    # 9   Slow         B        1
    
    

    I'm inferring that this should be done per-Parameter and not just row-relative, ergo the .by = Parameter.