rgroup-bytidyversemutate

How can I use group_by and mutate to perform a subtraction calculation with specific groupings? Time 0 minus Time X for all groups


I have this dataframe:

mydf <- structure(list(Time = c("T0", "T3", "T3", "T0", "T3", "T3"), 
    Organism = c("BB", "BB", "BB", "CR", "CR", "CR"), MOF = c("MOF", 
    "MOF", "MOF", "MOF", "MOF", "MOF"), MOFTreatment = c("0", 
    "0", "1", "0", "0", "1"), std_dev_groups = c(" 0.9228677", 
    " 0.8464373", " 0.4491846", " 0.5988814", " 0.5845546", " 1.240182"
    ), Fold_reduction = c(" 1.8958800", " 1.7980552", " 1.3652684", 
    " 1.5145418", " 1.4995760", " 2.362283"), Perc_viability = c("52.7459535", 
    "55.6156459", "73.2456721", "66.0265686", "66.6855175", "42.331930"
    )), class = "data.frame", row.names = c(NA, -6L))

Which looks like this:

  Time Organism MOF MOFTreatment std_dev_groups Fold_reduction Perc_viability
1   T0       BB MOF            0      0.9228677      1.8958800     52.7459535
2   T3       BB MOF            0      0.8464373      1.7980552     55.6156459
3   T3       BB MOF            1      0.4491846      1.3652684     73.2456721
4   T0       CR MOF            0      0.5988814      1.5145418     66.0265686
5   T3       CR MOF            0      0.5845546      1.4995760     66.6855175
6   T3       CR MOF            1       1.240182       2.362283      42.331930

I want to perform a calculation on specific comparisons. I thought I should use group_by and mutate but this is giving me the following error:

mydf2 <- mydf %>% group_by(Time, Organism, MOFTreatment) %>%
  mutate(Log_loss = (log10(100/Fold_reduction[Time == "T0"]) - log10(100/Fold_reduction[Time != "T0"])))
    
Error in `mutate()`:
ℹ In argument: `Log_loss = (...)`.
ℹ In group 1: `Time = "T0"`, `Organism = "BB"`, `MOFTreatment = "0"`.
Caused by error in `100 / Fold_reduction[Time == "T0"]`:
! non-numeric argument to binary operator
Backtrace:
 1. mydf %>% group_by(Time, Organism, MOFTreatment) %>% ...
 3. dplyr:::mutate.data.frame(...)
 4. dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
 6. dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 7. mask$eval_all_mutate(quo)
 8. dplyr (local) eval()

What I want is to perform these calculations on my dataframe:

    #Time 0 for organism BB and MOFTreatment 0 for organism BB
    log10(100/1.8958800) - log10(100/1.7980552)

    #Time 0 for organism BB and MOFTreatment 1 for organism BB
    log10(100/1.8958800) - log10(100/1.3652684)

    #Time 0 for organism CR and MOFTreatment 0 for organism CR
    log10(100/1.5145418) - log10(100/1.4995760)

    #Time 0 for organism CR and MOFTreatment 1 for organism CR
    log10(100/1.5145418) - log10(100/2.362283)

I want my final dataframe to have a new column named Log_loss that should have the result of the calculations. I recognize that some rows will not have a calculated value in column Log_loss and this is another issue I am not sure how to address.

I would appreciate help in figuring out how to properly write my code to perform these calculations.


Solution

  • How about this?

    First, fixing mydf so that it has numeric components (as Friede suggested):

    mydf[] <- lapply(mydf, type.convert, as.is = TRUE)
    

    Now the dplyr pipe:

    mydf |>
      mutate(
        .by = Organism,
        Log_loss = if_else(Time == "T0" | !"T0" %in% Time, NA_real_,
                           log10(100 / Fold_reduction[Time == "T0"]) - log10(100 / Fold_reduction))
      )
    #   Time Organism MOF MOFTreatment std_dev_groups Fold_reduction Perc_viability     Log_loss
    # 1   T0       BB MOF            0      0.9228677       1.895880       52.74595           NA
    # 2   T3       BB MOF            0      0.8464373       1.798055       55.61565 -0.023007825
    # 3   T3       BB MOF            1      0.4491846       1.365268       73.24567 -0.142592807
    # 4   T0       CR MOF            0      0.5988814       1.514542       66.02657           NA
    # 5   T3       CR MOF            0      0.5845546       1.499576       66.68552 -0.004312783
    # 6   T3       CR MOF            1      1.2401820       2.362283       42.33193  0.193050661
    

    I'm assuming that you control the size of your data sufficiently that [Time == "T0"] will always return no more than 1 number per Organism group.