I have this dataframe:
mydf <- structure(list(Time = c("T0", "T3", "T3", "T0", "T3", "T3"),
Organism = c("BB", "BB", "BB", "CR", "CR", "CR"), MOF = c("MOF",
"MOF", "MOF", "MOF", "MOF", "MOF"), MOFTreatment = c("0",
"0", "1", "0", "0", "1"), std_dev_groups = c(" 0.9228677",
" 0.8464373", " 0.4491846", " 0.5988814", " 0.5845546", " 1.240182"
), Fold_reduction = c(" 1.8958800", " 1.7980552", " 1.3652684",
" 1.5145418", " 1.4995760", " 2.362283"), Perc_viability = c("52.7459535",
"55.6156459", "73.2456721", "66.0265686", "66.6855175", "42.331930"
)), class = "data.frame", row.names = c(NA, -6L))
Which looks like this:
Time Organism MOF MOFTreatment std_dev_groups Fold_reduction Perc_viability
1 T0 BB MOF 0 0.9228677 1.8958800 52.7459535
2 T3 BB MOF 0 0.8464373 1.7980552 55.6156459
3 T3 BB MOF 1 0.4491846 1.3652684 73.2456721
4 T0 CR MOF 0 0.5988814 1.5145418 66.0265686
5 T3 CR MOF 0 0.5845546 1.4995760 66.6855175
6 T3 CR MOF 1 1.240182 2.362283 42.331930
I want to perform a calculation on specific comparisons. I thought I should use group_by and mutate but this is giving me the following error:
mydf2 <- mydf %>% group_by(Time, Organism, MOFTreatment) %>%
mutate(Log_loss = (log10(100/Fold_reduction[Time == "T0"]) - log10(100/Fold_reduction[Time != "T0"])))
Error in `mutate()`:
ℹ In argument: `Log_loss = (...)`.
ℹ In group 1: `Time = "T0"`, `Organism = "BB"`, `MOFTreatment = "0"`.
Caused by error in `100 / Fold_reduction[Time == "T0"]`:
! non-numeric argument to binary operator
Backtrace:
1. mydf %>% group_by(Time, Organism, MOFTreatment) %>% ...
3. dplyr:::mutate.data.frame(...)
4. dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
6. dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
7. mask$eval_all_mutate(quo)
8. dplyr (local) eval()
What I want is to perform these calculations on my dataframe:
#Time 0 for organism BB and MOFTreatment 0 for organism BB
log10(100/1.8958800) - log10(100/1.7980552)
#Time 0 for organism BB and MOFTreatment 1 for organism BB
log10(100/1.8958800) - log10(100/1.3652684)
#Time 0 for organism CR and MOFTreatment 0 for organism CR
log10(100/1.5145418) - log10(100/1.4995760)
#Time 0 for organism CR and MOFTreatment 1 for organism CR
log10(100/1.5145418) - log10(100/2.362283)
I want my final dataframe to have a new column named Log_loss
that should have the result of the calculations. I recognize that some rows will not have a calculated value in column Log_loss
and this is another issue I am not sure how to address.
I would appreciate help in figuring out how to properly write my code to perform these calculations.
How about this?
First, fixing mydf
so that it has numeric components (as Friede suggested):
mydf[] <- lapply(mydf, type.convert, as.is = TRUE)
Now the dplyr pipe:
mydf |>
mutate(
.by = Organism,
Log_loss = if_else(Time == "T0" | !"T0" %in% Time, NA_real_,
log10(100 / Fold_reduction[Time == "T0"]) - log10(100 / Fold_reduction))
)
# Time Organism MOF MOFTreatment std_dev_groups Fold_reduction Perc_viability Log_loss
# 1 T0 BB MOF 0 0.9228677 1.895880 52.74595 NA
# 2 T3 BB MOF 0 0.8464373 1.798055 55.61565 -0.023007825
# 3 T3 BB MOF 1 0.4491846 1.365268 73.24567 -0.142592807
# 4 T0 CR MOF 0 0.5988814 1.514542 66.02657 NA
# 5 T3 CR MOF 0 0.5845546 1.499576 66.68552 -0.004312783
# 6 T3 CR MOF 1 1.2401820 2.362283 42.33193 0.193050661
I'm assuming that you control the size of your data sufficiently that [Time == "T0"]
will always return no more than 1 number per Organism
group.