I am trying to write a simple code where it runs one-way ANOVA for each column, where the data looks like this:
PROTEIN A | PROTEIN B | |
---|---|---|
A1 | Cell 1 | Cell 2 |
A2 | Cell 3 | Cell 4 |
B1 | Cell 5 | Cell 6 |
B2 | Cell 7 | Cell 8 |
Data:
structure(list(acc = c("A", "A", "B", "B", "B", "C", "C", "C",
"D", "D"), A0A8L2QEN5 = c(130.3, 110.4, 123.3, 143.2, 110.4,
130.5, 109.1, 106.4, 19.5, 16.9), P63018 = c(97.1, 93.4, 103.1,
102.2, 110.9, 113, 122.7, 135.1, 60.6, 61.9), P85108 = c(99.1,
103.5, 97.9, 89.8, 87.8, 94.9, 87.8, 96.9, 121.5, 120.7), A0A8L2R7U3 = c(95.9,
101.1, 97.5, 96.6, 87.4, 97.9, 82.3, 103.7, 119.5, 118.1)), class = "data.frame", row.names = c(NA,
-10L))
The dataframe has 10 rows, and 300 columns. I have run ANOVA comparing Group A, B, C, etc. Unfortunately, I have to call for the summary of each ANOVA e.g. summary(anovas$PROTEIN A), which means that I have to do this manually 300 times. Is there any way to simply create a column (can be another dataframe) where P-value for the ANOVA is extracted automatically, so that I don't have to do this manually? Here is my code for the ANOVA:
fit_aov <- function(col) {
aov(col ~ trt, data = df_long)
}
anovas <- map(df_long[, 2:ncol(df_long)], fit_aov)
summary(anovas$protein2)[[1]][1,5] yields 1 readout.
A few things here:
reformulate
with the column name as the response.summary()
is possible, but using broom::tidy()
will make your life a little easier.fit_aov <- function(col, trt = "acc") {
f <- reformulate(trt, response = col)
a <- aov(f, data = df_long)
## summary(a)[[1]][["Pr(>F)"]][1]
broom::tidy(a)[["p.value"]][1]
}
vars <- names(df_long)[-1]
fit_aov(vars[1]) ## test on the first response before
purrr::map_dbl(vars, fit_aov)