So I have data looking a little something like this:
Data:
Area | Al | Cd | Cu |
---|---|---|---|
A | 10000 | 0.2 | 30 |
A | 15000 | 0.5 | 25 |
A | NA | Na | NA |
B | 8000 | 1.1 | 55 |
B | 11000 | 0.2 | 40 |
B | 13000 | 0.1 | 40 |
etc.
And I want to do a Mann Whitney U test between group A and B separately for each element/column.
I have managed to do this manually for each element individually according to this:
#Data is the above dataframe
Area_A <- subset(Data, Group %in% c("A"))
Area_B <- subset(Data, Group %in% c("B"))
WhitneyU_Al <- wilcox.test(Area_A$Al, Area_B$Al, na.rm = TRUE, paired = FALSE, exact = FALSE)
(I couldn't figure out how to do it based on the rows in the column "Areas" in one data frame, which is why I divided it into two subsets).
Now, I have a lot more columns than just these three (43 to be exact), and I was wondering if there was some way to do this across all columns without changing it manually each time?
I tried a few variations of this:
WhitneyU <- wilcox.test(Area_A, Area_B, na.rm = TRUE, paired = FALSE, exact = FALSE)
#OR
WhitneyU <- wilcox.test(Area_A[2:43], Area_B[2:43], na.rm = TRUE, paired = FALSE, exact = FALSE)
But they both return the error that "'x' must be numeric".
I suspect the answer isn't this easy and that I am barking up the wrong tree? Either that, or the question/answer is too obvious and I am just not seeing it. When I tried looking up multiple tests most answers deal with how to do multiple tests if you have multiple "groups" (as in, they have area A, B, C and D). Sorry if this has been asked before and I didn't find it (or I didn't understand it). I did look.
Any help is appreciated.
Edit: Upon request, using dput() on part of my data it looks a bit like this:
structure(list(Group = c("A", "A", "A", "A",
"A", "B", "B", "B", "B", "B", "B"
), Al = c(NA, NA, NA, 18100, 18400, 32500, 33200, 31200,
17400, 13900, 14400), As = c(NA, NA, NA, 16.9, 14.6, 8.83, 8.59,
8.42, 13.4, 13.5, 13.7), B = c(NA, NA, NA, 18, 16, 14, 14, 11,
53, 87, 58), Bi = c(NA, NA, NA, 0.13, 0.12, 0.57, 0.55, 0.52, 0.22,
0.18, 0.21), Ca = c(NA, NA, NA, 5950, 5480, 6220, 6230, 5950,
6850, 8170, 7000), Cd = c(NA, NA, NA, 0.2, 0.2, 0.2, 0.2, 0.18,
0.31, 0.36, 0.46)), row.names = c(1L, 2L, 3L, 4L, 5L, 40L, 41L,
42L, 43L, 44L, 45L), class = c("tbl_df", "tbl", "data.frame"))
wilcox.test
requires the first input (x) to be numeric. In R, factors have an integer value assigned to them “under the hood” (ie, A = 1, B = 2,…). So you can convert the group variable in your data frame df
. This should work to perform the test across all other columns:
df$Group <- as.factor(df$Group)
lapply(df[-1], function(x){
wilcox.test(x ~ df$Group)
})