rstatistical-test

How to do the Whitney U test (wilcox.test) across several columns?


So I have data looking a little something like this:

Data:

Area Al Cd Cu
A 10000 0.2 30
A 15000 0.5 25
A NA Na NA
B 8000 1.1 55
B 11000 0.2 40
B 13000 0.1 40

etc.

And I want to do a Mann Whitney U test between group A and B separately for each element/column.

I have managed to do this manually for each element individually according to this:

#Data is the above dataframe

Area_A <- subset(Data, Group %in% c("A"))
Area_B <- subset(Data, Group %in% c("B"))

WhitneyU_Al <- wilcox.test(Area_A$Al, Area_B$Al, na.rm = TRUE, paired = FALSE, exact = FALSE)

(I couldn't figure out how to do it based on the rows in the column "Areas" in one data frame, which is why I divided it into two subsets).

Now, I have a lot more columns than just these three (43 to be exact), and I was wondering if there was some way to do this across all columns without changing it manually each time?

I tried a few variations of this:

WhitneyU <- wilcox.test(Area_A, Area_B, na.rm = TRUE, paired = FALSE, exact = FALSE)

#OR

WhitneyU <- wilcox.test(Area_A[2:43], Area_B[2:43], na.rm = TRUE, paired = FALSE, exact = FALSE)

But they both return the error that "'x' must be numeric".

I suspect the answer isn't this easy and that I am barking up the wrong tree? Either that, or the question/answer is too obvious and I am just not seeing it. When I tried looking up multiple tests most answers deal with how to do multiple tests if you have multiple "groups" (as in, they have area A, B, C and D). Sorry if this has been asked before and I didn't find it (or I didn't understand it). I did look.

Any help is appreciated.

Edit: Upon request, using dput() on part of my data it looks a bit like this:

structure(list(Group = c("A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B"
), Al = c(NA, NA, NA, 18100, 18400, 32500, 33200, 31200, 
17400, 13900, 14400), As = c(NA, NA, NA, 16.9, 14.6, 8.83, 8.59, 
8.42, 13.4, 13.5, 13.7), B = c(NA, NA, NA, 18, 16, 14, 14, 11, 
53, 87, 58), Bi = c(NA, NA, NA, 0.13, 0.12, 0.57, 0.55, 0.52, 0.22, 
0.18, 0.21), Ca = c(NA, NA, NA, 5950, 5480, 6220, 6230, 5950, 
6850, 8170, 7000), Cd = c(NA, NA, NA, 0.2, 0.2, 0.2, 0.2, 0.18, 
0.31, 0.36, 0.46)), row.names = c(1L, 2L, 3L, 4L, 5L, 40L, 41L, 
42L, 43L, 44L, 45L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • wilcox.test requires the first input (x) to be numeric. In R, factors have an integer value assigned to them “under the hood” (ie, A = 1, B = 2,…). So you can convert the group variable in your data frame df. This should work to perform the test across all other columns:

    df$Group <- as.factor(df$Group)
    
    lapply(df[-1], function(x){
        wilcox.test(x ~ df$Group)
    })