I am using pairwise_wilcox_test from rstatix package on my data.frame.
data1 #shortend the data.frame
Firmicutes Proteobacteria Verrucomicrobiota cls
1 9916885 83115.37 0.0000 1
10 9923240 76759.73 0.0000 1
13 9897778 102222.14 0.0000 1
16 9887923 112077.44 0.0000 1
19 9832122 167423.55 454.1326 1
11 9717375 235007.98 47616.9546 2
14 9820485 150719.87 28794.7347 2
17 9805007 54276.39 140716.5721 2
2 9676859 320811.45 2329.3241 2
20 9636967 363032.82 0.0000 2
12 9581184 400989.93 17825.6204 3
15 9908333 87339.68 4327.6418 3
18 9624107 147003.76 228889.5762 3
21 9899086 67276.26 33638.1295 3
24 9827215 165133.37 7651.6540 3
When I apply it on a specific column, it works fine
WIL <- rstatix::pairwise_wilcox_test(Firmicutes ~ cls, data=data1,exact = TRUE, p.adjust.method="bonferron")
Output:
# A tibble: 3 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Firmicutes 1 2 12 12 86 0.443 1 ns
2 Firmicutes 1 3 12 12 71 0.977 1 ns
3 Firmicutes 2 3 12 12 43 0.101 0.303 ns
Now I want to use apply() to parse the entire table as follows (the table is originally longer), but I have a problem with the apply() function
WIL <- apply(as.matrix(data1),2, function(x){rstatix::pairwise_wilcox_test(x ~ cls, data=data1,exact = TRUE, p.adjust.method="bonferron")})
Output:
ℹ In index: 1.
ℹ With name: V1.
Caused by error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `x` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
Called from: signal_abort(cnd, .file)
I understand that the column "x" is not present, but I thought that x is defined by fucntion(x).
Can somebody give me a hint what I m doing wrong.
I am fairly new to R and stackoverflow, so maybe there is an obvious solution for this I apologise in advance...
Thank you!
You can't use apply
here, because the x
is the actual vector of values from your data frame, not the name of the column that you wish to test. In any case, the variable x
inside the formula x ~ cls
does not get substituted (this is always the case with formulas in R), so the the function is literally looking for a column called x
that doesn't exist.
Instead, you can use the column names
of interest, and turn each into a correct formula inside lapply
. You can then simply bind the results together into a single data frame:
do.call('rbind',
lapply(names(data1)[1:3], function(x) {
f <- as.formula(paste(x, '~ cls'))
rstatix::pairwise_wilcox_test(data = data1, formula = f,
exact = TRUE, p.adjust.method = "bonferroni")
}))
#> # A tibble: 9 x 9
#> .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
#> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Firmicutes 1 2 5 5 25 0.008 0.024 *
#> 2 Firmicutes 1 3 5 5 19 0.222 0.666 ns
#> 3 Firmicutes 2 3 5 5 10 0.69 1 ns
#> 4 Proteobacteria 1 2 5 5 6 0.222 0.666 ns
#> 5 Proteobacteria 1 3 5 5 10 0.69 1 ns
#> 6 Proteobacteria 2 3 5 5 15 0.69 1 ns
#> 7 Verrucomicrobiota 1 2 5 5 3 0.045 0.135 ns
#> 8 Verrucomicrobiota 1 3 5 5 0 0.01 0.029 *
#> 9 Verrucomicrobiota 2 3 5 5 11 0.841 1 ns
Created on 2023-09-11 with reprex v2.0.2
Data from question in reproducible format
data1 <- structure(list(Firmicutes = c(9916885L, 9923240L, 9897778L, 9887923L,
9832122L, 9717375L, 9820485L, 9805007L, 9676859L, 9636967L, 9581184L,
9908333L, 9624107L, 9899086L, 9827215L), Proteobacteria = c(83115.37,
76759.73, 102222.14, 112077.44, 167423.55, 235007.98, 150719.87,
54276.39, 320811.45, 363032.82, 400989.93, 87339.68, 147003.76,
67276.26, 165133.37), Verrucomicrobiota = c(0, 0, 0, 0, 454.1326,
47616.9546, 28794.7347, 140716.5721, 2329.3241, 0, 17825.6204,
4327.6418, 228889.5762, 33638.1295, 7651.654), cls = c(1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)),
class = "data.frame", row.names = c("1",
"10", "13", "16", "19", "11", "14", "17", "2", "20", "12", "15",
"18", "21", "24"))