How do I know that wilcox_test (package rstatix) recognizes the correct column for each individual sample, when doing a paired test?
Here is an example:
install.packages("rstatix")
install.packages("datarium")
library(rstatix)
library(datarium)
data("mice2", package = "datarium")
mice2.long <- mice2 %>% gather(key = "group", value = "weight", before, after)
mice2.long %>% wilcox_test(weight ~ group, paired = T)
It seems the test works correctly, but I didn't specify the column "id" to represent the individual sample designation, and thus how did the test understand that this column identified the "pairs"?
The argument passed to paired is TRUE. the function will correlate the first value of before to the first value of after and so on . It does not need the column id. But if the data is not arranged such as the first value of before directly correlated to the first value of after, the function wilcox_text
would give incorrect results.
Here is a quick example:
set.seed(0)
dat <- data.frame(id = 1:10,matrix(rnorm(20, 15,2),,2))|>
setNames(c('id', 'before', 'after'))
dat %>%
gather(key = "group", value = "weight", before, after) %>%
rstatix::wilcox_test(weight~group, paired = TRUE)
# A tibble: 1 × 7
.y. group1 group2 n1 n2 statistic p
* <chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 weight after before 10 10 14 0.193
Now if we randomize the long data, such that the 1st value of before does not correspond with the first value of after, we should get different results
set.seed(1)
dat %>%
gather(key = "group", value = "weight", before, after) %>%
slice_sample(n = 20)%>%
rstatix::wilcox_test(weight~group, paired = TRUE)
# A tibble: 1 × 7
.y. group1 group2 n1 n2 statistic p
* <chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 weight after before 10 10 15 0.232
Try again with a different seed and you get different results.
Note that this is not the same for your case, ie randomizing mice2
does not produce different results. Why? because all the values of before
are smaller than all the values of after
. ie the maximum of the values before is smaller than the minimum of the values after:
mice2$before|>max()
[1] 235
mice2$after|>min()
[1] 337
This is very critical in computing the wilcox statistic in that regardless of the permutation, all the differences of after - before
will be positive and thus all the ranks will be grouped as positive thereby we just need to sum(1:10) = 55
. This is the test statistic.
mice2 %>%
gather(key = "group", value = "weight", before, after)%>%
rstatix::wilcox_test(weight~group, paired = TRUE)
# A tibble: 1 × 7
.y. group1 group2 n1 n2 statistic p
1 weight after before 10 10 55 0.00195