I want to perform a kruskal_test
for multiple variables. I have been able to do this with anova tests as follows:
vars = c("Sepal.Length" , "Sepal.Width" ,"Petal.Length" , "Petal.Width")
for (i in vars) {
f <- reformulate(paste(i,'~Species'))
test = aov(data = iris, f )
print(i)
print(summary(test))
But I can't seem to use the same method for rstatix::kruskal_test
or kruskal.test
:
for (i in vars) {
f <- reformulate(paste(i,'~Species'))
test = kruskal.test(data = iris,formula= f )
test = kruskal_test(data = iris,formula= f )
print(i)
print(summary(test))
}
I get errors: Error in kruskal.test.default(data = iris, formula = f) : argument "x" is missing, with no default
or
Error in kruskal.test.formula(formula, data = data, ...) : 'formula' missing or incorrect
In aov
you need this kind of formula,
> reformulate(paste('Petal.Width','~Species'))
~(Petal.Width ~ Species)
whereas in kruskal.test
, it's
> reformulate('Petal.Width', 'Species')
Species ~ Petal.Width
The generic function appears to have an unexpected issue in its method dispatch mechanism, indicating a possible bug in its method selection process (R 4.3.2). Even though the arguments are named, the order matters (default order is kruskal.test(formula, data)
, we have kruskal.test(data, formula)
):
> kruskal.test(formula=Petal.Width ~ Species, data=iris)
Kruskal-Wallis rank sum test
data: Petal.Width by Species
Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16
However:
> kruskal.test(data=iris, formula=Petal.Width ~ Species)
Error in kruskal.test.default(data = iris, formula = Petal.Width ~ Species) :
argument "x" is missing, with no default
It works if you use the method directly
> for (i in vars) {
+ f <- reformulate(i, 'Species')
+ test <- stats:::kruskal.test.formula(data=iris, formula=f)
+ print(test)
+ }
Kruskal-Wallis rank sum test
data: Species by Sepal.Length
Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09
Kruskal-Wallis rank sum test
data: Species by Sepal.Width
Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
...
Better use lapply
though:
> lapply(vars, \(v) stats:::kruskal.test.formula(data=iris, formula=reformulate(v, 'Species')))
[[1]]
Kruskal-Wallis rank sum test
data: Species by Sepal.Length
Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09
[[2]]
Kruskal-Wallis rank sum test
data: Species by Sepal.Width
Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
...