rfor-loopkruskal-wallis

Perform multiple Kruskal tests in for loop


I want to perform a kruskal_test for multiple variables. I have been able to do this with anova tests as follows:

vars = c("Sepal.Length" , "Sepal.Width" ,"Petal.Length" ,  "Petal.Width")

for (i in vars) {
  
  f <- reformulate(paste(i,'~Species'))
  test = aov(data = iris, f )
  
  print(i)
  print(summary(test))

But I can't seem to use the same method for rstatix::kruskal_test or kruskal.test:

for (i in vars) {
  
  f <- reformulate(paste(i,'~Species'))
  test = kruskal.test(data = iris,formula= f )
  test = kruskal_test(data = iris,formula= f )
  print(i)
  print(summary(test))
  
}

I get errors: Error in kruskal.test.default(data = iris, formula = f) : argument "x" is missing, with no default

or

Error in kruskal.test.formula(formula, data = data, ...) : 'formula' missing or incorrect


Solution

  • In aov you need this kind of formula,

    > reformulate(paste('Petal.Width','~Species'))
    ~(Petal.Width ~ Species)
    

    whereas in kruskal.test, it's

    > reformulate('Petal.Width', 'Species')
    Species ~ Petal.Width
    

    The generic function appears to have an unexpected issue in its method dispatch mechanism, indicating a possible bug in its method selection process (R 4.3.2). Even though the arguments are named, the order matters (default order is kruskal.test(formula, data), we have kruskal.test(data, formula)):

    > kruskal.test(formula=Petal.Width ~ Species, data=iris)
    
        Kruskal-Wallis rank sum test
    
    data:  Petal.Width by Species
    Kruskal-Wallis chi-squared = 131.19, df = 2, p-value < 2.2e-16
    

    However:

    > kruskal.test(data=iris, formula=Petal.Width ~ Species)
    Error in kruskal.test.default(data = iris, formula = Petal.Width ~ Species) : 
      argument "x" is missing, with no default
    

    It works if you use the method directly

    > for (i in vars) {
    +   f <- reformulate(i, 'Species')
    +   test <- stats:::kruskal.test.formula(data=iris, formula=f)
    +   print(test)
    + }
    
        Kruskal-Wallis rank sum test
    
    data:  Species by Sepal.Length
    Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09
    
    
        Kruskal-Wallis rank sum test
    
    data:  Species by Sepal.Width
    Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
    ...
    

    Better use lapply though:

    > lapply(vars, \(v) stats:::kruskal.test.formula(data=iris, formula=reformulate(v, 'Species')))
    [[1]]
    
        Kruskal-Wallis rank sum test
    
    data:  Species by Sepal.Length
    Kruskal-Wallis chi-squared = 107.29, df = 34, p-value = 1.596e-09
    
    
    [[2]]
    
        Kruskal-Wallis rank sum test
    
    data:  Species by Sepal.Width
    Kruskal-Wallis chi-squared = 47.879, df = 22, p-value = 0.001125
    ...