rfor-loopkruskal-wallis

For loop in R: error in model.frame.default()


I'm quite new to loops so please be patient with me :)

I have calculated alpha indices (Observed, Shannon, InvSimpson, Evenness) for which I want to perform a Kruskal-Wallis statistical test with the variable Month of my table.

My table (df) looks something like this :

Observed Shannon InvSimpson Evenness Month
688 4.5538 23.365814 0.696963 February
749 4.3815 15.162467 0.661992 February
610 3.8291 11.178981 0.597054 February
665 4.2011 16.284009 0.646343 March
839 5.1855 43.198709 0.770260 March
516 3.2393 4.765211 0.518611 April
470 3.9677 11.614851 0.644873 April
539 4.2995 15.593572 0.683583 April
... ... ... ... ...

Before trying with a loop I performed the test, one indices at a time, like so :

obs <- df %>% kruskal_test(Observed ~ Month)
sha <- df %>% kruskal_test(Shannon ~ Month)
inv <- df %>% kruskal_test(InvSimpson ~ Month)
eve <- df %>% kruskal_test(Evenness ~ Month)
res.kruskal <- rbind(obs, sha, inv, eve)
res.kruskal

And it worked, that's the same result I want to get with the for loop :

# A tibble: 4 × 6
  .y.            n statistic    df       p method        
  <chr>      <int>     <dbl> <int>   <dbl> <chr>         
1 Observed      45      20.6     9 0.0144  Kruskal-Wallis
2 Shannon       45      24.0     9 0.00434 Kruskal-Wallis
3 InvSimpson    45      20.3     9 0.0159  Kruskal-Wallis
4 Evenness      45      22.0     9 0.00899 Kruskal-Wallis

However, when I try it with a for loop like so :

Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- data_frame()

for (i in Indices) {
  kruskal <- df %>% kruskal_test(i ~ Month)
  result.kruskal <- rbind(result.kruskal, kruskal)
}

I get the following error :

Error in model.frame.default(formula = formula, data = data) : 
  variable length differ (found for 'Month')

From similar errors found on the forum, I don't think my problem comes from the Month variable as the error message says, I don't have NA in my table df either. Am I writing the for loop wrong?

I would be thankful for any insight you might have. :)

Sophie


Solution

  • Using the first rows of your dataset as example, both lapply() and apply() can be used to iterate over the columns. Then, with bind_rows() the results of single tests can be combined together as a data frame:

    library(tidyverse)
    library(rstatix)
    Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
    

    using lapply

    result.kruskal <- bind_rows(
                   lapply(df[Indices], FUN = function(x)   kruskal_test(df, x ~ Month))
                   , .id = "variable") %>% 
                   select(-2) %>% as.data.frame()
    
    result.kruskal
    
     variable        n statistic    df   p    method        
    1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
    2 Shannon        8      2        2 0.368  Kruskal-Wallis
    3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
    4 Evenness       8      1.44     2 0.486  Kruskal-Wallis
    

    or with apply

    result.kruskal <- bind_rows(
      apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
    , .id = "variable") %>% select(-2) %>% as.data.frame()
    
    result.kruskal
    
     variable        n statistic    df   p    method        
    1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
    2 Shannon        8      2        2 0.368  Kruskal-Wallis
    3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
    4 Evenness       8      1.44     2 0.486  Kruskal-Wallis
    

    Example data

    df <- read.table(text = "Observed   Shannon InvSimpson  Evenness    Month
    688 4.5538  23.365814   0.696963    February
    749 4.3815  15.162467   0.661992    February
    610 3.8291  11.178981   0.597054    February
    665 4.2011  16.284009   0.646343    March
    839 5.1855  43.198709   0.770260    March
    516 3.2393  4.765211    0.518611    April
    470 3.9677  11.614851   0.644873    April
    539 4.2995  15.593572   0.683583    April", header=T)