I'm quite new to loops so please be patient with me :)
I have calculated alpha indices (Observed, Shannon, InvSimpson, Evenness) for which I want to perform a Kruskal-Wallis statistical test with the variable Month of my table.
My table (df) looks something like this :
Observed | Shannon | InvSimpson | Evenness | Month |
---|---|---|---|---|
688 | 4.5538 | 23.365814 | 0.696963 | February |
749 | 4.3815 | 15.162467 | 0.661992 | February |
610 | 3.8291 | 11.178981 | 0.597054 | February |
665 | 4.2011 | 16.284009 | 0.646343 | March |
839 | 5.1855 | 43.198709 | 0.770260 | March |
516 | 3.2393 | 4.765211 | 0.518611 | April |
470 | 3.9677 | 11.614851 | 0.644873 | April |
539 | 4.2995 | 15.593572 | 0.683583 | April |
... | ... | ... | ... | ... |
Before trying with a loop I performed the test, one indices at a time, like so :
obs <- df %>% kruskal_test(Observed ~ Month)
sha <- df %>% kruskal_test(Shannon ~ Month)
inv <- df %>% kruskal_test(InvSimpson ~ Month)
eve <- df %>% kruskal_test(Evenness ~ Month)
res.kruskal <- rbind(obs, sha, inv, eve)
res.kruskal
And it worked, that's the same result I want to get with the for loop :
# A tibble: 4 × 6
.y. n statistic df p method
<chr> <int> <dbl> <int> <dbl> <chr>
1 Observed 45 20.6 9 0.0144 Kruskal-Wallis
2 Shannon 45 24.0 9 0.00434 Kruskal-Wallis
3 InvSimpson 45 20.3 9 0.0159 Kruskal-Wallis
4 Evenness 45 22.0 9 0.00899 Kruskal-Wallis
However, when I try it with a for loop like so :
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- data_frame()
for (i in Indices) {
kruskal <- df %>% kruskal_test(i ~ Month)
result.kruskal <- rbind(result.kruskal, kruskal)
}
I get the following error :
Error in model.frame.default(formula = formula, data = data) :
variable length differ (found for 'Month')
From similar errors found on the forum, I don't think my problem comes from the Month variable as the error message says, I don't have NA in my table df either. Am I writing the for loop wrong?
I would be thankful for any insight you might have. :)
Sophie
Using the first rows of your dataset as example, both lapply()
and apply()
can be used to iterate over the columns. Then, with bind_rows()
the results of single tests can be combined together as a data frame:
library(tidyverse)
library(rstatix)
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- bind_rows(
lapply(df[Indices], FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>%
select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
result.kruskal <- bind_rows(
apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>% select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
df <- read.table(text = "Observed Shannon InvSimpson Evenness Month
688 4.5538 23.365814 0.696963 February
749 4.3815 15.162467 0.661992 February
610 3.8291 11.178981 0.597054 February
665 4.2011 16.284009 0.646343 March
839 5.1855 43.198709 0.770260 March
516 3.2393 4.765211 0.518611 April
470 3.9677 11.614851 0.644873 April
539 4.2995 15.593572 0.683583 April", header=T)