rfor-looppurrriris-dataset

Using pvalue from list of anova results as condition


I want run a series of kruskal.tests, followed by a dunn_test where the Kruskal was significant. Then print the results of the significant dunn_tests.

library(rstatix)
library(purrr)

iris<- iris

for(i in 1:4){
  a<- colnames(iris)
  anova<- map(names(iris)[1:4], ~ kruskal.test(reformulate('Species', response=.x), data=iris))
  post<- dunn_test(iris, as.formula(paste(a[i], a[5], sep="~")), p.adjust.method= "bonferroni")
  
  if(any(post$p.adj < 0.05 & anova[[i]]$p.value < 0.05)){
    print(a)
    print(setNames(post, a))
  }
}

I'm wondering how I can print only the significant dunntests when the kruskal test of the same variable was significant.

I realize this wouldn't give me the result I'm looking for as the post and anova items are two separate objects, this was just my original line of thinking. Using anova[[i]]$p.valuealso results in a subscript out of bounds error.


Solution

  • Here is a way with a function to just get the post-hoc results you want. It works similar to a loop. I see you are familiar with as.formula and I think that’s the best approach here. I’ve used the same formula for the Kruskal-Wallis test and the Dunn test. The function just uses the column names you want to test and creates the formula from that. If you want to return all post-hoc tests if any one comparison is significant, any works with a small change. The output want captures all the post-hoc tests with at least one significant pairwise comparison.

    As an aside from your code, it’s not great practice to create object names that have the same name as functions in R. anova is a base function in R, so you wouldn’t want to then have your result named anova as well. You also create the object a over and over in your loop. It’s the same task, so it should be outside your loop. It’s not a huge deal, but if you start working with larger datasets it can add up.

    library(rstatix)
    #> 
    #> Attaching package: 'rstatix'
    #> The following object is masked from 'package:stats':
    #> 
    #>     filter
    
    iris <- iris
    
    a <- colnames(iris)
    a <- a[1:4]
    
    myfunc <- function(y, x){
      form <- as.formula(paste0(y, " ~ ", x))
      res <- rstatix::kruskal_test(form, data = iris)
      
      if (res$p < 0.05){
         post <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
         
         if(any(post$p.adj < 0.05)) return(post)
      }
    }
    
    want <- lapply(a, FUN = myfunc, x = "Species")
    want <- do.call(rbind, want)
    

    If you only want the specific pairwise comparisons that were significant, because the output from dunn_test is a data.frame, you can just subset it by the adjusted p-values that are less than 0.05 and return that.

    myfunc <- function(y, x){
      form <- as.formula(paste0(y, " ~ ", x))
      res <- rstatix::kruskal_test(form, data = iris)
      
      if (res$p < 0.05){
         post <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
         post <- post[post$p.adj < 0.05 ,]
         return(post)
      }
    }
    

    As a loop, this would be:

    res <- list()
    post <- list()
    post_sig <- list()
    for(i in 1:4){
      form <- as.formula(paste0(a[i], " ~ Species"))
      res[[i]] <- rstatix::kruskal_test(form, data = iris)
      post[[i]] <- rstatix::dunn_test(data = iris, form, p.adjust.method= "bonferroni")
      
      if (res[[i]]$p < 0.05 & any(post[[i]]$p.adj < 0.05)){
           print(post[[i]])
           ## could be blanks in the list here as the number i is used
           ## e.g. if var 2 was not significant, post_sig[[2]] is blank
           post_sig[[i]] <- post[[i]]
      }
    }
    
    ## if you want results as a dataframe
    res <- do.call(rbind, res)
    post <- do.call(rbind, post)
    post_sig <- do.call(rbind, post_sig)