rdataframedplyrmultiple-columnspairwise.wilcox.test

Wilcox.test error when running multiple times due to NAs (using dplyr in R)


I am using this example data.frame and code. I would like to do multiple tests between two group, but when there are no data in one group, it is causing an error. How can I skip the comparisons without two groups and still run the code on the other ones?

 library(dplyr)
 df <- data.frame(group=c(rep(0,10),rep(1,10)),
      apple = as.numeric(c(runif(20, -1, 18))),
      pear = as.numeric(c(rep("NA",12), runif(8, 2, 7))),
      banana = as.numeric(c(runif(10, 1, 3), runif(10, 2.5, 6))),
      cherry = as.numeric(c(runif(9, 5, 12), rep("NA",11))))
 df_new <- df %>% summarise(across(!group, ~wilcox.test(.x ~ group)$p.value), exact=NULL) %>%
         bind_rows(., p.adjust(., method = 'BH')) %>%
         bind_rows(df, .) %>%
         mutate(group=replace(group, is.na(group), c('p.values', 'adjusted_p.values')))
 # Error in `summarise()`:
 # ! Problem while computing `..1 = across(!group, ~wilcox.test(.x ~ group)$p.value)`.
 # Caused by error in `across()`:
 # ! Problem while computing column `pear`.
 # Caused by error in `wilcox.test.formula()`:
 # ! grouping factor must have exactly 2 levels
 # Run `rlang::last_error()` to see where the error occurred.

Solution

  • As said in the question linked in the comments, you can use try() or tryCatch() in the function in across() to silently capture the error. Basically, it says "if there's no error, return the result (the p-value in your case), and if there's an error, move to the next variable".

    df_new <- df %>%
      summarise(
        across(
          !group, 
          function(x) {
            out <- try(wilcox.test(x ~ group)$p.value, silent = TRUE)
            if (!inherits(out, "try-error")) {
              return(out)
            }
          } 
        ), 
        exact = NULL
      ) %>%
      bind_rows(., p.adjust(., method = "BH")) %>%
      bind_rows(df, .) %>%
      mutate(group = replace(group, is.na(group), c("p.values", "adjusted_p.values")))
    
    df_new
    #>                group      apple     pear       banana    cherry
    #> 1                  0  7.8559712       NA 2.085272e+00  8.606734
    #> 2                  0 10.3555136       NA 1.021946e+00  9.759360
    #> 3                  0 10.9658917       NA 2.839778e+00  7.334473
    #> 4                  0  1.5944729       NA 2.371802e+00  7.262835
    #> 5                  0  7.6975703       NA 1.805847e+00  7.527016
    #> 6                  0 14.3173630       NA 2.029271e+00 10.610466
    #> 7                  0  6.0701846       NA 1.681383e+00  6.397823
    #> 8                  0 14.9290293       NA 1.668531e+00 10.541239
    #> 9                  0 11.0102237       NA 2.202353e+00  8.274926
    #> 10                 0  6.6343644       NA 2.613700e+00        NA
    #> 11                 1 10.9747881       NA 3.660303e+00        NA
    #> 12                 1 14.5043713       NA 3.064596e+00        NA
    #> 13                 1  5.3523349 4.165088 3.441590e+00        NA
    #> 14                 1 12.9285923 5.586389 3.768061e+00        NA
    #> 15                 1  9.0848274 4.119166 3.706076e+00        NA
    #> 16                 1  1.3605938 2.721709 3.894040e+00        NA
    #> 17                 1  1.8937699 6.235140 2.826658e+00        NA
    #> 18                 1 15.5570770 3.414619 2.702846e+00        NA
    #> 19                 1  0.6901305 6.319984 3.219524e+00        NA
    #> 20                 1 15.1907810 4.879057 4.147871e+00        NA
    #> 21          p.values  0.9705125       NA 4.330035e-05        NA
    #> 22 adjusted_p.values  0.9705125       NA 8.660071e-05        NA
    

    Created on 2022-08-11 by the reprex package (v2.0.1)