rlarge-datakruskal-wallis

Running multiple Kruskal Wallis test with lapply taking long. Easier solution?


I have a data frame 90 observations and 124306 variables named KWR all numeric data. I want to run a Kruskal Wallis analysis within every column between groups. I added a vector with every different group behind my variables named "Group". To test the accuracy, I tested one peptide (named x2461) with this code:

kruskal.test(X2461 ~ Group, data = KWR)

Which worked out fine and got me a result instantly. However, I need all the variables to be analyzed. I used lapply while reading this post: How to loop Bartlett test and Kruskal tests for multiple columns in a dataframe?

cols <- names(KWR)[1:124306]
allKWR <- lapply(cols, function(x) kruskal.test(reformulate("Group", x), data = KWR))

However, after 2 hours of R working non stop, I quit the job. Is there any more efficient way of doing this?

Thanks in advance.

NB: first time poster, beginner in R


Solution

  • Take a look at kruskaltests in the Rfast package. For the KWR data.frame, it appears it would be something like:

    allKWR <- Rfast::kruskaltests(as.matrix(KWR[,1:124306]), as.numeric(as.factor(KWR$Group)))