I have a two large dataframes (around 19000 rows and 71 columns) as follows df1
sample1 | sample2 | sample3 | |
---|---|---|---|
gene1 | 5 | 10 | 15 |
gene2 | 2 | 8 | 10 |
gene3 | 3 | 9 | 10 |
df2
sample1 | sample2 | sample3 | |
---|---|---|---|
gene1 | 40 | 50 | 65 |
gene2 | 12 | 18 | 0 |
gene3 | 31 | 19 | 10 |
I am trying to perform wilcoxon rank sum test on the rows with the same index but the code is taking forever on google colab!! My code so far
wilc_results= c()
for( x in 1:nrow(df1)){
for (y in 1:nrow(df2)){
result= wilcox.test(as.numeric(df2[y,]), as.numeric(f1d[x,]),
alternative= 'two.sided', paired= T )
wilc_results[length(wilc_results) + 1] <- result$p.value
}
}
is there a much faster way to get the desired output?
There is no need to loop twice, since both your data frames have the same number of columns. It runs in about 10 seconds on a similarly sized dataset on my computer.
wilc_results <- list()
for(i in 1:nrow(df1)) {
result <- wilcox.test(as.numeric(df1[i,]), as.numeric(df2[i,]),
alternative='two.sided', paired=T)
wilc_results[[i]] <- result$p.value
}