I was wondering if there is a package or a new way that can perform KS-test between each columns of a matrix more quickly than use 2 loops? Thanks!
all <- c()
for (i in (1:(ncol(a)-1))){
for (j in ((i+1):ncol(a))){
res <- ks.test(i,j)
all <- rbind(all,res)
}
}
Suppose we have a 5 column matrix, each containing 10 draws from a different normal distribution:
set.seed(1)
a <- sapply(seq(1, 3, 0.5), function(x) rnorm(10, x))
a
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.3735462 3.0117812 2.9189774 3.858680 2.835476
#> [2,] 1.1836433 1.8898432 2.7821363 2.397212 2.746638
#> [3,] 0.1643714 0.8787594 2.0745650 2.887672 3.696963
#> [4,] 2.5952808 -0.7146999 0.0106483 2.446195 3.556663
#> [5,] 1.3295078 2.6249309 2.6198257 1.122940 2.311244
#> [6,] 0.1795316 1.4550664 1.9438713 2.085005 2.292505
#> [7,] 1.4874291 1.4838097 1.8442045 2.105710 3.364582
#> [8,] 1.7383247 2.4438362 0.5292476 2.440687 3.768533
#> [9,] 1.5757814 2.3212212 1.5218499 3.600025 2.887654
#> [10,] 0.6946116 2.0939013 2.4179416 3.263176 3.881108
We can get all unique combinations of 2 columns using combn
and apply ks.test
to these pairs of columns, retreiving all the p values in a single vector like this:
p <- apply(combn(ncol(a), 2), 2, function(x) ks.test(a[,x[1]], a[, x[2]])$p.val)
p
#> [1] 0.1678213427 0.0524475524 0.0020567668 0.0002165018 0.9944575548
#> [6] 0.4175236528 0.0123406006 0.1678213427 0.0524475524 0.4175236528
To make it clearer which comparison these p values refer to, we can store the results in a 5 x 5 matrix that will be the same as the result of your loop. Note though, that we have only had to run ks.test
10 times instead of 25 because the diagonal is already known to be p = 1, and the matrix is symmetrical:
m <- matrix(0, ncol(a), ncol(a))
m[lower.tri(m)] <- round(p, 3)
m <- t(m)
m[lower.tri(m)] <- round(p, 3)
diag(m) <- 1
m
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.000 0.168 0.052 0.002 0.000
#> [2,] 0.168 1.000 0.994 0.418 0.012
#> [3,] 0.052 0.994 1.000 0.168 0.052
#> [4,] 0.002 0.418 0.168 1.000 0.418
#> [5,] 0.000 0.012 0.052 0.418 1.000
Alternatively, you could give p
the names of the columns compared:
names(p) <- apply(combn(ncol(a), 2), 2, paste, collapse = "-")
p
#> 1-2 1-3 1-4 1-5 2-3
#> 0.1678213427 0.0524475524 0.0020567668 0.0002165018 0.9944575548
#> 2-4 2-5 3-4 3-5 4-5
#> 0.4175236528 0.0123406006 0.1678213427 0.0524475524 0.4175236528
Created on 2022-09-10 with reprex v2.0.2