[SOLVED] KS-test between each columns of a matrix in R

KS-test between each columns of a matrix in R

I was wondering if there is a package or a new way that can perform KS-test between each columns of a matrix more quickly than use 2 loops? Thanks!

all <- c()
for (i in (1:(ncol(a)-1))){
  for (j in ((i+1):ncol(a))){
    res <- ks.test(i,j)
    all <- rbind(all,res)
  }
}

Solution

Suppose we have a 5 column matrix, each containing 10 draws from a different normal distribution:

set.seed(1)

a <- sapply(seq(1, 3, 0.5), function(x) rnorm(10, x))

a
#>            [,1]       [,2]      [,3]     [,4]     [,5]
#>  [1,] 0.3735462  3.0117812 2.9189774 3.858680 2.835476
#>  [2,] 1.1836433  1.8898432 2.7821363 2.397212 2.746638
#>  [3,] 0.1643714  0.8787594 2.0745650 2.887672 3.696963
#>  [4,] 2.5952808 -0.7146999 0.0106483 2.446195 3.556663
#>  [5,] 1.3295078  2.6249309 2.6198257 1.122940 2.311244
#>  [6,] 0.1795316  1.4550664 1.9438713 2.085005 2.292505
#>  [7,] 1.4874291  1.4838097 1.8442045 2.105710 3.364582
#>  [8,] 1.7383247  2.4438362 0.5292476 2.440687 3.768533
#>  [9,] 1.5757814  2.3212212 1.5218499 3.600025 2.887654
#> [10,] 0.6946116  2.0939013 2.4179416 3.263176 3.881108

We can get all unique combinations of 2 columns using combn and apply ks.test to these pairs of columns, retreiving all the p values in a single vector like this:

p <- apply(combn(ncol(a), 2), 2, function(x) ks.test(a[,x[1]], a[, x[2]])$p.val)

p
#>  [1] 0.1678213427 0.0524475524 0.0020567668 0.0002165018 0.9944575548
#>  [6] 0.4175236528 0.0123406006 0.1678213427 0.0524475524 0.4175236528

To make it clearer which comparison these p values refer to, we can store the results in a 5 x 5 matrix that will be the same as the result of your loop. Note though, that we have only had to run ks.test 10 times instead of 25 because the diagonal is already known to be p = 1, and the matrix is symmetrical:

m <- matrix(0, ncol(a), ncol(a))

m[lower.tri(m)] <- round(p, 3)
m <- t(m)
m[lower.tri(m)] <- round(p, 3)
diag(m) <- 1

m
#>       [,1]  [,2]  [,3]  [,4]  [,5]
#> [1,] 1.000 0.168 0.052 0.002 0.000
#> [2,] 0.168 1.000 0.994 0.418 0.012
#> [3,] 0.052 0.994 1.000 0.168 0.052
#> [4,] 0.002 0.418 0.168 1.000 0.418
#> [5,] 0.000 0.012 0.052 0.418 1.000

Alternatively, you could give p the names of the columns compared:

names(p) <- apply(combn(ncol(a), 2), 2, paste, collapse = "-")

p
#>          1-2          1-3          1-4          1-5          2-3 
#> 0.1678213427 0.0524475524 0.0020567668 0.0002165018 0.9944575548 
#>          2-4          2-5          3-4          3-5          4-5 
#> 0.4175236528 0.0123406006 0.1678213427 0.0524475524 0.4175236528

^{Created on 2022-09-10 with reprex v2.0.2}