rmatrixcovariance-matrix

Variance-covariance matrix : difference between cov(X) and t(X)X(1/n-1)


First, let's get this straight: I'm not a mathematician or statistician.

To the problem. I want to compute a variance-covariance matrix for a set of values. The values are stored in a matrix such as:

M <- matrix(c(1,2,3,4,
              2,4,6,8,
              4,8,12,16),3,4, byrow = TRUE)

The reason for calculating a variance-covariance matrix is that I want to use mvrnorm() to generate simulated multivariate data. In my real problem I use cov() to do that, which worked fine. But, when I really wanted to know how a variance-covariance matrix is calculated I stumbled on its definition, which realised in R should look something like:

covM <- (t(M) %*% M)*(1/(3-1))

where 3 is the number of obs (rows) in M.

However, (t(M) %% M)(1/(3-1)) doesn't produce the same result for me. For the real problem, it produces almost similar as cov(M), but not exactly the same.

What am I doing or understanding wrong?


Solution

  • You have to center the columns of the matrix, that is to say, to subtract the column mean of each column. A possibility is:

    X = t(t(M) - colMeans(M))
    (t(X) %*% X)*(1/(3-1))
    

    which gives the same as cov(M).

    Cleaner:

    X <- sweep(M, 2, colMeans(M), FUN = `-`)