First, let's get this straight: I'm not a mathematician or statistician.
To the problem. I want to compute a variance-covariance matrix for a set of values. The values are stored in a matrix such as:
M <- matrix(c(1,2,3,4,
2,4,6,8,
4,8,12,16),3,4, byrow = TRUE)
The reason for calculating a variance-covariance matrix is that I want to use mvrnorm() to generate simulated multivariate data. In my real problem I use cov() to do that, which worked fine. But, when I really wanted to know how a variance-covariance matrix is calculated I stumbled on its definition, which realised in R should look something like:
covM <- (t(M) %*% M)*(1/(3-1))
where 3 is the number of obs (rows) in M.
However, (t(M) %% M)(1/(3-1)) doesn't produce the same result for me. For the real problem, it produces almost similar as cov(M), but not exactly the same.
What am I doing or understanding wrong?
You have to center the columns of the matrix, that is to say, to subtract the column mean of each column. A possibility is:
X = t(t(M) - colMeans(M))
(t(X) %*% X)*(1/(3-1))
which gives the same as cov(M)
.
Cleaner:
X <- sweep(M, 2, colMeans(M), FUN = `-`)