I have a data encoded to one cluster each, and I am trying to calculate the empirical mean and variance of each cluster. I will use N
as batchsize (in other words, number of data), D
as a dimension of each data, and K
as a number of cluster.
x
is a input data, which has a size of (N, D)
, and enc
is a one-hot encodings of the data x
with a size of (N, D)
. Each row of enc
is a one-hot vector. For example, if the i-th row of x
is in k-th cluster, the i-th row of enc
will have 1 in k-th column and have 0 in others.
In my code, I have calculated the mean and the variance as followings:
def get_mean_var(x, enc):
n_K = (torch.sum(enc, dim=0) + 1e-10)[:,None]
mu_e = enc.t() @ x / n_K
encoded_mu = torch.matmul(enc, mu_e)
var_e = enc.t() @ ((x - encoded_mu ) ** 2) / n_K + 1e-10
return mu_e, var_e
'n_K' sums the number of elements in each cluster. 'mu_e' sums 'x' for each cluster, then divide it by corresponding 'n_K'. encoded_mu
shows the corresponding mu_e
for each x
, which then is used for calculating var_e
. (I only need diagonals of the covariance.) I used 1e-10
to prevent form zero division.
However, during using this code, I found that var_e
shows negative value often, which shouldn't be valid as far as I know.
Before this, I used the following for the var_e
, which also showed negative value.
var_e = enc.t() @ (x ** 2) / n_K - mu_e ** 2 + 1e-10
In both cases, I want to know where I did make mistake. Also, if there is a better way to code this, I will be very happy to know.
Edit:
I found that sampled variance should have n_K - 1
as a denominator and I slightly changed the code.
def get_mean_var(x, enc):
n_K = (torch.sum(enc, dim=0) + 1e-10)[:,None]
mu_e = enc.t() @ x / n_K
encoded_mu = torch.matmul(enc, mu_e)
var_e = enc.t() @ ((x - encoded_mu ) ** 2) / (n_K-1) + 1e-10
return mu_e, var_e
However it is still giving me negative value and I don't see why.
I think the problem of the code was the numerical instability due to how computers calculate float. When dealing with very small float, it could be actually interpreted as a negative value. Actually, that was already in my concern which is the reason I added 1e-10 for each term that could be near zero, but after evaluating misinterpreted values, I finally came to the conclusion that 1e-10 was not large enough to correct the error. Therefor, I changed the code as
def get_mean_var(x, enc):
n_K = (torch.sum(enc, dim=0) + 1e-6)[:,None]
mu_e = enc.t() @ x / n_K
encoded_mu = torch.matmul(enc, mu_e)
var_e = enc.t() @ ((x - encoded_mu ) ** 2) / (n_K - 1) + 1e-6
return mu_e, var_e
Now the code is working as I intended with no negative value or NaN or whatsoever. So, in conclusion, just a tiny bit larger epsilon.