Calculated Empirical Variance is negative (python, pytorch)

I have a data encoded to one cluster each, and I am trying to calculate the empirical mean and variance of each cluster. I will use N as batchsize (in other words, number of data), D as a dimension of each data, and K as a number of cluster.

x is a input data, which has a size of (N, D), and enc is a one-hot encodings of the data x with a size of (N, D). Each row of enc is a one-hot vector. For example, if the i-th row of x is in k-th cluster, the i-th row of enc will have 1 in k-th column and have 0 in others.

In my code, I have calculated the mean and the variance as followings:

def get_mean_var(x, enc):
    n_K = (torch.sum(enc, dim=0) + 1e-10)[:,None] 
    mu_e = enc.t() @ x / n_K
    encoded_mu = torch.matmul(enc, mu_e)
    var_e = enc.t() @ ((x - encoded_mu ) ** 2) / n_K + 1e-10
    return mu_e, var_e

'n_K' sums the number of elements in each cluster. 'mu_e' sums 'x' for each cluster, then divide it by corresponding 'n_K'. encoded_mu shows the corresponding mu_e for each x, which then is used for calculating var_e. (I only need diagonals of the covariance.) I used 1e-10 to prevent form zero division.

However, during using this code, I found that var_e shows negative value often, which shouldn't be valid as far as I know.

Before this, I used the following for the var_e, which also showed negative value.

var_e = enc.t() @ (x ** 2) / n_K - mu_e ** 2 + 1e-10

In both cases, I want to know where I did make mistake. Also, if there is a better way to code this, I will be very happy to know.

Edit:

I found that sampled variance should have n_K - 1 as a denominator and I slightly changed the code.

def get_mean_var(x, enc):
    n_K = (torch.sum(enc, dim=0) + 1e-10)[:,None] 
    mu_e = enc.t() @ x / n_K
    encoded_mu = torch.matmul(enc, mu_e)
    var_e = enc.t() @ ((x - encoded_mu ) ** 2) / (n_K-1) + 1e-10
    return mu_e, var_e

However it is still giving me negative value and I don't see why.

Solution

I think the problem of the code was the numerical instability due to how computers calculate float. When dealing with very small float, it could be actually interpreted as a negative value. Actually, that was already in my concern which is the reason I added 1e-10 for each term that could be near zero, but after evaluating misinterpreted values, I finally came to the conclusion that 1e-10 was not large enough to correct the error. Therefor, I changed the code as

def get_mean_var(x, enc):
    n_K = (torch.sum(enc, dim=0) + 1e-6)[:,None] 
    mu_e = enc.t() @ x / n_K
    encoded_mu = torch.matmul(enc, mu_e)
    var_e = enc.t() @ ((x - encoded_mu ) ** 2) / (n_K - 1) + 1e-6
    return mu_e, var_e

Now the code is working as I intended with no negative value or NaN or whatsoever. So, in conclusion, just a tiny bit larger epsilon.