I'm trying to get the KL divergence between 2 distributions using Pytorch, but the output is often negative which shouldn't be the case:
import torch
import torch.nn.functional as F
x_axis_kl_div_values = []
for epoch in range(200):
# each epoch generates 2 different distributions
input_1 = torch.empty(10).normal_(mean=torch.randint(1,50,(1,)).item(),std=0.5).unsqueeze(0)
input_2 = torch.empty(10).normal_(mean=torch.randint(1,50,(1,)).item(),std=0.5).unsqueeze(0)
kl_divergence = F.kl_div(input_1.log(), input_2, reduction='batchmean')
x_axis_kl_div_values.append(kl_divergence.item())
x_axis_kl_div_values
>>>
[324.4713134765625,
-69.10758972167969,
-92.42606353759766,
From the Pytorch forum I found this that mentions that their issue was that the inputs were not proper distributions, which is not the case in my code as I'm creating a normal distribution. From this SO thread it seems like their issue was that the nn.KLDivLoss
expects the input to be log-probabiltie, but again, I did that in my code. So I'm not sure what I'm missing
normal_
fills with values drawn from a normal distribution, but that doesn't mean that the resulting tensor represents a normal or even a valid probability distribution.
E.g. [0.2, 0.2, 0.2, 0.2, 0.2]
is an valid uniform distribution. If you had used torch.empty(10).uniform_()
, you would not get a tensor that represents a uniform distribution.
For computing the KL divergence, each value in the tensor should represent the probability of that index occurring, not merely a sample from the said distribution (as in your example).
In your code, you could make the probabilities sum to 100%:
F.kl_div( (input_1/input_1.sum() ).log(), input_2 / input_2.sum(), reduction='batchmean')
which would give a positive result.