machine-learningpytorchloss-function

Why KL divergence is negative in Pytorch?


I'm trying to get the KL divergence between 2 distributions using Pytorch, but the output is often negative which shouldn't be the case:

import torch 
import torch.nn.functional as F

x_axis_kl_div_values = []
for epoch in range(200):
    # each epoch generates 2 different distributions
    input_1 = torch.empty(10).normal_(mean=torch.randint(1,50,(1,)).item(),std=0.5).unsqueeze(0)
    input_2 = torch.empty(10).normal_(mean=torch.randint(1,50,(1,)).item(),std=0.5).unsqueeze(0)

    kl_divergence = F.kl_div(input_1.log(), input_2, reduction='batchmean')
    x_axis_kl_div_values.append(kl_divergence.item())

x_axis_kl_div_values 
>>> 
[324.4713134765625,
 -69.10758972167969,
 -92.42606353759766,

From the Pytorch forum I found this that mentions that their issue was that the inputs were not proper distributions, which is not the case in my code as I'm creating a normal distribution. From this SO thread it seems like their issue was that the nn.KLDivLoss expects the input to be log-probabiltie, but again, I did that in my code. So I'm not sure what I'm missing


Solution

  • normal_ fills with values drawn from a normal distribution, but that doesn't mean that the resulting tensor represents a normal or even a valid probability distribution.

    E.g. [0.2, 0.2, 0.2, 0.2, 0.2] is an valid uniform distribution. If you had used torch.empty(10).uniform_(), you would not get a tensor that represents a uniform distribution.

    For computing the KL divergence, each value in the tensor should represent the probability of that index occurring, not merely a sample from the said distribution (as in your example).

    In your code, you could make the probabilities sum to 100%:

    F.kl_div( (input_1/input_1.sum() ).log(), input_2 / input_2.sum(), reduction='batchmean')

    which would give a positive result.