machine-learningpytorch

Difference between sum and mean in .backward() in calculating the loss and backpropagated through the network


I know we should convert the tensor to scalar before applying backward(), but when to sum and when to mean?

some_loss_function.sum().backward()
-OR-
some_loss_function.mean().backward()

Solution

  • After some Research I found the difference hope that will help you out:

    some_loss_function.sum().backward() calculates the sum of all the loss values across the batch and then performs backpropagation based on that sum. This means that each element of the batch contributes equally to the loss, regardless of its value. This can be useful in some scenarios, such as when you want to prioritize rare events that have a small number of occurrences in the batch.

    some_loss_function.mean().backward() calculates the mean of all the loss values across the batch and then performs backpropagation based on that mean. This means that each element of the batch contributes equally to the loss, but the contribution is weighted by its value. This can be useful in scenarios where you want to prioritize elements of the batch that have a higher loss value, or when you want to ensure that the gradients are scaled appropriately.