I am optimizing over two loss functions which take very different values. To give an example:
loss1 = 1534
loss2 = 0.723
and I want to optimize over loss1+loss2
. Would rescaling loss1 to values closer to loss2 be a good idea? I tried the naive way of just multiplying loss2 by 1000, within the overall loss term (sum), but the problem is, as loss1
goes down (say around 600, 500) , loss2
becomes too large.
My idea is to find a way to keep both loss terms in the same range, during the whole optimization process. What is the best way of doing this?
Perhaps you could use a min-max scaler to scale both losses between 0 and 1. so by doing:
loss1 = mse(predicted, target)
loss2 = passion(predicted, target)
loss1_scaled = (loss1 - loss1.min())/(loss1.max() - loss1.min())
loss2_scaled = (loss2 - loss2.min())/(loss2.max() - loss2.min())
total_loss = loss1_scaled + loss2_scaled