If I'm using a optimizer that uses momentum (e.g. AdamOptimizer
) and I have a graph which splits at the end leading to two values I'm trying to simultaneously minimize, I can use compute_gradients
twice attempting to minimize each value. This produces two separate sets of gradients. If I simply combine the two lists into one long list and use apply_gradients
on this entire list, what happens in terms of the momentum? The same variable may be updated twice with two opposing values. Do the TensorFlow optimizers take this into account and place the momentum in the appropriate middle ground? Or does the optimizer take the two separate gradients as two separate gradient update calls effecting the momentum (possibly leading to problems, as one may then be favored since it was always called last)? And if that's the case, how should I look into combining the gradients manually before applying them?
You can use a "Joint loss" to train the network.
Suppose you have two tensors: loss1 and loss2, thus you can just add them up and run optimizer on the combined loss, like Adam(loss1 + loss2).