I don't understand if the MirroredStrategy
has any impact on training outcome.
By that, I mean: Is the model trained on a single device the same as a model trained on multiple devices?
I think it should be the same model, because it's just a distributed calculation of the gradients, isn't it?
Yes, the model trained on a single GPU and multiple GPUS (on a single machine) is the same. That is, the variables in the model are replicated and in sync on all GPU's, as per the documentation.