I implemented my PyTorch model with DataParallel for multi-GPU training. However, it seems that the model doesn't consistently output the right dimension. In the training loop, it seems that the model gave the correct output dimension for the first two batches, but it failed to do so for the third batch and caused an error when calculating the loss:
I also tried to use the solution from this post but it didn't help.
It seems like you are left with only one sample for the last batch. Try setting drop_last=True
in your Dataloader
: This will discard the last "not-full" batch.