[SOLVED] Clarifying batch size when using multiple GPU's

Clarifying batch size when using multiple GPU's

After observing this video enter link description here (timestap 00:25s - 1:20s only needed), I want to confirm the batch size that must be passed to the dataloader. The video states the "effective batch size" = batch size * no gpus * no nodes. I want to confirm whether I must pass the "effective batch size" or the "batch size" to the data loader.

For example, if I have 2 GPU's and 1 Node and wish to pass a batch size of 512 (for this examples sake, say its a requirement by the model to have batch size of 512. Since I work on Self-Supervised Learning, some of these models have requirement of large batch sizes).

Then should my data loader be set to,

trainloader = DataLoader(... , batch_size = 1024)

trainloader = DataLoader(... , batch_size = 512)

In the first case, will lightning automatically divide 1024 batches into 512 and pass it to each GPU?. While in the second case will lightning takes 512 as it is and pass it to each GPU ?

Lightning Trainer for this example, trainer = pl.Trainer(..., devices = 2, num_nodes = 1)

And similarly do this work the same way when considering multiple nodes, i.e 2 nodes and 2 GPU's for example ?

Should I pass the "effective batch size" or "batch size" to the DataLoader

Solution

It's the latter of your two options. batch_size in the DataLoader specifies how many samples EACH GPU will process at once. The "effective batch size" is the TOTAL number of samples processed across all GPUs in one forward pass. Lightning automatically handles distributing the data:

If you set batch_size=512 with 2 GPUs:

Each GPU processes 512 samples
effective_batch_size = 512 × 2 = 1024 total samples