machine-learningneural-networkdeep-learningkerasgradient-descent

How to calculate optimal batch size?


Sometimes I run into a problem:

OOM when allocating tensor with shape

e.g.

OOM when allocating tensor with shape (1024, 100, 160)

Where 1024 is my batch size and I don't know what's the rest. If I reduce the batch size or the number of neurons in the model, it runs fine.

Is there a generic way to calculate optimal batch size based on model and GPU memory, so the program doesn't crash?

In short: I want the largest batch size possible in terms of my model, which will fit into my GPU memory and won't crash the program.


Solution

  • You can estimate the largest batch size using:

    Max batch size= available GPU memory bytes / 4 / (size of tensors + trainable parameters)