deep-learningcomputer-sciencecross-validationfast-aimini-batch

What is 'mini-batch' in deep learning?


I'm taking the fast-ai course, and in "Lesson 2 - SGD" it says:

Mini-batch: a random bunch of points that you use to update your weights

And it also says that gradient descent uses mini-batches.

What is a mini-batch? What's the difference between a mini-batch and a regular batch?


Solution

  • Both are approaches to gradient descent. But in a batch gradient descent you process the entire training set in one iteration. Whereas, in a mini-batch gradient descent you process a small subset of the training set in each iteration.

    Also compare stochastic gradient descent, where you process a single example from the training set in each iteration.

    Another way to look at it: they are all examples of the same approach to gradient descent with a batch size of m and a training set of size n. For stochastic gradient descent, m=1. For batch gradient descent, m = n. For mini-batch, m=b and b < n, typically b is small compared to n.

    Mini-batch adds the question of determining the right size for b, but finding the right b may greatly improve your results.