I'm using batch normalization with batch size 10 for face detection.
Does batch normalization works with such small batch sizes? If not, then what else can i use for normalization?
Yes, it works for the smaller size, it will work even with the smallest possible size you set.
The trick is the bach size also adds to the regularization effect, not only the batch norm. I will show you few pics:
We are on the same scale tracking the bach loss. The left-hand side is a module without the batch norm layer (black), the right-hand side is with the batch norm layer.
Note how the regularization effect is evident even for the bs=10
.
When we set the bs=64
the batch loss regularization is super evident. Note the y
scale is always [0, 4]
.
My examination was purely on nn.BatchNorm1d(10, affine=False)
without learnable parameters gamma
and beta
i.e. w
and b
.
This is why when you have low batch size, it has sense to use the BatchNorm layer.