My generator always yields two images from my dataset randomly and then I calculate the loss using this two samples. Say I set steps_per_epoch=40
and epochs=5
, what's the difference if I set steps_per_epoch=5
and epochs=40
(I use Adam for my optimizer)?
The epochs
argument (also called iteration) refers to the number of full passes over the whole training data. The steps_per_epoch
argument refers to the number of batches generated during one epoch. Therefore we have steps_per_epoch = n_samples / batch_size
.
For example, if we have 1000 training samples and we set batch-size to 10 then we have steps_per_epoch = 1000 / 10 = 100
. The epochs
can be set regardless of the value of batch-size or steps_per_epoch
.
There is no definite value of batch-size that works for all the scenarios. Usually, a very large batch-size slows down the training process (i.e. it takes more time for the model to converge to a solution) and a very small batch-size may not be a good use of available resources (i.e. GPU and CPU). The usual values include 32, 64, 128, 256, 512 (powers of 2 helps with faster GPU memory allocations). Also, here is an answer on SO that concerns this issue which includes citations of relevant books and papers. Or take a look at this question and its answers on Cross Validated for a more complete definition of batch-size.