pythontensorflowneural-networktraining-datamini-batch

Does TensorFlow optimizer minimize API implemented mini-batch?


Does Tensorflow minimize API for, say GradientDescentOptimizer implement mini-batch already when we feed the input tensor with a minibatch size of data?

I was reading this bolg which indicated that the minibatch is not implemented in the minimize method, we have to do compute_gradients first and then accumulate the gradients, finally do the apply_gradients to finish the minibatch training.

def train_standard(opt_type, learning_rate, image_set):
# Arrays for logging accuracy and loss
acc_log = np.zeros(len(image_set))
loss_log = np.zeros(len(image_set))
# Create optimizer
opt = opt_type(learning_rate)
#
# no-minibatch (standard, simple) operation
#
minimize = opt.minimize(loss)
# Create session to execute ops
sess = tf.InteractiveSession()
# Necessary initializations
tf.set_random_seed(1234)
tf.global_variables_initializer().run()
# Train loop
for i, batch in enumerate(image_set):
    sess.run(minimize, feed_dict={x: batch[0], y_: batch[1]})

    acc_log[i] = sess.run(accuracy, 
                          feed_dict={x: mnist.test.images, y_: mnist.test.labels})
    loss_log[i] = sess.run(loss, 
                           feed_dict={x: mnist.test.images, y_: mnist.test.labels})

return acc_log, loss_log

However, when I do the experiments, I found that two approaches generate similar results. I wonder if the minimize method will do the minibatch update if we feed_dict is a minibatch size matrix instead just one row of the training data.

Can anyone help me to clarify this question and correct me if I am wrong?

Best regards


Solution

  • It depends on your definition of learning with minibatches. One way to do that is to simply sample a minibatch, perform a weight update (i.e., compute forward and backward pass), sample another minibatch, perform another weight update, and so on. This is easily done with Optimizer.minimize() when you just feed one minibatch at a time. AFAIK, this is the most commonly used method.

    The post you're liking aims to do something else: compute the gradients on multiple minibatches (i.e. compute forward and backward pass, but don't change the weights), and then perform a single weight update using all of the the accumulated gradients. That is of course different, and more work to implement (as shown in the post).