[SOLVED] How to accumulate gradient across mini-batch and then back-propagation in Chainer?

How to accumulate gradient across mini-batch and then back-propagation in Chainer?

I am doing classifying video sequence, I need 2 things:

Because of limited GPU memory, I want to accumulate gradient across mini-batch, and then average gradient value, and then back propagation.
I need to know how to shuffle between mini-batch but not shuffle inside each mini-batch, because I want the video sequence keep its order.

Solution

Question 1: You can forward and backward each minibatch but not call optimizer.update(), after you have repeated forward & backward for necessary minibatches, you can call optimizer.update() to updated based on accumulated gradients.

If you want to achieve it with trainer module, I think you need to override StandardUpdater to define your own Updater class to do above.

Question 2: Are you using trainer module? If so, you can define your own iterator to achieve this. See also below for reference how to define iterator class.

https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py
http://corochann.com/training-rnn-with-simple-sequence-dataset-1304.html