apache-sparkneural-networkdeep-learningh2osparkling-water

H2O sparkling water - DNN mini_batch_size parameter


I'm currently running Spark 2.3.0 with sparkling-water 2.3.1. I found the documentation of the underlying H2O library by looking at the changelog that links to this. So apparently it uses H2O 3.18.

By looking at the DNN I noticed the lack of a batch_size parameter, but instead it offers a mini_batch_size parameter which is not actually documented. The only documentation regarding this parameter that I found is here, which refers to H2O 2.4, and I assumed that it still applies to the version I'm using (I don't know if this assumption is correct).

mini batch

The number of training data rows to be processed per iteration. Note that independent of this parameter, each row is used immediately to update the model with (online) stochastic gradient descent. The mini batch size controls the synchronization period between nodes in a distributed environment and the frequency at which scoring and model cancellation can happen. For example, if mini-batch is set to 10,000 on H2O running on 4 nodes, then each node will process 2,500 rows per iteration, sampling randomly from their local data. Then, model averaging between the nodes takes place, and scoring can happen (dependent on scoring interval and duty factor). Special values are 0 for one epoch per iteration and -1 for processing the maximum amount of data per iteration. If “replicate training data” is enabled, N epochs will be trained per iteration on N nodes, otherwise one epoch.

From this I interpret that the batch size is actually fixed to 1 as it always performs an Online Gradient Descent.

I also started digging into the source code of H2O in order to see what is its default value, and AFAIU the default parameters are contained in this class.

From the line 1694:

// stochastic gradient descent: mini-batch size = 1
// batch gradient descent: mini-batch size = # training rows
public int _mini_batch_size = 1;

So from the comment it seems that it doesn't actually perform Online Gradient Descent, but it seems to actually behave as the batch size. And a value of 1 is non-sense if we assume that the documentation of H2O 2.4 still applies.

Furthermore from line 2173 where it sets the user given parameters:

if (fromParms._mini_batch_size > 1) {
    Log.warn("_mini_batch_size", "Only mini-batch size = 1 is supported right now.");
    toParms._mini_batch_size = 1;

Actually I just had a quick lock at the source code and I may be missing something, but I really cannot understand how the mini_batch_size parameter works and how it relates with the batch size. Can someone shed some light on this?


Solution

  • This parameter should actually not be used by a user and there is a ticket to hide it here. For now please leave mini_batch_size as 1 (the default value) so that you don't hit any warnings or errors.