tensorflowmachine-learningkerastheanolasagne

Convert Lasagne BatchNormLayer to Keras BatchNormalization layer


I want to convert a pretrained Lasagne (Theano) model to a Keras (Tensorflow) model, so all layers need to have the exact same configuration. From both documentations it is not clear to me how the parameters correspond. Let's assume a Lasagne BatchNormLayer with default settings:

class lasagne.layers.BatchNormLayer(incoming, axes='auto', epsilon=1e-4, alpha=0.1, beta=lasagne.init.Constant(0), gamma=lasagne.init.Constant(1), mean=lasagne.init.Constant(0), inv_std=lasagne.init.Constant(1), **kwargs)

And this is the Keras BatchNormalization layer API:

keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

Most of it is clear, so I'll provide the corresponding parameters for future reference here:

(Lasagne -> Keras)
incoming -> (not needed, automatic)
axes -> axis
epsilon -> epsilon
alpha -> ?
beta -> beta_initializer
gamma -> gamma_initializer
mean -> moving_mean_initializer
inv_std -> moving_variance_initializer
? -> momentum
? -> center
? -> scale
? -> beta_regularizer
? -> gamma_regularizer
? -> beta_constraint
? -> gamma_constraint

I assume Lasagne simply does not support beta_regularizer, gamma_regularizer, beta_constraint and gamma_constraint, so the default in Keras of None is correct. I also assume in Lasagne center and scale are always turned on and can not be turned off.

That leaves Lasagne alpha and Keras momentum. From the Lasagne documentation for alpha:

Coefficient for the exponential moving average of batch-wise means and standard deviations computed during training; the closer to one, the more it will depend on the last batches seen

From the Keras documentation for momentum:

Momentum for the moving mean and the moving variance.

They seem to correspond -- but by which formula?


Solution

  • From the Lasagne code we see the usage of alpha like so:

    running_mean.default_update = ((1 - self.alpha) * running_mean +
                                   self.alpha * input_mean)
    running_inv_std.default_update = ((1 - self.alpha) *
                                      running_inv_std +
                                      self.alpha * input_inv_std)
    

    and from this issue discussing Keras batch norm 'momentum' we can see:

    def assign_moving_average(variable, value, decay, zero_debias=True, name=None):
        """Compute the moving average of a variable.
        The moving average of 'variable' updated with 'value' is:
          variable * decay + value * (1 - decay)
    
        ...
    

    where, as the issue notes, the TensorFlow term 'decay' is what takes on the value of 'momentum' from Keras.

    From this, it appears that what Lasagne calls 'alpha' is equal to 1 - 'momentum', since in Keras, 'momentum' is the multiplier of the existing variable (the existing moving average), while in Lasagne this multiplier is 1 - alpha.

    Admittedly it is confusing because