[SOLVED] Keras: Difference between Kernel and Activity regularizers

Keras: Difference between Kernel and Activity regularizers

I have noticed that weight_regularizer is no more available in Keras and that, in its place, there are activity and kernel regularizer. I would like to know:

What are the main differences between kernel and activity regularizers?
Could I use activity_regularizer in place of weight_regularizer?

Solution

The activity regularizer works as a function of the output of the net, and is mostly used to regularize hidden units, while weight_regularizer, as the name says, works on the weights (e.g. making them decay). Basically you can express the regularization loss as a function of the output (activity_regularizer) or of the weights (weight_regularizer).

The new kernel_regularizer replaces weight_regularizer - although it's not very clear from the documentation.

From the definition of kernel_regularizer:

kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).

And activity_regularizer:

activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).

Important Edit: Note that there is a bug in the activity_regularizer that was only fixed in version 2.1.4 of Keras (at least with Tensorflow backend). Indeed, in the older versions, the activity regularizer function is applied to the input of the layer, instead of being applied to the output (the actual activations of the layer, as intended). So beware if you are using an older version of Keras (before 2.1.4), activity regularization may probably not work as intended.

You can see the commit on GitHub

Five months ago François Chollet provided a fix to the activity regularizer, that was then included in Keras 2.1.4