kerastensorflow2.0keras-layeractivation-function

Adaptive Activation Function in Tensorflow 2 trained variable for mulitple calls


So I want to try out an adaptive activation function for my neural network. This means I want to have a custom loss that is similar to a standard one (like tanh or relu), however I want to add some trainable parameters.

Currently, I am trying to add this trainable parameter by creating the activation function as a custom layer:

class AdaptiveActivation(keras.layers.Layer):
> """
Adaptive activation function that is changed in training process.
> """
def __init__(self, act="tanh"):
    super(AdaptiveActivation, self).__init__()
    self.a = tf.Variable(0.1, dtype=tf.float32, trainable=True)
    self.n = tf.constant(10.0, dtype=tf.float32)
    self.act = act

def call(self, x):
    if self.act == "tanh":
        return keras.activations.tanh(self.a*self.n*x)
    elif self.act == "relu":
        return keras.activations.relu(self.a*self.n*x)

However - if I understood some test outputs correctly - this means every time I call the activation function, there will be a unique parameter a. This means for every hidden layer, I get a different a. What I want, is one single a for all my activation functions. So instead of say 9 different values for a per epoch, just always one a that can change between epochs.

Furthermore, is there an easy way to obtain the a from this layer for output during training?


Solution

  • ok the solution was stupidly easy, I can just pass a trainable tensorflow variable to the layer from outside and assign it to the self.a there.

    class AdaptiveActivation(keras.layers.Layer):
        """
        Adaptive activation function that is changed in training process.
        """
        def __init__(self, a, act="tanh"):
            super(AdaptiveActivation, self).__init__()
            self.a = a
            self.n = tf.constant(5.0, dtype=tf.float32)
            self.act = act
    
        def call(self, x):
            if self.act == "tanh":
                return keras.activations.tanh(self.a*self.n*x)
            elif self.act == "relu":
                return keras.activations.relu(self.a*self.n*x)
    

    This also solves the "issue" of tracking it.

    It does feel very unnecessary though, why couldn't I just have done this without having to implement a new layer first.