[SOLVED] Adaptive Activation Function in Tensorflow 2 trained variable for mulitple calls

Adaptive Activation Function in Tensorflow 2 trained variable for mulitple calls

So I want to try out an adaptive activation function for my neural network. This means I want to have a custom loss that is similar to a standard one (like tanh or relu), however I want to add some trainable parameters.

Currently, I am trying to add this trainable parameter by creating the activation function as a custom layer:

class AdaptiveActivation(keras.layers.Layer):
> """
Adaptive activation function that is changed in training process.
> """
def __init__(self, act="tanh"):
    super(AdaptiveActivation, self).__init__()
    self.a = tf.Variable(0.1, dtype=tf.float32, trainable=True)
    self.n = tf.constant(10.0, dtype=tf.float32)
    self.act = act

def call(self, x):
    if self.act == "tanh":
        return keras.activations.tanh(self.a*self.n*x)
    elif self.act == "relu":
        return keras.activations.relu(self.a*self.n*x)

However - if I understood some test outputs correctly - this means every time I call the activation function, there will be a unique parameter a. This means for every hidden layer, I get a different a. What I want, is one single a for all my activation functions. So instead of say 9 different values for a per epoch, just always one a that can change between epochs.

Furthermore, is there an easy way to obtain the a from this layer for output during training?

Solution

ok the solution was stupidly easy, I can just pass a trainable tensorflow variable to the layer from outside and assign it to the self.a there.

class AdaptiveActivation(keras.layers.Layer):
    """
    Adaptive activation function that is changed in training process.
    """
    def __init__(self, a, act="tanh"):
        super(AdaptiveActivation, self).__init__()
        self.a = a
        self.n = tf.constant(5.0, dtype=tf.float32)
        self.act = act

    def call(self, x):
        if self.act == "tanh":
            return keras.activations.tanh(self.a*self.n*x)
        elif self.act == "relu":
            return keras.activations.relu(self.a*self.n*x)

This also solves the "issue" of tracking it.

It does feel very unnecessary though, why couldn't I just have done this without having to implement a new layer first.