pythonmachine-learningpytorchneural-networkactivation-function

How to initialize parameters of an activation function?


I'm studying the basics of neural networks with pytorch, and I'm having a hard time understanding how the activation function should work.

I don't understand what shape should the trainable parameters of my activation function have. Should they have the same shape of the input dataset? Or should they have the shape of a single element in the dataset?

From what I understood the activation function should take as input the whole dataset, but I'm unsure of how to initialize the parameters.

class customModel(nn.Module):
    def __init__(self, units):
        super(customModel, self).__init__()
        self.p1 = nn.parameter.Parameter(torch.ones(units))
        self.p2 = nn.parameter.Parameter(torch.ones(units))
        self.b1 = nn.parameter.Parameter(torch.zeros(units))
        self.b2 = nn.parameter.Parameter(torch.zeros(units))
    def forward(self, inputs):
        out = myCustomActivationFunction(inputs, self.p1, self.p2, self.b1, self.b2)
        return out

Solution

  • In short

    The activation was used to create a "Non-Linearity" between each layer which is always Linear(without activation function) and we usually choose the activation function based on our task Such as we use ReLu Between the neural network layers to create a "Non-Linearity" between each layer and we use sigmoid in the output layer to normalize value between 0-1 for the binary classification task using 0.5 as a threshold for classify between two classes

    Long

    To Fully Grasp how activation function was used in Neural Network First of all, we need to create a clear understanding In Neural Networks between

    To Understand the Neural Network I recommend you to understand the Linear Regression model first! since it will be easier to understand about weight

    y = mx+b is a Linear Function that can be leveraged to create a simple model that can predict data with Linear correlation (we call this model Linear Regression)

    with "x" as the input "y" as the output and "m,b" as the features

    this "m" and "b" is a trainable parameters while x is an input features

    More Explanation About Linear Regression

    **It's a bit hard to explain since I can't attach the image at my reputation Level So I'll attach the video link instead

    Assume you are already familiar with Linear Regression

    The Neural Networks are like a chain of Linear Models connected which will be called Neuron, Stacking as a Layer

    Layers1 Example (First Layer Generally Called Input layer)

    because it's the layer that we'll put ours features in

    [x1]

    [x2]

    [x3]

    each neuron in the layer will have a "LINE" that connected to every neuron in the next layer

    each "LINE" contain its own w (weight) which is a trainable parameter

    the same as "m" and "b" we can train in y = mx+b

    when computing, the input was put in each X of the input layer

    and then they will times the weight on the LINE connecting to each neuron of the next layer and sum them up at the destination

    the formula is

    Yi= sum(Xi*Wi)

    To Simplify In the image below

    You can think of it as computing 2 Linear Regression models Separately

    then sum the output up -> use it as the input for the next Linear Regression model which predicts the gender

    IT IS AT THIS PART where we'll really need the activation function Assume that the previous layers provide the information on how to predict the BIOLOGICAL Gender, Given Height & Weight

    The possibility of the output layer which is just

    y = mx+b formular

    is -infinity & +infinity

    how would you classify this output into two classes?

    The Answer is by using an activation function such as Sigmoid which normalizes any range of value into between 0-1 range thinking of this as a percentage we can now use a 0.5 cutoff threshold to classify below 0.5 into class "0" and over 0.5 into class "1"

    Simple Neural Network Image

    Summary

    As you can see, We don't train the activation function it's the trainable parameters that will be trained! The activation was used to create a "Non-Linearity" between each layer we usually choose the activation function based on our task Such as Sigmoid For Classification

    To Implement Custom Activation, Just Create a Function that receives 1 input and then returns something

    def custom_act(x):
      return -x
    

    in case you need it to be trainable (which usually doesn't need to)

    Referring to this Question, Already Have Good Explanation Pytorch custom activation functions?

    Additional Information

    What Is Neural Network

    Using Sigmoid for Logistic Regression

    Activation Function Explain For Binary Classification and Keras Implementation