machine-learningdeep-learninglogistic-regression

why is e used so much in the NN?


I dont understand why we use the 'e' so often in NN may it be sigmoid function or softmax function.

In sigmoid function we are essentially compressing the values y=mx+b to be in the range 0-1 so why is it that we specifically use 'e'. If we go by intuition it makes sense to use '2' instead of 'e' i mean we are going for binary classification so that makes sense right ?

Also in softmax function we take the e^x / sum(e^x) why do we need to do that, i mean we are trying to get the probability of which class the x belongs to right so why can't we just you know do it like this x/sum(abs(x)) ?


Solution

    1. Differentiability for backpropagation is given by e.
    2. Maps to 0-1 range and gives 0-100% values.
    3. Exaggeration of differences => Focuses on biggest delta, ignores smallest.
    4. Stability by handling smallest+greatest values evenly. (think about float rounding errors, now we use only floats closer to each other)
    5. softmax does the same but for n-classes again