ReLU is not derivable at 0, but in the implementation of PyTorch, it should be handled. So, the derivative at 0 is set to be 0 by default or?
I tried to set the weights and bias (the input of ReLU) to be zero while backpropagation, and the gradient of weights is 0, but not zero with the last conv layer in the residual block
Pytorch implements the derivative of ReLU
at x = 0
by outputting zero. According to this article:
Even though the ReLU activation function is non-differentiable at 0, autograd libraries such as PyTorch or TensorFlow implement its derivative with ReLU'(0) = 0.
The article also goes on to elaborate on the differences in outcomes between using ReLU'(0) = 0 and ReLU'(0) = 1, and notes that the effect is stronger if the numbers are lower precision.