I have seen straight through estimator (STE) in many Neural Network related papers e.g. this and this. But I cannot understand the concept. I wonder if anyone could explain STE.
A straight through estimator is a way of estimating gradients for a threshold operation in a neural network. The threshold could be as simple as the following function,
As we can see, the derivative of this threshold function will 0 and during back-propagation, the network will not learn anything since it gets 0 gradients and the weights won't get updated.
The concept of a straight through estimator is that you set the incoming gradients to a threshold function equal to it's outgoing gradients, disregarding the derivative of the threshold function itself. This has been shown to perform well in the results (Figure 2) in this paper you have referenced.