I am very new to python and TensorFlow, recent days I met a problem when I study "MNIST For ML Beginners"(https://www.tensorflow.org/get_started/mnist/beginners).
In this tutorial, we use y = tf.nn.softmax(tf.matmul(X, W) + b)
to get our outputs.
My question is, for example, X is a [100,784] matrix, and W is [784,10] matrix, b is a [10] tensor (like a [10,1] matrix?), after we called tf.matmul(X, W) we will get a [100,10] matrix. here is my question, how can a [100,10] matrix add a b[10] tensor here? It does not make any sense to me.
I know why there are biases and I know why the biases need to be added. But I just do not know how the "+" operator worked in this problem.
This is because of a concept called broadcasting which can be found in both Numpy and TensorFlow. At a high level, this is how it works:
Suppose you're working with an op that supports broadcasting (eg + or *) and has 2 input tensors, X and Y. In order to assess whether the shapes of X and Y are compatible, the op will assess the dimensions in pairs starting at the right. Dimensions are considered compatible if:
Applying these rules to the add operation (+) and your inputs of shape [100, 10] and [10]:
If the shapes are compatible and one of the dimensions of an input is 1 or missing, the op will essentially tile that input to match the shape of the other input.
In your example, the add op will effectively tile Y of shape [10] to shape [100, 10] before doing the addition.
See the Numpy documentation on broadcasting for more detailed information (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)