[SOLVED] How does multidimensional input to a nn.Linear layer work?

How does multidimensional input to a nn.Linear layer work?

When sending a mulditimensional tensor to a nn.Linear layer, how does it work in practice? Does it just process the input vector by vector, or does it actually perform matrix multiplication over the whole input at once?

If it's the former, is there a way to perform the latter operation (multiplying the input tensor by the weights as is, and not one vector at a time)? Will it be faster than doing it vector by vector?

Solution

A linear layer is just a matrix multiplication in the end and therefore only works on 1 vector at a time. It multiplies the incoming vector with the weight matrix and outputs a vector (to which then the bias is added). In pytorch, nn.Linear is specifically coded to accept N-dimensional tensors as an input (which isn't necessarily a standard feature of any linear layer elsewhere). It then applies the same weight-matrix and bias vector to the input vector-by-vector as you already suggested. You cannot perform a matrix multiplication (what a linear layer is in the end) on more than 2-dimensions (it wouldn't be a matrix anymore), so it cannot be applied to the whole input at once.

The code of PyTorch is usually heavily optimised and I don't think this is a for-loop that does this vector by vector. Instead it will take advantage of parallel processing capacities for these kind of operations present in CPUs and GPUs to apply the same weight matrix and bias vector to all of the input vectors at once.

For example: If you have an input of (8, 32, 32, 3) and want to reduce the last dimension to size 1, you could use a linear layer nn.Linear(in_features=3,out_features=1). Your input can technically be taken apart to be 83232 = 8192 3-dimensional vectors. These linear vectors will then be multiplied with the same weight matrix of the linear layer to produce 8192 1-dimensional vectors, and then recombined to be the (8, 32, 32, 1) shaped output. See also this question and the linear C++ code here