Pytorch: Adding a dimension to a tensor through padding

I have given different tensors where some have only 2 dimensions and some 3 dimensions. The first 2 dimensions are always matching. I want them all to be of the same shape for further processing. Example:

tensor a: (1, 10) tensor b: (1, 10, 15)

Now my approach was to pad 'tensor a' with zeros to the shape of 'tensor b' without changing any information.

In the following snippet feature_tensor can be seen as tensor a and the padding_reference_tensor as tensor b.

For the padding i use torch.nn.functional.pad.

if feature_tensor.dim() == padding_reference_tensor.dim():
   padding_number = padding_reference_tensor.size()[2] - feature_tensor.size()[2]

elif feature_tensor.dim() < padding_reference_tensor.dim():
   padding_number = padding_reference_tensor.size()[2] - 1

feature_tensor = F.pad(feature_tensor, (0, padding_number), "constant", 0)

The case that feature_tensor.dim() > padding_reference_tensor.dim() can be ignored for now.

Now i'd like feature_tensor to be from (1,10) to (1,10,15) but instead it is (1,24).

I understand why that happened but how do I add efficiently a dimension to feature_tensor when it is necessary?

Thanks a lot!

Solution

Padding does not add dimensions to a tensor but adds elements to an existing dimension. For example: Say you have a vector shaped (3,) with values [1, 2, 3] and want to multiply it by a tensor shaped (2, 3) If you just 0-pad it with 2 elements, you will get a tensor shaped (5,) with values [1, 2, 3, 0, 0], which will be no good to operate with the (2,3) tensor.

You have two options for this:

Repeat the tensor across a new dimension. You can use torch.repeat or the more efficient torch.expand to get the tensor

[[1, 2, 3],
 [1, 2, 3]]

which you can then operate with any other (2, 3) shaped tensor.

The most efficient and common way to do this is to convert your (3,) tensor to a (1, 3) tensor. This you can do by using unsqueeze() method. This will add a new dimension of size 1. Now your (1, 3) tensor can operate with any tensor shaped (2, 3). I suggest you take a look at broadcast semantics for more info.