I am currently implementing the LoFTR model and came across the following code:
feature_c0.shape
-> torch.Size([1, 256, 60, 60])
rearrange(feature_c0, 'n c h w -> n (h w) c').shape
-> torch.Size([1, 3600, 256])
feature_c0.view(1, -1, 256).shape
-> torch.Size([1, 3600, 256])
I thought I understood the functionality of both, tensor.view
and rearrange
. The problem: the output of these 2 is different, even if their shape is the same. I don't really understand what is going on here.
The torch.view
automatically reshape the inner dimension to fit the output dimension especially using -1
index.
For example,
x = torch.arange(24)
x = x.view(1, 2, 3, 4)
>
tensor([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]]])
x_res = x.view(1, -1, 6) # x_res.shape = [1, 4, 6]
>
tensor([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]]])
x_res = rearrange(x, 'a b c d -> a (specified_b) specified_c') # raise error!
using tensor.view()
is still possible to reshape to "last_dimension=6" with the order of tail tensor, while rearrange()
should involve specified dimension to be reshaped, divided or grouped.
In your case, the 256 * 60 * 60 is somehow grouped into [x * 256] in the order of last dimension, not [(60*60) * 256] you wanted.
As a result, rearrange
is more specified function in your case.