Just found I randomly coded some_data.unsqueeze(0).to(device)
and some_data.to(device).unsqueeze(0)
.
If I recall correctly, torch.Tensor.to
involves data transferring, something like cudaMemcpyHostToDevice
?.
Which makes me wonder, suppose device is set to GPU, does different order between .to(device)
and tensor operation yield different performance, etc?
Operations are executed in the order they are called.
some_data.unsqueeze(0).to(device)
first performs unsqueeze
, then moves some_data
to device
some_data.to(device).unsqueeze(0)
first moves some_data
to device
, then performs unsqueeze
.
For the specific case of unsqueeze
, you are only updating the tensor metadata (shape/stride), so the impact is minimal. However for other operations, the impact of operation order can be significant. Generally speaking, if you want to perform operations on some_data
on device
, you should move some_data
to device
first, then perform the operations.