[SOLVED] some_operation().to(device) vs to(device).some

some_operation().to(device) vs to(device).some_operation()

Just found I randomly coded some_data.unsqueeze(0).to(device) and some_data.to(device).unsqueeze(0).

If I recall correctly, torch.Tensor.to involves data transferring, something like cudaMemcpyHostToDevice?.

Which makes me wonder, suppose device is set to GPU, does different order between .to(device) and tensor operation yield different performance, etc?

Solution

Operations are executed in the order they are called.

some_data.unsqueeze(0).to(device) first performs unsqueeze, then moves some_data to device

some_data.to(device).unsqueeze(0) first moves some_data to device, then performs unsqueeze.

For the specific case of unsqueeze, you are only updating the tensor metadata (shape/stride), so the impact is minimal. However for other operations, the impact of operation order can be significant. Generally speaking, if you want to perform operations on some_data on device, you should move some_data to device first, then perform the operations.