When I use os.environ['CUDA_VISIBLE_DEVICES'] in pytorch, I get the following message
Warning: Device on which events/metrics are configured are different than the device on which it is being profiled. One of the possible reason is setting CUDA_VISIBLE_DEVICES inside the application.
What does this actually mean? How can I avoid this by using 'CUDA_VISIBLE_DEVICES' (not torch.cuda.set_device())?
Here is the code in pytorch test.py
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
g = 1
c1 = 512
c2 = 512
input = torch.randn(64, c1, 28, 28).cuda()
model = nn.Sequential(
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
nn.Conv2d(c1,c2,1,groups=g),
nn.ReLU(),
).cuda()
out = model(input)
and the command:
nvprof --analysis-metrics -o metrics python test.py
What does this actually mean?
It means that nvprof started profiling your code on a GPU context which you made unavailable by setting CUDA_VISIBLE_DEVICES
.
How can I avoid this by using
CUDA_VISIBLE_DEVICES
(not torch.cuda.set_device())?
Probably like this:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
import torch
....
I know nothing about pytorch, but I would guess that importing the library triggers a lot of CUDA activity you don't see. If you import the library after you set CUDA_VISIBLE_DEVICES
, I suspect the whole problem will disappear.
If that doesn't work then you would have no choice but to not set CUDA_VISIBLE_DEVICES
within the python code at all, and instead do this:
CUDA_VISIBLE_DEVICES=1 nvprof --analysis-metrics -o metrics python test.py