I was looking at the basic implementation of DDP:
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = nn.Linear(10, 10)
self.relu = nn.ReLU()
self.net2 = nn.Linear(10, 5)
def forward(self, x):
return self.net2(self.relu(self.net1(x)))
def demo_basic(rank, world_size):
print(f"Running basic DDP example on rank {rank}.")
setup(rank, world_size)
# create model and move it to GPU with id rank
model = ToyModel().to(rank)
ddp_model = DDP(model, device_ids=[rank])
loss_fn = nn.MSELoss()
optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
optimizer.zero_grad()
outputs = ddp_model(torch.randn(20, 10))
labels = torch.randn(20, 5).to(rank)
loss_fn(outputs, labels).backward()
optimizer.step()
cleanup()
def run_demo(demo_fn, world_size):
mp.spawn(demo_fn,
args=(world_size,),
nprocs=world_size,
join=True)
Just wondering how PyTorch knows which GPU to put the model on just based off of rank? Usually we specify a torch.device() object to a model. How does Pytorch interpret it when the to() function is provided an integer?
By default, if an integer i
is provided as an argument to torch.Tensor.to
, it will consider the i
-th cuda device. Here is a test:
>>> torch.rand(0).to(0).device
device(type='cuda', index=0)
>>> torch.rand(0, device=0).device
device(type='cuda', index=0)
Which means .to(0)
will be same as .to('cuda:0')
, to(torch.device('cuda'))
, or even .cuda()
, which defaults to the first device ie. cuda:0
.