I'm working with IntelliJ DataLore to train a basic VGG16 CNN, but when I try to do it using a GPU machine I get the following error:
Traceback (most recent call last):
at block 20, line 1
at /data/workspace_files/train/trainer/training.py, line 115, in train(self, max_epochs)
at /data/workspace_files/train/trainer/training.py, line 46, in train_epoch(self, train_loader)
at /data/workspace_files/train/trainer/training.py, line 94, in forward_to_loss(self, step_images, step_labels)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/modules/module.py, line 1102, in _call_impl(self, *input, **kwargs)
at /data/workspace_files/models/vgg.py, line 49, in forward(self, x)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/modules/module.py, line 1102, in _call_impl(self, *input, **kwargs)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/modules/container.py, line 141, in forward(self, input)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/modules/module.py, line 1102, in _call_impl(self, *input, **kwargs)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/modules/linear.py, line 103, in forward(self, input)
at /opt/python/envs/default/lib/python3.8/site-packages/torch/nn/functional.py, line 1848, in linear(input, weight, bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Here is my code so you guys can review it.
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = model.to(device)
In this fragment of code I use self.device
because I pass the device as parameter to the class Train
for _, (data, target) in tqdm(enumerate(train_loader, 1)):
self.optimizer.zero_grad()
step_images, step_labels = data.to(
self.device), target.to(self.device)
step_output, loss = self.forward_to_loss(step_images, step_labels)
I haven't had this issue before so I don't know if there something missing on DataLore or my code is wrong.
Hope you can help me!
can you try this
step_output, loss = self.forward_to_loss(step_images.to(self.device), step_labels.to(self.device))