pytorchazure-virtual-machine

PyTorch doesn't detect GPU on Azure VM


It’s my first time trying to use PyTorch with a GPU and I’m struggling.

I’m using an Azure Virtual Machine (NCasT4_v3-series) which requires manual configuration as detailed here: https://learn.microsoft.com/en-us/azure/virtual-machines/windows/n-series-driver-setup. This was apparently successful.

Then I had to download and install CUDA 12.1 from here: https://developer.nvidia.com/cuda-downloads. This also seemed to work.

Finally I installed miniconda and installed an existing environment (I’m transferring an existing project to this VM). I then duly ran the code I found on https://pytorch.org/get-started/locally/ i.e. conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia - which also ran successfully.

But after all that, torch.cuda.is_available() returns False.

I’m at a loss for what to do next or any possible troubleshooting steps. Any help would be much appreciated.


Solution

  • I've managed to get this working, although I still don't have any insight into what the problem was. On logging back into the machine I found a problem with the NVIDIA device driver.

    I uninstalled everything I had installed manually, and instead added the NVIDIA drivers via the Azure Portal as detailed here.