pytorchcudaptx

PyTorch CUDA : the provided PTX was compiled with an unsupported toolchain


I am using Nvidia V100 with the following specs:

(pytorch) [s.1915438@cl1 aneurysm]$ srun nvidia-smi
Sun Jul 17 16:17:27 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:D8:00.0 Off |                    0 |
| N/A   31C    P0    25W / 250W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The Python, Pytorch and CUDA version is as follows:

Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.12.0+cu113'

When I run a python file, containing a machine learning model, I get the following error.

(pytorch) [s.1915438@cl1 aneurysm]$ srun python aneurysm.py
terminate called after throwing an instance of 'std::runtime_error'
  what():  the provided PTX was compiled with an unsupported toolchain.
srun: error: ccs2114: task 0: Aborted

Is it some kind of compatibility issue? Should I fallback to CUDA 10 .2 as the V100 is very old GPU?


Solution

  • Anyone using an old GPU from an HPC cluster is probably out of luck. In my case, I had Nvidia Driver 495 which is not very old. In fact, for CUDA 11.5 they recommend Nvidia Driver 470.

    This is the official reply from Nvidia for a similar problem. They also recommend updating the driver. And most of the time HPC centres won't update the driver on personal requests.