I need to test this AI model on the following CUDA server:
https://github.com/sicxu/Deep3DFaceRecon_pytorch
$ nvidia-smi
Tue Jun 18 18:28:37 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:41:00.0 Off | N/A |
| 0% 40C P8 13W / 170W | 39MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1078 G /usr/lib/xorg/Xorg 16MiB |
| 0 N/A N/A 1407 G /usr/bin/gnome-shell 3MiB |
+---------------------------------------------------------------------------------------+
But I'm receiving this warning while testing:
/home/arisa/.conda/envs/deep3d_pytorch/lib/python3.6/site-packages/torch/cuda/init.py:125: UserWarning: NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
I receive this error after the above warning:
RuntimeError: CUDA error: no kernel image is available for execution on the device
12.2
vs 12.3
I was able to test the same AI model on Google Colab with CUDA 12.2
without any problem. I'm not sure why the server with CUDA 12.3
is a trouble maker.
Why CUDA 12.2
works fine but 12.3
throws warnings and errors?
So, I just thought I would build PyTorch 1.6.0
- required by the AI model - with CUDA 12.3
. I just wanted to ask about the possibility of building from source.
I just want to know if it's possible to build PyTorch 1.6.0
along with CUDA 12.3
without patching the source code:
As commented by @talonmies :
This has nothing to do with “CUDA versions”, which is irrelevant. The problem is clearly described in the error message - the Colab case is using a Tesla T4 (compute capability 7.5) for which the Pytorch build you have includes binary support, whereas the other GPU is a compute capability 8.6 device and there is no binary support in the same Pytorch build. There is nothing to do except get a build of the PyTorch version you want to use with compute 8.6 binary support included, if such a thing exists
I'm going to find the first/earliest versions of PyTorch having compute capability 8.6
and then test them to see if the AI model can be run with them...