pythonpytorchopenmmlab

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root


I have a weird problem which only occurs since today on my github workflow. These are relevant commands.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
pip3 install mmengine==0.6.0 mmcv==2.0.0rc3 mmdet==3.0.0rc5 mmaction2==1.0rc3

The former succeeded. The latter stops with following error:

Collecting mmengine==0.6.0
  Using cached mmengine-0.6.0-py3-none-any.whl (360 kB)
Collecting mmcv==2.0.0rc3
  Using cached mmcv-2.0.0rc3.tar.gz (424 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-uml22xq3/mmcv_89a43e000b91495e88399ffe3c493514/setup.py", line 329, in <module>
          ext_modules=get_extensions(),
                      ^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-uml22xq3/mmcv_89a43e000b91495e88399ffe3c493514/setup.py", line 290, in get_extensions
          ext_ops = extension(
                    ^^^^^^^^^^
        File "/home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/envs/heavi-analytic/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
          library_dirs += library_paths(cuda=True)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/envs/heavi-analytic/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
                                 ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/envs/heavi-analytic/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
          raise EnvironmentError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Any idea?

UPDATE 1: So it turns out that pytorch version installed is 2.0.0 which is not desirable.


Solution

  • It turns out that as torch 2 was released on March 15 yesterday, the continuous build automatically gets the latest version of torch.

    This hardcoded torch version fix everything:

    pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 \
      torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
    

    It installs torch 1.13 with cuda 11.7.

    Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu117
    Collecting torch==1.13.1+cu117
      Using cached https://download.pytorch.org/whl/cu117/torch-1.13.1%2Bcu117-cp310-cp310-linux_x86_64.whl (1801.8 MB)
    Collecting torchvision==0.14.1+cu117
      Using cached https://download.pytorch.org/whl/cu117/torchvision-0.14.1%2Bcu117-cp310-cp310-linux_x86_64.whl (24.3 MB)
    Collecting torchaudio==0.13.1
      Using cached https://download.pytorch.org/whl/cu117/torchaudio-0.13.1%2Bcu117-cp310-cp310-linux_x86_64.whl (4.2 MB)
    Collecting typing-extensions
      Using cached typing_extensions-4.5.0-py3-none-any.whl (27 kB)
    Collecting pillow!=8.3.*,>=5.3.0
      Using cached Pillow-9.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.4 MB)
    Requirement already satisfied: requests in /home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/lib/python3.10/site-packages (from torchvision==0.14.1+cu117) (2.28.1)
    Collecting numpy
      Using cached numpy-1.24.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
    Requirement already satisfied: certifi>=2017.4.17 in /home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/lib/python3.10/site-packages (from requests->torchvision==0.14.1+cu117) (2022.12.7)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/lib/python3.10/site-packages (from requests->torchvision==0.14.1+cu117) (1.26.13)
    Requirement already satisfied: charset-normalizer<3,>=2 in /home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/lib/python3.10/site-packages (from requests->torchvision==0.14.1+cu117) (2.0.4)
    Requirement already satisfied: idna<4,>=2.5 in /home/github/.pyenv/versions/miniconda3-3.10-22.11.1-1/lib/python3.10/site-packages (from requests->torchvision==0.14.1+cu117) (3.4)
    Installing collected packages: typing-extensions, pillow, numpy, torch, torchvision, torchaudio
    Successfully installed numpy-1.24.2 pillow-9.4.0 torch-1.13.1+cu117 torchaudio-0.13.1+cu117 torchvision-0.14.1+cu117 typing-extensions-4.5.0
    

    EDIT 1:

    Sometimes pip3 does not succeed. Use conda instead.

    conda install pytorch==1.13.1 torchvision==0.14.1 \
      torchaudio==0.13.1 cudatoolkit=11.7 pytorch-cuda=11.7 -c pytorch -c nvidia