pythonlinuxgpudeviceid

How to find the nvidia GPU IDs for pytorch cuda run setup?


One major issue most young data scientists, enthusiasts ask me is how to find the GPU IDs to map in the Pytorch code?

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

This can be easily found with this piece of code down below.


Solution

  • import torch
    import sys
    print('__Python VERSION:', sys.version)
    print('__pyTorch VERSION:', torch.__version__)
    print('__CUDA VERSION')
    from subprocess import call
    # call(["nvcc", "--version"]) does not work
    ! nvcc --version
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__Devices')
    call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
    print('Active CUDA Device: GPU', torch.cuda.current_device())
    print ('Available devices ', torch.cuda.device_count())
    print ('Current cuda device ', torch.cuda.current_device())
    

    This will produce the below output:

    __Python VERSION: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
    [GCC 10.3.0]
    __pyTorch VERSION: 1.12.0a0+bd13bc6
    __CUDA VERSION
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Tue_Mar__8_18:18:20_PST_2022
    Cuda compilation tools, release 11.6, V11.6.124
    Build cuda_11.6.r11.6/compiler.31057947_0
    __CUDNN VERSION: 8400
    __Number CUDA Devices: 2
    __Devices
    index, name, driver_version, memory.total [MiB], memory.used [MiB], memory.free [MiB]
    
    0, Tesla V100-SXM2-32GB, 470.103.01, 32510 MiB, 3381 MiB, 29129 MiB
    1, Tesla V100-SXM2-32GB, 470.103.01, 32510 MiB, 684 MiB, 31826 MiB
    
    Active CUDA Device: GPU 0
    Available devices  2
    Current cuda device  0
    

    or

    import pycuda
    from pycuda import compiler
    import pycuda.driver as drv
    
    drv.init()
    print("%d device(s) found." % drv.Device.count())
               
    for ordinal in range(drv.Device.count()):
        dev = drv.Device(ordinal)
        print (ordinal, dev.name())
    

    This will produce the below output

    2 device(s) found.
    0 Tesla V100-SXM2-32GB
    1 Tesla V100-SXM2-32GB
    

    If you get an error of pycuda module not found, you can simply do a pip install.

    pip install pycuda