cudanumba-pro

NVVM_ERROR_INVALID_OPTION when using the CUDA kernel with Numbapro api


I want to execute a CUDA kernel in python using Numbapro API. I have this code:

import math
import numpy
from numbapro import jit, cuda, int32, float32
from matplotlib import pyplot

@cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
    start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
    end    = N
    stride = cuda.gridDim.x * cuda.blockDim.x
    for i in range(start, end, stride):
        u_source[i] = strength_source/(2*math.pi) * (X[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)
        v_source[i] = strength_source/(2*math.pi) * (Y[i]-x_source)/((X[i]-x_source)**2 + (Y[i]-y_source)**2)


def main():
    N = 200                                # number of points in each direction
    x_start, x_end = -4.0, 4.0            # boundaries in the x-direction
    y_start, y_end = -2.0, 2.0            # boundaries in the y-direction
    x = numpy.linspace(x_start, x_end, N)    # creates a 1D-array with the x-coordinates
    y = numpy.linspace(y_start, y_end, N)    # creates a 1D-array with the y-coordinates

    X, Y = numpy.meshgrid(x, y)              # generates a mesh grid

    strength_source = 5.0                      # source strength
    x_source, y_source = -1.0, 0.0             # location of the source

    start = timer()

    #calculate grid dimensions
    blockSize = 1024
    gridSize  = int(math.ceil(float(N)/blockSize))

    #transfer memory to device
    X_d        = cuda.to_device(X)
    Y_d        = cuda.to_device(Y)
    u_source_d = cuda.device_array_like(X)
    v_source_d = cuda.device_array_like(Y)

    #launch kernel
    calculate_velocity_field[gridSize,blockSize](X_d,Y_d,u_source_d,v_source_d,x_source,y_source,strength_source,N)

    #transfer memory to host
    u_source = numpy.empty_like(X)
    v_source = numpy.empty_like(Y)
    u_source_d.to_host(u_source)
    v_source_d.to_host(v_source)

    elapsed_time = timer() - start
    print("Exec time with GPU %f s" % elapsed_time)

if __name__ == "__main__":
    main()

Is giving me this error:

NvvmError                                 Traceback (most recent call last)
<ipython-input-17-85e4a6e56a14> in <module>()
----> 1 @cuda.jit('void(float32[:], float32[:], float32[:], float32[:], float32, float32, float32, int32)')
      2 def calculate_velocity_field(X, Y, u_source, v_source, x_source, y_source, strength_source, N):
      3     start  = cuda.blockIdx.x * cuda.blockDim.x + cuda.threadIdx.x
      4     end    = N
      5     stride = cuda.gridDim.x * cuda.blockDim.x

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/decorators.py in kernel_jit(func)
     89             # Force compilation for the current context
     90             if bind:
---> 91                 kernel.bind()
     92 
     93             return kernel

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in bind(self)
    319         Force binding to current CUDA context
    320         """
--> 321         self._func.get()
    322 
    323     @property

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    254         cufunc = self.cache.get(device.id)
    255         if cufunc is None:
--> 256             ptx = self.ptx.get()
    257 
    258             # Link

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/compiler.py in get(self)
    226             arch = nvvm.get_arch_option(*cc)
    227             ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 228                                    **self._extra_options)
    229             self.cache[cc] = ptx
    230             if config.DUMP_ASSEMBLY:

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in llvm_to_ptx(llvmir, **opts)
    420     cu.add_module(llvmir.encode('utf8'))
    421     cu.add_module(libdevice.get())
--> 422     ptx = cu.compile(**opts)
    423     return ptx
    424 

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in compile(self, **options)
    211                                           for x in opts])
    212         err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
--> 213         self._try_error(err, 'Failed to compile\n')
    214 
    215         # get result

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in _try_error(self, err, msg)
    229 
    230     def _try_error(self, err, msg):
--> 231         self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
    232 
    233     def get_log(self):

~/.anaconda3/lib/python3.4/site-packages/numba/cuda/cudadrv/nvvm.py in check_error(self, error, msg, exit)
    118                 sys.exit(1)
    119             else:
--> 120                 raise exc
    121 
    122 

NvvmError: Failed to compile

libnvvm : error: -arch=compute_52 is an unsupported option
NVVM_ERROR_INVALID_OPTION

I tried another numbapro examples and the same error ocurrs. I don't know if it's a bug of Numbapro that doesn't support 5.2 compute capability or it's a problem of Nvidia NVVM... suggestions?

In theory it should be supported, but I don't know what is happening.

I'm using Linux with CUDA 7.0 and driver version 346.29


Solution

  • Finally I found a solution here

    conda update cudatoolkit

    Fetching package metadata: ....
    # All requested packages already installed.
    # packages in environment at ~/.anaconda3:
    #
    cudatoolkit               6.0                          p0
    

    It looks like me updating the CUDA toolkit doesn't update to CUDA 7.0. A second solution can be done:

    conda install -c numba cudatoolkit

    Fetching package metadata: ...... 
    Solving package specifications: .
    Package plan for installation in environment ~/.anaconda3:
    
    The following packages will be downloaded:
         package                    |            build
        ---------------------------|-----------------
        cudatoolkit-7.0            |                1       190.8 MB
    
    The following packages will be UPDATED:
    
        cudatoolkit: 6.0-p0 --> 7.0-1
    
    Proceed ([y]/n)? y
    

    Before:

    In [4]: check_cuda()
    ------------------------------libraries detection-------------------------------
    Finding cublas
        located at ~/.anaconda3/lib/libcublas.so.6.0.37
        trying to open library...   ok
    Finding cusparse
        located at ~/.anaconda3/lib/libcusparse.so.6.0.37
        trying to open library...   ok
    Finding cufft
        located at ~/.anaconda3/lib/libcufft.so.6.0.37
        trying to open library...   ok
    Finding curand
        located at ~/.anaconda3/lib/libcurand.so.6.0.37
        trying to open library...   ok
    Finding nvvm
        located at ~/.anaconda3/lib/libnvvm.so.2.0.0
        trying to open library...   ok
        finding libdevice for compute_20... ok
        finding libdevice for compute_30... ok
        finding libdevice for compute_35... ok
    -------------------------------hardware detection-------------------------------
    Found 1 CUDA devices
    id 0      b'GeForce GTX 970'                              [SUPPORTED]
                          compute capability: 5.2
                               pci device id: 0
                                  pci bus id: 7
    Summary:
        1/1 devices are supported
    PASSED
    Out[4]: True
    

    After:

    In [6]:  check_cuda()
    ------------------------------libraries detection-------------------------------
    Finding cublas
        located at ~/.anaconda3/lib/libcublas.so.7.0.28
        trying to open library...   ok
    Finding cusparse
        located at ~/.anaconda3/lib/libcusparse.so.7.0.28
        trying to open library...   ok
    Finding cufft
        located at ~/.anaconda3/lib/libcufft.so.7.0.35
        trying to open library...   ok
    Finding curand
        located at ~/.anaconda3/lib/libcurand.so.7.0.28
        trying to open library...   ok
    Finding nvvm
        located at ~/.anaconda3/lib/libnvvm.so.3.0.0
        trying to open library...   ok
        finding libdevice for compute_20... ok
        finding libdevice for compute_30... ok
        finding libdevice for compute_35... ok
    -------------------------------hardware detection-------------------------------
    Found 1 CUDA devices
    id 0      b'GeForce GTX 970'                              [SUPPORTED]
                          compute capability: 5.2
                               pci device id: 0
                                  pci bus id: 7
    Summary:
        1/1 devices are supported
    PASSED
    Out[6]:  True