index-error: "invalid subindex in axis 0" with pycuda

import math # all the libraries i import
import numpy as np
!pip install pycuda
import pycuda.gpuarray as gpu
import pycuda.cumath as cm

import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule

I have an error that gets thrown after using PyCUDA GPUarrays with a for loop. I defined a function PropagatorS that uses a for loop and it ran fine when I had just been using numpy, but not after switching to cuda.

def PropagatorS(N, L, area, z0):

  p = gpu.zeros((N,N), dtype = 'complex_')

  for ii in gpu.arange(0, N, 1):
      for jj in gpu.arange(0, N, 1):
          u = (ii - N/2 - 1)/area
          v = (jj - N/2 - 1)/area
          p[ii, jj] = cm.exp(1j*np.pi*L*z0*(u**2 + v**2))
  return p

Trying with some values:

p = PropagatorS(200, 700*10**-9, 0.002, 0.08))

returns "IndexError: invalid subindex in axis 0". The error happens in this line:

--->    p[ii, jj] = cm.exp(1j*np.pi*L*z0*(u**2 + v**2))

I am using Colab to run this code. I can't find any troubleshooting threads on this, hopefully someone can help. :)

Solution

This is not how you use cumath.

cumath functions like exp take an array argument, and perform the work on that array. There is no need for the doubly-nested for-loops.

so:

math.exp takes an argument and raises e to the power of that argument.

cumath.exp takes an input array, and returns an array of the same shape, where each element of the returned array is e raised to the power of the corresponding element in the input array.

Here is a trivial example:

$ cat t31.py
import math # all the libraries i import
import numpy as np
import pycuda.gpuarray as gpu
import pycuda.cumath as cm

import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule

def PropagatorS(N):

  q = gpu.zeros((N,N), dtype = np.float32)
  p = gpu.ones_like(q)
  cm.exp(p, out=q)
  return q


p = PropagatorS(4)
print(p)
$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python t31.py
[[ 2.71828175  2.71828175  2.71828175  2.71828175]
 [ 2.71828175  2.71828175  2.71828175  2.71828175]
 [ 2.71828175  2.71828175  2.71828175  2.71828175]
 [ 2.71828175  2.71828175  2.71828175  2.71828175]]
$

I think to do what you want you have at least a couple options:

create an array with your desired exponents in numpy. Transfer that numpy array to a GPU array. Then call cumath.exp on that GPU array.
write a pycuda kernel to do it.

Here is one possible example of how to do it using method 1 i.e. cumath.exp:

$ cat t32.py
import math # all the libraries i import
import numpy as np
import pycuda.gpuarray as gpu
import pycuda.cumath as cm

import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule

def PropagatorS(N, L, area, z0):

  p = np.zeros((N,N), dtype = np.complex64)

  for ii in range(0, N, 1):
      for jj in range(0, N, 1):
          u = (ii - N/2 - 1)/area
          v = (jj - N/2 - 1)/area
          p[ii, jj] = 1j*np.pi*L*z0*(u**2 + v**2)
  q = gpu.to_gpu(p)
  r = cm.exp(q)
  return r


p = PropagatorS(4, 700*10**-9, 0.002, 0.08)
print(p)
$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python t32.py
[[ 0.70264995+0.71153569j  0.84094459+0.54112124j  0.90482706+0.42577928j
   0.92267275+0.385584j  ]
 [ 0.84094459+0.54112124j  0.93873388+0.34464294j  0.97591674+0.21814324j
   0.98456436+0.17502306j]
 [ 0.90482706+0.42577928j  0.97591674+0.21814324j  0.99613363+0.0878512j
   0.99903291+0.04396812j]
 [ 0.92267275+0.385584j    0.98456436+0.17502306j  0.99903291+0.04396812j
   1.00000000+0.j        ]]