I'm new to CUDA and am trying to figure out whether PyCUDA (free) or NumbaPro CUDA Python (not free) would be better for me (assuming the library cost is not an issue).
Both seem to require that you use their respective Python dialects. But, it seems that PyCUDA requires you to write a kernel function in C
code, which would be more cumbersome than using NumbaPro, which seems to do all the hard work for you.
Is this indeed the case? Would there be notable performance differences?
let's talk about each one of these libraries:
PyCUDA:
PyCUDA is a Python programming environment for CUDA it give you access to Nvidia's CUDA parallel computation API from Python. PyCUDA is written in C++(the base layer) and Python,the C++ code will be executed on the NVIDIA chip, and Python code to compile, execute, and get the results of the C++ code and Automatically manages resources which make it one of powerful library CUDA.
PyCUDA is slightly different from to PyOpenCl can be used to run code on a variety of platforms, including Intel, AMD, NVIDIA, and ATI chips. unlike PyCUDA which can be run on NVIDIA chips only:
Python + CUDA = PyCUDA
Python + OpenCL = PyOpenCL
NUMBA/NumbaPro:
NUMBA : NumbaPro or recently Numba (NumbaPro has been deprecated, and its code generation features have been moved into open-source Numba.) is an Open Source NumPy-aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the remarkable LLVM compiler infrastructure to compile Python syntax to machine code. Numba supports compilation of Python to run on either CPU or GPU hardware and it's fundamentally written in Python. it's easy to install and implement.
As @Wang has mentioned, Pycuda is faster than Numba.