I'm writting CUDA code, compiling it with nvcc in VS2022, generating a PTX file, and running the CUDA code from Embarcadero Delphi. For running the CUDA kernels from Delphi I have written an API to nvcuda.dll, which has been working very well. For example, I use functions like cuInit, cuMemAlloc, cuLaunchKernel, cuMemcpyDtoH_v2, cuMemcpyHtoD_v2 without any problem, all according to the CUDA driver API.
However, I have not been able to find cudaDeviceSynchronize() in nvcuda.dll (or libcuda.so). Although cudaDeviceSynchronize() is present in most CUDA demo programs to be compiled by nvcc, it does not seem to exist in the DLL.
How can make the CPU wait for a CUDA kernel using the driver API (i.e. through the DLL, not a C program compiled by nvcc)?
… use functions like
cuInit
,cuMemAlloc
,cuLaunchKernel
,cuMemcpyDtoH_v2
,cuMemcpyHtoD_v2
without any problem, all according to the CUDA Runtime API
Those are not runtime API functions, they are driver API functions. And the reason why you find them in NVCUDA.DLL is because that library is the driver API provider on Windows.
The reason you can’t find CudaDeviceSynchronize
is because it is a runtime API function. If you are actually using the driver API then the equivalent function would be cuCtxSynchronize
.