cudalinkerptxnvrtccubin

When should NVRTC compilation produce a CUBIN?


If I understand the workflow description in the NVRTC documentation correctly, here's how it works:

However... beginning with CUDA 11.3, NVRTC has the following API call :

nvrtcResult nvrtcGetCUBIN ( nvrtcProgram prog, char* cubin );

So how can I have a cubin after compilation only?


Solution

  • Well, on the host side you get proper machine code after just compilation, so why not on the device side?

    It seems that cubin availability depends on what you targeted with your compilation:

    Now, when you link using the CUDA driver, you have a context at play, and that's always associated with a physical GPU - a concrete micro-architecture. So that necessarily gives you a cubin.

    PS:

    1. Other switches could also affect the availability of cubin output, e.g. --dlink-time-opt.
    2. Before CUDA 11.3, we couldn't nvrtcGetCUBIN() at all. This seems to also have effected the creation of modules, i.e. whether you can create a module using the PTX vs the CUBIN.