[SOLVED] Call graphs for CUDA

Call graphs for CUDA

I am trying to generate call graphs for a code that I have in CUDA with egypt but the usual way doesn't seem to work (since nvcc doesn't have any flag that can do the same thing as -fdump-rtl-expand).

More details :

I have a really large code (of which I am not the author) that spans over multiple .cu files and it would be easier for me to understand what it's doing if I had a call graph.

I bet that an answer to this question would be of use to other people as well.

Any ideas on how this can be done with cuda (.cu) files?

Solution

You can do this with the CUDA support of clang 3.8.

First, compile your CUDA code to emit llvm (example on Windows with CUDA 7.5 installed):

clang++ -c main.cu --cuda-gpu-arch=sm_35 -o main.ll -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"

Then, use the generated ll to build the callgraph with opt:

opt.exe main.ll -analyze -dot-callgraph

Note that opt is not part of the default binary distribution, you may need to build it yourself (I had a 3.7.1 build and it has been able to manage the ll from 3.8).

Example main.cu file:

#include <cuda_runtime.h>
__device__ int f() { return 1; }
__device__ float g(float* a) { return a[f()] ; }
__device__ float h() { return 42.0f ; }
__global__ void kernel (int a, float* b)
{
        int c = a + f();
        g(b);
        b[c] = h();
}

Generated dot file:

digraph "Call graph" {
        label="Call graph";

        Node0x1e3d438 [shape=record,label="{external node}"];
        Node0x1e3d438 -> Node0x1e3cfb0;
        Node0x1e3d438 -> Node0x1e3ce48;
        Node0x1e3d438 -> Node0x1e3d0a0;
        Node0x1e3d438 -> Node0x1e3d258;
        Node0x1e3d438 -> Node0x1e3cfd8;
        Node0x1e3d438 -> Node0x1e3ce98;
        Node0x1e3d438 -> Node0x1e3d000;
        Node0x1e3d438 -> Node0x1e3cee8;
        Node0x1e3d438 -> Node0x1e3d078;
        Node0x1e3d000 [shape=record,label="{__cuda_module_ctor}"];
        Node0x1e3d000 -> Node0x1e3ce98;
        Node0x1e3d000 -> Node0x1e3d168;
        Node0x1e3d078 [shape=record,label="{__cuda_module_dtor}"];
        Node0x1e3d078 -> Node0x1e3cee8;
        Node0x1e3cfb0 [shape=record,label="{^A?f@@YAHXZ}"];
        Node0x1e3d0a0 [shape=record,label="{^A?h@@YAMXZ}"];
        Node0x1e3ce48 [shape=record,label="{^A?g@@YAMPEAM@Z}"];
        Node0x1e3ce48 -> Node0x1e3cfb0;
        Node0x1e3d258 [shape=record,label="{^A?kernel@@YAXHPEAM@Z}"];
        Node0x1e3d258 -> Node0x1e3cfb0;
        Node0x1e3d258 -> Node0x1e3ce48;
        Node0x1e3d258 -> Node0x1e3d0a0;
        Node0x1e3d168 [shape=record,label="{__cuda_register_kernels}"];
        Node0x1e3cee8 [shape=record,label="{__cudaUnregisterFatBinary}"];
        Node0x1e3cee8 -> Node0x1e3d528;
        Node0x1e3cfd8 [shape=record,label="{__cudaRegisterFunction}"];
        Node0x1e3cfd8 -> Node0x1e3d528;
        Node0x1e3ce98 [shape=record,label="{__cudaRegisterFatBinary}"];
        Node0x1e3ce98 -> Node0x1e3d528;
}