I'm working on a CUDA application I'd like to profile. Up to now all I've used is the command line profiler, nvprof
, which just displayes the summarized statistics.
I thought about using the GUI profiler, NVVP. The problem is that the remote Linux node I'm running the application on doesn't have anything GUI (even X.org). Moreover, even if I managed to get some X11 stack on the remote node, keeping my own laptop alive for the whole time of the profiling would be, well, tedious.
I tried collecting all the needed information in the following way:
nvprof --analysis-metrics -o application.nvprof ./myapplication
Then I copy the output file onto my laptop and view it in NVVP. This has three problems, though.
First of all, I don't get any file transfer information when I load the output file into NVVP. It's not shown at all in the NVVP window.
Secondly, the call graph is completely distorted. The gaps between kernel launches are at least 100x bigger than the kernel durations, which makes any dependency and flow analysis impossible.
Lastly, my application uses a lot of the GPU memory. During the profiling the device gets out of memory, which is not the case during the standalone run.
How should I properly profile my CUDA application on a headless node?
NVVP supports headless nodes as a first-class citizen. Remote profiling is a major feature of NVVP.
The way this works is that NVVP runs on your local GUI-enabled host machine and invokes nvprof on the headless machine, generates the required files there, copies the files over, and opens them. All of this happens transparently and automatically. You can run further analyses from NVVP as usual and it will repeat these steps for you.
To use remote profiling, open NVVP, then File->New Session
. Add a Connection instead of using Local
, putting in details of the headless machine. Click on Manage...
to point NVVP to the toolkit path on the remote machine. Once this one-time setup is done, enter the path to the executable and run as usual.
You can read about remote profiling in the relevant documentation.