I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing:
Run nvprof on terminal 1 within a docker container with TensorRT enabled, nvprof --profile-all-processes -o results%p.nvvp
Run TensorRT server on terminal 2 within the same docker container as the first step
Request a service on terminal 3 within a different docker container as the first two steps.
When the third step finishes, the client exists normally but the server and nvprof are kept running. So naturally, I closed the TensorRT server with ctrl-c. When I do this, on terminal 1 (running nvprof) it tells me that the application has had an internal profiling error, and the resulting output file does not have any timeline information on it. (It is only a 380KB big, whereas other files run about the same duration, 2-3 minutes, are about a few MB big at least)
It seemed like ending TensorRT server with ctrl-C is the problem, so I tried to give nvprof a timeout option, namely nvprof --profile-all-processes -o results%p.nvvp --timeout 200
in the first step (200 seconds is more than enough for the whole process to finish) But while this does make nvprof raise this message: Execution timeout, stopping the application...
, it does not actually stop the TensorRT server.
Basically, I'd like to know if there's any way to stop a running TensorRT server exit normally without using ctrl-C, or if there is a workaround with this issue using nvprof and TensorRT together.
Any help or push in the right direction would be greatly appreciated. Thanks!
P.S. Original question was posted here about 3 hours ago.
So it turns out, TensorRT was not the problem. When creating and first running the docker container for the server, I have not added the privileged option.
Running docker container with docker run --rm -it -d --gpus all --privileged ...
helps nvprof
profile the server behavior even when the server program is killed with Ctrl-C.