dockertensorrtnvidia-dockernvprofnvvp

How to stop running TensorRT server without using ctrl-c (for profiling with nvprof)


I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing:

  1. Run nvprof on terminal 1 within a docker container with TensorRT enabled, nvprof --profile-all-processes -o results%p.nvvp

  2. Run TensorRT server on terminal 2 within the same docker container as the first step

  3. Request a service on terminal 3 within a different docker container as the first two steps.

When the third step finishes, the client exists normally but the server and nvprof are kept running. So naturally, I closed the TensorRT server with ctrl-c. When I do this, on terminal 1 (running nvprof) it tells me that the application has had an internal profiling error, and the resulting output file does not have any timeline information on it. (It is only a 380KB big, whereas other files run about the same duration, 2-3 minutes, are about a few MB big at least)

It seemed like ending TensorRT server with ctrl-C is the problem, so I tried to give nvprof a timeout option, namely nvprof --profile-all-processes -o results%p.nvvp --timeout 200 in the first step (200 seconds is more than enough for the whole process to finish) But while this does make nvprof raise this message: Execution timeout, stopping the application..., it does not actually stop the TensorRT server.

Basically, I'd like to know if there's any way to stop a running TensorRT server exit normally without using ctrl-C, or if there is a workaround with this issue using nvprof and TensorRT together.

Any help or push in the right direction would be greatly appreciated. Thanks!

P.S. Original question was posted here about 3 hours ago.


Solution

  • So it turns out, TensorRT was not the problem. When creating and first running the docker container for the server, I have not added the privileged option.

    Running docker container with docker run --rm -it -d --gpus all --privileged ... helps nvprof profile the server behavior even when the server program is killed with Ctrl-C.