cudanvprof

nvprof is crashing as it writes a very large file to /tmp/ and runs out of disk space


How do I work-around an nvprof crash that occurs when running on a disk with a relatively small amount of space available?

Specifically, when profiling my cuda kernel, I use the following two commands:

# Generate the timeline
nvprof -f -o ~/myproj/profiling/timeline-`date -I`.out ~/myproj/build/myexe
# Generate profiling data
nvprof -f --kernels ::mykernel:1 --analysis-metrics -o ~/myproj/profiling/analysis-metrics-`date -I`.out ~/myproj/build/myexe

The first nvprof command works fine. The second nvprof needs to write a 12GB temporary file to /tmp before it can proceed. Since my 38GB cloud disk only has 6 GB available, nvprof crashes. Assuming I can't free up more disk space, how do I work around this problem?

Side note: It's mostly irrelevant to diagnosing the problem, but nvprof is reporting an Error: Application received signal 7, which is a "Bus error (bad memory access)" (see http://man7.org/linux/man-pages/man7/signal.7.html for more info).


Solution

  • One can direct nvprof to use a different temporary directory by setting the TMPDIR environment variable. This is helpful, because since Linux kernel 2.6, there's a decent chance that you have a RAM disk available at /dev/shm (see https://superuser.com/a/45509/363816 for more info). Thus, adding the following at the beginning of one's [bash] script will likely work-around your issue.

    export TMPDIR=/dev/shm