Which is the most precise method to measure the memory usage of the GPU of an application that is using OpenACC with Managed Memory? I used two method to do so: one is
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla v100 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P5 11W / N/A | 10322MiB / 16160MiB | 65% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2670 G ./myapp 398MiB |
+-----------------------------------------------------------------------------+
About what here is printed, which is the difference between the Memory usage above (10322MiB / 16160MiB) and that below (./myapp 398MiB) ?
The other method I used is:
void measure_acc_mem_usage() {
auto dev_ty = acc_get_device_type();
auto dev_mem = acc_get_property(0, dev_ty, acc_property_memory);
auto dev_free_mem = acc_get_property(0, dev_ty, acc_property_free_memory);
auto mem = dev_mem - dev_free_mem;
if (mem > max_mem_usage)
max_mem_usage = mem;
}
A function I call many times during the program execution.
Both these methods don't seem to report the exact behaviour of the device (basing this statement on when the saturation seems to occurs: when the application begins to run really slow increasing the problem size) and report very different values (while for example, the second method indicates 2GB of memory usage, nvidia-smi says 16GB)
Not sure you'll be able to get a precise value of memory usage when using CUDA Unified Memory (aka managed). The nvidia-smi utility will only show cudaMalloc allocated memory and the OpenACC property function will use cudaGetMemInfo which isn't accurate for UM.
Bob gives a good explanation as to why here: CUDA unified memory pages accessed in CPU but not evicted from GPU