is there any option to profile unified virtual memory CUDA application with Nsight Compute (NCU)? For example, I want to know the time spending on handling page fault and migration.
Finally, I figure out the solution by myself. Just need to specify --export=json
to output the profiling result into json file to get the detailed metrics of page fault.
The overall profiling command looks like this.
nsys profile \
--force-overwrite=true \
--cuda-um-gpu-page-faults=true \
--cuda-um-cpu-page-faults=true \
--export=json \
./yourapplication